Introduction
What's Prometheus ?
Observability focuses on understanding the internal state of your systems based on the data they produce, which helps determine if your infrastructure is healthy. Prometheus is a core technology for monitoring and observability of systems, but the term “Prometheus” can be confusing because it is used in different contexts. Understanding Prometheus basics, why it’s valuable for system observability, and how users use it in practice will both help you better understand it and help you use Grafana.
Prometheus began in 2012 at SoundCloud because existing technologies were insufficient for their observability needs. Prometheus offers both a robust data model and a query language. Prometheus is also simple and scalable. In 2018, Prometheus graduated from Cloud Native Computing Foundation (CNCF) incubation, and today has a thriving community.
可观察性侧重于根据系统产生的数据了解系统的内部状态,这有助于确定您的基础设施是否健康。Prometheus是系统监测和可观察性的核心技术,但“Prometheus”一词可能会令人困惑,因为它在不同的环境中使用。了解Prometheus的基础知识,为什么它对系统的可观察性很有价值,以及用户在实践中如何使用它,都将帮助您更好地理解它,并帮助您使用Grafana。
Prometheus于2012年在SoundCloud开始工作,因为现有技术不足以满足其可观测性需求。Prometheus提供了一个健壮的数据模型和一种查询语言。Prometheus也是简单且可扩展的。2018年,Prometheus从云原生计算基金会(CNCF)孵化中心毕业,如今拥有一个蓬勃发展的社区。
The following panel in a Grafana dashboard shows how much disk bandwidth on a Mac laptop is being used. The green line represents disk reads, and the yellow line represents writes.
Data like these form time series. The X-axis is a moment in time and the Y-axis is a number or measurement; for example, 5 megabytes per second. This type of time series data appears everywhere in systems monitoring, as well as in places such as seasonal temperature charts and stock prices. This data is simply some measurement (such as a company stock price or Disk I/O) through a series of time instants.
Grafana仪表板中的以下面板显示了Mac笔记本电脑上使用的磁盘带宽。绿色线表示磁盘读取,黄色线表示写入。
像这样的数据形成时间序列。X轴是时间上的一个时刻,Y轴是一个数字或测量值;例如每秒5兆字节。这种类型的时间序列数据出现在系统监测中的任何地方,也出现在季节性温度图和股价等地方。这些数据只是通过一系列时间瞬间进行的一些测量指标(例如公司股价或磁盘I/O)。
Prometheus is a technology that collects and stores time series data. Time series are fundamental to Prometheus; its data model is arranged into:
metrics that consist of a timestamp and a sample, which is the numeric value, such as how many disk bytes have been read or a stock price
a set of labels called dimensions, for example, job and device
You can store time series data in any relational database, however, these systems are not developed to store and query large volumes of time series data. Prometheus and similar software provide tools to compact and optimize time series data.
Prometheus是一种收集和存储时间序列数据的技术。时间序列是Prometheus的基础;其数据模型被设计为:
- 由时间戳和样本组成的指标,样本是数值,例如读取了多少磁盘字节或股价
- 一组称为维度的标签,例如作业和设备
您可以将时间序列数据存储在任何关系数据库中,但是,这些系统并不是为存储和查询大量时间序列数据而开发的。Prometheus和类似的软件提供了压缩和优化时间序列数据的工具。
Simple dashboard using PromQL
The following Grafana dashboard image shows a Disk I/O graph of raw data from Prometheus derived from a laptop.
The Metrics browser field contains the following query:
node_disk_written_bytes_total{job="integrations/macos-node", device!=""}
In this example, the Y-axis shows the total number of bytes written, and the X-axis shows dates and times. As the laptop runs, the number of bytes written increases over time. Below Metrics browser is a counter that counts the number of bytes written over time.
使用PromQL的简单仪表板
下面的Grafana仪表板图像显示了Prometheus从笔记本电脑获得的原始数据的磁盘I/O图。
度量浏览器字段包含以下查询:
node_disk_writen_bytes_total{job=“integrations/macos node”,device!=“”}
在本例中,Y轴显示写入的字节总数,X轴显示日期和时间。随着笔记本电脑的运行,写入的字节数会随着时间的推移而增加。Metrics浏览器里面是一个计数器,用于统计随时间写入的字节数。
The query is a simple example of PromQL, the Prometheus Query Language. The query identifies the metric of interest (node_disk_written_bytes_total) and provides two labels (job and device). The label selector job="integrations/macos-node" filters metrics. It both reduces the scope of the metrics to those coming from the MacOS integration job and specifies that the “device” label cannot be empty. The result of this query is the raw stream of numbers that the graph displays.
Although this view provides some insight into the performance of the system, it doesn’t provide the full story. A clearer picture of system performance requires understanding the rate of change that displays how fast the data being written is changing. To properly monitor disk performance, you need to also see spikes in activity that illustrate if and when the system is under load, and whether disk performance is at risk. PromQL includes a rate() function that shows the per-second average rate of increase over 5m (5-minute) intervals. This view provides a much clearer picture of what’s happening with the system.
该查询是Prometheus查询语言PromQL的一个简单示例。查询标识感兴趣的指标(node_disk_writen_bytes_total),并提供两个标签(job和device)。标签选择器job=“integrations/macos节点”过滤指标。它既将指标的范围缩小到来自MacOS集成job的指标,又指定“设备”标签不能为空。此查询的结果是图形显示的原始数字流。