Please check my previous posts as we have discussed in detail over scrape targets installations and configuration.
We would be discussing the below items on the Prometheus.
- PromQL Queries
- Saving/Recording Rules
- Alerting Rules
PromQL Queries
The query language used in Prometheus is called PromQL (Prometheus Query Language). The data can either be viewed as a graph, as tabled data, or in external systems such as Grafana, Zabbix and others.
Simple examples
node_cpu_seconds_total{}
node_cpu_seconds_total{instance="promonode01.devtestlabs.in:80"}
Regular Expressions
list only nodes with specific domains.
node_cpu_seconds_total{instance=~".*.devtestlabs.*"}
node_cpu_seconds_total{instance=~".*.devtestlabs.*",mode=~".*irq*"}
Data Types
Scalar
A numeric floating point value
Instant vector
A set of time series containing a single sample for each time series.
Range Vector
A set of time series containing a range of data points over time for each time series.
node_cpu_seconds_total{instance=~".*.devtestlabs.*",mode=~".*irq*"}[1m]
node_cpu_seconds_total{instance=~".*.devtestlabs.*",mode=~".*irq*"}[5m]
Functions/Subfunctions
- Start with this instant vector node_netstat_Tcp_InSegs{instance="localhost:9100"}
- Convert it to a Range Vector and then convert it back to an instant vector using rate rate(node_netstat_Tcp_InSegs{instance="localhost:9100"}[1m])
- Wrap it in the ceiling function ceil(rate(node_netstat_Tcp_InSegs{instance="localhost:9100"}[1m]))
- Convert it to a range vector and get the per-second derivative of the time series deriv(ceil(rate(node_netstat_Tcp_InSegs{instance="localhost:9100"}[1m]))[1m:])
More Info: https://prometheus.io/docs/prometheus/latest/querying/functions/
Saving Rules
To run a query every time would be difficult, hence we can store the resultant query, Custom rules can be created and saved in the config file.
Saving Rules
To run a query every time would be difficult, hence we can store the resultant query, Custom rules can be created and saved in the config file.
Examples
Memory available percentage: 100 - (100 * node_memory_MemFree_bytes / node_memory_MemTotal_bytes))
Root Disk Space: 100 * node_filesystem_free_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}
Root Disk Space: 100 * node_filesystem_free_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}
Login to your prometheus server and configure the rules as below
vim /etc/prometheus/prometheus_rules.yml
groups:
- name: custom_rules
rules:
- record: node_memory_free_percent
expr: 100 - (100 * node_memory_MemFree_bytes / node_memory_MemTotal_bytes)
- record: node_filesystem_free_percent
expr: 100 * node_filesystem_free_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}
check for syntax,
promtool check rules prometheus_rules.yml
Checking prometheus_rules.yml
SUCCESS: 2 rules found
Include these rules in the main prometheus configurations.
vim /etc/prometheus/prometheus.yml
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
- "prometheus_rules.yml"
check for the syntax of the main config file,
promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: 1 rule files found
Checking prometheus_rules.yml
SUCCESS: 2 rules found
since all are fine, restart prometheus server,
systemctl restart prometheus
You would now be able to see the custom rules in the dashboard.
Alerts
We will create a new group named alert_rules and add in the same config file which created earlier rules.
vim /etc/promotheus/prometheus_rules.yml
- name: alert_rules
rules:
- alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute."
- alert: DiskSpaceFree10Percent
expr: node_filesystem_free_percent <= 10
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} has 10% or less Free disk space"
description: "{{ $labels.instance }} has only {{ $value }}% or less free."
check for the syntax of the config file and restart the service when there are SUCCESS message.
promtool check config prometheus.yml
Checking prometheus.yml
SUCCESS: 1 rule files found
Checking prometheus_rules.yml
SUCCESS: 3 rules found
sudo service prometheus restart
sudo service prometheus status
No comments:
Post a Comment