Saturday 7 November 2020

Prometheus Monitoring - Part 3

Please check my previous posts as we have discussed in detail over scrape targets installations and configuration.

We would be discussing the below items on the Prometheus.
  • PromQL Queries
  • Saving/Recording Rules
  • Alerting Rules
PromQL Queries

The query language used in Prometheus is called PromQL (Prometheus Query Language). The data can either be viewed as a graph, as tabled data, or in external systems such as Grafana, Zabbix and others.

Simple examples

node_cpu_seconds_total{}
node_cpu_seconds_total{instance="promonode01.devtestlabs.in:80"} 

Regular Expressions
list only nodes with specific domains.

node_cpu_seconds_total{instance=~".*.devtestlabs.*"} 
node_cpu_seconds_total{instance=~".*.devtestlabs.*",mode=~".*irq*"}

Data Types

Scalar
A numeric floating point value

Instant vector
A set of time series containing a single sample for each time series.

Range Vector
A set of time series containing a range of data points over time for each time series.

node_cpu_seconds_total{instance=~".*.devtestlabs.*",mode=~".*irq*"}[1m]
node_cpu_seconds_total{instance=~".*.devtestlabs.*",mode=~".*irq*"}[5m]

Functions/Subfunctions
  • Start with this instant vector node_netstat_Tcp_InSegs{instance="localhost:9100"}
  • Convert it to a Range Vector and then convert it back to an instant vector using rate rate(node_netstat_Tcp_InSegs{instance="localhost:9100"}[1m])
  • Wrap it in the ceiling function ceil(rate(node_netstat_Tcp_InSegs{instance="localhost:9100"}[1m]))
  • Convert it to a range vector and get the per-second derivative of the time series deriv(ceil(rate(node_netstat_Tcp_InSegs{instance="localhost:9100"}[1m]))[1m:])
More Info: https://prometheus.io/docs/prometheus/latest/querying/functions/

Saving Rules
To run a query every time would be difficult, hence we can store the resultant query, Custom rules can be created and saved in the config file. 

Examples
Memory available percentage: 100 - (100 * node_memory_MemFree_bytes / node_memory_MemTotal_bytes))
Root Disk Space: 100 * node_filesystem_free_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}

Login to your prometheus server and configure the rules as below

vim /etc/prometheus/prometheus_rules.yml
groups:
  - name: custom_rules
    rules:
      - record: node_memory_free_percent
        expr: 100 - (100 * node_memory_MemFree_bytes / node_memory_MemTotal_bytes)
      - record: node_filesystem_free_percent
        expr: 100 * node_filesystem_free_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}


check for syntax,
promtool check rules prometheus_rules.yml
Checking prometheus_rules.yml
  SUCCESS: 2 rules found


Include these rules in the main prometheus configurations.
vim /etc/prometheus/prometheus.yml
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
  - "prometheus_rules.yml"


check for the syntax of the main config file,

promtool check config prometheus.yml
Checking prometheus.yml
  SUCCESS: 1 rule files found

Checking prometheus_rules.yml
  SUCCESS: 2 rules found


since all are fine, restart prometheus server,
systemctl restart prometheus

You would now be able to see the custom rules  in the dashboard.
 


Alerts
We will create a new group named alert_rules and add in the same config file which created earlier rules.
vim /etc/promotheus/prometheus_rules.yml
  - name: alert_rules
    rules:
      - alert: InstanceDown
        expr: up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Instance {{ $labels.instance }} down"
          description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute."

      - alert: DiskSpaceFree10Percent
        expr: node_filesystem_free_percent <= 10
        labels:
          severity: warning
        annotations:
          summary: "Instance {{ $labels.instance }} has 10% or less Free disk space"
          description: "{{ $labels.instance }} has only {{ $value }}% or less free."


check for the syntax of the config file and restart the service when there are SUCCESS message.

 promtool check config prometheus.yml
Checking prometheus.yml
  SUCCESS: 1 rule files found

Checking prometheus_rules.yml
  SUCCESS: 3 rules found


sudo service prometheus restart
sudo service prometheus status

 



No comments:

Post a Comment