6.2. Working with Metrics #

This section describes the steps required to manage metrics.

6.2.1. General Setup #

6.2.1.1. Adding and Configuring the postgrespro Receiver #

To collect metrics from the database instance, add the postgrespro receiver to the receivers section and specify its configuration.

Required configuration:

  • Specify the database instance connection parameters.

  • Specify the list of the plugins for data collection.

Additional configuration:

  • Collection parameters: parallelism, delay, interval.

receivers:
  postgrespro:
    max_threads: 3
    collection_interval: 60s
    initial_delay: 1s
    transport: tcp
    endpoint: localhost:5432
    database: postgres
    username: postgres
    password: ${env:POSTGRESQL_PASSWORD}
    metrics: null
    plugins:
      activity:
        enabled: true
      bgwriter:
        enabled: true
      locks:
        enabled: true
      version:
        enabled: true
      wal:
        enabled: true
      cache:
        enabled: true

Some plugins have additional configuration parameters. For example, the plugins for metrics collection from DBMS objects (tablespaces, databases, tables, indexes) can be configured in such a way that the collection will only take place in a specified number of objects. This allows controlling the load on the database instance and the amount of data sent through the pipeline to an exporter. The detailed description of configuration parameters for each plugin can be found in the /usr/share/doc/pgpro-otel-collector/examples directory.

The receiver can also use Unix sockets when the endpoint is defined as shown below.

receivers:
  postgrespro:
    ...
    transport: unix
    endpoint: /tmp:5432 # Or 'tmp:5432'
    ...

6.2.1.2. Adding and Configuring the hostmetrics Receiver #

The hostmetrics receiver is an open-source component of the OpenTelemetry Collector and is used for collecting metrics from the operating system. For detailed information about this receiver, refer to the OpenTelemetry documentation.

To configure the hostmetrics receiver, it is sufficient to list the plugins (scrapers) for data collection. Collection parameters are also available: delay and interval.

Some plugins also have additional configuration parameters.

receivers:
  hostmetrics:
    collection_interval: 60s
    initial_delay: 1s
    scrapers:
      cpu:
        metrics:
          system.cpu.utilization:
            enabled: true
      disk: null
      load: null
      memory: null
      network: null

6.2.1.3. Adding and Configuring the SQL Query Receiver #

The sqlquery receiver is an open-source component of the OpenTelemetry Collector for gathering metrics and/or logs from custom SQL queries.

Warning

The sqlquery receiver is currently experimental and is not recommended for production use.

To set up the sqlquery receiver for collecting metrics, follow the example procedure below.

  1. Create the sqlquery.yml configuration file.

  2. In the created configuration file, specify database connection parameters in the receivers.sqlquery section:

    receivers:
      sqlquery:
        driver: postgres
        host: localhost
        port: 5432
        database: postgres
        username: postgres
        password: ${env:POSTGRESQL_PASSWORD}
        # Additional driver-specific connection parameters
        additional_params:
          application_name: pgpro-otel-collector
          sslmode: disable
        # The time interval between query executions. Default: 10s
        collection_interval: 60s
        # Defines setup for the component's own telemetry
        telemetry:
          logs:
            # If true, each executed query is logged at debug level
            query: false
        # The maximum number of open connections to the Postgres Pro server. Default: 0 (unlimited)
        max_open_conn: 5
    

    The password parameter supports environment variable substitution, as shown in the example. Special characters in the credentials are automatically URL-encoded to ensure proper connection string formatting.

    Alternatively, use the datasource parameter to provide a complete connection string:

    datasource: "host=localhost port=5432 user=postgres password=postgres application_name=pgpro-otel-collector sslmode=disable"
    
  3. List queries to collect metrics. Each query consists of an SQL statement and a metrics section. There may be several metrics sections, but at least one such section is required. Each metric in the configuration produces one OpenTelemetry metric per row returned from the SQL query.

    receivers:
      sqlquery:
        ...
        queries:
          - sql: >-
              SELECT p.id, p.name, p.price, sum(s.units) AS sold
              FROM products p LEFT JOIN sales s ON (p.id = product_id)
              GROUP BY p.id, p.name, p.price
            metrics:
              # Gauge metric example
              # The name assigned to the OpenTelemetry metric
              - metric_name: postgresql.products.price
                value_column: price
                data_type: gauge
                value_type: double
                description: Price of the product
                unit: rub
                static_attributes:
                  database: postgres
                attribute_columns: ["name"]
              # Sum metric example
              - metric_name: postgresql.products.sold
                value_column: sold
                data_type: sum
                value_type: int
                description: Total count of sold products
                # Whether a cumulative sum value is monotonically increasing (i.e. never rolls over or resets)
                # Default = false
                monotonic: true
                aggregation: cumulative
                static_attributes:
                  database: postgres
                attribute_columns: ["name"]
    

    This example assumes the following database schema:

    CREATE TABLE products (id INTEGER, name TEXT, price NUMERIC);
    INSERT INTO products VALUES (1, 'Cheese', 9.99), (2, 'Bread', 1.99), (3, 'Milk', 2.99);
    
    CREATE TABLE sales (id BIGINT, product_id BIGINT, units INT, sold_at TIMESTAMP DEFAULT now());
    INSERT INTO sales VALUES (1, 1, 10), (2, 2, 20), (2, 2, 50), (3, 3, 30);
    

    The query creates two metrics: a gauge for current product prices and a cumulative sum for total units sold per product.

    Use data_type to specify the metric type. Set it to gauge for metrics representing a current value, or sum for metrics representing an aggregated total. When using sum, set the aggregation parameter to either cumulative (default) or delta.

    Note

    Avoid queries that produce NULL values. If a query returns NULL in a column referenced in the configuration, errors will be logged, but the receiver will continue operating.

  4. Configure exporters, processors, and the service pipeline:

    ...
    exporters:
      prometheus:
        endpoint: :8889
        send_timestamps: true
      otlphttp/sqlquery/metrics:
        compression: ''
        endpoint: https://metrics.example.org:8080
        headers:
          X-Ppem-Source-Agent-Name: local
          X-Ppem-Source-Instance-Port: '5432'
    processors:
      batch/sqlquery:
        send_batch_size: 2048
        timeout: 10s
    service:
      # Telemetry for the collector itself
      telemetry:
        logs:
          # Sets the minimum enabled logging level
          # Values: debug, info, warn, error
          # Default = info
          level: info
      pipelines:
        metrics/sqlquery:
          receivers: [ sqlquery ]
          processors: [ batch/sqlquery ]
          exporters: [ prometheus, otlphttp/sqlquery/metrics ]
    
  5. Start pgpro-otel-collector with the created configuration file:

    build/pgpro-otel-collector/pgpro-otel-collector --config configs/sqlquery.yml
    

For the full list of the sqlquery configuration parameters, refer to the OpenTelemetry documentation.

6.2.1.4. Adding and Configuring the metricstransform Processor #

The metricstransform processor is an open-source component of the OpenTelemetry Collector for renaming metrics and managing labels, including scaling and aggregation operations. For more details, refer to the OpenTelemetry documentation.

The example setup of the metricstransform processor is shown in the procedure below.

  1. Create the metrics_transform.yml configuration file.

  2. Add the configuration sections that match your needs from the examples below:

    • Rename a metric — for example, postgresql.databases.size_bytes to postgresql.db.size_bytes:

      processors:
        metricstransform/rename:
          transforms:
            - include: postgresql.databases.size_bytes
              # Default = strict
              match_type: strict
              action: update
              new_name: postgresql.db.size_bytes
      

      The action parameter controls the transformation type: update modifies existing metrics, insert creates cloned metrics, and combine merges multiple metrics into a new one.

    • Use regular expressions to rename multiple metrics at once:

        metricstransform/rename_by_regexp:
          transforms:
            - include: ^postgresql\.databases\.(.*)$$
              match_type: regexp
              action: update
              new_name: postgresql.db.$${1}
      
    • Create a duplicate of the metric with a new name:

        metricstransform/create_new:
          transforms:
            - include: postgresql.databases.size_bytes
              match_type: strict
              action: insert
              new_name: postgresql.db.size_bytes
      
    • Add a new label to a metric:

        metricstransform/add_label:
            transforms:
              - include: postgresql.databases.size_bytes
                action: update
                operations:
                  - action: add_label
                    new_label: new_label
                    new_value: "value 1"
      
    • Rename an existing label:

        metricstransform/rename_label:
          transforms:
            - include: postgresql.databases.size_bytes
              action: update
              operations:
                - action: update_label
                  label: database
                  new_label: db
      
    • Rename multiple labels using a regular expression:

        metricstransform/rename_label_multiple:
          transforms:
            - include: ^postgresql\.databases\.(.*)$$
              match_type: regexp
              action: update
              operations:
                - action: update_label
                  label: database
                  new_label: db
      
    • Rename a specific label value:

        metricstransform/rename_label_value:
          transforms:
            - include: postgresql.databases.size_bytes
              action: update
              operations:
                - action: update_label
                  label: database
                  value_actions:
                    - value: postgres
                      new_value: default
      
    • Delete data points that have a certain label value:

        metricstransform/delete_by_label_value:
          transforms:
            - include: postgresql.databases.size_bytes
              action: update
              operations:
                - action: delete_label_value
                  label: database
                  # Specifies the label value whose data points will be removed
                  label_value: db11
      
    • Convert data type from double to int and vice versa:

        metricstransform/convert_data_type:
          transforms:
            - include: postgresql.databases.size_bytes
              action: update
              operations:
                - action: toggle_scalar_data_type
      
    • Aggregate data points that have the labels excluded in label_set:

        metricstransform/aggregate_labels:
          transforms:
            - include: postgresql.activity.connections
              action: update
              operations:
                - action: aggregate_labels
                  # Contains a list of labels that will remain after aggregation
                  label_set:
                    #- database
                    - user
                    #- backend_process_state
                  # Defines how combined data points will be aggregated
                  # Possible values: sum, mean, min, max, count, median
                  aggregation_type: sum
      

      Note that only the sum aggregation function is supported for histogram and exponential histogram data types.

    • Aggregate data points that have a specific label value:

        metricstransform/aggregate_label_values:
          transforms:
            - include: postgresql.activity.connections
              action: update
              operations:
                - action: aggregate_label_values
                  label: database
                  # Contains a list of label values that will be aggregated
                  aggregated_values: [ db11, db12, db13 ]
                  new_value: all_db1
                  aggregation_type: sum
      

      Note that only the sum aggregation function is supported for histogram and exponential histogram data types.

    • Combine several related metrics into one. For example, combine multiple tuples_* metrics into tuples_total with a type label:

        metricstransform/combine_metrics:
          transforms:
            - include: ^postgresql\.databases\.tuples_(?P<type>.*)$$
              match_type: regexp
              action: combine
              new_name: postgresql.databases.tuples_total
              submatch_case: lower
      

      The submatch_case parameter controls the case of label values extracted from regular expression submatches during combine operations. Leave empty to preserve the original case. Possible values: lower or upper.

    • Group metrics from a single resource and report them as multiple resource metrics:

        metricstransform/group_metrics:
          transforms:
            - include: ^postgresql.databases.size_bytes$$
              match_type: regexp
              action: group
              group_resource_labels: {"resource.type": "default", "source": "default"}
            - include: ^postgresql.databases.orphaned_files_size_bytes$$
              match_type: regexp
              action: group
              group_resource_labels: {"resource.type": "orphaned", "source": "orphaned"}
      
    • Create a new metric filtered by a label value (experimental feature):

        metricstransform/experimental_create_by_label:
          transforms:
            - include: postgresql.databases.size_bytes
              match_type: regexp
              experimental_match_labels: {"database": "db1.*"}
              action: insert
              new_name: postgresql.db1.size
      

      If experimental_match_labels is specified, transformations apply only to metrics that match the specified label values. This works with both strict and regexp match_type.

    • Scale metric values from bytes to bits (experimental feature):

        metricstransform/experimental_scale_value:
          transforms:
            - include: postgresql.databases.size_bytes
              match_type: strict
              action: insert
              new_name: postgresql.databases.size_bits
              operations:
                - action: experimental_scale_value
                  experimental_scale: 8
      

      The experimental_scale parameter defines the scalar to apply to metric values. Note that scaling exponential histograms inherently involves some loss of accuracy.

  3. Configure exporters and the service pipeline in accordance with the chosen sections:

    exporters:
      prometheus:
        endpoint: :8889
        send_timestamps: true
        # translation_strategy: UnderscoreEscapingWithoutSuffixes
        # If true, all the resource attributes will be converted to metric labels by default
        # resource_to_telemetry_conversion:
        #   enabled: true
    service:
      pipelines:
        metrics:
          receivers:
            - postgrespro
          processors:
            - metricstransform/rename
            # - metricstransform/rename_by_regexp
            # - metricstransform/create_new
            # - metricstransform/add_label
            # - metricstransform/rename_label
            # - metricstransform/rename_label_multiple
            # - metricstransform/rename_label_value
            # - metricstransform/delete_by_label_value
            # - metricstransform/convert_data_type
            # - metricstransform/aggregate_labels
            # - metricstransform/aggregate_label_values
            # - metricstransform/combine_metrics
            # - metricstransform/group_metrics
            # - metricstransform/experimental_create_by_label
            # - metricstransform/experimental_scale_value
          exporters:
            - prometheus
    
  4. Start pgpro-otel-collector with both the main configuration file and metrics_transform.yml:

    build/pgpro-otel-collector/pgpro-otel-collector --config configs/basic.yml --config configs/metrics_transform.yml
    

For a complete list of configuration parameters, refer to the OpenTelemetry documentation.

6.2.1.5. Adding and Configuring the prometheus Exporter #

The prometheus exporter is an open-source component of the OpenTelemetry Collector. For detailed information, refer to the OpenTelemetry documentation.

prometheus is the easiest to use — it does not require the external component configuration and can be enabled by default. To set it up, it is sufficient to specify the address to listen for incoming requests:

exporters:
  prometheus:
    endpoint: "1.2.3.4:8889"
    send_timestamps: true

6.2.1.6. Adding and Configuring the otlphttp Exporter #

The otlphttp exporter is an open-source component of the OpenTelemetry Collector and is used for exporting collected logs to an OTLP-compatible storage or monitoring system that has to be predeployed and accessible. For more details, refer to the OpenTelemetry documentation.

To configure the otlphttp exporter, it is sufficient to specify the address of the target system where data should be sent:

exporters:
  otlphttp:
    endpoint: https://otlp.example.org

6.2.1.7. Adding and Configuring the kafka Exporter #

The kafka exporter is an open-source component of the OpenTelemetry Collector for sending metrics and logs to Apache Kafka. For more details, refer to the OpenTelemetry documentation.

Below is the example of setting it up for sending metrics.

receivers:
  postgrespro:
    max_threads: 3
    collection_interval: 60s
    initial_delay: 1s
    transport: tcp
    endpoint: localhost:5432
    database: postgres
    username: postgres
    password: ${env:POSTGRESQL_PASSWORD}
    metrics: null
    plugins:
      activity:
        enabled: true
      bgwriter:
        enabled: true
      locks:
        enabled: true
      version:
        enabled: true
      wal:
        enabled: true
      cache:
        enabled: true
exporters:
  kafka:
    brokers:
      - localhost:9092
    protocol_version: 2.1.0
    client_id: pgpro-otel-collector
    metrics:
      topic: otlp_metrics
      encoding: otlp_json # proto supported
    include_metadata_keys:
      - service.name
      - service.instance.id
    tls:
      insecure: true
    timeout: 30s
    producer:
      max_message_bytes: 1000000
      required_acks: 1
      compression: none # gzip, snappy, lz4, and zstd;
processors:
  batch/kafka:
    send_batch_size: 1024
    timeout: 1s
  resource:
    attributes:
      - key: service.name
        action: upsert
        value: postgresql
      - key: service.instance.id
        action: upsert
        value: address-of-postgres-instance:5432
service:
  pipelines:
    metrics/kafka:
      receivers: [ postgrespro ]
      processors: [ batch/kafka,resource ]
      exporters: [ kafka ]

6.2.1.8. Setting up a Pipeline #

Once receivers and exporters are added and configured, they need to be combined into a pipeline. The pipeline is configured in the service section. The pipeline contents depend altogether on the previously added components (there is no default configuration).

The example below shows how to set up a pipeline for metric management. The data is collected by the postgrespro and hostmetrics receivers, processed by the batch processor and exported by the prometheus and otlphttp exporters.

Thus, all the components used in the pipeline should also be added in the configuration file and set up.

service:
  extensions: []
  pipelines:
    metrics:
      receivers:
        - postgrespro
        - hostmetrics
      processors:
        - batch
      exporters:
        - prometheus
        - otlphttp

6.2.2. Use Cases #

6.2.2.1. Collecting Metrics from a BiHA Cluster #

pgpro-otel-collector can be configured to collect metrics from BiHA clusters using the biha plugin. The example procedure for such setup is as follows:

  1. Create a configuration file called biha.yml with the following content:

    receivers:
      postgrespro/biha:
        transport: tcp
        endpoint: &endpoint localhost:5432
        database: biha_db
        username: biha_replication_user
        password: ${env:POSTGRESQL_PASSWORD}
        collection_interval: 60s
        initial_delay: 1s
        max_threads: 3
        plugins:
          biha:
            enabled: true
            cluster_name: cluster_name
    exporters:
      prometheus/biha:
        endpoint: :8889
        send_timestamps: true
        # Defines how long metrics remain exposed without updates
        # Since some BiHA metrics include 'biha_state' in their labels,
        # it is best to set this parameter equal to the collection_interval,
        # so that outdated states do not appear on the Prometheus page.
        metric_expiration: 60s
    processors:
      batch/metrics:
        send_batch_size: 8192
        timeout: 10s
      memory_limiter/metrics:
        check_interval: 1s
        limit_mib: 2048
    service:
      pipelines:
        metrics/biha:
          receivers:
            - postgrespro/biha
          processors:
            - memory_limiter/metrics
            - batch/metrics
          exporters:
            - prometheus/biha
    
  2. Start pgpro-otel-collector with the created configuration file:

    build/pgpro-otel-collector/pgpro-otel-collector --config configs/biha.yml
    

To check which metrics are included in the biha plugin, refer to Section 8.18.

Note

Plugins with special access privileges must be configured separately.

If you need to include plugins that require special access privileges, add all of them in a separate postgrespro section with their own usernames. For example:

receivers:
   postgrespro/biha:
     username: biha_replication_user
     ...
     plugins:
       biha:
         enabled: true
         ...
   postgrespro/otel:
     username: otel
     ...
     plugins:
       databases:
         enabled: true

For more details on the access privileges, refer to Section 3.1.

Note

By default, the postgres database is not copied to a node in the referee or referee_with_wal mode (unless the --referee-with-postgres-db option is used).

When collecting database-level, table-level, index-level, or function-level statistics from these nodes, you must explicitly exclude the postgres database using the acl.databases.deny section, as shown below. This exclusion is not required when collecting per-instance statistics. For more information about allowlists and denylists, refer to Section 6.6.5.

acl:
  databases:
    deny:
      - name: postgres