9.3. The prometheus Exporter Issues #

The prometheus exporter is not designed to handle and transmit large volumes of metrics.

Consider a scenario with a Postgres Pro database containing 10,000 tables and 10,000 indexes, using the following extended plugin set:

  • hostmetrics: cpu (utilization), disk, filesystem, load, memory, network, paging, processes

  • postgrespro: activity, archiver, bgwriter, bloat_indexes, bloat_tables, cache, databases, functions, indexes, io, locks, replication, replication_slots, tables, tablespaces, version, wal

The expected load in this scenario looks as follows:

  • collector RAM usage — at least 3 GiB

  • time to fully load the metrics page — 8-10 seconds

  • CPU load — 30-50% in conducted tests (1 core)

If the server has less than 3 GiB of RAM available, the collector may be terminated by the OOM killer, excluding other processes.

The collector generates over 390,000 metric records in this configuration.

Use the table below to estimate the number of metrics.

Plugin NameNumber of Metrics Generated per Object
tables31 per table
indexes6 per index (+1 if invalid)
bloat_tables1 per table
bloat_indexes1 per index

Thus, for 100,000 tables and 100,000 indexes, the number of metrics would be at least 3,900,000 for the plugins listed above.

When transmitting hundreds of thousands of metrics through the prometheus exporter using pull model, you may encounter the following error in pgpro-otel-collector logs:

{
    "level": "error",
    "ts": "2025-09-05T17:40:25.575+0300",
    "msg": "error encoding and sending metric family: write tcp 127.0.0.1:8889->127.0.0.1:44930: write: broken pipe\n",
    "resource": {
        "service.instance.id": "62cc1e9c-a53f-423e-9c6f-41b1f6a0872a",
        "service.name": "pgpro-otel-collector",
        "service.version": "v0.4.0"
    },
    "otelcol.component.id": "prometheus",
    "otelcol.component.kind": "exporter",
    "otelcol.signal": "metrics"
}

This error indicates that metrics cannot be downloaded by prometheus within its allocated timeout period. Use the following workarounds to fix the problem:

  • Increase Timeout

    In the prometheus configuration, specify a larger timeout than the default value:

    global:
      scrape_interval: 15s # Default = 1m
      scrape_timeout: 15s # Increase timeout globally or in specific scrape_config (default = 10s)
    

  • Reduce Metrics Volume

    To reduce the overall volume of transmitted metrics, configure collection from specific objects:

    receivers:
      postgrespro:
        plugins:
          tables:
            enabled: true
            databases:
              - name: database_name
                schemas:
                  - name: schema_name
                    tables:
                      - name: table_name
          indexes:
            enabled: true
            databases:
               - name: database_name
                 schemas:
                  - name: schema_name
                    tables:
                      - name: table_name
                        indexes:
                        - name: index_name
          bloat_tables:
            enabled: true
            fetcher:
              batch_size: 10000
            collection_interval: 5m
            databases:
              - name: database_name
                schemas:
                  - name: schema_name
                    tables:
                      - name: table_name
          bloat_indexes:
            enabled: true
            fetcher:
              batch_size: 10000
            collection_interval: 5m
            databases:
               - name: database_name
                 schemas:
                  - name: schema_name
                    tables:
                      - name: table_name
                        indexes:
                        - name: index_name
    

  • Use Denylists

    If the previous method requires specifying too many objects, use a denylist to exclude specific objects instead. For implementation examples, refer to Section 6.6.5.

  • Increase Resources

    If all collected metrics are required, allocate more CPU resources to pgpro-otel-collector — for example, when the /metrics page loads too slowly due to insufficient server resources.