9.5. PPEM Communication Failures #

9.5.1. Error HTTP Status Code 500 #

When sending metrics for 10,000 tables and indexes to PPEM via otlphttp, you may encounter the following error in pgpro-otel-collector logs:

{
    "level": "error",
    "ts": "2025-09-05T17:44:37.351+0300",
    "msg": "Exporting failed. Dropping data.",
    "resource": {
        "service.instance.id": "62cc1e9c-a53f-423e-9c6f-41b1f6a0872a",
        "service.name": "pgpro-otel-collector",
        "service.version": "v0.4.0"
    },
    "otelcol.component.id": "otlphttp/ppem_metrics",
    "otelcol.component.kind": "exporter",
    "otelcol.signal": "metrics",
    "error": "not retryable error: Permanent error: rpc error: code = Unknown desc = error exporting items, request to http://192.168.21.147:80/v1/metrics responded with HTTP Status Code 500, Message=, Details=[]",
    "dropped_items": 393313
}

This may indicate that PPEM cannot process the metrics batch within a single request. Use the batch processor to split the request to PPEM into smaller parts. Make sure to specify the send_batch_max_size parameter:

processors:
  batch/metrics:
    send_batch_size: 8192
    send_batch_max_size: 8192
    timeout: 10s
    ...

This parameter splits large requests into batches of 8,192 metrics each. After setting this parameter, ensure the sending queue does not overflow over time:

curl 127.0.0.1:8888/metrics | grep queue_size

9.5.2. Error sending queue is full #

The default maximum queue size is 1,000 elements. The more elements in the queue, the more RAM pgpro-otel-collector requires.

If the queue is full, the following error appears in the collector logs:

{
    "level": "warn",
    "ts": "2025-09-22T15:24:45.161+0300",
    "msg": "Sender failed",
    "resource": {
        "service.instance.id": "8859d053-3485-4a61-bc28-8d839bc9e20f",
        "service.name": "pgpro-otel-collector",
        "service.version": "v0.4.0"
    },
    "otelcol.component.id": "batch/metrics",
    "otelcol.component.kind": "processor",
    "otelcol.pipeline.id": "metrics",
    "otelcol.signal": "metrics",
    "error": "sending queue is full"
}

This means data is being dropped before being sent to PPEM. You need to reduce the number of metrics sent to PPEM or increase the queue size. Example queue configuration:

exporters:
  otlphttp/ppem_metrics:
    sending_queue:
      enabled: true
      queue_size: 1000
      ...

Additionally, you can increase batch/metrics.timeout and the global collection interval to collect metrics less frequently overall.

Retransmission of failed batches is enabled by default. Example configuration:

exporters:
  otlphttp/ppem_metrics:
    retry_on_failure:
      initial_interval: 5s
      max_interval: 30s
      max_elapsed_time: 300s
      multiplier: 1.5
      ...

The example shows default values. The complete set of queue and retry configuration options is available in the OpenTelemetry documentation.

9.5.3. Timeout Errors #

The following error is related to the timeout specified on the PPEM side:

{
    "level": "info",
    "ts": "2025-09-05T18:20:28.465+0300",
    "msg": "Exporting failed. Will retry the request after interval.",
    "resource": {
        "service.instance.id": "b225f72c-f753-4816-85ef-57a982a0392c",
        "service.name": "pgpro-otel-collector",
        "service.version": "v0.4.0"
    },
    "otelcol.component.id": "otlphttp/ppem_metrics",
    "otelcol.component.kind": "exporter",
    "otelcol.signal": "metrics",
    "error": "failed to make an HTTP request: Post \"http://192.168.21.147:80/v1/metrics\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)",
    "interval": "4.905832986s"
}

On the PPEM side, increase the timeout as follows:

  http:
    server:
      timeout: 5m # Specify your value

Another error may be related to the timeout specified on the pgpro-otel-collector side, typically for logs and metrics:

{
    "level": "error",
    "ts": 1752593436.7439601,
    "msg": "Exporting failed. Dropping data.",
    "kind": "exporter",
    "data_type": "logs",
    "name": "otlphttp/ppem_logs",
    "error": "no more retries left: failed to make an HTTP request: Post \"http://192.168.22.114:80/v1/logs\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)",
    "dropped_items": 1
}

Increase the timeout by specifying your values on the pgpro-otel-collector side:

exporters:
  otlphttp/elastic_logs:
    timeout: 1m
    ...
  otlphttp/ppem_logs:
    timeout: 1m
    ...
  otlphttp/ppem_metrics:
    timeout: 1m
    ...

Both errors may indicate that PPEM or pgpro-otel-collector lacks sufficient CPU resources to function properly with the specified plugin settings.