9.5. PPEM Communication Failures #
9.5.1. Error HTTP Status Code 500 #
When sending metrics for 10,000 tables and indexes to PPEM via otlphttp, you may encounter the following error in pgpro-otel-collector logs:
{
"level": "error",
"ts": "2025-09-05T17:44:37.351+0300",
"msg": "Exporting failed. Dropping data.",
"resource": {
"service.instance.id": "62cc1e9c-a53f-423e-9c6f-41b1f6a0872a",
"service.name": "pgpro-otel-collector",
"service.version": "v0.4.0"
},
"otelcol.component.id": "otlphttp/ppem_metrics",
"otelcol.component.kind": "exporter",
"otelcol.signal": "metrics",
"error": "not retryable error: Permanent error: rpc error: code = Unknown desc = error exporting items, request to http://192.168.21.147:80/v1/metrics responded with HTTP Status Code 500, Message=, Details=[]",
"dropped_items": 393313
}
This may indicate that PPEM cannot process the metrics batch within a single request. Use the batch processor to split the request to PPEM into smaller parts. Make sure to specify the send_batch_max_size parameter:
processors:
batch/metrics:
send_batch_size: 8192
send_batch_max_size: 8192
timeout: 10s
...
This parameter splits large requests into batches of 8,192 metrics each. After setting this parameter, ensure the sending queue does not overflow over time:
curl 127.0.0.1:8888/metrics | grep queue_size
9.5.2. Error sending queue is full #
The default maximum queue size is 1,000 elements. The more elements in the queue, the more RAM pgpro-otel-collector requires.
If the queue is full, the following error appears in the collector logs:
{
"level": "warn",
"ts": "2025-09-22T15:24:45.161+0300",
"msg": "Sender failed",
"resource": {
"service.instance.id": "8859d053-3485-4a61-bc28-8d839bc9e20f",
"service.name": "pgpro-otel-collector",
"service.version": "v0.4.0"
},
"otelcol.component.id": "batch/metrics",
"otelcol.component.kind": "processor",
"otelcol.pipeline.id": "metrics",
"otelcol.signal": "metrics",
"error": "sending queue is full"
}
This means data is being dropped before being sent to PPEM. You need to reduce the number of metrics sent to PPEM or increase the queue size. Example queue configuration:
exporters:
otlphttp/ppem_metrics:
sending_queue:
enabled: true
queue_size: 1000
...
Additionally, you can increase batch/metrics.timeout and the global collection interval to collect metrics less frequently overall.
Retransmission of failed batches is enabled by default. Example configuration:
exporters:
otlphttp/ppem_metrics:
retry_on_failure:
initial_interval: 5s
max_interval: 30s
max_elapsed_time: 300s
multiplier: 1.5
...
The example shows default values. The complete set of queue and retry configuration options is available in the OpenTelemetry documentation.
9.5.3. Timeout Errors #
The following error is related to the timeout specified on the PPEM side:
{
"level": "info",
"ts": "2025-09-05T18:20:28.465+0300",
"msg": "Exporting failed. Will retry the request after interval.",
"resource": {
"service.instance.id": "b225f72c-f753-4816-85ef-57a982a0392c",
"service.name": "pgpro-otel-collector",
"service.version": "v0.4.0"
},
"otelcol.component.id": "otlphttp/ppem_metrics",
"otelcol.component.kind": "exporter",
"otelcol.signal": "metrics",
"error": "failed to make an HTTP request: Post \"http://192.168.21.147:80/v1/metrics\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)",
"interval": "4.905832986s"
}
On the PPEM side, increase the timeout as follows:
http:
server:
timeout: 5m # Specify your value
Another error may be related to the timeout specified on the pgpro-otel-collector side, typically for logs and metrics:
{
"level": "error",
"ts": 1752593436.7439601,
"msg": "Exporting failed. Dropping data.",
"kind": "exporter",
"data_type": "logs",
"name": "otlphttp/ppem_logs",
"error": "no more retries left: failed to make an HTTP request: Post \"http://192.168.22.114:80/v1/logs\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)",
"dropped_items": 1
}
Increase the timeout by specifying your values on the pgpro-otel-collector side:
exporters:
otlphttp/elastic_logs:
timeout: 1m
...
otlphttp/ppem_logs:
timeout: 1m
...
otlphttp/ppem_metrics:
timeout: 1m
...
Both errors may indicate that PPEM or pgpro-otel-collector lacks sufficient CPU resources to function properly with the specified plugin settings.