Performance regression when adding LIMIT 1 to a query - Mailing list pgsql-general

From Costa Alexoglou
Subject Performance regression when adding LIMIT 1 to a query
Date
Msg-id CAJ+5Ff7s1NxcyZ4x-FdrygRT0KMuyROVPrP+Pm_1Z2DwMdv07Q@mail.gmail.com
Whole thread Raw
Responses Re: Performance regression when adding LIMIT 1 to a query
List pgsql-general
Hey folks,

I faced an interesting regression and would love to have some help understanding why this happened.

The postgres version if it's of any use is PostgreSQL 16.3 .

The following query:
```
EXPLAIN ANALYZE
SELECT "databases_metrics"."metrics"
FROM
  "databases_metrics"
WHERE
  (
    "databases_metrics"."created_at" >= '2023-03-15 10:00:00+00:00'::timestamptz
    AND "databases_metrics"."db_instance_id" = 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::UUID
    AND "databases_metrics"."created_at" <= '2025-04-03 10:00:00+00:00'::timestamptz
    AND (
      metrics -> 'perf_average_query_runtime' IS NOT NULL
    )
  )
ORDER BY
  "databases_metrics"."id" ASC;
```

with a plan:
```
Gather Merge  (cost=115584.47..118515.35 rows=25120 width=824) (actual time=46.004..74.267 rows=29653 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Sort  (cost=114584.45..114615.85 rows=12560 width=824) (actual time=41.200..47.322 rows=9884 loops=3)
        Sort Key: id
        Sort Method: external merge  Disk: 16360kB
        Worker 0:  Sort Method: external merge  Disk: 15552kB
        Worker 1:  Sort Method: external merge  Disk: 14536kB
        ->  Parallel Bitmap Heap Scan on databases_metrics  (cost=990.77..109175.83 rows=12560 width=824) (actual time=3.326..14.295 rows=9884 loops=3)
              Recheck Cond: ((db_instance_id = 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15 10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03 10:00:00+00'::timestamp with time zone))"},
              Filter: ((metrics -> 'perf_average_query_runtime'::text) IS NOT NULL)
              Rows Removed by Filter: 68
              Heap Blocks: exact=4272
              ->  Bitmap Index Scan on idx_databases_metrics_instance_date_custom_created_debugging  (cost=0.00..983.24 rows=30294 width=0) (actual time=3.786.786 rows=29856 loops=1)"},
                    Index Cond: ((db_instance_id = 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid) AND (created_at >= '2023-03-15 10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03 10:00:00+00'::timestamp with time zone))"},
Planning Time: 0.126 ms
Execution Time: 79.334 ms
```

Results in an execution time of `79.334ms` which is ok.


Then if I only add a `LIMIT 1` at the end of the query, I get dramatically slower execution:

```
EXPLAIN ANALYZE
SELECT "databases_metrics"."metrics"
FROM
  "databases_metrics"
WHERE
  (
    "databases_metrics"."created_at" >= '2023-03-15 10:00:00+00:00'::timestamptz
    AND "databases_metrics"."db_instance_id" = 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::UUID
    AND "databases_metrics"."created_at" <= '2025-04-03 10:00:00+00:00'::timestamptz
    AND (
      metrics -> 'perf_average_query_runtime' IS NOT NULL
    )
  )
ORDER BY
  "databases_metrics"."id" ASC LIMIT 1;
```

With a plan:
```
Limit  (cost=0.43..229.66 rows=1 width=824) (actual time=7538.004..7538.005 rows=1 loops=1)
  ->  Index Scan using databases_metrics_pkey on databases_metrics  (cost=0.43..6909156.38 rows=30142 width=824) (actual time=7538.002..7538.003 rows=1 loops=1)
        Filter: ((created_at >= '2023-03-15 10:00:00+00'::timestamp with time zone) AND (created_at <= '2025-04-03 10:00:00+00'::timestamp with time zone) A((metrics -> 'perf_average_query_runtime'::text) IS NOT NULL) AND (db_instance_id = 'c4c97a60-b88e-4cd3-a2f1-random-uuid'::uuid))"},
        Rows Removed by Filter: 10244795
Planning Time: 0.128 ms
Execution Time: 7538.032 ms
```

Any clue why this may be happening?

pgsql-general by date:

Previous
From: Willy-Bas Loos
Date:
Subject: Re: find replication slots that "belong" to a publication
Next
From: Joe Conway
Date:
Subject: Re: Kubernetes, cgroups v2 and OOM killer - how to avoid?