BUG #18477: A specific SQL query with "ORDER BY ... NULLS FIRST" is performing poorly if an ordering column is n - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #18477: A specific SQL query with "ORDER BY ... NULLS FIRST" is performing poorly if an ordering column is n |
Date | |
Msg-id | 18477-55fd294f03d66632@postgresql.org Whole thread Raw |
Responses |
Re: BUG #18477: A specific SQL query with "ORDER BY ... NULLS FIRST" is performing poorly if an ordering column is n
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 18477 Logged by: Alexander777 Email address: alexander.berezin3000@gmail.com PostgreSQL version: 14.6 Operating system: CentOS 7.6.1810 Description: Summary: A specific SQL query with "ORDER BY ... NULLS FIRST" is performing poorly if an ordering column is not nullable. Description: Recently, I encountered an intriguing performance issue with a SQL query over a table with 30+ millions of rows. This query takes significantly longer to execute than I anticipated. After a thorough investigation, I extracted a minimal example that clearly illustrates the problem, particularly when the ordering column is not NULL-able. Steps to Reproduce: 1. Database Setup: - PostgreSQL version: 14.6 - Client: psql 14.6 - Operating system: CentOS 7 - Architecture: x86-64 2. Create simple table and index CREATE TABLE T1 ( pk BIGSERIAL NOT NULL PRIMARY KEY, updated_at BIGINT NOT NULL DEFAULT 0 ); CREATE INDEX idx_t1_updated_at_pk ON T1 (updated_at, pk); 3. Add big number of rows, in my environment there are ~35M rows SELECT COUNT(*) FROM T1; count ---------- 35493666 (1 row) 4. Problematic Query: SELECT updated_at FROM T1 ORDER BY updated_at NULLS FIRST LIMIT 130000 5. Actual Result: The query takes tens of seconds to execute, which is significantly slower than expected. 5.1 EXPLAIN (ANALYZE,BUFFERS) output from problematic query: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=1774902.21..1790069.93 rows=130000 width=8) (actual time=12107.287..12225.239 rows=130000 loops=1) Buffers: shared hit=894055 read=162956 dirtied=3807, temp read=153949 written=231607 I/O Timings: read=6287.278 -> Gather Merge (cost=1774902.21..5093645.38 rows=28444384 width=8) (actual time=12107.285..12216.140 rows=130000 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=894055 read=162956 dirtied=3807, temp read=153949 written=231607 I/O Timings: read=6287.278 -> Sort (cost=1773902.18..1809457.66 rows=14222192 width=8) (actual time=12043.738..12048.354 rows=44698 loops=3) Sort Key: updated_at NULLS FIRST Sort Method: external merge Disk: 204184kB Buffers: shared hit=894055 read=162956 dirtied=3807, temp read=153949 written=231607 I/O Timings: read=6287.278 Worker 0: Sort Method: external merge Disk: 210344kB Worker 1: Sort Method: external merge Disk: 210576kB -> Parallel Index Only Scan using idx_t1_updated_at_pk on t1 (cost=0.56..494747.42 rows=14222192 width=8) (actual time=0.088..7815.929 rows=11827789 loops=3) Heap Fetches: 858732 Buffers: shared hit=893979 read=162956 dirtied=3807 I/O Timings: read=6287.278 Planning Time: 0.099 ms Execution Time: 12278.073 ms (21 rows) 5.2 Same query but w/o NULLS FIRST SELECT updated_at FROM T1 ORDER BY updated_at LIMIT 130000 5.2.1 EXPLAIN (ANALYZE,BUFFERS) output: QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.56..2643.19 rows=130000 width=8) (actual time=0.063..30.874 rows=130000 loops=1) Buffers: shared hit=12 read=501 I/O Timings: read=9.383 -> Index Only Scan using idx_t1_updated_at_pk on t1 (cost=0.56..693858.10 rows=34133260 width=8) (actual time=0.062..21.435 rows=130000 loops=1) Heap Fetches: 0 Buffers: shared hit=12 read=501 I/O Timings: read=9.383 Planning Time: 0.083 ms Execution Time: 35.562 ms (9 rows) 5.3 Same query but with NULLS LAST SELECT updated_at FROM T1 ORDER BY updated_at NULLS LAST LIMIT 130000 5.3.1 EXPLAIN (ANALYZE,BUFFERS) output: QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.56..2643.19 rows=130000 width=8) (actual time=1.768..35.431 rows=130000 loops=1) Buffers: shared hit=11 read=502 I/O Timings: read=13.143 -> Index Only Scan using idx_t1_updated_at_pk on t1 (cost=0.56..693858.10 rows=34133260 width=8) (actual time=1.767..26.107 rows=130000 loops=1) Heap Fetches: 0 Buffers: shared hit=11 read=502 I/O Timings: read=13.143 Planning Time: 0.135 ms Execution Time: 40.427 ms (9 rows) 6. Expected Result: The query in 5. should have performance like in 5.2 or 5.3 for non-nullable columns. 7. Workaround: Workaround is to not specify NULLS FIRST explicitly for non-nullable columns. 8. Severity: Medium - This issue causes significant performance issues when querying on big tables (like ERROR: temporary file size exceeds temp_file_limit (5242880kB)).
pgsql-bugs by date: