Hello,
First let me say this is a first time bug report and trying to patch the PostgreSQL, fresh after the hacking webinar. Please let me know if I need to clarify something.
TLDR; EXPLAIN ANALYZE BUFFERS reports different buffer counts for the Sort node between first and subsequent executions per session.
Buffers: shared hit=4
-> Seq Scan on sort_buffer_test (cost=0.00..2.50 rows=1 width=32) (actual time=0.042..0.126 rows=1.00 loops=1)
Filter: (lower(email) = '
email_1@example.com'::text)
Rows Removed by Filter: 99
Buffers: shared hit=1
vs
Buffers: shared hit=1
-> Seq Scan on sort_buffer_test (cost=0.00..2.50 rows=1 width=32) (actual time=0.021..0.057 rows=1.00 loops=1)
Filter: (lower(email) = '
email_1@example.com'::text)
Rows Removed by Filter: 99
Buffers: shared hit=1
I found it while working on SQL regression tool (regresql) that uses total reported buffers from EXPLAIN ANALYZE BUFFERS to detect deviations for query performance.
The issue came on smaller queries where test cases showed consistent "improvement" on performance (fewer buffers) when comparing baseline and actual test runs. Previously did not notice it on larger data sets due to the difference in buffers being negligible given the larger operations.
The example is the test case I managed to isolate.
--- way how to reproduce
CREATE TABLE sort_buffer_test (id serial PRIMARY KEY, val int);
INSERT INTO sort_buffer_test (val) SELECT generate_series(1, 100);
ANALYZE sort_buffer_test;
-- run twice in new psql session and observe reported buffer change
EXPLAIN (ANALYZE, BUFFERS, COSTS OFF, TIMING OFF)
SELECT * FROM sort_buffer_test ORDER BY val DESC LIMIT 1;
---
This behaviour is session specific, tested on 17.x, 18.x and current development version (just fiy, all verified on both amd64 and aarch64).
I traced the potential issue down in case of ORDER BY to pg_amop/pg_amproc lookups. The specific lookups vary by operation but the pattern is the same: first execution incurs catalog reads that get cached for subsequent runs. This applies to DISTINCT (sort, not hashed), GROUP BY, window functions, etc - which seems to isolate it to sort node.
All those I believe are actually metadata resolution (i.e. planner already knows what to sort by).
Having said that - I'm happy to try to fix it. Just wondering what is the right direction to go:
1. Pre-warm syscache during planning (tried locally and it works - but does not feel conceptually right and it's rather 'hacky')
2. Pre-compute and store metadata and modify each plan node struct for each type (lots of work). I believe that's what MergeJoin already does.
3. Or this is 'works as expected' - which I don't think it's the case and would force me to do pre-warming on regresql side
Kind regards,
Radim