Non-deterministic buffer counts reported in execution with EXPLAIN ANALYZE BUFFERS - Mailing list pgsql-hackers

From Radim Marek
Subject Non-deterministic buffer counts reported in execution with EXPLAIN ANALYZE BUFFERS
Date
Msg-id CAJgoLkLCCKy1hAFpoUzwaRq91cA=m=-J03MBpUo-d7-oXy6MzA@mail.gmail.com
Whole thread Raw
Responses Re: Non-deterministic buffer counts reported in execution with EXPLAIN ANALYZE BUFFERS
Re: Non-deterministic buffer counts reported in execution with EXPLAIN ANALYZE BUFFERS
List pgsql-hackers
Hello,

First let me say this is a first time bug report and trying to patch the PostgreSQL, fresh after the hacking webinar. Please let me know if I need to clarify something.

TLDR; EXPLAIN ANALYZE BUFFERS reports different buffer counts for the Sort node between first and subsequent executions per session.

         Buffers: shared hit=4
         ->  Seq Scan on sort_buffer_test  (cost=0.00..2.50 rows=1 width=32) (actual time=0.042..0.126 rows=1.00 loops=1)
               Filter: (lower(email) = 'email_1@example.com'::text)
               Rows Removed by Filter: 99
               Buffers: shared hit=1

vs

         Buffers: shared hit=1
         ->  Seq Scan on sort_buffer_test  (cost=0.00..2.50 rows=1 width=32) (actual time=0.021..0.057 rows=1.00 loops=1)
               Filter: (lower(email) = 'email_1@example.com'::text)
               Rows Removed by Filter: 99
               Buffers: shared hit=1

I found it while working on SQL regression tool (regresql) that uses total reported buffers from EXPLAIN ANALYZE BUFFERS to detect deviations for query performance.

The issue came on smaller queries where test cases showed consistent "improvement" on performance (fewer buffers) when comparing baseline and actual test runs. Previously did not notice it on larger data sets due to the difference in buffers being negligible given the larger operations.

The example is the test case I managed to isolate.

--- way how to reproduce

  CREATE TABLE sort_buffer_test (id serial PRIMARY KEY, val int);
  INSERT INTO sort_buffer_test (val) SELECT generate_series(1, 100);
  ANALYZE sort_buffer_test;

  -- run twice in new psql session and observe reported buffer change
  EXPLAIN (ANALYZE, BUFFERS, COSTS OFF, TIMING OFF)
  SELECT * FROM sort_buffer_test ORDER BY val DESC LIMIT 1;

---

This behaviour is session specific, tested on 17.x, 18.x and current development version (just fiy, all verified on both amd64 and aarch64).

I traced the potential issue down in case of ORDER BY to pg_amop/pg_amproc lookups. The specific lookups vary by operation but the pattern is the same: first execution incurs catalog reads that get cached for subsequent runs. This applies to DISTINCT (sort, not hashed), GROUP BY, window functions, etc - which seems to isolate it to sort node.

All those I believe are actually metadata resolution (i.e. planner already knows what to sort by).

Having said that - I'm happy to try to fix it. Just wondering what is the right direction to go:

1. Pre-warm syscache during planning (tried locally and it works - but does not feel conceptually right and it's rather 'hacky')
2. Pre-compute and store metadata and modify each plan node struct for each type (lots of work). I believe that's what MergeJoin already does.
3. Or this is 'works as expected' - which I don't think it's the case and would force me to do pre-warming on regresql side

Kind regards,

Radim


pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: refactor architecture-specific popcount code
Next
From: Jacob Champion
Date:
Subject: Re: libpq: Bump protocol version to version 3.2 at least until the first/second beta