Thread: Buffer usage detailed by RelKind in EXPLAIN ANALYZE BUFFERS
Hi hackers! I was asked to prototype a feature that helps to distinguish shared buffer usage between index reads and heap reads. Practically it looks like this: # explain (analyze,verbose,buffers) select nextval('s'); QUERY PLAN ------------------------------------------------------------------------------------------------ Result (cost=0.00..0.01 rows=1 width=8) (actual time=1.159..1.160 rows=1 loops=1) Output: nextval('s'::regclass) Buffers: shared hit=7(relation 3, index 4) read=6(relation 1, index 4, sequence 1) dirtied=1 Planning Time: 0.214 ms Execution Time: 1.238 ms (5 rows) The change is in these parts "(relation 3, index 4)" and "(relation 1, index 4, sequence 1)". Probably, it should help DBAs to better understand complex queries. I think cluttering output with more text is OK as long as "verbose" is requested. But there are some caveats: 1. Some more increments on hot paths. We have to add this tiny toll to every single buffer hit, but it will be seldom of any use. 2. It's difficult to measure writes caused by query, and even dirties. The patch adds "evictions" caused by query, but they have little practical sense too. All in all I do not have an opinion if this feature is a good tradeoff. What do you think? Does the feature look useful? Do we want a more polished implementation? Thanks! Best regards, Andrey Borodin.
Attachment
Hi, On 2023-02-13 16:23:30 -0800, Andrey Borodin wrote: > But there are some caveats: > 1. Some more increments on hot paths. We have to add this tiny toll to > every single buffer hit, but it will be seldom of any use. Additionally, I bet it slows down EXPLAIN (ANALYZE, BUFFERS) noticeably. It's already quite expensive... > All in all I do not have an opinion if this feature is a good tradeoff. > What do you think? Does the feature look useful? Do we want a more > polished implementation? Unless the above issues could be avoided, I don't think so. Greetings, Andres Freund
On Mon, Feb 13, 2023 at 4:29 PM Andres Freund <andres@anarazel.de> wrote: > > 1. Some more increments on hot paths. We have to add this tiny toll to > > every single buffer hit, but it will be seldom of any use. > > Additionally, I bet it slows down EXPLAIN (ANALYZE, BUFFERS) noticeably. It's > already quite expensive... > I think collection of instrumentation is done unconditionally. We always do that pgBufferUsage.shared_blks_hit++; when the buffer is in shared_buffers. Best regards, Andrey Borodin.
Hi, On 2023-02-13 16:36:25 -0800, Andrey Borodin wrote: > On Mon, Feb 13, 2023 at 4:29 PM Andres Freund <andres@anarazel.de> wrote: > > > 1. Some more increments on hot paths. We have to add this tiny toll to > > > every single buffer hit, but it will be seldom of any use. > > > > Additionally, I bet it slows down EXPLAIN (ANALYZE, BUFFERS) noticeably. It's > > already quite expensive... > > > > I think collection of instrumentation is done unconditionally. > We always do that > pgBufferUsage.shared_blks_hit++; > when the buffer is in shared_buffers. The problem I'm talking about is the increased overhead in InstrStopNode(), due to BufferUsageAccumDiff() getting more expensive. Andres
On Mon, Feb 13, 2023 at 4:39 PM Andres Freund <andres@anarazel.de> wrote: > > The problem I'm talking about is the increased overhead in InstrStopNode(), > due to BufferUsageAccumDiff() getting more expensive. > Thanks, now I understand the problem better. According to godbolt.com my patch doubles the number of instructions in this function. Unless we compute only tables\indexes\matviews. Anyway, without regarding functionality of this particular patch, BufferUsageAccumDiff() does not seem slow to me. It's just a branchless bunch of simd instructions. Performance of this function might matter only when called gazillion times per second. Best regards, Andrey Borodin.
Attachment
Hi, On 2023-02-13 18:14:58 -0800, Andrey Borodin wrote: > On Mon, Feb 13, 2023 at 4:39 PM Andres Freund <andres@anarazel.de> wrote: > > > > The problem I'm talking about is the increased overhead in InstrStopNode(), > > due to BufferUsageAccumDiff() getting more expensive. > > > > Thanks, now I understand the problem better. According to godbolt.com > my patch doubles the number of instructions in this function. Unless > we compute only tables\indexes\matviews. > Anyway, without regarding functionality of this particular patch, > BufferUsageAccumDiff() does not seem slow to me. It's just a > branchless bunch of simd instructions. Performance of this function > might matter only when called gazillion times per second. It is called gazillions of times per second when you do an EXPLAIN (ANALYZE, BUFFERS). Every invocation of an executor node calls it. Here's a quick pgbench, showing todays code, with -O3, without assertions: 298.396 0 SELECT generate_series(1, 10000000) OFFSET 10000000; 397.400 0 EXPLAIN (ANALYZE, TIMING OFF) SELECT generate_series(1, 10000000) OFFSET 10000000; 717.238 0 EXPLAIN (ANALYZE, TIMING ON) SELECT generate_series(1, 10000000) OFFSET 10000000; 419.736 0 EXPLAIN (ANALYZE, BUFFERS, TIMING OFF) SELECT generate_series(1, 10000000) OFFSET 10000000; 761.575 0 EXPLAIN (ANALYZE, BUFFERS, TIMING ON) SELECT generate_series(1, 10000000) OFFSET 10000000; The effect ends up a lot larger once you add in joins etc, because it ads additional executor node that all have their instrumentation started/stopped. Greetings, Andres Freund