Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans) - Mailing list pgsql-hackers
From | Alena Rybakina |
---|---|
Subject | Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans) |
Date | |
Msg-id | 0fd01749-6a6f-4fae-9915-1a4952a2b5d0@postgrespro.ru Whole thread Raw |
In response to | Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans) (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Responses |
Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
|
List | pgsql-hackers |
Hi! On 27.11.2024 16:36, Matthias van de Meent wrote: > On Wed, 27 Nov 2024 at 14:22, Alena Rybakina <a.rybakina@postgrespro.ru> wrote: >> Sorry it took me so long to answer, I had some minor health complications >> >> On 12.11.2024 23:00, Peter Geoghegan wrote: >> >> On Sun, Nov 10, 2024 at 2:00 PM Alena Rybakina >> <a.rybakina@postgrespro.ru> wrote: >> >> Or maybe I was affected by fatigue, but I don’t understand this point, to be honest. I see from the documentation andyour first letter that it specifies how many times in total the tuple search would be performed during the index execution.Is that not quite right? >> >> Well, nodes that appear on the inner side of a nested loop join (and >> in a few other contexts) generally have their row counts (and a few >> other things) divided by the total number of executions. The idea is >> that we're showing the average across all executions of the node -- if >> the user wants the true absolute number, they're expected to multiply >> nrows by nloops themselves. This is slightly controversial behavior, >> but it is long established (weirdly, we never divide by nloops for >> "Buffers"). >> >> I understood what you mean and I faced this situation before when I saw extremely more number of actual rows that couldbe and it was caused by the number of scanned tuples per cycles. [0] >> >> [0] https://www.postgresql.org/message-id/flat/9f4a159b-f527-465f-b82e-38b4b7df812f@postgrespro.ru >> >> Initial versions of my patch didn't do this. The latest version does >> divide like this, though. In general it isn't all that likely that an >> inner index scan would have more than a single primitive index scan, >> in any case, so which particular behavior I use here (divide vs don't >> divide) is not something that I feel strongly about. >> >> I think we should divide them because by dividing the total buffer usage by the number of loops, user finds the averagebuffer consumption per loop. This gives them a clearer picture of the resource intensity per basic unit of work. > I disagree; I think the whole "dividing by number of loops and > rounding up to integer" was the wrong choice for tuple count, as that > makes it difficult if not impossible to determine the actual produced > count when it's less than the number of loops. Data is lost in the > rounding/processing, and I don't want to have lost that data. > > Same applies for ~scans~ searches: If we do an index search, we should > show it in the count as total sum, not partial processed value. If a > user is interested in per-loopcount values, then they can derive that > value from the data they're presented with; but that isn't true when > we present only the divided-and-rounded value. > To be honest, I didn't understand how it will be helpful because there is an uneven distribution of buffer usage from cycle to cycle, isn't it? I thought that the dividing memory on number of cycles helps us to normalize the metric to account for the repeated iterations. This gives us a clearer picture of the resource intensity per basic unit of work, rather than just the overall total. Each loop may consume a different amount of buffer space, but by averaging it out, we're smoothing those fluctuations into a more representative measure. Moreover, this does not correspond to another metric that is nearby - the number of lines processed by the algorithm for the inner node. Will not the user who evaluates the query plan be confused by such a discrepancy? -- Regards, Alena Rybakina Postgres Professional
pgsql-hackers by date: