Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans) - Mailing list pgsql-hackers

From Alena Rybakina
Subject Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
Date
Msg-id 0fd01749-6a6f-4fae-9915-1a4952a2b5d0@postgrespro.ru
Whole thread Raw
In response to Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Responses Re: Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)
List pgsql-hackers
Hi!

On 27.11.2024 16:36, Matthias van de Meent wrote:
> On Wed, 27 Nov 2024 at 14:22, Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
>> Sorry it took me so long to answer, I had some minor health complications
>>
>> On 12.11.2024 23:00, Peter Geoghegan wrote:
>>
>> On Sun, Nov 10, 2024 at 2:00 PM Alena Rybakina
>> <a.rybakina@postgrespro.ru> wrote:
>>
>> Or maybe I was affected by fatigue, but I don’t understand this point, to be honest. I see from the documentation
andyour first letter that it specifies how many times in total the tuple search would be performed during the index
execution.Is that not quite right?
 
>>
>> Well, nodes that appear on the inner side of a nested loop join (and
>> in a few other contexts) generally have their row counts (and a few
>> other things) divided by the total number of executions. The idea is
>> that we're showing the average across all executions of the node -- if
>> the user wants the true absolute number, they're expected to multiply
>> nrows by nloops themselves. This is slightly controversial behavior,
>> but it is long established (weirdly, we never divide by nloops for
>> "Buffers").
>>
>> I understood what you mean and I faced this situation before when I saw extremely more number of actual rows that
couldbe and it was caused by the number of scanned tuples per cycles. [0]
 
>>
>> [0] https://www.postgresql.org/message-id/flat/9f4a159b-f527-465f-b82e-38b4b7df812f@postgrespro.ru
>>
>> Initial versions of my patch didn't do this. The latest version does
>> divide like this, though. In general it isn't all that likely that an
>> inner index scan would have more than a single primitive index scan,
>> in any case, so which particular behavior I use here (divide vs don't
>> divide) is not something that I feel strongly about.
>>
>> I think we should divide them because by dividing the total buffer usage by the number of loops, user finds the
averagebuffer consumption per loop. This gives them a clearer picture of the resource intensity per basic unit of
work.
> I disagree; I think the whole "dividing by number of loops and
> rounding up to integer" was the wrong choice for tuple count, as that
> makes it difficult if not impossible to determine the actual produced
> count when it's less than the number of loops. Data is lost in the
> rounding/processing, and I don't want to have lost that data.
>
> Same applies for ~scans~ searches: If we do an index search, we should
> show it in the count as total sum, not partial processed value. If a
> user is interested in per-loopcount values, then they can derive that
> value from the data they're presented with; but that isn't true when
> we present only the divided-and-rounded value.
>
To be honest, I didn't understand how it will be helpful because there 
is an uneven distribution of buffer usage from cycle to cycle, isn't it?

I thought that the dividing memory on number of cycles helps us to 
normalize the metric to account for the repeated iterations. This gives 
us a clearer picture of the resource intensity per basic unit of work, 
rather than just the overall total. Each loop may consume a different 
amount of buffer space, but by averaging it out, we're smoothing those 
fluctuations into a more representative measure.

Moreover, this does not correspond to another metric that is nearby - 
the number of lines processed by the algorithm for the inner node. Will 
not the user who evaluates the query plan be confused by such a discrepancy?

-- 
Regards,
Alena Rybakina
Postgres Professional




pgsql-hackers by date:

Previous
From: Michael Banck
Date:
Subject: Re: [PATCH] Add support for displaying database service in psql prompt
Next
From: David Rowley
Date:
Subject: Re: Added prosupport function for estimating numeric generate_series rows