On Wed, 27 Nov 2024 at 14:22, Alena Rybakina <a.rybakina@postgrespro.ru> wrote:
>
> Sorry it took me so long to answer, I had some minor health complications
>
> On 12.11.2024 23:00, Peter Geoghegan wrote:
>
> On Sun, Nov 10, 2024 at 2:00 PM Alena Rybakina
> <a.rybakina@postgrespro.ru> wrote:
>
> Or maybe I was affected by fatigue, but I don’t understand this point, to be honest. I see from the documentation and
yourfirst letter that it specifies how many times in total the tuple search would be performed during the index
execution.Is that not quite right?
>
> Well, nodes that appear on the inner side of a nested loop join (and
> in a few other contexts) generally have their row counts (and a few
> other things) divided by the total number of executions. The idea is
> that we're showing the average across all executions of the node -- if
> the user wants the true absolute number, they're expected to multiply
> nrows by nloops themselves. This is slightly controversial behavior,
> but it is long established (weirdly, we never divide by nloops for
> "Buffers").
>
> I understood what you mean and I faced this situation before when I saw extremely more number of actual rows that
couldbe and it was caused by the number of scanned tuples per cycles. [0]
>
> [0] https://www.postgresql.org/message-id/flat/9f4a159b-f527-465f-b82e-38b4b7df812f@postgrespro.ru
>
> Initial versions of my patch didn't do this. The latest version does
> divide like this, though. In general it isn't all that likely that an
> inner index scan would have more than a single primitive index scan,
> in any case, so which particular behavior I use here (divide vs don't
> divide) is not something that I feel strongly about.
>
> I think we should divide them because by dividing the total buffer usage by the number of loops, user finds the
averagebuffer consumption per loop. This gives them a clearer picture of the resource intensity per basic unit of work.
I disagree; I think the whole "dividing by number of loops and
rounding up to integer" was the wrong choice for tuple count, as that
makes it difficult if not impossible to determine the actual produced
count when it's less than the number of loops. Data is lost in the
rounding/processing, and I don't want to have lost that data.
Same applies for ~scans~ searches: If we do an index search, we should
show it in the count as total sum, not partial processed value. If a
user is interested in per-loopcount values, then they can derive that
value from the data they're presented with; but that isn't true when
we present only the divided-and-rounded value.
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)