Re: explain analyze rows=%.0f - Mailing list pgsql-hackers

From Ilia Evdokimov
Subject Re: explain analyze rows=%.0f
Date
Msg-id 6acda85e-7af0-49fb-a4b1-13cd9f584f99@tantorlabs.com
Whole thread Raw
In response to Re: explain analyze rows=%.0f  (Matheus Alcantara <matheusssilv97@gmail.com>)
Responses Re: explain analyze rows=%.0f
List pgsql-hackers


On 10.02.2025 23:43, Matheus Alcantara wrote:
When the total number of returned tuples is less than the number of
loops currently shows 'rows = 0'. This can mislead users into thinking
that no rows were returned at all, even though some might have appeared
occasionally.

I think that this can happen when the returned rows and the loops are small
enough to result in a 'row' value like 0.00045? I'm not sure if we have
"bigger" values (e.g 1074(ntuples) / 117(nloops) which would result in 9.17
rows) this would also be true, what do you think? If you could provide
an example of this case would be great!

Based on what was discussed earlier in the thread, there are cases with
large loops [0]. However, I believe it's better not to display average
rows with excessively long digits or in scientific notation. And, of
course, I agree with you regarding small values. I think we should also
add a check to ensure that the total rows is actually greater than zero.
When the total rows is zero, we could simply display it as an integer
without decimals. It could help users average rows is very small but not
zero. What do you think about this approach?

Yeah, I agree with you about the long digits. My question is more about why do
we need the planstate->instrument->ntuples < nloops check? I tried to remove
this check and I got a lot of EXPLAIN output that shows 'rows' values with .00,
so I'm just trying to understand the reason. From what I've understood about
this thread is that just avoiding .00 decimals of 'rows' values that could be
just integers would be enough, is that right or I'm missing something here? I'm
just worried if we could have a scenario where nloops > 1 &&
planstate->instrument->ntuples < nloops which would make the 'rows' not be
formatted correctly.


Sorry for missing your question earlier. If you notice in the code above, the variable(average) 'rows' is defined as:

double rows = planstate->instrument->ntuples / nloops;

This represents the total rows divided by the number of loops. The condition means that variable 'rows' will always  between zero and one. Therefore, the average rows under such conditions cannot be greater than or even equal to one. I wrote this condition specifically to avoid the verbose expression 'rows > 0 && rows < 1'. However, since this might not be obvious to everyone, perhaps it'd be better to write is using 'rows' directly or add a comment explaining this logic.


-    executing the index scans on <literal>tenk2</literal>.
+    executing the index scans on <literal>tenk2</literal>.   If a subplan node
+    is executed multiple times and the average number of rows is less than one,
+    the rows and <literal>loops</literal> values are shown as a
decimal fraction
+    (with two digits after the decimal point) to indicate that some rows
+    were actually processed rather than simply rounding down to zero.

* I think that it would be good to mention what a 'row' value in
decimal means. For  example, if its says "0.1 rows" the user should assume that typically 0 rows  will be returned but sometimes it can return 1 or more.

* There are more spaces than necessary before "If a subplan node ..."

* Maybe wrap 'rows' with <literal> </literal>?

I agree with the last two points. As for the first one—maybe we could
simply state that the average rows value can be decimal, especially for
very small values?

I'm just not sure about the "small values"; the 'rows' in decimal will only
happen with small values? What would be a "small value" in this context? My main
point here is more that I think that it would be good to mention *why* the
'rows' can be decimal, not just describe that it could be decimal.


As for 'small values', it means that the average rows is between zero and one, to avoid rounding errors and misunderstanding. I think this would be ideal.

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.

pgsql-hackers by date:

Previous
From: Matheus Alcantara
Date:
Subject: Re: explain analyze rows=%.0f
Next
From: Melanie Plageman
Date:
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring