Re: explain analyze rows=%.0f - Mailing list pgsql-hackers

From Ilia Evdokimov
Subject Re: explain analyze rows=%.0f
Date
Msg-id 7ab512ac-eed1-4fc8-9c5f-b42c0b0d313e@tantorlabs.com
Whole thread Raw
In response to Re: explain analyze rows=%.0f  (Andrei Lepikhov <lepihov@gmail.com>)
Responses Re: explain analyze rows=%.0f
Re: explain analyze rows=%.0f
List pgsql-hackers
On 18.02.2025 23:55, Andrei Lepikhov wrote:
> On 17/2/2025 15:19, Robert Haas wrote:
>> On Mon, Feb 17, 2025 at 3:08 AM Ilia Evdokimov
>> if (nloops > 1)
>>
>> Instead of:
>>
>> if (nloops > 1 && rows_is_fractonal)
>>
>> I don't think it's really safe to just cast a double back to int64. In
>> practice, the number of tuples should never be large enough to
>> overflow int64, but if it did, this result would be nonsense. Also, if
>> the double ever lost precision, the result would be nonsense. If we
>> want to have an exact count of tuples, we ought to change ntuples and
>> ntuples2 to be uint64. But I don't think we should do that in this
>> patch, because that adds a whole bunch of new problems to worry about
>> and might cause us to get nothing committed. Instead, I think we
>> should just always show two decimal digits if there's more than one
>> loop.
>>
>> That's simpler than what the patch currently does and avoids this
>> problem. Perhaps it's objectionable for some other reason, but if so,
>> can somebody please spell out what that reason is so we can talk about
>> it?
> I can understand two decimal places. You might be concerned about 
> potential issues with some codes that parse PostgreSQL explains.
> However, I believe it would be beneficial to display fractional parts 
> only when iterations yield different numbers of tuples. Given that I 
> often work with enormous explains, I think this approach would enhance 
> the readability and comprehension of the output. Frequently, I may see 
> only part of the EXPLAIN on the screen. A floating-point row number 
> format may immediately give an idea about parameterisation (or another 
> reason for the subtree's variability) and trace it down to the source.
>

The idea of indicating to the user that different iterations produced 
varying numbers of rows is quite reasonable. Most likely, this would 
require adding a new boolean field to the Instrumentation structure, 
which would track this information by comparing the rows value from the 
current and previous iterations.

However, there is a major issue: this case would be quite difficult to 
document clearly. Even with an example and explanatory text, users may 
still be confused about why rows=100 means the same number of rows on 
all iterations, while rows=100.00 indicates variation. Even if we 
describe this in the documentation, a user seeing rows=100.00 will most 
likely assume it represents an average of 100 rows per iteration and may 
still not realize that the actual number of rows varied.

If we want to convey this information more clearly, we should consider a 
more explicit approach. For example, instead of using a fractional 
value, we could display the minimum and maximum row counts observed 
during execution (e.g.,rows=10..20, formatting details could be 
discussed). However, in my opinion, this discussion is beyond the scope 
of this thread.

Any thoughts?

--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.




pgsql-hackers by date:

Previous
From: Jakub Wartak
Date:
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Next
From: Daniel Gustafsson
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER