Thread: explain analyze rows=%.0f
I have always assumed that there is some very good reason why EXPLAIN ANALYZE reports the number of rows as an integer rather than a floating point value, but in reading explain.c it seems that the reason is just that we decided to round to zero decimal places. Any chance we could reconsider this decision? I often find myself wanting to know the value that is here called ntuples, but rounding ntuples/nloops off to the nearest integer loses too much precision. (Before someone mentions it, yes that would be a good thing to include in XML-formatted explain output. But I don't see that including a couple of decimal places would hurt the text output format either.) ...Robert
Robert Haas escreveu: > I have always assumed that there is some very good reason why EXPLAIN > ANALYZE reports the number of rows as an integer rather than a > floating point value, but in reading explain.c it seems that the > reason is just that we decided to round to zero decimal places. Any > chance we could reconsider this decision? I often find myself wanting > to know the value that is here called ntuples, but rounding > ntuples/nloops off to the nearest integer loses too much precision. > Don't you think is too strange having, for example, 6.67 rows? I would confuse users and programs that parses the EXPLAIN output. However, I wouldn't object to add ntuples to an extended explain output (as discussed in the other thread). -- Euler Taveira de Oliveira http://www.timbira.com/
On Thu, May 28, 2009 at 11:00 PM, Euler Taveira de Oliveira <euler@timbira.com> wrote: > Robert Haas escreveu: >> I have always assumed that there is some very good reason why EXPLAIN >> ANALYZE reports the number of rows as an integer rather than a >> floating point value, but in reading explain.c it seems that the >> reason is just that we decided to round to zero decimal places. Any >> chance we could reconsider this decision? I often find myself wanting >> to know the value that is here called ntuples, but rounding >> ntuples/nloops off to the nearest integer loses too much precision. >> > Don't you think is too strange having, for example, 6.67 rows? No stranger than having it say 7 when it's really not. Actually mine mostly come out 1 when the real value is somewhere between 0.5 and 1.49. :-( ...Robert
On Thu, May 28, 2009 at 11:12:42PM -0400, Robert Haas wrote: > On Thu, May 28, 2009 at 11:00 PM, Euler Taveira de Oliveira > > Don't you think is too strange having, for example, 6.67 rows? > > No stranger than having it say 7 when it's really not. Actually mine > mostly come out 1 when the real value is somewhere between 0.5 and > 1.49. :-( +1. It would help users realize more quickly that some of the values in the EXPLAIN output are, for instance, *average* number of rows *per iteration* of a nested loop, say, rather than total rows found in all loops. That's an important distinction that isn't immediately clear to the novice EXPLAIN reader, but would become so very quickly as users tried to figure out how a scan could come up with a fractional row. - Josh / eggyknap
Joshua Tolley <eggyknap@gmail.com> writes: > On Thu, May 28, 2009 at 11:12:42PM -0400, Robert Haas wrote: >> On Thu, May 28, 2009 at 11:00 PM, Euler Taveira de Oliveira >>> Don't you think is too strange having, for example, 6.67 rows? >> >> No stranger than having it say 7 when it's really not. Actually mine >> mostly come out 1 when the real value is somewhere between 0.5 and >> 1.49. :-( > +1. It would help users realize more quickly that some of the values in the > EXPLAIN output are, for instance, *average* number of rows *per iteration* of a > nested loop, say, rather than total rows found in all loops. I think it would only be sensible to show fractional digits if nloops is greater than 1. Otherwise the value must in fact be an integer, and you're just going to confuse people more by suggesting that it might not be. regards, tom lane
On Fri, May 29, 2009 at 1:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Joshua Tolley <eggyknap@gmail.com> writes: >> On Thu, May 28, 2009 at 11:12:42PM -0400, Robert Haas wrote: >>> On Thu, May 28, 2009 at 11:00 PM, Euler Taveira de Oliveira >>>> Don't you think is too strange having, for example, 6.67 rows? >>> >>> No stranger than having it say 7 when it's really not. Actually mine >>> mostly come out 1 when the real value is somewhere between 0.5 and >>> 1.49. :-( > >> +1. It would help users realize more quickly that some of the values in the >> EXPLAIN output are, for instance, *average* number of rows *per iteration* of a >> nested loop, say, rather than total rows found in all loops. > > I think it would only be sensible to show fractional digits if nloops is > greater than 1. Otherwise the value must in fact be an integer, and > you're just going to confuse people more by suggesting that it might not > be. That might be over-engineering, but I'll take it. ...Robert
Euler Taveira de Oliveira wrote: > Robert Haas escreveu: >> ...EXPLAIN ANALYZE reports the number of rows as an integer... Any >> chance we could reconsider this decision? I often find myself wanting >> to know the value that is here called ntuples, but rounding >> ntuples/nloops off to the nearest integer loses too much precision. >> > Don't you think is too strange having, for example, 6.67 rows? I would confuse > users and programs that parses the EXPLAIN output. However, I wouldn't object I don't think it's that confusing. If it says "0.1 rows", I imagine most people would infer that this means "typically 0, but sometimes 1 or a few" rows. What I'd find strange about "6.67 rows" in your example is more that on the estimated rows side, it seems to imply an unrealistically precise estimate in the same way that "667 rows" would seem unrealistically precise to me. Maybe rounding to 2 significant digits would reduce confusion?
On Mon, 2009-06-01 at 20:30 -0700, Ron Mayer wrote: > What I'd find strange about "6.67 rows" in your example is more that on > the estimated rows side, it seems to imply an unrealistically precise estimate > in the same way that "667 rows" would seem unrealistically precise to me. > Maybe rounding to 2 significant digits would reduce confusion? You're right that the number of significant digits already exceeds the true accuracy of the computation. I think what Robert wants to see is the exact value used in the calc, so the estimates can be checked more thoroughly than is currently possible. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Jun 2, 2009, at 9:41 AM, Simon Riggs <simon@2ndQuadrant.com> wrote: > > On Mon, 2009-06-01 at 20:30 -0700, Ron Mayer wrote: > >> What I'd find strange about "6.67 rows" in your example is more >> that on >> the estimated rows side, it seems to imply an unrealistically >> precise estimate >> in the same way that "667 rows" would seem unrealistically precise >> to me. >> Maybe rounding to 2 significant digits would reduce confusion? > > You're right that the number of significant digits already exceeds the > true accuracy of the computation. I think what Robert wants to see is > the exact value used in the calc, so the estimates can be checked more > thoroughly than is currently possible. Bingo. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Jun 2, 2009, at 9:41 AM, Simon Riggs <simon@2ndQuadrant.com> wrote: >> You're right that the number of significant digits already exceeds the >> true accuracy of the computation. I think what Robert wants to see is >> the exact value used in the calc, so the estimates can be checked more >> thoroughly than is currently possible. > Bingo. Uh, the planner's estimate *is* an integer. What was under discussion (I thought) was showing some fractional digits in the case where EXPLAIN ANALYZE is outputting a measured row count that is an average over multiple loops, and therefore isn't necessarily an integer. In that case the measured value can be considered arbitrarily precise --- though I think in practice one or two fractional digits would be plenty. regards, tom lane
...Robert On Jun 2, 2009, at 10:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Jun 2, 2009, at 9:41 AM, Simon Riggs <simon@2ndQuadrant.com> >> wrote: >>> You're right that the number of significant digits already exceeds >>> the >>> true accuracy of the computation. I think what Robert wants to see >>> is >>> the exact value used in the calc, so the estimates can be checked >>> more >>> thoroughly than is currently possible. > >> Bingo. > > Uh, the planner's estimate *is* an integer. What was under discussion > (I thought) was showing some fractional digits in the case where > EXPLAIN > ANALYZE is outputting a measured row count that is an average over > multiple loops, and therefore isn't necessarily an integer. In that > case the measured value can be considered arbitrarily precise --- > though > I think in practice one or two fractional digits would be plenty. We're in violent agreement here. ...Robert
Robert Haas <robertmhaas@gmail.com> writes:
> On Jun 2, 2009, at 9:41 AM, Simon Riggs <simon@2ndQuadrant.com> wrote:
>> You're right that the number of significant digits already exceeds the
>> true accuracy of the computation. I think what Robert wants to see is
>> the exact value used in the calc, so the estimates can be checked more
>> thoroughly than is currently possible.
> Bingo.
Uh, the planner's estimate *is* an integer. What was under discussion
(I thought) was showing some fractional digits in the case where EXPLAIN
ANALYZE is outputting a measured row count that is an average over
multiple loops, and therefore isn't necessarily an integer. In that
case the measured value can be considered arbitrarily precise --- though
I think in practice one or two fractional digits would be plenty.
regards, tom lane
postgres=# explain analyze select * from foo; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Seq Scan on foo (cost=0.00..64414.79 rows=2326379 width=8) (actual time=0.025..277.096 rows=2344671 loops=1 Planning Time: 0.516 ms Execution Time: 356.993 ms (3 rows) postgres=# explain analyze select * from foo where b = (select c from bar where c = 1); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Seq Scan on foo (cost=8094.37..78325.11 rows=2326379 width=8) (actual time=72.352..519.159 rows=2344671 loops=1 Filter: (b = $1) InitPlan 1 (returns $1) -> Gather (cost=1000.00..8094.37 rows=1 width=4) (actual time=0.872..72.434 rows=1 loops=1 Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on bar (cost=0.00..7094.27 rows=1 width=4) (actual time=41.931..65.382 rows=0.33 loops=3) Filter: (c = 1) Rows Removed by Filter: 245457 Planning Time: 0.277 ms Execution Time: 597.795 ms (11 rows)
Attachment
On Thu, Jun 23, 2022 at 12:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:Robert Haas <robertmhaas@gmail.com> writes:
> On Jun 2, 2009, at 9:41 AM, Simon Riggs <simon@2ndQuadrant.com> wrote:
>> You're right that the number of significant digits already exceeds the
>> true accuracy of the computation. I think what Robert wants to see is
>> the exact value used in the calc, so the estimates can be checked more
>> thoroughly than is currently possible.
> Bingo.
Uh, the planner's estimate *is* an integer. What was under discussion
(I thought) was showing some fractional digits in the case where EXPLAIN
ANALYZE is outputting a measured row count that is an average over
multiple loops, and therefore isn't necessarily an integer. In that
case the measured value can be considered arbitrarily precise --- though
I think in practice one or two fractional digits would be plenty.
regards, tom laneHi,I was looking at the TODO list and found that the issue requiresa quick fix. Attached is a patch which shows output like this.
+ WRITE_FLOAT_FIELD(rows, "%.2f");
On Wed, Jun 22, 2022 at 12:11 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:On Thu, Jun 23, 2022 at 12:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:Robert Haas <robertmhaas@gmail.com> writes:
> On Jun 2, 2009, at 9:41 AM, Simon Riggs <simon@2ndQuadrant.com> wrote:
>> You're right that the number of significant digits already exceeds the
>> true accuracy of the computation. I think what Robert wants to see is
>> the exact value used in the calc, so the estimates can be checked more
>> thoroughly than is currently possible.
> Bingo.
Uh, the planner's estimate *is* an integer. What was under discussion
(I thought) was showing some fractional digits in the case where EXPLAIN
ANALYZE is outputting a measured row count that is an average over
multiple loops, and therefore isn't necessarily an integer. In that
case the measured value can be considered arbitrarily precise --- though
I think in practice one or two fractional digits would be plenty.
regards, tom laneHi,I was looking at the TODO list and found that the issue requiresa quick fix. Attached is a patch which shows output like this.Quick code review:+ "actual rows=%.0f loops=%.0f": " rows=%.2f loops=%.0f",The leading space before the else block "rows" does not belong.There should be a space after the colon.
The word "actual" that you are dropping in the else block seems like it should belong - it is a header for the entire section not just a modifier for the word "rows". This is evidenced by the timing block verbiage where rows is standalone and the word actual comes before time. In short, only the format specifier should change under the current scheme. Both sections.- WRITE_FLOAT_FIELD(rows, "%.0f");
+ WRITE_FLOAT_FIELD(rows, "%.2f");This one looks suspicious, though I haven't dug into the code to see exactly what all is being touched. That it doesn't have an nloops condition like everything else stands out.
Tooling that expects an integer is the only downside I see here, but I concur that the usability of always showing two decimal places when nloops > 1 overcomes any objection I have on those grounds.David J.
Attachment
On Thu, Jun 23, 2022 at 2:25 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote: > > On Thu, Jun 23, 2022 at 1:04 AM David G. Johnston <david.g.johnston@gmail.com> wrote: >> >> - WRITE_FLOAT_FIELD(rows, "%.0f"); >> + WRITE_FLOAT_FIELD(rows, "%.2f"); >> >> This one looks suspicious, though I haven't dug into the code to see exactly what all is being touched. That it doesn'thave an nloops condition like everything else stands out. >> > I was also thinking about that, but I don't see any harm when we ultimately truncating that decimal > at a latter stage of code in case of loop = 1. > That change is in the path node which we anyway not going to target as part of this change. We only want to change the display for actual rows in Explain Analyze. So, I can't see how the quoted change can help in any way. Few miscellaneous comments: ======================== * static FullTransactionId XactTopFullTransactionId = {InvalidTransactionId}; -static int nParallelCurrentXids = 0; +static int nParallelCurrentXids = 0; I don't see why this change is required. * Can you please add a comment explaining why we are making this change for actual rows? * Can you please write a test case unless there is some existing test that covers the change by displaying actual rows values in decimal but in that case patch should have that changed output test? If you don't think we can reliably write such a test then please let me know the reason? -- With Regards, Amit Kapila.
On Thu, Jun 23, 2022 at 2:25 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote: > > > > On Thu, Jun 23, 2022 at 1:04 AM David G. Johnston <david.g.johnston@gmail.com> wrote: >> >> On Wed, Jun 22, 2022 at 12:11 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote: >>> >>> On Thu, Jun 23, 2022 at 12:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> >>>> Robert Haas <robertmhaas@gmail.com> writes: >>>> > On Jun 2, 2009, at 9:41 AM, Simon Riggs <simon@2ndQuadrant.com> wrote: >>>> >> You're right that the number of significant digits already exceeds the >>>> >> true accuracy of the computation. I think what Robert wants to see is >>>> >> the exact value used in the calc, so the estimates can be checked more >>>> >> thoroughly than is currently possible. >>>> >>>> > Bingo. >>>> >>>> Uh, the planner's estimate *is* an integer. What was under discussion >>>> (I thought) was showing some fractional digits in the case where EXPLAIN >>>> ANALYZE is outputting a measured row count that is an average over >>>> multiple loops, and therefore isn't necessarily an integer. In that >>>> case the measured value can be considered arbitrarily precise --- though >>>> I think in practice one or two fractional digits would be plenty. >>>> >>>> regards, tom lane >>>> >>>> >>> Hi, >>> I was looking at the TODO list and found that the issue requires >>> a quick fix. Attached is a patch which shows output like this. >> >> >> Quick code review: >> >> + "actual rows=%.0f loops=%.0f": " rows=%.2f loops=%.0f", >> >> The leading space before the else block "rows" does not belong. >> >> There should be a space after the colon. >> > Thanks, David for your quick response. I have updated the patch. > >> >> The word "actual" that you are dropping in the else block seems like it should belong - it is a header for the entiresection not just a modifier for the word "rows". This is evidenced by the timing block verbiage where rows is standaloneand the word actual comes before time. In short, only the format specifier should change under the current scheme. Both sections. >> >> - WRITE_FLOAT_FIELD(rows, "%.0f"); >> + WRITE_FLOAT_FIELD(rows, "%.2f"); >> >> This one looks suspicious, though I haven't dug into the code to see exactly what all is being touched. That it doesn'thave an nloops condition like everything else stands out. >> > I was also thinking about that, but I don't see any harm when we ultimately truncating that decimal > at a latter stage of code in case of loop = 1. Thanks for the patch. 1) There are some existing regression tests that are failing, you should update the expect files accordingly for the same: --- /home/vignesh/postgres/src/test/regress/expected/select_parallel.out 2022-05-18 20:51:46.874818044 +0530 +++ /home/vignesh/postgres/src/test/regress/results/select_parallel.out 2022-07-07 15:27:34.450440922 +0530 @@ -545,17 +545,17 @@ explain (analyze, timing off, summary off, costs off) select count(*) from tenk1, tenk2 where tenk1.hundred > 1 and tenk2.thousand=0; - QUERY PLAN --------------------------------------------------------------------------- + QUERY PLAN +----------------------------------------------------------------------------- Aggregate (actual rows=1 loops=1) -> Nested Loop (actual rows=98000 loops=1) -> Seq Scan on tenk2 (actual rows=10 loops=1) Filter: (thousand = 0) Rows Removed by Filter: 9990 - -> Gather (actual rows=9800 loops=10) + -> Gather (actual rows=9800.00 loops=10) Workers Planned: 4 Workers Launched: 4 - -> Parallel Seq Scan on tenk1 (actual rows=1960 loops=50) + -> Parallel Seq Scan on tenk1 (actual rows=1960.00 loops=50) Filter: (hundred > 1) test select_parallel ... FAILED 744 ms partition_prune ... FAILED 861 ms explain ... FAILED 134 ms memoize ... FAILED 250 ms 2) This change is not required as part of this patch: --- a/src/backend/access/transam/xact.c +++ b/src/backend/access/transam/xact.c @@ -122,7 +122,7 @@ bool bsysscan = false; * lookups as fast as possible. */ static FullTransactionId XactTopFullTransactionId = {InvalidTransactionId}; -static int nParallelCurrentXids = 0; +static int nParallelCurrentXids = 0; static TransactionId *ParallelCurrentXids; Regards, Vignesh
> - -> Parallel Seq Scan on tenk1 (actual rows=1960 loops=50) > + -> Parallel Seq Scan on tenk1 (actual rows=1960.00 At the not inconsiderable risk of bike-shedding.... I'm wondering if printing something like 0.00 will be somewhat deceptive when the real value is non-zero but less than 1 row per 200 loops. I wonder if the number of decimal places should be calculated to produce a minimum of one non-zero digit for non-zero values. -- greg
On Thu, Jul 7, 2022 at 1:53 PM Greg Stark <stark@mit.edu> wrote: > > - -> Parallel Seq Scan on tenk1 (actual rows=1960 loops=50) > > + -> Parallel Seq Scan on tenk1 (actual rows=1960.00 > > At the not inconsiderable risk of bike-shedding.... > > I'm wondering if printing something like 0.00 will be somewhat > deceptive when the real value is non-zero but less than 1 row per 200 > loops. I wonder if the number of decimal places should be calculated > to produce a minimum of one non-zero digit for non-zero values. I mean, what I really want here if I'm honest is to not have the system divide the number of rows by the loop count. And it sort of sounds like maybe that's what you want, too. You want to know whether the loop count is actually zero, not whether it's close to zero when you divide it by some number that might be gigantic. Parallel query's treatment of this topic has come in for some criticism, but I don't know what else it could really do: there could be any number of loops in each backend, and it need not be the same across all backends, so all it can do is add up the loop counts just like it adds up the row counts and times. And if we just printed out those totals, the result would be understandable by everyone. But we insist on dividing it by the loop count, and then things get really obscure. Consider this example, which does not involve parallel query: Nested Loop (actual time=TIME FOR THIS AND ALL CHILDREN rows=THE REAL ROW COUNT loops=1) -> Seq Scan on something (actual time=THE TIME IT REALLY TOOK rows=THE REAL ROW COUNT loops=1) -> Index Scan using someidx on somethingelse (actual time=NOT REALLY HOW LONG IT TOOK rows=NOT REALLY HOW MANY ROWS WE GOT loops=HUGE NUMBER) If I'm looking at this plan and trying to find out what's gone wrong, I want to know how much time got spent in the nested loop, how much time got spent in the Seq Scan, and how much time got spent in the Index Scan. It's easy to figure out how much time got spent in the Seq Scan, but to find out how much time got spent in the Index Scan, I have to multiply the time by the loop count. Then, I have to add that number to the time spent in the Seq Scan and subtract that from the time from the nested loop to find the time spent on the nested loop itself. This is quite a lot of computation, especially if the plan involves a dozen or two different nested loops, and if we didn't insist on dividing the time by the loop count, it would be MUCH EASIER to figure out whether the time spent in the Index Scan is a significant percentage of the total time or not. And likewise, if you're trying to understand the row count for the nested loop, it would be a heck of a lot simpler if you could see the *raw* row count for the index scan. It's unclear to me what value there ever is in knowing that the number of rows per iteration was about 0 or about 1 or about 2. The only thing I'm ever going to do with the row count that gets printed here is multiply it by the loop count and then try to figure out how much precision I've lost because of limits on the number of decimal places. Right now that's basically all of it because nearly every case ends up with the index scan having rows=1, so even just adding 2 decimal places will help a lot. But I'm still just going to be reverse engineering what I really want to know, which is the original number, from what the system gives me, which is a needlessly-obfuscated version of that value. Grumble, grumble. It's sad that it's been 13 years and we haven't done anything about this. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Jul 7, 2022 at 1:21 PM Robert Haas <robertmhaas@gmail.com> wrote: > Nested Loop (actual time=TIME FOR THIS AND ALL CHILDREN rows=THE REAL > ROW COUNT loops=1) > -> Seq Scan on something (actual time=THE TIME IT REALLY TOOK rows=THE > REAL ROW COUNT loops=1) > -> Index Scan using someidx on somethingelse (actual time=NOT REALLY > HOW LONG IT TOOK rows=NOT REALLY HOW MANY ROWS WE GOT loops=HUGE > NUMBER) > > If I'm looking at this plan and trying to find out what's gone wrong, > I want to know how much time got spent in the nested loop, how much > time got spent in the Seq Scan, and how much time got spent in the > Index Scan. It's easy to figure out how much time got spent in the Seq > Scan, but to find out how much time got spent in the Index Scan, I > have to multiply the time by the loop count. I agree that this general state of affairs is very confusing, and seems like something that we should still improve. Just because it's a very old way of presenting the information doesn't mean that it's the best one, or even a particularly good one. Plus you could probably make some kind of concession in the direction of maintaining compatibility with the current approach if you had to. Right? -- Peter Geoghegan
On Thu, Jul 07, 2022 at 04:21:37PM -0400, Robert Haas wrote: > I mean, what I really want here if I'm honest is to not have the > system divide the number of rows by the loop count. And it sort of > sounds like maybe that's what you want, too. You want to know whether > the loop count is actually zero, not whether it's close to zero when > you divide it by some number that might be gigantic. ... > involves a dozen or two different nested loops, and if we didn't > insist on dividing the time by the loop count, it would be MUCH EASIER > to figure out whether the time spent in the Index Scan is a > significant percentage of the total time or not. I think the guiding princible for what to do should be to reduce how much is needed to explain about how to interpret what explain is showing... The docs say this: | In such cases, the loops value reports the total number of executions of the | node, and the actual time and rows values shown are averages per-execution. | This is done to make the numbers comparable with the way that the cost | estimates are shown. Multiply by the loops value to get the total time | actually spent in the node. On Thu, Jul 07, 2022 at 01:45:19PM -0700, Peter Geoghegan wrote: > Plus you could probably > make some kind of concession in the direction of maintaining > compatibility with the current approach if you had to. Right? The minimum would be to show the information in a way that makes it clear that it's "new style" output showing a total and not an average, so that a person who sees it knows how to interpret it (same for the web "explain tools") A concession would be to show the current information *plus* total/raw values. This thread is about how to display the existing values. But note that there's a CF entry for also collecting more values to show things like min/max rows per loop. https://commitfest.postgresql.org/38/2765/ Add extra statistics to explain for Nested Loop -- Justin
On Thu, Jun 23, 2022 at 2:25 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
>
> On Thu, Jun 23, 2022 at 1:04 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
>>
>> - WRITE_FLOAT_FIELD(rows, "%.0f");
>> + WRITE_FLOAT_FIELD(rows, "%.2f");
>>
>> This one looks suspicious, though I haven't dug into the code to see exactly what all is being touched. That it doesn't have an nloops condition like everything else stands out.
>>
> I was also thinking about that, but I don't see any harm when we ultimately truncating that decimal
> at a latter stage of code in case of loop = 1.
>
That change is in the path node which we anyway not going to target as
part of this change. We only want to change the display for actual
rows in Explain Analyze. So, I can't see how the quoted change can
help in any way.
Few miscellaneous comments:
========================
*
static FullTransactionId XactTopFullTransactionId = {InvalidTransactionId};
-static int nParallelCurrentXids = 0;
+static int nParallelCurrentXids = 0;
I don't see why this change is required.
* Can you please add a comment explaining why we are making this
change for actual rows?
* Can you please write a test case unless there is some existing test
that covers the change by displaying actual rows values in decimal but
in that case patch should have that changed output test? If you don't
think we can reliably write such a test then please let me know the
reason?
--
With Regards,
Amit Kapila.
Attachment
On Thu, Jun 23, 2022 at 2:25 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
>
>
>
> On Thu, Jun 23, 2022 at 1:04 AM David G. Johnston <david.g.johnston@gmail.com> wrote:
>>
>> On Wed, Jun 22, 2022 at 12:11 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
>>>
>>> On Thu, Jun 23, 2022 at 12:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>
>>>> Robert Haas <robertmhaas@gmail.com> writes:
>>>> > On Jun 2, 2009, at 9:41 AM, Simon Riggs <simon@2ndQuadrant.com> wrote:
>>>> >> You're right that the number of significant digits already exceeds the
>>>> >> true accuracy of the computation. I think what Robert wants to see is
>>>> >> the exact value used in the calc, so the estimates can be checked more
>>>> >> thoroughly than is currently possible.
>>>>
>>>> > Bingo.
>>>>
>>>> Uh, the planner's estimate *is* an integer. What was under discussion
>>>> (I thought) was showing some fractional digits in the case where EXPLAIN
>>>> ANALYZE is outputting a measured row count that is an average over
>>>> multiple loops, and therefore isn't necessarily an integer. In that
>>>> case the measured value can be considered arbitrarily precise --- though
>>>> I think in practice one or two fractional digits would be plenty.
>>>>
>>>> regards, tom lane
>>>>
>>>>
>>> Hi,
>>> I was looking at the TODO list and found that the issue requires
>>> a quick fix. Attached is a patch which shows output like this.
>>
>>
>> Quick code review:
>>
>> + "actual rows=%.0f loops=%.0f": " rows=%.2f loops=%.0f",
>>
>> The leading space before the else block "rows" does not belong.
>>
>> There should be a space after the colon.
>>
> Thanks, David for your quick response. I have updated the patch.
>
>>
>> The word "actual" that you are dropping in the else block seems like it should belong - it is a header for the entire section not just a modifier for the word "rows". This is evidenced by the timing block verbiage where rows is standalone and the word actual comes before time. In short, only the format specifier should change under the current scheme. Both sections.
>>
>> - WRITE_FLOAT_FIELD(rows, "%.0f");
>> + WRITE_FLOAT_FIELD(rows, "%.2f");
>>
>> This one looks suspicious, though I haven't dug into the code to see exactly what all is being touched. That it doesn't have an nloops condition like everything else stands out.
>>
> I was also thinking about that, but I don't see any harm when we ultimately truncating that decimal
> at a latter stage of code in case of loop = 1.
Thanks for the patch.
1) There are some existing regression tests that are failing, you
should update the expect files accordingly for the same:
--- /home/vignesh/postgres/src/test/regress/expected/select_parallel.out
2022-05-18 20:51:46.874818044 +0530
+++ /home/vignesh/postgres/src/test/regress/results/select_parallel.out
2022-07-07 15:27:34.450440922 +0530
@@ -545,17 +545,17 @@
explain (analyze, timing off, summary off, costs off)
select count(*) from tenk1, tenk2 where tenk1.hundred > 1
and tenk2.thousand=0;
- QUERY PLAN
---------------------------------------------------------------------------
+ QUERY PLAN
+-----------------------------------------------------------------------------
Aggregate (actual rows=1 loops=1)
-> Nested Loop (actual rows=98000 loops=1)
-> Seq Scan on tenk2 (actual rows=10 loops=1)
Filter: (thousand = 0)
Rows Removed by Filter: 9990
- -> Gather (actual rows=9800 loops=10)
+ -> Gather (actual rows=9800.00 loops=10)
Workers Planned: 4
Workers Launched: 4
- -> Parallel Seq Scan on tenk1 (actual rows=1960 loops=50)
+ -> Parallel Seq Scan on tenk1 (actual rows=1960.00 loops=50)
Filter: (hundred > 1)
test select_parallel ... FAILED 744 ms
partition_prune ... FAILED 861 ms
explain ... FAILED 134 ms
memoize ... FAILED 250 ms
2) This change is not required as part of this patch:
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -122,7 +122,7 @@ bool bsysscan = false;
* lookups as fast as possible.
*/
static FullTransactionId XactTopFullTransactionId = {InvalidTransactionId};
-static int nParallelCurrentXids = 0;
+static int nParallelCurrentXids = 0;
static TransactionId *ParallelCurrentXids;
Regards,
Vignesh
> - -> Parallel Seq Scan on tenk1 (actual rows=1960 loops=50)
> + -> Parallel Seq Scan on tenk1 (actual rows=1960.00
At the not inconsiderable risk of bike-shedding....
I'm wondering if printing something like 0.00 will be somewhat
deceptive when the real value is non-zero but less than 1 row per 200
loops. I wonder if the number of decimal places should be calculated
to produce a minimum of one non-zero digit for non-zero values.
--
greg
On Fri, Jul 8, 2022 at 3:50 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Thu, Jul 07, 2022 at 04:21:37PM -0400, Robert Haas wrote: > > I mean, what I really want here if I'm honest is to not have the > > system divide the number of rows by the loop count. And it sort of > > sounds like maybe that's what you want, too. You want to know whether > > the loop count is actually zero, not whether it's close to zero when > > you divide it by some number that might be gigantic. > ... > > involves a dozen or two different nested loops, and if we didn't > > insist on dividing the time by the loop count, it would be MUCH EASIER > > to figure out whether the time spent in the Index Scan is a > > significant percentage of the total time or not. > > I think the guiding princible for what to do should be to reduce how much is > needed to explain about how to interpret what explain is showing... > > The docs say this: > | In such cases, the loops value reports the total number of executions of the > | node, and the actual time and rows values shown are averages per-execution. > | This is done to make the numbers comparable with the way that the cost > | estimates are shown. Multiply by the loops value to get the total time > | actually spent in the node. > > On Thu, Jul 07, 2022 at 01:45:19PM -0700, Peter Geoghegan wrote: > > Plus you could probably > > make some kind of concession in the direction of maintaining > > compatibility with the current approach if you had to. Right? > > The minimum would be to show the information in a way that makes it clear that > it's "new style" output showing a total and not an average, so that a person > who sees it knows how to interpret it (same for the web "explain tools") > > A concession would be to show the current information *plus* total/raw values. > > This thread is about how to display the existing values. > I feel the discussion has slightly deviated which makes it unclear whether this patch is required or not? -- With Regards, Amit Kapila.
On Fri, Jul 22, 2022 at 6:47 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > I feel the discussion has slightly deviated which makes it unclear > whether this patch is required or not? My opinion is that showing some fractional digits at least when loops>1 would be better than what we have now. It might not be the best thing we could do, but it would be better than doing nothing. -- Robert Haas EDB: http://www.enterprisedb.com
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: not tested Verified patch 'explain_float_row_v3.patch' on master & REL_15_STABLE branches. The new status of this patch is: Ready for Committer
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, failed Spec compliant: not tested Documentation: not tested LGTM
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: not tested The previous review was incorrectly posted. Updating the pat.ch review
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jul 22, 2022 at 6:47 AM Amit Kapila <amit.kapila16@gmail.com> wr= ote: >> I feel the discussion has slightly deviated which makes it unclear >> whether this patch is required or not? > My opinion is that showing some fractional digits at least when > loops>1 would be better than what we have now. It might not be the > best thing we could do, but it would be better than doing nothing. Yeah, I think that is a reasonable compromise. I took a brief look through the patch, and I have some review comments: * Code like this is pretty awful: appendStringInfo(es->str, (nloops =3D=3D 1 || !HAS_DECIMAL(rows)) ? " rows=3D%.0f loops=3D%.0f)" : " rows=3D%= .2f loops=3D%.0f)", rows, nloops); Don't use variable format strings. They're hard to read and they probably defeat compile-time checks that the arguments match the format string. You could use a "*" field width instead, ie appendStringInfo(es->str, " rows=3D%.*f loops=3D%.0f)", (nloops =3D=3D 1 || !HAS_DECIMAL(rows)) ?= 2 : 0, rows, nloops); That'd also allow you to reduce the code churn you've added by splitting some appendStringInfo calls. * I'm fairly concerned about how stable this'll be in the buildfarm, in particular I fear HAS_DECIMAL() is not likely to give consistent results across platforms. I gather that an earlier version of the patch tried to check whether the fractional part would be zero to two decimal places, rather than whether it's exactly zero. Probably want to put back something like that. * Another thought is that the non-text formats tend to prize output consistency over readability, so maybe we should just always use 2 fractional digits there, rather than trying to minimize visible changes. * We need some doc adjustments, surely, to explain what the heck this means. regards, tom lane
On 22/7/2022 16:47, Amit Kapila wrote: > I feel the discussion has slightly deviated which makes it unclear > whether this patch is required or not? After quick review I want to express my thoughts. At first, We have been waiting for this feature for years. Often clients give an explain to us where we see something like: "rows=0, loops=1000000". Without verbose mode, I can't even understand whether this node produces any rows or not. So, I think this feature is useful for parameterized plans mostly. Also, printing two decimal digits or even three isn't meaningful - sometimes we have a plan where number of loops is about 1E6 or even 1E7, but number of real rows is equal 100 or 1000. To overcome this issue, I see two options: 1. Show the exact number of tuples without division by loops (fair case but invasive and bad for automation tools). 2. Show rows in scientific format like X.XXEXX. I vote for second option. -- regards, Andrey Lepikhov Postgres Professional
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Jul 22, 2022 at 6:47 AM Amit Kapila <amit.kapila16@gmail.com> wr=
ote:
>> I feel the discussion has slightly deviated which makes it unclear
>> whether this patch is required or not?
> My opinion is that showing some fractional digits at least when
> loops>1 would be better than what we have now. It might not be the
> best thing we could do, but it would be better than doing nothing.
Yeah, I think that is a reasonable compromise.
I took a brief look through the patch, and I have some review
comments:
* Code like this is pretty awful:
appendStringInfo(es->str,
(nloops =3D=3D 1 || !HAS_DECIMAL(rows)) ?
" rows=3D%.0f loops=3D%.0f)" : " rows=3D%=
.2f loops=3D%.0f)",
rows, nloops);
Don't use variable format strings. They're hard to read and they
probably defeat compile-time checks that the arguments match the
format string. You could use a "*" field width instead, ie
appendStringInfo(es->str,
" rows=3D%.*f loops=3D%.0f)",
(nloops =3D=3D 1 || !HAS_DECIMAL(rows)) ?=
2 : 0,
rows, nloops);
That'd also allow you to reduce the code churn you've added by
splitting some appendStringInfo calls.
* I'm fairly concerned about how stable this'll be in the buildfarm,
in particular I fear HAS_DECIMAL() is not likely to give consistent
results across platforms. I gather that an earlier version of the patch
tried to check whether the fractional part would be zero to two decimal
places, rather than whether it's exactly zero. Probably want to put
back something like that.
* Another thought is that the non-text formats tend to prize output
consistency over readability, so maybe we should just always use 2
fractional digits there, rather than trying to minimize visible changes.
* We need some doc adjustments, surely, to explain what the heck this
means.
regards, tom lane
On Wed, 4 Jan 2023 at 10:05, Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote: > > Thanks, I have modified everything as suggested, except one point > > > Don't use variable format strings. They're hard to read and they > > probably defeat compile-time checks that the arguments match the > > format string. You could use a "*" field width instead, ie ... > > * Another thought is that the non-text formats tend to prize output > > consistency over readability, so maybe we should just always use 2 > > fractional digits there, rather than trying to minimize visible changes. > > In that, we need to adjust a lot of test case outputs. > > * We need some doc adjustments, surely, to explain what the heck this > > means. That sounds like three points :) But they seem like pretty good arguments to me and straightforward if a little tedious to adjust. This patch was marked Returned with Feedback and then later Waiting on Author. And it hasn't had any updates since January. What is the state on this feedback? If it's already done we can set the patch to Ready for Commit and if not do you disagree with the proposed changes? It's actually the kind of code cleanup changes I'm reluctant to bump a patch for. It's not like a committer can't make these kinds of changes when committing. But there are so many patches they're likely to just focus on a different patch when there are adjustments like this pending. -- Gregory Stark As Commitfest Manager
On Wed, 4 Jan 2023 at 10:05, Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
>
> Thanks, I have modified everything as suggested, except one point
>
> > Don't use variable format strings. They're hard to read and they
> > probably defeat compile-time checks that the arguments match the
> > format string. You could use a "*" field width instead, ie
...
> > * Another thought is that the non-text formats tend to prize output
> > consistency over readability, so maybe we should just always use 2
> > fractional digits there, rather than trying to minimize visible changes.
>
> In that, we need to adjust a lot of test case outputs.
> > * We need some doc adjustments, surely, to explain what the heck this
> > means.
That sounds like three points :) But they seem like pretty good
arguments to me and straightforward if a little tedious to adjust.
This patch was marked Returned with Feedback and then later Waiting on
Author. And it hasn't had any updates since January. What is the state
on this feedback? If it's already done we can set the patch to Ready
for Commit and if not do you disagree with the proposed changes?
It's actually the kind of code cleanup changes I'm reluctant to bump a
patch for. It's not like a committer can't make these kinds of changes
when committing. But there are so many patches they're likely to just
focus on a different patch when there are adjustments like this
pending.
--
Gregory Stark
As Commitfest Manager
> On 8 Jun 2023, at 19:49, Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote: > On Mon, Mar 20, 2023 at 7:56 PM Gregory Stark (as CFM) <stark.cfm@gmail.com <mailto:stark.cfm@gmail.com>> wrote: > This patch was marked Returned with Feedback and then later Waiting on > Author. And it hasn't had any updates since January. What is the state > on this feedback? If it's already done we can set the patch to Ready > for Commit and if not do you disagree with the proposed changes? > > If there is a consensus to modify the test cases' output, I am willing to > make the necessary changes and adjust the patch accordingly. However, > if there is a preference to keep the output of certain test cases unchanged, > I can rebase and modify the patch accordingly to accommodate those preferences. As there hasn't been any other comments I suggest updating your patch to address Tom's comments to see if we can make progress here. -- Daniel Gustafsson
> On 3 Jul 2023, at 18:34, Daniel Gustafsson <daniel@yesql.se> wrote: > >> On 8 Jun 2023, at 19:49, Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote: >> On Mon, Mar 20, 2023 at 7:56 PM Gregory Stark (as CFM) <stark.cfm@gmail.com <mailto:stark.cfm@gmail.com>> wrote: > >> This patch was marked Returned with Feedback and then later Waiting on >> Author. And it hasn't had any updates since January. What is the state >> on this feedback? If it's already done we can set the patch to Ready >> for Commit and if not do you disagree with the proposed changes? >> >> If there is a consensus to modify the test cases' output, I am willing to >> make the necessary changes and adjust the patch accordingly. However, >> if there is a preference to keep the output of certain test cases unchanged, >> I can rebase and modify the patch accordingly to accommodate those preferences. > > As there hasn't been any other comments I suggest updating your patch to > address Tom's comments to see if we can make progress here. Since there hasn't been any updates here, and the thread has been stalled, I'm marking this returned with feedback. Please feel free to resubmit a version of the patch addressing comments to a future CF. -- Daniel Gustafsson
Hi Em seg., 13 de jan. de 2025 às 17:18, Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> escreveu: > I guess, it's not ideal to modify the existing example in the documentation of the v5 patch because readers wouldn't immediatelyunderstand why decimal fractions appear there. Instead, I'll add a brief note in the documentation clarifyinghow rows and loops are displayed when the average row count is below one. > > The changes the of documentation are attached v6 patch. > v6 is not applying on master, could you please rebase? git apply v6-0001-Clarify-display-of-rows-and-loops-as-decimal-fraction.patch error: patch failed: src/test/regress/expected/partition_prune.out:3041 error: src/test/regress/expected/partition_prune.out: patch does not apply -- Matheus Alcantara
On 07.02.2025 22:59, Matheus Alcantara wrote: > Hi > v6 is not applying on master, could you please rebase? > > git apply v6-0001-Clarify-display-of-rows-and-loops-as-decimal-fraction.patch > error: patch failed: src/test/regress/expected/partition_prune.out:3041 > error: src/test/regress/expected/partition_prune.out: patch does not apply > Hi Strange, I don't have any problems to apply it on master. postgres$ git branch * master postgres$ git pull Already up to date. postgres$ git apply v6-0001-Clarify-display-of-rows-and-loops-as-decimal-fraction.patch postgres$ -- Best regards, Ilia Evdokimov, Tantor Labs LLC.
Em sex., 7 de fev. de 2025 às 17:41, Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> escreveu: > Strange, I don't have any problems to apply it on master. > > postgres$ git branch > * master > postgres$ git pull > Already up to date. > postgres$ git apply > v6-0001-Clarify-display-of-rows-and-loops-as-decimal-fraction.patch > postgres$ Just for reference I'm trying to apply based on commit fb056564ec5. https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=fb056564ec5bc1c18dd670c963c893cdb6de927e -- Matheus Alcantara
Thanks for the new patch version. -- v7-0001-Clarify-display-of-rows-and-loops-as-decimal-fraction.patch > + if (nloops > 1 && planstate->instrument->ntuples < nloops) > + appendStringInfo(es->str," rows=%.2f loops=%.2f)", rows, nloops); > Sorry, but why do the 'loops' need to be changed? IIUC the issue is only with the rows field? When testing this patch I got some results similar with this: Index Scan using t_a_idx on t (cost=0.14..0.28 rows=1 width=8) (actual time=0.049..0.049 rows=0.50 loops=2.00) > When the total number of returned tuples is less than the number of > loops currently shows 'rows = 0'. This can mislead users into thinking > that no rows were returned at all, even though some might have appeared > occasionally. > I think that this can happen when the returned rows and the loops are small enough to result in a 'row' value like 0.00045? I'm not sure if we have "bigger" values (e.g 1074(ntuples) / 117(nloops) which would result in 9.17 rows) this would also be true, what do you think? If you could provide an example of this case would be great! - executing the index scans on <literal>tenk2</literal>. + executing the index scans on <literal>tenk2</literal>. If a subplan node + is executed multiple times and the average number of rows is less than one, + the rows and <literal>loops</literal> values are shown as a decimal fraction + (with two digits after the decimal point) to indicate that some rows + were actually processed rather than simply rounding down to zero. * I think that it would be good to mention what a 'row' value in decimal means. For example, if its says "0.1 rows" the user should assume that typically 0 rows will be returned but sometimes it can return 1 or more. * There are more spaces than necessary before "If a subplan node ..." * Maybe wrap 'rows' with <literal> </literal>? -- Matheus Alcantara
On 10.02.2025 18:32, Matheus Alcantara wrote: > Thanks for the new patch version. > > -- v7-0001-Clarify-display-of-rows-and-loops-as-decimal-fraction.patch >> + if (nloops > 1 && planstate->instrument->ntuples < nloops) >> + appendStringInfo(es->str," rows=%.2f loops=%.2f)", rows, nloops); >> > Sorry, but why do the 'loops' need to be changed? IIUC the issue is > only with the > rows field? When testing this patch I got some results similar with this: > Index Scan using t_a_idx on t (cost=0.14..0.28 rows=1 width=8) > (actual time=0.049..0.049 rows=0.50 loops=2.00) The only reason I added this was to make it clear to the user that loops > 1, along with the 'rows' value. If no one finds it useful and it only raises more questions, I can remove it. > >> When the total number of returned tuples is less than the number of >> loops currently shows 'rows = 0'. This can mislead users into thinking >> that no rows were returned at all, even though some might have appeared >> occasionally. >> > I think that this can happen when the returned rows and the loops are small > enough to result in a 'row' value like 0.00045? I'm not sure if we have > "bigger" values (e.g 1074(ntuples) / 117(nloops) which would result in 9.17 > rows) this would also be true, what do you think? If you could provide > an example of this case would be great! Based on what was discussed earlier in the thread, there are cases with large loops [0]. However, I believe it's better not to display average rows with excessively long digits or in scientific notation. And, of course, I agree with you regarding small values. I think we should also add a check to ensure that the total rows is actually greater than zero. When the total rows is zero, we could simply display it as an integer without decimals. It could help users average rows is very small but not zero. What do you think about this approach? > > - executing the index scans on <literal>tenk2</literal>. > + executing the index scans on <literal>tenk2</literal>. If a subplan node > + is executed multiple times and the average number of rows is less than one, > + the rows and <literal>loops</literal> values are shown as a > decimal fraction > + (with two digits after the decimal point) to indicate that some rows > + were actually processed rather than simply rounding down to zero. > > * I think that it would be good to mention what a 'row' value in > decimal means. For > example, if its says "0.1 rows" the user should assume that typically 0 rows > will be returned but sometimes it can return 1 or more. > > * There are more spaces than necessary before "If a subplan node ..." > > * Maybe wrap 'rows' with <literal> </literal>? > I agree with the last two points. As for the first one—maybe we could simply state that the average rows value can be decimal, especially for very small values? [0]: https://www.postgresql.org/message-id/a9393107-6bb9-c835-50b7-c0f453a514b8%40postgrespro.ru -- Best regards, Ilia Evdokimov, Tantor Labs LLC.
Em seg., 10 de fev. de 2025 às 13:38, Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> escreveu: > > -- v7-0001-Clarify-display-of-rows-and-loops-as-decimal-fraction.patch > >> + if (nloops > 1 && planstate->instrument->ntuples < nloops) > >> + appendStringInfo(es->str," rows=%.2f loops=%.2f)", rows, nloops); > >> > > Sorry, but why do the 'loops' need to be changed? IIUC the issue is > > only with the > > rows field? When testing this patch I got some results similar with this: > > Index Scan using t_a_idx on t (cost=0.14..0.28 rows=1 width=8) > > (actual time=0.049..0.049 rows=0.50 loops=2.00) > > > The only reason I added this was to make it clear to the user that loops > > 1, along with the 'rows' value. If no one finds it useful and it only > raises more questions, I can remove it. > Ok, I get it. IMHO the current behaviour of using %.0f for 'loops' would be better, but let's see if anyone else has opinions about it. > >> When the total number of returned tuples is less than the number of > >> loops currently shows 'rows = 0'. This can mislead users into thinking > >> that no rows were returned at all, even though some might have appeared > >> occasionally. > >> > > I think that this can happen when the returned rows and the loops are small > > enough to result in a 'row' value like 0.00045? I'm not sure if we have > > "bigger" values (e.g 1074(ntuples) / 117(nloops) which would result in 9.17 > > rows) this would also be true, what do you think? If you could provide > > an example of this case would be great! > > > Based on what was discussed earlier in the thread, there are cases with > large loops [0]. However, I believe it's better not to display average > rows with excessively long digits or in scientific notation. And, of > course, I agree with you regarding small values. I think we should also > add a check to ensure that the total rows is actually greater than zero. > When the total rows is zero, we could simply display it as an integer > without decimals. It could help users average rows is very small but not > zero. What do you think about this approach? > Yeah, I agree with you about the long digits. My question is more about why do we need the planstate->instrument->ntuples < nloops check? I tried to remove this check and I got a lot of EXPLAIN output that shows 'rows' values with .00, so I'm just trying to understand the reason. From what I've understood about this thread is that just avoiding .00 decimals of 'rows' values that could be just integers would be enough, is that right or I'm missing something here? I'm just worried if we could have a scenario where nloops > 1 && planstate->instrument->ntuples < nloops which would make the 'rows' not be formatted correctly. > > > > > - executing the index scans on <literal>tenk2</literal>. > > + executing the index scans on <literal>tenk2</literal>. If a subplan node > > + is executed multiple times and the average number of rows is less than one, > > + the rows and <literal>loops</literal> values are shown as a > > decimal fraction > > + (with two digits after the decimal point) to indicate that some rows > > + were actually processed rather than simply rounding down to zero. > > > > * I think that it would be good to mention what a 'row' value in > > decimal means. For > > example, if its says "0.1 rows" the user should assume that typically 0 rows > > will be returned but sometimes it can return 1 or more. > > > > * There are more spaces than necessary before "If a subplan node ..." > > > > * Maybe wrap 'rows' with <literal> </literal>? > > > > I agree with the last two points. As for the first one—maybe we could > simply state that the average rows value can be decimal, especially for > very small values? > I'm just not sure about the "small values"; the 'rows' in decimal will only happen with small values? What would be a "small value" in this context? My main point here is more that I think that it would be good to mention *why* the 'rows' can be decimal, not just describe that it could be decimal. -- Matheus Alcantara
When the total number of returned tuples is less than the number of loops currently shows 'rows = 0'. This can mislead users into thinking that no rows were returned at all, even though some might have appeared occasionally.I think that this can happen when the returned rows and the loops are small enough to result in a 'row' value like 0.00045? I'm not sure if we have "bigger" values (e.g 1074(ntuples) / 117(nloops) which would result in 9.17 rows) this would also be true, what do you think? If you could provide an example of this case would be great!Based on what was discussed earlier in the thread, there are cases with large loops [0]. However, I believe it's better not to display average rows with excessively long digits or in scientific notation. And, of course, I agree with you regarding small values. I think we should also add a check to ensure that the total rows is actually greater than zero. When the total rows is zero, we could simply display it as an integer without decimals. It could help users average rows is very small but not zero. What do you think about this approach?Yeah, I agree with you about the long digits. My question is more about why do we need the planstate->instrument->ntuples < nloops check? I tried to remove this check and I got a lot of EXPLAIN output that shows 'rows' values with .00, so I'm just trying to understand the reason. From what I've understood about this thread is that just avoiding .00 decimals of 'rows' values that could be just integers would be enough, is that right or I'm missing something here? I'm just worried if we could have a scenario where nloops > 1 && planstate->instrument->ntuples < nloops which would make the 'rows' not be formatted correctly.
Sorry for missing your question earlier. If you notice in the code above, the variable(average) 'rows' is defined as:
double rows = planstate->instrument->ntuples / nloops;
This represents the total rows divided by the number of loops. The condition means that variable 'rows' will always between zero and one. Therefore, the average rows under such conditions cannot be greater than or even equal to one. I wrote this condition specifically to avoid the verbose expression 'rows > 0 && rows < 1'. However, since this might not be obvious to everyone, perhaps it'd be better to write is using 'rows' directly or add a comment explaining this logic.
- executing the index scans on <literal>tenk2</literal>. + executing the index scans on <literal>tenk2</literal>. If a subplan node + is executed multiple times and the average number of rows is less than one, + the rows and <literal>loops</literal> values are shown as a decimal fraction + (with two digits after the decimal point) to indicate that some rows + were actually processed rather than simply rounding down to zero. * I think that it would be good to mention what a 'row' value in decimal means. For example, if its says "0.1 rows" the user should assume that typically 0 rows will be returned but sometimes it can return 1 or more. * There are more spaces than necessary before "If a subplan node ..." * Maybe wrap 'rows' with <literal> </literal>?I agree with the last two points. As for the first one—maybe we could simply state that the average rows value can be decimal, especially for very small values?I'm just not sure about the "small values"; the 'rows' in decimal will only happen with small values? What would be a "small value" in this context? My main point here is more that I think that it would be good to mention *why* the 'rows' can be decimal, not just describe that it could be decimal.
As for 'small values', it means that the average rows is between zero and one, to avoid rounding errors and misunderstanding. I think this would be ideal.
--
Best regards,
Ilia Evdokimov,
Tantor Labs LLC.
Em seg., 10 de fev. de 2025 às 18:14, Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> escreveu: > Sorry for missing your question earlier. If you notice in the code above, the variable(average) 'rows' is defined as: > > double rows = planstate->instrument->ntuples / nloops; > > This represents the total rows divided by the number of loops. The condition > means that variable 'rows' will always between zero and one. Therefore, the > average rows under such conditions cannot be greater than or even equal to > one. I wrote this condition specifically to avoid the verbose expression > 'rows > 0 && rows < 1'. However, since this might not be obvious to everyone, > perhaps it'd be better to write is using 'rows' directly or add a comment > explaining this logic. > Thanks for the details! It makes sense to me now. I think that adding a comment could be a good idea > I agree with the last two points. As for the first one—maybe we could > simply state that the average rows value can be decimal, especially for > very small values? > > I'm just not sure about the "small values"; the 'rows' in decimal will only > happen with small values? What would be a "small value" in this context? My main > point here is more that I think that it would be good to mention *why* the > 'rows' can be decimal, not just describe that it could be decimal. > > > As for 'small values', it means that the average rows is between zero and > one, to avoid rounding errors and misunderstanding. I think this would be > ideal. > Get it, sounds reasonable to me. -- Matheus Alcantara
On 8/2/2025 04:28, Ilia Evdokimov wrote: > On 08.02.2025 00:01, Matheus Alcantara wrote: >> Just for reference I'm trying to apply based on commit fb056564ec5. > You are right, because two commits were appeared after creating v6-patch > on partition_prune.out and patch v6 must not have applied on master. > Then I created v7 patch rebased on fb056564ec5 . Thank for your remark! I support the idea in general, but I believe it should be expanded to cover all cases of parameterised plan nodes. Each rescan iteration may produce a different number of tuples, and rounding can obscure important data. For example, consider five loops of a scan node: the first loop returns nine tuples, and each other - zero tuples. When we calculate the average, 9 divided by 5 equals 1.8. This results in an explanation that indicates "rows = 1," masking almost 40% of the data. Now, if we apply the same two loops but have a total of 900,000 tuples, then 400,000 masked tuples represent a significant portion of the data. Moreover, switching to a floating-point type for row explanations in each parameterised node would provide a more comprehensive view and add valuable information about the parameterisation of the node, which may not be immediately apparent. -- regards, Andrei Lepikhov
On Tue, Feb 11, 2025 at 12:14 PM Andrei Lepikhov <lepihov@gmail.com> wrote: > I support the idea in general, but I believe it should be expanded to > cover all cases of parameterised plan nodes. Each rescan iteration may > produce a different number of tuples, and rounding can obscure important > data. > > For example, consider five loops of a scan node: the first loop returns > nine tuples, and each other - zero tuples. When we calculate the > average, 9 divided by 5 equals 1.8. This results in an explanation that > indicates "rows = 1," masking almost 40% of the data. > > Now, if we apply the same two loops but have a total of 900,000 tuples, > then 400,000 masked tuples represent a significant portion of the data. > > Moreover, switching to a floating-point type for row explanations in > each parameterised node would provide a more comprehensive view and add > valuable information about the parameterisation of the node, which may > not be immediately apparent. I agree strongly with all of this. I believe we should just implement what was agreed here: https://www.postgresql.org/message-id/21013.1243618236%40sss.pgh.pa.us Let's just display 2 fractional digits when nloops>1, else 0, and call it good. -- Robert Haas EDB: http://www.enterprisedb.com
Hi! Thank you for your valuable work on this! On 11.02.2025 22:18, Ilia Evdokimov wrote: > > On 11.02.2025 20:41, Robert Haas wrote: >> On Tue, Feb 11, 2025 at 12:14 PM Andrei Lepikhov <lepihov@gmail.com> >> wrote: >>> I support the idea in general, but I believe it should be expanded to >>> cover all cases of parameterised plan nodes. Each rescan iteration may >>> produce a different number of tuples, and rounding can obscure >>> important >>> data. >>> >>> For example, consider five loops of a scan node: the first loop returns >>> nine tuples, and each other - zero tuples. When we calculate the >>> average, 9 divided by 5 equals 1.8. This results in an explanation that >>> indicates "rows = 1," masking almost 40% of the data. >>> >>> Now, if we apply the same two loops but have a total of 900,000 tuples, >>> then 400,000 masked tuples represent a significant portion of the data. >>> >>> Moreover, switching to a floating-point type for row explanations in >>> each parameterised node would provide a more comprehensive view and add >>> valuable information about the parameterisation of the node, which may >>> not be immediately apparent. >> I agree strongly with all of this. I believe we should just implement >> what was agreed here: >> >> https://www.postgresql.org/message-id/21013.1243618236%40sss.pgh.pa.us >> >> Let's just display 2 fractional digits when nloops>1, else 0, and >> call it good. >> > > Thank you for your review! > > With such example, it's hard to disagree with it. This would really > add valuable information. Taking all opinions into account, I have > updated the patch v8. I have also included a check for the case where > there are only zeros after the decimal point. We do not want to > clutter the rows with unnecessary zeros. > I looked at the patch and agree with them. I would suggest adding a description of how this can help in analyzing the query plans - I think there is a lack of a description of the reason why this is done in the commit message. I would also add the same to the documentation with an example. -- Regards, Alena Rybakina Postgres Professional
On 12/2/2025 00:41, Robert Haas wrote: > On Tue, Feb 11, 2025 at 12:14 PM Andrei Lepikhov <lepihov@gmail.com> wrote: >> Moreover, switching to a floating-point type for row explanations in >> each parameterised node would provide a more comprehensive view and add >> valuable information about the parameterisation of the node, which may >> not be immediately apparent. > > I agree strongly with all of this. I believe we should just implement > what was agreed here: > > https://www.postgresql.org/message-id/21013.1243618236%40sss.pgh.pa.us > > Let's just display 2 fractional digits when nloops>1, else 0, and call it good. Why are there only two fractional digits? I reviewed the user reports where we identified issues without sufficient data, based on explains only, and typical examples included loop numbers ranging from 1E5 to 1E7 and tuples from 1E2 to 1E5. Therefore, it may make sense to display fractional digits up to two meaningful (non-zero) digits. -- regards, Andrei Lepikhov
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Feb 11, 2025 at 12:14 PM Andrei Lepikhov <lepihov@gmail.com> wrote: >> I support the idea in general, but I believe it should be expanded to >> cover all cases of parameterised plan nodes. Each rescan iteration may >> produce a different number of tuples, and rounding can obscure important >> data. > I agree strongly with all of this. I believe we should just implement > what was agreed here: > https://www.postgresql.org/message-id/21013.1243618236%40sss.pgh.pa.us > Let's just display 2 fractional digits when nloops>1, else 0, and call it good. Note that that formulation has nothing especially to do with parameterized plan nodes. Any nestloop inner side would end up getting shown with fractional rowcounts. Maybe that's fine. I suggest that when thinking about what to change here, you start by considering how you'd adjust the docs at https://www.postgresql.org/docs/devel/using-explain.html to explain the new behavior. If you can't explain it clearly for users, then maybe it's not such a great idea. regards, tom lane
On 12/2/2025 03:46, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Feb 11, 2025 at 12:14 PM Andrei Lepikhov <lepihov@gmail.com> wrote: >>> I support the idea in general, but I believe it should be expanded to >>> cover all cases of parameterised plan nodes. Each rescan iteration may >>> produce a different number of tuples, and rounding can obscure important >>> data. > >> I agree strongly with all of this. I believe we should just implement >> what was agreed here: >> https://www.postgresql.org/message-id/21013.1243618236%40sss.pgh.pa.us >> Let's just display 2 fractional digits when nloops>1, else 0, and call it good. > > Note that that formulation has nothing especially to do with > parameterized plan nodes. Any nestloop inner side would end up > getting shown with fractional rowcounts. Maybe that's fine. I partly agree with this approach. Playing around a bit, I couldn't invent a case where we have different numbers of tuples without parameters. But I can imagine it is possible or may be possible in future. So, it is not necessary to tangle fractional output with a parameterised node. I'm unsure about the inner subtree of a JOIN - subplan may refer to the upper query and process a different number of tuples for every evaluation without any JOIN operator. May we agree on a more general formula to print at least two meaningful digits if we have a fractional part? Examples: - actual rows = 2, nloops = 2 -> rows = 1 - actual rows = 9, nloops = 5 -> rows = 1.8 - actual rows = 101, nloops = 100 -> rows = 1.0 - actual rows = 99, nloops = 1000000 -> rows = 0.000099 It may guarantee that an EXPLAIN exposes most of the data passed the node, enough to tangle it with actual timing and tuples at the upper levels of the query. > > I suggest that when thinking about what to change here, > you start by considering how you'd adjust the docs at > https://www.postgresql.org/docs/devel/using-explain.html > to explain the new behavior. If you can't explain it > clearly for users, then maybe it's not such a great idea. Agree -- regards, Andrei Lepikhov
On 12.02.2025 11:54, Andrei Lepikhov wrote: > May we agree on a more general formula to print at least two > meaningful digits if we have a fractional part? > > Examples: > - actual rows = 2, nloops = 2 -> rows = 1 > - actual rows = 9, nloops = 5 -> rows = 1.8 > - actual rows = 101, nloops = 100 -> rows = 1.0 > - actual rows = 99, nloops = 1000000 -> rows = 0.000099 > > It may guarantee that an EXPLAIN exposes most of the data passed the > node, enough to tangle it with actual timing and tuples at the upper > levels of the query. I think the idea of keeping two significant digits after the decimal point is quite reasonable. The thing is, rows=0.000001 or something similar can only occur when loops is quite large. If we show the order of magnitude in rows, it will be easier for the user to estimate the order of total rows. For example, if we see this: rows=0.000056 loops=4718040 the user can quickler approximate the order of total rows for analyzing the upper levels of the query. However, keep in mind that I am against using the E notation, as many users have mentioned that they are not mathematicians and are not familiar with the concept of "E". > >> >> I suggest that when thinking about what to change here, >> you start by considering how you'd adjust the docs at >> https://www.postgresql.org/docs/devel/using-explain.html >> to explain the new behavior. If you can't explain it >> clearly for users, then maybe it's not such a great idea. > Agree > So do I. Firstly, I'll think how to explain it. -- Best regards, Ilia Evdokimov, Tantor Labs LLC.
On Tue, Feb 11, 2025 at 2:18 PM Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> wrote: > With such example, it's hard to disagree with it. This would really add > valuable information. Taking all opinions into account, I have updated > the patch v8. I have also included a check for the case where there are > only zeros after the decimal point. We do not want to clutter the rows > with unnecessary zeros. I disagree. We don't do this for any other fractional value we print in any other part of the system, and I do not think this case should be some kind of weird exception. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Feb 12, 2025 at 5:10 AM Ilia Evdokimov <ilya.evdokimov@tantorlabs.com> wrote: > I think the idea of keeping two significant digits after the decimal > point is quite reasonable. The thing is, rows=0.000001 or something > similar can only occur when loops is quite large. If we show the order > of magnitude in rows, it will be easier for the user to estimate the > order of total rows. For example, if we see this: > > rows=0.000056 loops=4718040 > > the user can quickler approximate the order of total rows for analyzing > the upper levels of the query. I agree that showing 2 digits after the decimal point in all cases is not ideal, but I suggest that we take a practical approach. Figuring out dynamically what number of decimal digits to display in each case sounds complicated and we may spend a bunch of time arguing about the details of that and get nothing committed. If we just show 2 digits after the decimal point, it will not be perfect, but it will be 10^2 times better than what we have now. If I'm honest, what I actually think we should do is stop dividing values by nloops before printing them out. Every time I'm looking at a quantity that has been divided by nloops, the very first thing I do is try to figure out what the original value was. The whole reason I want to display at least a couple of decimal digits here is so that I can do that more accurately, but of course the dream would be not having to reverse engineer it like that at all. However, I expect fierce opposition to that idea, and no matter how misguided I may think that opposition might be, a patch in the tree is worth two in the CommitFest. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > I agree that showing 2 digits after the decimal point in all cases is > not ideal, but I suggest that we take a practical approach. Figuring > out dynamically what number of decimal digits to display in each case > sounds complicated and we may spend a bunch of time arguing about the > details of that and get nothing committed. If we just show 2 digits > after the decimal point, it will not be perfect, but it will be 10^2 > times better than what we have now. I was idly speculating yesterday about letting the Ryu code print the division result, so that we get a variable number of digits. Realistically, that'd probably result in many cases in more digits than anybody wants, so it's not a serious proposal. I'm cool with the fixed-two-digits approach to start with. regards, tom lane
On 13/2/2025 01:40, Tom Lane wrote: > I was idly speculating yesterday about letting the Ryu code print > the division result, so that we get a variable number of digits. > Realistically, that'd probably result in many cases in more digits > than anybody wants, so it's not a serious proposal. I'm cool with > the fixed-two-digits approach to start with. Okay, since no one else voted for the meaningful-numbers approach, I would say that fixed size is better than nothing. It may cover some of my practical cases, but unfortunately, not the most problematic ones. -- regards, Andrei Lepikhov
On Wed, Feb 12, 2025 at 2:55 PM Andrei Lepikhov <lepihov@gmail.com> wrote: > On 13/2/2025 01:40, Tom Lane wrote: > > I was idly speculating yesterday about letting the Ryu code print > > the division result, so that we get a variable number of digits. > > Realistically, that'd probably result in many cases in more digits > > than anybody wants, so it's not a serious proposal. I'm cool with > > the fixed-two-digits approach to start with. > Okay, since no one else voted for the meaningful-numbers approach, I > would say that fixed size is better than nothing. It may cover some of > my practical cases, but unfortunately, not the most problematic ones. I don't love it either, but I do think it is significantly better than nothing. -- Robert Haas EDB: http://www.enterprisedb.com