Re: [PATCH] Add extra statistics to explain for Nested Loop - Mailing list pgsql-hackers

From e.sokolova@postgrespro.ru
Subject Re: [PATCH] Add extra statistics to explain for Nested Loop
Date
Msg-id 32513912fb3f4b4042a3834b2e5afd5c@postgrespro.ru
Whole thread Raw
In response to Re: [PATCH] Add extra statistics to explain for Nested Loop  (Julien Rouhaud <rjuju123@gmail.com>)
Responses Re: [PATCH] Add extra statistics to explain for Nested Loop  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
Thank you all for your feedback and reforms.
I attach a new version of the patch with the some changes and fixes. 
Here's a list of the major changes:
1) New format of extra statistics. This is now contained in a line 
separate from the main statistics.

Julien Rouhaud писал 2021-02-01 08:28:
> On Thu, Jan 28, 2021 at 8:38 PM Yugo NAGATA <nagata@sraoss.co.jp> 
> wrote:
>> 
>> postgres=# explain (analyze, verbose) select * from a,b where a.i=b.j;
>>                                                                        
>>          QUERY PLAN
>>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>  Nested Loop  (cost=0.00..2752.00 rows=991 width=8) (actual 
>> time=0.021..17.651 rows=991 loops=1)
>>    Output: a.i, b.j
>>    Join Filter: (a.i = b.j)
>>    Rows Removed by Join Filter: 99009
>>    ->  Seq Scan on public.b  (cost=0.00..2.00 rows=100 width=4) 
>> (actual time=0.009..0.023 rows=100 loops=1)
>>          Output: b.j
>>    ->  Seq Scan on public.a  (cost=0.00..15.00 rows=1000 width=4) 
>> (actual time=0.005..0.091 min_time=0.065 max_time=0.163 min_rows=1000 
>> rows=1000 max_rows=1000 loops=100)
>>          Output: a.i
>>  Planning Time: 0.066 ms
>>  Execution Time: 17.719 ms
>> (10 rows)
>> 
>> I don't like this format where the extra statistics appear in the same
>> line of existing information because the output format differs 
>> depended
>> on whether the plan node's loops > 1 or not. This makes the length of 
>> a
>> line too long. Also, other information reported by VERBOSE doesn't 
>> change
>> the exiting row format and just add extra rows for new information.
>> 
>> Instead, it seems good for me to add extra rows for the new statistics
>> without changint the existing row format as other VERBOSE information,
>> like below.
>> 
>>    ->  Seq Scan on public.a  (cost=0.00..15.00 rows=1000 width=4) 
>> (actual time=0.005..0.091 rows=1000  loops=100)
>>          Output: a.i
>>          Loops: min_time=0.065 max_time=0.163 min_rows=1000 
>> max_rows=1000
>> 
>> and so  on. What do you think about it?
> 

2) Correction of the case of parallel scan

>> In parallel scan, the extra statistics are not reported correctly.
>> 
>> This reports max/min rows or time of inner scan as 0 in parallel 
>> workers,
>> and as a result only the leader process's ones are accounted. To fix 
>> this,
>> we would change InstrAggNode as below.
>> 
> 

3) Adding extra statistics about total number of rows (total rows). 
There were many wishes for this here.

Please don't hesitate to share any thoughts on this topic.

-- 
Ekaterina Sokolova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

pgsql-hackers by date:

Previous
From: "Joel Jacobson"
Date:
Subject: Re: [PATCH] pg_permissions
Next
From: Fujii Masao
Date:
Subject: Re: wal stats questions