Re: [PATCH] Add extra statistics to explain for Nested Loop - Mailing list pgsql-hackers
From | Julien Rouhaud |
---|---|
Subject | Re: [PATCH] Add extra statistics to explain for Nested Loop |
Date | |
Msg-id | CAOBaU_aN4oSGXJTrgN4k44g0Z1_2cASEnaGd2M3pdRdbiQdxuQ@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Add extra statistics to explain for Nested Loop (Yugo NAGATA <nagata@sraoss.co.jp>) |
Responses |
Re: [PATCH] Add extra statistics to explain for Nested Loop
Re: [PATCH] Add extra statistics to explain for Nested Loop |
List | pgsql-hackers |
On Thu, Jan 28, 2021 at 8:38 PM Yugo NAGATA <nagata@sraoss.co.jp> wrote: > > postgres=# explain (analyze, verbose) select * from a,b where a.i=b.j; > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Nested Loop (cost=0.00..2752.00 rows=991 width=8) (actual time=0.021..17.651 rows=991 loops=1) > Output: a.i, b.j > Join Filter: (a.i = b.j) > Rows Removed by Join Filter: 99009 > -> Seq Scan on public.b (cost=0.00..2.00 rows=100 width=4) (actual time=0.009..0.023 rows=100 loops=1) > Output: b.j > -> Seq Scan on public.a (cost=0.00..15.00 rows=1000 width=4) (actual time=0.005..0.091 min_time=0.065 max_time=0.163min_rows=1000 rows=1000 max_rows=1000 loops=100) > Output: a.i > Planning Time: 0.066 ms > Execution Time: 17.719 ms > (10 rows) > > I don't like this format where the extra statistics appear in the same > line of existing information because the output format differs depended > on whether the plan node's loops > 1 or not. This makes the length of a > line too long. Also, other information reported by VERBOSE doesn't change > the exiting row format and just add extra rows for new information. > > Instead, it seems good for me to add extra rows for the new statistics > without changint the existing row format as other VERBOSE information, > like below. > > -> Seq Scan on public.a (cost=0.00..15.00 rows=1000 width=4) (actual time=0.005..0.091 rows=1000 loops=100) > Output: a.i > Min Time: 0.065 ms > Max Time: 0.163 ms > Min Rows: 1000 > Max Rows: 1000 > > or, like Buffers, > > -> Seq Scan on public.a (cost=0.00..15.00 rows=1000 width=4) (actual time=0.005..0.091 rows=1000 loops=100) > Output: a.i > Loops: min_time=0.065 max_time=0.163 min_rows=1000 max_rows=1000 > > and so on. What do you think about it? It's true that the current output is a bit long, which isn't really convenient to read. Using one of those alternative format would also have the advantage of not breaking compatibility with tools that process those entries. I personally prefer the 2nd option with the extra "Loops:" line . For non text format, should we keep the current format? > 2) > In parallel scan, the extra statistics are not reported correctly. > > postgres=# explain (analyze, verbose) select * from a,b where a.i=b.j; > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Gather (cost=1000.00..2463.52 rows=991 width=8) (actual time=0.823..25.651 rows=991 loops=1) > Output: a.i, b.j > Workers Planned: 2 > Workers Launched: 2 > -> Nested Loop (cost=0.00..1364.42 rows=413 width=8) (actual time=9.426..16.723 min_time=0.000 max_time=22.017 min_rows=0rows=330 max_rows=991 loops=3) > Output: a.i, b.j > Join Filter: (a.i = b.j) > Rows Removed by Join Filter: 33003 > Worker 0: actual time=14.689..14.692 rows=0 loops=1 > Worker 1: actual time=13.458..13.460 rows=0 loops=1 > -> Parallel Seq Scan on public.a (cost=0.00..9.17 rows=417 width=4) (actual time=0.049..0.152 min_time=0.000max_time=0.202 min_rows=0 rows=333 max_rows=452 loops=3) > Output: a.i > Worker 0: actual time=0.040..0.130 rows=322 loops=1 > Worker 1: actual time=0.039..0.125 rows=226 loops=1 > -> Seq Scan on public.b (cost=0.00..2.00 rows=100 width=4) (actual time=0.006..0.026 min_time=0.012 max_time=0.066min_rows=100 rows=100 max_rows=100 loops=1000) > Output: b.j > Worker 0: actual time=0.006..0.024 min_time=0.000 max_time=0.000 min_rows=0 rows=100 max_rows=0 loops=322 > Worker 1: actual time=0.008..0.030 min_time=0.000 max_time=0.000 min_rows=0 rows=100 max_rows=0 loops=226 > Planning Time: 0.186 ms > Execution Time: 25.838 ms > (20 rows) > > This reports max/min rows or time of inner scan as 0 in parallel workers, > and as a result only the leader process's ones are accounted. To fix this, > we would change InstrAggNode as below. > > @@ -167,6 +196,10 @@ InstrAggNode(Instrumentation *dst, Instrumentation *add) > dst->nloops += add->nloops; > dst->nfiltered1 += add->nfiltered1; > dst->nfiltered2 += add->nfiltered2; > + dst->min_t = Min(dst->min_t, add->min_t); > + dst->max_t = Max(dst->max_t, add->max_t); > + dst->min_tuples = Min(dst->min_tuples, add->min_tuples); > + dst->max_tuples = Max(dst->max_tuples, add->max_tuples); Agreed. > 3) > There are garbage lines and I could not apply this patch. > > diff --git a/src/test/regress/expected/timetz.out b/src/test/regress/expected/timetz.out > index 038bb5fa094..5294179aa45 100644 I also switch the patch to "waiting on author" on the commit fest app.
pgsql-hackers by date: