Re: [PATCH] Add extra statistics to explain for Nested Loop - Mailing list pgsql-hackers
From | Julien Rouhaud |
---|---|
Subject | Re: [PATCH] Add extra statistics to explain for Nested Loop |
Date | |
Msg-id | CAOBaU_Yj6gdcc7q_CYWWMByMSUt29pO2Vky6xk9eP2g1rw_Eww@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Add extra statistics to explain for Nested Loop (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>) |
Responses |
Re: [PATCH] Add extra statistics to explain for Nested Loop
|
List | pgsql-hackers |
On Sat, Oct 17, 2020 at 6:11 AM Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote: > > On 16.10.2020 12:07, Julien Rouhaud wrote: > > Le ven. 16 oct. 2020 à 16:12, Pavel Stehule <pavel.stehule@gmail.com> a écrit : >> >> >> >> pá 16. 10. 2020 v 9:43 odesílatel <e.sokolova@postgrespro.ru> napsal: >>> >>> Hi, hackers. >>> For some distributions of data in tables, different loops in nested loop >>> joins can take different time and process different amounts of entries. >>> It makes average statistics returned by explain analyze not very useful >>> for DBA. >>> To fix it, here is the patch that add printing of min and max statistics >>> for time and rows across all loops in Nested Loop to EXPLAIN ANALYSE. >>> Please don't hesitate to share any thoughts on this topic! >> >> >> +1 >> >> This is great feature - sometimes it can be pretty messy current limited format > > > +1, this can be very handy! > > Cool. > I have added your patch to the commitfest, so it won't get lost. > https://commitfest.postgresql.org/30/2765/ > > I will review the code next week. Unfortunately, I cannot give any feedback about usability of this feature. > > User visible change is: > > - -> Nested Loop (actual rows=N loops=N) > + -> Nested Loop (actual min_rows=0 rows=0 max_rows=0 loops=2) Thanks for working on this feature! Here are some comments on the patch. First, cosmetic issues. There are a lot of whitespace issues, the new code is space indented while it should be tab indented. Also there are 3 patches included with some fixups, could you instead push a single patch? It also misses some modification in the regression tests. For instance: diff --git a/src/test/regress/expected/partition_prune.out b/src/test/regress/expected/partition_prune.out index 50d2a7e4b9..db0b167ef4 100644 --- a/src/test/regress/expected/partition_prune.out +++ b/src/test/regress/expected/partition_prune.out @@ -2065,7 +2065,7 @@ select explain_parallel_append('select avg(ab.a) from ab inner join lprt_a a on Workers Planned: 1 Workers Launched: N -> Partial Aggregate (actual rows=N loops=N) - -> Nested Loop (actual rows=N loops=N) + -> Nested Loop (actual min_rows=0 rows=0 max_rows=0 loops=2) -> Parallel Seq Scan on lprt_a a (actual rows=N loops=N) You should update the explain_parallel_append() plpgsql function created in that test file to make sure that both "rows" and the two new counters are changed to "N". There might be other similar changes needed. Now, for the changes themselves. For the min/max time, you're aggregating "totaltime - instr->firsttuple". Why removing the startup time from the loop execution time? I think this should be kept. Also, in explain.c you display the counters in the "Nested loop" node, but this should be done in the inner plan node instead, as this is the one being looped on. So the condition should probably be "nloops > 1" rather than testing if it's a NestLoop. I'm switching the patch to WoA.
pgsql-hackers by date: