Home > mailing lists

Re: [PATCH] Add extra statistics to explain for Nested Loop - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: [PATCH] Add extra statistics to explain for Nested Loop
Date	March 28, 2022 22:09:12
Msg-id	CAM-w4HPw1m+6KfjyS-DOv2PjG4s92PNe5Px8axbf5FOehFs3TA@mail.gmail.com Whole thread Raw
In response to	Re: [PATCH] Add extra statistics to explain for Nested Loop (e.sokolova@postgrespro.ru)
Responses	Re: [PATCH] Add extra statistics to explain for Nested Loop (Justin Pryzby <pryzby@telsasoft.com>) Re: [PATCH] Add extra statistics to explain for Nested Loop (Julien Rouhaud <rjuju123@gmail.com>)
List	pgsql-hackers

Tree view

This patch got some very positive feedback and some significant amount
of work earlier in the release cycle. The feedback from Julien earlier
this month seemed pretty minor.

Ekaterina, is there any chance you'll be able to work on this this
week and do you think it has a chance of making this release? Julien,
do you think it's likely to be possible to polish for this release?

Otherwise I guess we should move it to the next CF but it seems a
shame given how much work has been done and how close it is.

On Mon, 7 Mar 2022 at 00:17, Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> Hi,
>
> On Thu, Feb 03, 2022 at 12:59:03AM +0300, Ekaterina Sokolova wrote:
> >
> > I apply the new version of patch.
> >
> > I wanted to measure overheads, but could't choose correct way. Thanks for
> > idea with auto_explain.
> > I loaded it and made 10 requests of pgbench (number of clients: 1, of
> > threads: 1).
> > I'm not sure I chose the right way to measure overhead, so any suggestions
> > are welcome.
> > Current results are in file overhead_v0.txt.
> >
> > Please feel free to share your suggestions and comments. Regards,
> >
>
> >    | master latency (ms) | master tps |  | new latency (ms) |   new tps
> > --------------------------------------------------------------------------
> >  1 |        2,462        | 406,190341 |  |       4,485      | 222,950527
> >  2 |        3,877        | 257,89813  |  |       4,141      | 241,493395
> >  3 |        3,789        | 263,935811 |  |       2,837      | 352,522297
> >  4 |        3,53         | 283,310196 |  |       5,510      | 181,488203
> >  5 |        3,413        | 292,997363 |  |       6,475      | 154,432999
> >  6 |        3,757        | 266,148564 |  |       4,073      | 245,507218
> >  7 |        3,752        | 266,560043 |  |       3,901      | 256,331385
> >  8 |        4,389        | 227,847524 |  |       4,658      | 214,675196
> >  9 |        4,341        | 230,372282 |  |       4,220      | 236,983672
> > 10 |        3,893        | 256,891104 |  |       7.059      | 141,667139
> > --------------------------------------------------------------------------
> > avg|        3,7203       | 275,215136 |  |       4,03       | 224,8052031
> >
> >
> > master/new latency | 0,92315 |
> > master/new tps     | 1,22424 |
> >
> > new/master latency | 1,08325 |
> > new/master tps     | 0,81683 |
>
> The overhead is quite significant (at least for OLTP-style workload).
>
> I think this should be done with a new InstrumentOption, like
> INSTRUMENT_LOOP_DETAILS or something like that, and set it where appropriate.
> Otherwise you will have to pay that overhead even if you won't use the new
> fields at all.  It could be EXPLAIN (ANALYZE, VERBOSE OFF), but also simply
> using pg_stat_statements which doesn't seem acceptable.
>
> One problem is that some extensions (like pg_stat_statements) can rely on
> INSTRUMENT_ALL but may or may not care about those extra counters.  Maybe we
> should remove that alias and instead provide two (like INSTRUMENT_ALL_VERBOSE
> and INSTRUMENT_ALL_SOMETHINGELSE, I don't have any bright name right now) so
> that authors can decide what they need instead of silently having such
> extension ruin the performance for no reason.
>
> About the implementation itself:
>
> +static void show_loop_info(Instrumentation *instrument, bool isworker,
> +                          ExplainState *es);
>
> I think this should be done as a separate refactoring commit.
>
> +   /*
> +    * this is first loop
> +    *
> +    * We only initialize the min values. We don't need to bother with the
> +    * max, because those are 0 and the non-zero values will get updated a
> +    * couple lines later.
> +    */
> +   if (instr->nloops == 0)
> +   {
> +       instr->min_t = totaltime;
> +       instr->min_tuples = instr->tuplecount;
> +   }
> +
> +   if (instr->min_t > totaltime)
> +       instr->min_t = totaltime;
> +
> +   if (instr->max_t < totaltime)
> +       instr->max_t = totaltime;
> +
>     instr->ntuples += instr->tuplecount;
> +
> +   if (instr->min_tuples > instr->tuplecount)
> +       instr->min_tuples = instr->tuplecount;
> +
> +   if (instr->max_tuples < instr->tuplecount)
> +       instr->max_tuples = instr->tuplecount;
> +
>     instr->nloops += 1;
>
> Why do you need to initialize min_t and min_tuples but not max_t and
> max_tuples while both will initially be 0 and possibly updated afterwards?
>
> I think you should either entirely remove this if (instr->nloops == 0) part, or
> handle some else block.
>
>


-- 
greg

pgsql-hackers by date:

From: Robert Haas
Date: 28 March 2022, 22:08:41
Subject: Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

From: Tom Lane
Date: 28 March 2022, 22:11:15
Subject: Re: Granting SET and ALTER SYSTE privileges for GUCs

Re: [PATCH] Add extra statistics to explain for Nested Loop - Mailing list pgsql-hackers

Previous

Next