Re: [PATCH] Add extra statistics to explain for Nested Loop - Mailing list pgsql-hackers
From | Julien Rouhaud |
---|---|
Subject | Re: [PATCH] Add extra statistics to explain for Nested Loop |
Date | |
Msg-id | 20220730125433.b6eebuwa2l5vpfam@jrouhaud Whole thread Raw |
In response to | Re: [PATCH] Add extra statistics to explain for Nested Loop (Ekaterina Sokolova <e.sokolova@postgrespro.ru>) |
Responses |
Re: [PATCH] Add extra statistics to explain for Nested Loop
|
List | pgsql-hackers |
Hi, On Fri, Jun 24, 2022 at 08:16:06PM +0300, Ekaterina Sokolova wrote: > > We started discussion about overheads and how to calculate it correctly. > > Julien Rouhaud wrote: > > Can you give a bit more details on your bench scenario? > > [...] > > Ideally you would need a custom scenario with a single read-only query > > involving a nested loop or something like that to check how much > > overhead you > > really get when you cumulate those values. > I created 2 custom scenarios. First one contains VERBOSE flag so this > scenario uses extra statistics. Second one doesn't use new feature and > doesn't disable its use (therefore still collect data). > I attach scripts for pgbench to this letter. I don't think that this scenario is really representative for the problem I was mentioning as you're only testing the overhead using the EXPLAIN (ANALYZE) command, which doesn't say much about normal query execution. I did a simple benchmark using a scale 50 pgbench on a pg_stat_statements enabled instance, and the following scenario: SET enable_mergejoin = off; SELECT count(*) FROM pgbench_accounts JOIN pgbench_tellers on aid = tid; (which forces a nested loop) and compared the result from this patch and fixing pg_stat_statements to not request INSTRUMENT extra, something like: diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 049da9fe6d..9a2177e438 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -985,7 +985,7 @@ pgss_ExecutorStart(QueryDesc *queryDesc, int eflags) MemoryContext oldcxt; oldcxt = MemoryContextSwitchTo(queryDesc->estate->es_query_cxt); - queryDesc->totaltime = InstrAlloc(1, INSTRUMENT_ALL, false); + queryDesc->totaltime = InstrAlloc(1, (INSTRUMENT_ALL & ~INSTRUMENT_EXTRA), false); MemoryContextSwitchTo(oldcxt); } } It turns out that having pg_stat_statements with INSTRUMENT_EXTRA indirectly requested by INSTRUMENT_ALL adds a ~27% overhead. I'm not sure that I actually believe these results, but they're really consistent, so maybe that's real. Anyway, even if the overheadwas only 1.5% like in your own benchmark, that still wouldn't be acceptable. Such a feature is in my opinion very welcome, but it shouldn't add *any* overhead outside of EXPLAIN (ANALYZE, VERBOSE). Note that this was done using a "production build" (so with -02, without assert and such). Doing the same on a debug build (and a scale 20 pgbench), the overhead is about 1.75%, which is closer to your result. What was the configure option you used for your benchmark? Also, I don't think it's not acceptable to ask every single extension that currently relies on INSTRUMENT_ALL to be patched and drop some random INSTRUMENT_XXX flags to avoid this overhead. So as I mentioned previously, I think we should keep INSTRUMENT_ALL to mean something like "all instrumentation that gives metrics at the statement level", and have INSTRUMENT_EXTRA be outside of INSTRUMENT_ALL. Maybe this new category should have a global flag to request all of them, and maybe there should be some additional alias to grab all categories. While at it, INSTRUMENT_EXTRA doesn't really seem like a nice name either since there's no guarantee that the next time someone adds a new instrument option for per-node information, she will want to combine it with this one. Maybe INSTRUMENT_MINMAX_LOOPS or something like that?
pgsql-hackers by date: