Re: proposal - log_full_scan - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: proposal - log_full_scan
Date
Msg-id CAFj8pRCyJGpYTU3uLkedhuL2TF2yVOkhZvfGwdE0uk3tT_KcBQ@mail.gmail.com
Whole thread Raw
In response to Re: proposal - log_full_scan  (Daniel Gustafsson <daniel@yesql.se>)
Responses Re: proposal - log_full_scan  (Daniel Gustafsson <daniel@yesql.se>)
List pgsql-hackers
Hi

út 6. 7. 2021 v 16:07 odesílatel Daniel Gustafsson <daniel@yesql.se> napsal:
Looking at this I like the idea in principle, but I'm not convinced that
auto_explain is the right tool for this.  auto_explain is for identifying slow
queries, and what you are proposing is to identify queries with a certain
"shape" (for lack of a better term) even if they aren't slow as per the
log_min_duration setting.  If log_min_duration is deemed to crude due to query
volume then sample_rate is the tool.  If sample_rate is also discarded, then
pg_stat_statements seems a better option.

I don't think so pg_stat_statements can be used - a) it doesn't check execution plan, so this feature can have big overhead against current pg_stat_statements, that works just with AST, b) pg_stat_statements has one entry per AST - but this can be problem on execution plan level, and this is out of perspective of pg_stat_statements.

Also, why just sequential scans (apart from it being this specific usecase)?
If the idea is to track aspects of execution which are deemed slow, then
tracking for example spills etc would be just as valid.  Do you have thoughts
on that?

Yes, I thought about it more, and sometimes bitmap index scans are problematic too, index scans in nested loops can be a problem too.

For my last customer I had to detect queries with a large bitmap index scan. I can do it with a combination of pg_stat_statements and log checking, but this work is not very friendly.

My current idea is to have some extension that can be tran for generally specified executor nodes.

Sometimes I can say - I need to know all queries that does seq scan over tabx where tuples processed > N. In other cases can be interesting to know the queries that uses index x for bitmap index scan,
 

That being said, a few comments on the patch:

-       (auto_explain_log_min_duration >= 0 && \
+       ((auto_explain_log_min_duration >= 0 || auto_explain_log_seqscan != -1) && \
Is there a reason to not follow the existing code and check for >= 0?

+       DefineCustomIntVariable("auto_explain.log_seqscan",
It's only a matter of time before another node is proposed for logging, and
then we'll be stuck adding log_XXXnode GUCs.  Is there a more future-proof way
to do this?

+       "Sets the minimum tuples produced by sequantial scans which plans will be logged",
s/sequantial/sequential/

-       es->analyze = (queryDesc->instrument_options && auto_explain_log_analyze);
+       es->analyze = (queryDesc->instrument_options && (auto_explain_log_analyze || auto_explain_log_seqscan != -1));
Turning on ANALYZE when log_analyze isn't set to True is a no-no IMO.

+ * Colllect relations where log_seqscan limit was exceeded
s/Colllect/Collect/

+       if (*relnames.data != '\0')
+               appendStringInfoString(&relnames, ",");
This should use appendStringInfoChar instead.

+       (errmsg("duration: %.3f ms, over limit seqscans: %s, plan:\n%s",
The "over limit" part is superfluous since it otherwise wouldn't be logged.  If
we're prefixing something wouldn't it be more helpful to include the limit,
like: "seqscans >= %d tuples returned:".  I'm not a fan of "seqscans" but
spelling it out is also quite verbose and this is grep-able.

Documentation and tests are also missing

Unfortunately, this idea is not well prepared. My patch was a proof concept - but I think so it is not a good start point. Maybe it needs some tracing API on executor level and some tool like "perf top", but for executor. Post execution analysis is not a good direction with big overhead, and mainly it is not friendly in critical situations. I need some handy tool like perf, but for executor nodes. I don't know how to do it effectively.

Thank you for your review and for your time, but I think it is better to remove this patch from commit fest. I have no idea how to design this feature well :-/

Regards

Pavel


 

--
Daniel Gustafsson               https://vmware.com/

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Refactor "mutually exclusive options" error reporting code in parse_subscription_options
Next
From: Tom Lane
Date:
Subject: Re: logical replication worker accesses catalogs in error context callback