Re: explain plans with information about (modified) gucs - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: explain plans with information about (modified) gucs
Date
Msg-id CAFj8pRBL6RYhzCnTpmkD=ucKGtzHdRWq6CjkaxVaKx1b47tV5w@mail.gmail.com
Whole thread Raw
In response to explain plans with information about (modified) gucs  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers


pá 14. 12. 2018 v 12:41 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com> napsal:
Hi,

every now and then I have to investigate an execution plan that is
strange in some way and I can't reproduce the same behavior. Usually
it's simply due to data distribution changing since the problem was
observed (say, after a nightly batch load/update).

In many cases it however may be due to some local GUC tweaks, usually
addressing some query specific issues (say, disabling nested loops or
lowering join_collapse_limit). I've repeatedly ran into cases where the
GUC was not properly reset to the "regular" value, and it's rather
difficult to identify this is what's happening. Or cases with different
per-user settings and connection pooling (SET SESSION AUTHORIZATION /
ROLE etc.).

So I propose to extend EXPLAIN output with an additional option, which
would include information about modified GUCs in the execution plan
(disabled by default, of course):

test=# explain (gucs) select * from t;

                                 QUERY PLAN
  --------------------------------------------------------------------
   Seq Scan on t  (cost=0.00..35.50 rows=2550 width=4)
   GUCs: application_name = 'x', client_encoding = 'UTF8',
         cpu_tuple_cost = '0.01'
   (2 rows)

Of course, this directly applies to auto_explain too, which gets a new
option log_gucs.

The patch is quite trivial, but there are about three open questions:

1) names of the options

I'm not particularly happy with calling the option "gucs" - it's an
acronym and many users have little idea what GUC stands for. So I think
a better name would be desirable, but I'm not sure what would that be.
Options? Parameters?

2) format of output

At this point the names/values are simply formatted into a one-line
string. That's not particularly readable, and it's not very useful for
the YAML/JSON formats I guess. So adding each modified GUC as an extra
text property would be better.

3) identifying modified (and interesting) GUCs

We certainly don't want to include all GUCs, so the question is how to
decide which GUCs are interesting. The simplest approach would be to
look for GUCs that changed in the session (source == PGC_S_SESSION), but
that does not help with SET SESSION AUTHORIZATION / ROLE cases. So we
probably want (source > PGC_S_ARGV), but probably not PGC_S_OVERRIDE
because that includes irrelevant options like wal_buffers etc.

For now I've used

  /* return only options that were modified (not as in config file) */
  if ((conf->source <= PGC_S_ARGV) || (conf->source == PGC_S_OVERRIDE))
    continue;

which generally does the right thing, although it also includes stuff
like application_name or client_encoding. But perhaps it'd be better to
whitelist the GUCs in some way, because some of the user-defined GUCs
may be sensitive and should not be included in plans.

Opinions?

has sense

Pavel



regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Making WAL receiver startup rely on GUC context forprimary_conninfo and primary_slot_name
Next
From: Tomas Vondra
Date:
Subject: Re: valgrind issues on Fedora 28