Thread: explain plans with information about (modified) gucs
Hi, every now and then I have to investigate an execution plan that is strange in some way and I can't reproduce the same behavior. Usually it's simply due to data distribution changing since the problem was observed (say, after a nightly batch load/update). In many cases it however may be due to some local GUC tweaks, usually addressing some query specific issues (say, disabling nested loops or lowering join_collapse_limit). I've repeatedly ran into cases where the GUC was not properly reset to the "regular" value, and it's rather difficult to identify this is what's happening. Or cases with different per-user settings and connection pooling (SET SESSION AUTHORIZATION / ROLE etc.). So I propose to extend EXPLAIN output with an additional option, which would include information about modified GUCs in the execution plan (disabled by default, of course): test=# explain (gucs) select * from t; QUERY PLAN -------------------------------------------------------------------- Seq Scan on t (cost=0.00..35.50 rows=2550 width=4) GUCs: application_name = 'x', client_encoding = 'UTF8', cpu_tuple_cost = '0.01' (2 rows) Of course, this directly applies to auto_explain too, which gets a new option log_gucs. The patch is quite trivial, but there are about three open questions: 1) names of the options I'm not particularly happy with calling the option "gucs" - it's an acronym and many users have little idea what GUC stands for. So I think a better name would be desirable, but I'm not sure what would that be. Options? Parameters? 2) format of output At this point the names/values are simply formatted into a one-line string. That's not particularly readable, and it's not very useful for the YAML/JSON formats I guess. So adding each modified GUC as an extra text property would be better. 3) identifying modified (and interesting) GUCs We certainly don't want to include all GUCs, so the question is how to decide which GUCs are interesting. The simplest approach would be to look for GUCs that changed in the session (source == PGC_S_SESSION), but that does not help with SET SESSION AUTHORIZATION / ROLE cases. So we probably want (source > PGC_S_ARGV), but probably not PGC_S_OVERRIDE because that includes irrelevant options like wal_buffers etc. For now I've used /* return only options that were modified (not as in config file) */ if ((conf->source <= PGC_S_ARGV) || (conf->source == PGC_S_OVERRIDE)) continue; which generally does the right thing, although it also includes stuff like application_name or client_encoding. But perhaps it'd be better to whitelist the GUCs in some way, because some of the user-defined GUCs may be sensitive and should not be included in plans. Opinions? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pá 14. 12. 2018 v 12:41 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com> napsal:
Hi,
every now and then I have to investigate an execution plan that is
strange in some way and I can't reproduce the same behavior. Usually
it's simply due to data distribution changing since the problem was
observed (say, after a nightly batch load/update).
In many cases it however may be due to some local GUC tweaks, usually
addressing some query specific issues (say, disabling nested loops or
lowering join_collapse_limit). I've repeatedly ran into cases where the
GUC was not properly reset to the "regular" value, and it's rather
difficult to identify this is what's happening. Or cases with different
per-user settings and connection pooling (SET SESSION AUTHORIZATION /
ROLE etc.).
So I propose to extend EXPLAIN output with an additional option, which
would include information about modified GUCs in the execution plan
(disabled by default, of course):
test=# explain (gucs) select * from t;
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on t (cost=0.00..35.50 rows=2550 width=4)
GUCs: application_name = 'x', client_encoding = 'UTF8',
cpu_tuple_cost = '0.01'
(2 rows)
Of course, this directly applies to auto_explain too, which gets a new
option log_gucs.
The patch is quite trivial, but there are about three open questions:
1) names of the options
I'm not particularly happy with calling the option "gucs" - it's an
acronym and many users have little idea what GUC stands for. So I think
a better name would be desirable, but I'm not sure what would that be.
Options? Parameters?
2) format of output
At this point the names/values are simply formatted into a one-line
string. That's not particularly readable, and it's not very useful for
the YAML/JSON formats I guess. So adding each modified GUC as an extra
text property would be better.
3) identifying modified (and interesting) GUCs
We certainly don't want to include all GUCs, so the question is how to
decide which GUCs are interesting. The simplest approach would be to
look for GUCs that changed in the session (source == PGC_S_SESSION), but
that does not help with SET SESSION AUTHORIZATION / ROLE cases. So we
probably want (source > PGC_S_ARGV), but probably not PGC_S_OVERRIDE
because that includes irrelevant options like wal_buffers etc.
For now I've used
/* return only options that were modified (not as in config file) */
if ((conf->source <= PGC_S_ARGV) || (conf->source == PGC_S_OVERRIDE))
continue;
which generally does the right thing, although it also includes stuff
like application_name or client_encoding. But perhaps it'd be better to
whitelist the GUCs in some way, because some of the user-defined GUCs
may be sensitive and should not be included in plans.
Opinions?
has sense
Pavel
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
You might want to only include the GUCs that are query tuning parameters, i.e., those returned by: SELECT name, setting, category FROM pg_settings WHERE category LIKE 'Query Tuning%' ORDER BY category, name; ----- Jim Finnerty, AWS, Amazon Aurora PostgreSQL -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
On 12/14/18 2:05 PM, Jim Finnerty wrote: > You might want to only include the GUCs that are query tuning parameters, > i.e., those returned by: > > SELECT name, setting, category > FROM pg_settings > WHERE category LIKE 'Query Tuning%' > ORDER BY category, name; > Good idea! Thanks. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12/14/18 3:01 PM, Tomas Vondra wrote: > On 12/14/18 2:05 PM, Jim Finnerty wrote: >> You might want to only include the GUCs that are query tuning parameters, >> i.e., those returned by: >> >> SELECT name, setting, category >> FROM pg_settings >> WHERE category LIKE 'Query Tuning%' >> ORDER BY category, name; >> > > Good idea! Thanks. V2 filtering the options to QUERY_TUNING group (and subgroups). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > ... I propose to extend EXPLAIN output with an additional option, which > would include information about modified GUCs in the execution plan > (disabled by default, of course): I'm a bit suspicious about whether this'll have any actual value, if it's disabled by default (which I agree it needs to be, if only for compatibility reasons). The problem you're trying to solve is basically "I forgot that this might have an effect", but stuff that isn't shown by default will not help you un-forget. It certainly won't fix the form of the problem that I run into, which is people sending in EXPLAIN plans and not mentioning their weird local settings. > We certainly don't want to include all GUCs, so the question is how to > decide which GUCs are interesting. The simplest approach would be to > look for GUCs that changed in the session (source == PGC_S_SESSION), but > that does not help with SET SESSION AUTHORIZATION / ROLE cases. So we > probably want (source > PGC_S_ARGV), but probably not PGC_S_OVERRIDE > because that includes irrelevant options like wal_buffers etc. Don't you want to show anything that's not the built-in default? (I agree OVERRIDE could be excluded, but that's irrelevant for query tuning parameters.) Just because somebody injected a damfool setting of, say, random_page_cost via the postmaster command line or environment settings doesn't make it not damfool :-( regards, tom lane
On 12/14/18 4:21 PM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> ... I propose to extend EXPLAIN output with an additional option, which >> would include information about modified GUCs in the execution plan >> (disabled by default, of course): > > I'm a bit suspicious about whether this'll have any actual value, > if it's disabled by default (which I agree it needs to be, if only for > compatibility reasons). The problem you're trying to solve is basically > "I forgot that this might have an effect", but stuff that isn't shown > by default will not help you un-forget. It certainly won't fix the > form of the problem that I run into, which is people sending in EXPLAIN > plans and not mentioning their weird local settings. > Not quite. I agree we'll still have to deal with plans from users without this info, but it's easier to ask for explain with this extra option (just like we regularly ask for explain analyze instead of just plain explain). I'd expect the output to be more complete than trying to figure out which of the GUCs might have effect / been modified here. But more importantly - my personal primary use case here is explains from application connections generated using auto_explain, with some application-level GUC magic. And there I can easily tweak auto_explain config to do (auto_explain.log_gucs = true) of course. >> We certainly don't want to include all GUCs, so the question is how to >> decide which GUCs are interesting. The simplest approach would be to >> look for GUCs that changed in the session (source == PGC_S_SESSION), but >> that does not help with SET SESSION AUTHORIZATION / ROLE cases. So we >> probably want (source > PGC_S_ARGV), but probably not PGC_S_OVERRIDE >> because that includes irrelevant options like wal_buffers etc. > > Don't you want to show anything that's not the built-in default? > (I agree OVERRIDE could be excluded, but that's irrelevant for query > tuning parameters.) Just because somebody injected a damfool setting > of, say, random_page_cost via the postmaster command line or > environment settings doesn't make it not damfool :-( > Probably. My assumption here was that I can do select * from pg_settings and then combine it with whatever is included in the plan. But you're right comparing it with the built-in default may be a better option. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 12/14/18 4:32 PM, Tomas Vondra wrote: > > > On 12/14/18 4:21 PM, Tom Lane wrote: >> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >>> ... I propose to extend EXPLAIN output with an additional option, which >>> would include information about modified GUCs in the execution plan >>> (disabled by default, of course): >> >> I'm a bit suspicious about whether this'll have any actual value, >> if it's disabled by default (which I agree it needs to be, if only for >> compatibility reasons). The problem you're trying to solve is basically >> "I forgot that this might have an effect", but stuff that isn't shown >> by default will not help you un-forget. It certainly won't fix the >> form of the problem that I run into, which is people sending in EXPLAIN >> plans and not mentioning their weird local settings. >> > > Not quite. > > I agree we'll still have to deal with plans from users without this > info, but it's easier to ask for explain with this extra option (just > like we regularly ask for explain analyze instead of just plain > explain). I'd expect the output to be more complete than trying to > figure out which of the GUCs might have effect / been modified here. > > But more importantly - my personal primary use case here is explains > from application connections generated using auto_explain, with some > application-level GUC magic. And there I can easily tweak auto_explain > config to do (auto_explain.log_gucs = true) of course. > >>> We certainly don't want to include all GUCs, so the question is how to >>> decide which GUCs are interesting. The simplest approach would be to >>> look for GUCs that changed in the session (source == PGC_S_SESSION), but >>> that does not help with SET SESSION AUTHORIZATION / ROLE cases. So we >>> probably want (source > PGC_S_ARGV), but probably not PGC_S_OVERRIDE >>> because that includes irrelevant options like wal_buffers etc. >> >> Don't you want to show anything that's not the built-in default? >> (I agree OVERRIDE could be excluded, but that's irrelevant for query >> tuning parameters.) Just because somebody injected a damfool setting >> of, say, random_page_cost via the postmaster command line or >> environment settings doesn't make it not damfool :-( >> > > Probably. My assumption here was that I can do > > select * from pg_settings > > and then combine it with whatever is included in the plan. But you're > right comparing it with the built-in default may be a better option. > FWIW here is a v3 of the patch, using the built-in default, and fixing a silly thinko resulting in the code not being executed from auto_explain. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
what would you think about adding search_path to that list ? Regards PAscal -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Hi, On 12/17/18 10:56 PM, legrand legrand wrote: > what would you think about adding > search_path > to that list ? > Yeah, I've been thinking about that too. Currently it gets filtered out because it's in the CLIENT_CONN_STATEMENT group, but the code only includes QUERY_TUNING_*. But we don't want to include everything from CLIENT_CONN_STATEMENT, because that would include various kinds of timeouts, vacuuming parameters, etc. And the same issue applies to work_mem, which is in RESOURCES_MEM. And of course, there's a lot of unrelated stuff in RESOURCES_MEM. So I guess we'll need to enable QUERY_TUNING_* and then selectively a couple of individual options from other groups. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On 12/17/18 10:56 PM, legrand legrand wrote: >> what would you think about adding >> search_path >> to that list ? > Yeah, I've been thinking about that too. Currently it gets filtered out > because it's in the CLIENT_CONN_STATEMENT group, but the code only > includes QUERY_TUNING_*. > But we don't want to include everything from CLIENT_CONN_STATEMENT, > because that would include various kinds of timeouts, vacuuming > parameters, etc. > And the same issue applies to work_mem, which is in RESOURCES_MEM. And > of course, there's a lot of unrelated stuff in RESOURCES_MEM. > So I guess we'll need to enable QUERY_TUNING_* and then selectively a > couple of individual options from other groups. This is putting way too much policy into the mechanism, if you ask me. At least for the auto_explain use case, it'd make far more sense for the user to be able to specify which GUCs he wants the query space to be divided according to. While it's possible to imagine giving auto_explain a control setting that's a list of GUC names, I'm not sure how we adapt the idea for other use-cases. But the direction you're headed here will mostly ensure that nobody is happy. regards, tom lane
On 12/17/18 11:16 PM, Tom Lane wrote: > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> On 12/17/18 10:56 PM, legrand legrand wrote: >>> what would you think about adding >>> search_path >>> to that list ? > >> Yeah, I've been thinking about that too. Currently it gets filtered out >> because it's in the CLIENT_CONN_STATEMENT group, but the code only >> includes QUERY_TUNING_*. >> But we don't want to include everything from CLIENT_CONN_STATEMENT, >> because that would include various kinds of timeouts, vacuuming >> parameters, etc. >> And the same issue applies to work_mem, which is in RESOURCES_MEM. And >> of course, there's a lot of unrelated stuff in RESOURCES_MEM. >> So I guess we'll need to enable QUERY_TUNING_* and then selectively a >> couple of individual options from other groups. > > This is putting way too much policy into the mechanism, if you ask me. > Doesn't that depend on how it'd be implemented? I have not envisioned to make these decisions in explain.c, but rather to keep them in guc.c somehow. Say in a function like this: bool guc_affects_query_planning(config_generic *conf); which would be a fairly simple check outlined before (QUERY_TUNING_* plus a couple of individual GUCs). Other use cases might provide similar filters. An alternative would be to somehow track this in the GUC definitions directly (similarly to how we track the group), but that seems rather inflexible and I'm not sure how would that handle ad-hoc use cases. > At least for the auto_explain use case, it'd make far more sense for > the user to be able to specify which GUCs he wants the query space > to be divided according to. While it's possible to imagine giving > auto_explain a control setting that's a list of GUC names, I'm not > sure how we adapt the idea for other use-cases. But the direction > you're headed here will mostly ensure that nobody is happy. > I certainly don't want to base this on explicitly listing "interesting" GUCs anywhere. That would make it pretty useless for the use case I care about, i.e. using auto_explain to investigate slow plans, when I don't really know what GUC the application might have changed (certainly not in advance). I can't really say how to adopt this to other use cases, considering there are none proposed (and I can't think of any either). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2018-12-18 00:38:16 +0100, Tomas Vondra wrote: > On 12/17/18 11:16 PM, Tom Lane wrote: > > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > >> Yeah, I've been thinking about that too. Currently it gets filtered out > >> because it's in the CLIENT_CONN_STATEMENT group, but the code only > >> includes QUERY_TUNING_*. > >> But we don't want to include everything from CLIENT_CONN_STATEMENT, > >> because that would include various kinds of timeouts, vacuuming > >> parameters, etc. > >> And the same issue applies to work_mem, which is in RESOURCES_MEM. And > >> of course, there's a lot of unrelated stuff in RESOURCES_MEM. > >> So I guess we'll need to enable QUERY_TUNING_* and then selectively a > >> couple of individual options from other groups. > > > > This is putting way too much policy into the mechanism, if you ask me. > > > > Doesn't that depend on how it'd be implemented? I have not envisioned to > make these decisions in explain.c, but rather to keep them in guc.c > somehow. Say in a function like this: > > bool guc_affects_query_planning(config_generic *conf); > > which would be a fairly simple check outlined before (QUERY_TUNING_* > plus a couple of individual GUCs). Other use cases might provide similar > filters. If we were to do that, I'd suggest implementing a GUC_* flag specified in the definition of the GUC, rather than a separate function containing all the knowledge (but such a function could obviously still be used to return such a fact). > An alternative would be to somehow track this in the GUC definitions > directly (similarly to how we track the group), but that seems rather > inflexible and I'm not sure how would that handle ad-hoc use cases. Not sure what problem you see here? Greetings, Andres Freund
On 12/18/18 12:43 AM, Andres Freund wrote: > Hi, > > On 2018-12-18 00:38:16 +0100, Tomas Vondra wrote: >> On 12/17/18 11:16 PM, Tom Lane wrote: >>> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >>>> Yeah, I've been thinking about that too. Currently it gets filtered out >>>> because it's in the CLIENT_CONN_STATEMENT group, but the code only >>>> includes QUERY_TUNING_*. >>>> But we don't want to include everything from CLIENT_CONN_STATEMENT, >>>> because that would include various kinds of timeouts, vacuuming >>>> parameters, etc. >>>> And the same issue applies to work_mem, which is in RESOURCES_MEM. And >>>> of course, there's a lot of unrelated stuff in RESOURCES_MEM. >>>> So I guess we'll need to enable QUERY_TUNING_* and then selectively a >>>> couple of individual options from other groups. >>> >>> This is putting way too much policy into the mechanism, if you ask me. >>> >> >> Doesn't that depend on how it'd be implemented? I have not envisioned to >> make these decisions in explain.c, but rather to keep them in guc.c >> somehow. Say in a function like this: >> >> bool guc_affects_query_planning(config_generic *conf); >> >> which would be a fairly simple check outlined before (QUERY_TUNING_* >> plus a couple of individual GUCs). Other use cases might provide similar >> filters. > > If we were to do that, I'd suggest implementing a GUC_* flag specified > in the definition of the GUC, rather than a separate function containing > all the knowledge (but such a function could obviously still be used to > return such a fact). > Seems reasonable. > >> An alternative would be to somehow track this in the GUC definitions >> directly (similarly to how we track the group), but that seems rather >> inflexible and I'm not sure how would that handle ad-hoc use cases. > > Not sure what problem you see here? > My main concern with that was how many flags could we need, if we use this as the way to implement this and similar use cases. There are 32 bits available, and 24 of them are already used for GUC_* flags. So if we use this as an "official" way to support similar use cases, we could run out of space pretty fast. The callback would also allow ad hoc filtering, which is not very practical from extensions (e.g. you can't define a new flag, because it might conflict with something defined in another extension). But I think that's nonsense - so far we have not seen any such use case, so it's pretty pointless to design for it. If that changes and some new use case is proposed in the future, we can rethink this based on it. I'll go with a new flag, marking all GUCs related to query planning, and post a new patch soon. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attached is v4, changing how GUCs are picked for inclusion on the query plans. Instead of picking the GUCs based on group and/or explicitly, a new GUC_EXPLAIN flag is used for that. I went through GUCs defined in guc.c and marked those in QUERY_TUNING* groups accordingly, with the exception of default_statistics_target because that seems somewhat useless without showing the value used to actually analyze the table (and/or columns). I've also included a couple of other GUCs, that I find to be relevant: - parallel_leader_participation - max_parallel_workers_per_gather - max_parallel_workers - search_path - effective_io_concurrency - work_mem - temp_buffers - plan_cache_mode I think this covers the interesting GUCs pretty well, although perhaps I missed something. The one bit that needs fixing is escaping the GUC values when showing them in the plan. Looking at the other places that currently escape stuff, I see they only care about YAML/JSON/XML and leave the regular output unescaped. I was wondering if it's OK with the current format with all GUCs on a single line QUERY PLAN --------------------------------------------------- Seq Scan on t (cost=0.00..54.63 rows=13 width=4) Filter: ('x''y'::text = (a)::text) GUCs: enable_nestloop = 'off', work_mem = '32MB' (3 rows) but I suppose it is, because without the escaping a user can break whatever format we use. So I'll do the same thing, escaping just the structured formats (YAML et al). The question however is whether someone has a better formatting idea? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
út 1. 1. 2019 v 18:39 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com> napsal:
Attached is v4, changing how GUCs are picked for inclusion on the query
plans. Instead of picking the GUCs based on group and/or explicitly, a
new GUC_EXPLAIN flag is used for that.
I went through GUCs defined in guc.c and marked those in QUERY_TUNING*
groups accordingly, with the exception of default_statistics_target
because that seems somewhat useless without showing the value used to
actually analyze the table (and/or columns).
I've also included a couple of other GUCs, that I find to be relevant:
- parallel_leader_participation
- max_parallel_workers_per_gather
- max_parallel_workers
- search_path
- effective_io_concurrency
- work_mem
- temp_buffers
- plan_cache_mode
when plan_cache_mode is auto, you know maybe too less executed query. Maybe you can read somewhere if plan was custom or generic.
I think this covers the interesting GUCs pretty well, although perhaps I
missed something.
seq_page_cost, random_page_cost, from_collapse_limit, join_collapse_limit, ... enable_***
The one bit that needs fixing is escaping the GUC values when showing
them in the plan. Looking at the other places that currently escape
stuff, I see they only care about YAML/JSON/XML and leave the regular
output unescaped. I was wondering if it's OK with the current format
with all GUCs on a single line
QUERY PLAN
---------------------------------------------------
Seq Scan on t (cost=0.00..54.63 rows=13 width=4)
Filter: ('x''y'::text = (a)::text)
GUCs: enable_nestloop = 'off', work_mem = '32MB'
(3 rows)
but I suppose it is, because without the escaping a user can break
whatever format we use. So I'll do the same thing, escaping just the
structured formats (YAML et al).
The question however is whether someone has a better formatting idea?
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/1/19 6:48 PM, Pavel Stehule wrote: > > > út 1. 1. 2019 v 18:39 odesílatel Tomas Vondra > <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> napsal: > > Attached is v4, changing how GUCs are picked for inclusion on the query > plans. Instead of picking the GUCs based on group and/or explicitly, a > new GUC_EXPLAIN flag is used for that. > > I went through GUCs defined in guc.c and marked those in QUERY_TUNING* > groups accordingly, with the exception of default_statistics_target > because that seems somewhat useless without showing the value used to > actually analyze the table (and/or columns). > > I've also included a couple of other GUCs, that I find to be relevant: > > - parallel_leader_participation > - max_parallel_workers_per_gather > - max_parallel_workers > - search_path > - effective_io_concurrency > - work_mem > - temp_buffers > - plan_cache_mode > > > when plan_cache_mode is auto, you know maybe too less executed query. > Maybe you can read somewhere if plan was custom or generic. > This patch is about showing GUCs, not such additional internal info. Also, you'll see the plan actually used. > > I think this covers the interesting GUCs pretty well, although perhaps I > missed something. > > > seq_page_cost, random_page_cost, from_collapse_limit, > join_collapse_limit, ... enable_*** > All these GUCs are included, of course. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
út 1. 1. 2019 v 20:11 odesílatel Tomas Vondra <tomas.vondra@2ndquadrant.com> napsal:
On 1/1/19 6:48 PM, Pavel Stehule wrote:
>
>
> út 1. 1. 2019 v 18:39 odesílatel Tomas Vondra
> <tomas.vondra@2ndquadrant.com <mailto:tomas.vondra@2ndquadrant.com>> napsal:
>
> Attached is v4, changing how GUCs are picked for inclusion on the query
> plans. Instead of picking the GUCs based on group and/or explicitly, a
> new GUC_EXPLAIN flag is used for that.
>
> I went through GUCs defined in guc.c and marked those in QUERY_TUNING*
> groups accordingly, with the exception of default_statistics_target
> because that seems somewhat useless without showing the value used to
> actually analyze the table (and/or columns).
>
> I've also included a couple of other GUCs, that I find to be relevant:
>
> - parallel_leader_participation
> - max_parallel_workers_per_gather
> - max_parallel_workers
> - search_path
> - effective_io_concurrency
> - work_mem
> - temp_buffers
> - plan_cache_mode
>
>
> when plan_cache_mode is auto, you know maybe too less executed query.
> Maybe you can read somewhere if plan was custom or generic.
>
This patch is about showing GUCs, not such additional internal info.
Also, you'll see the plan actually used.
I understand to your goal, and I understand so collecting some data inside can be hard or impossible.
But if you collect a data important for understanding to planner behave/decision, then some important information is inside plancache too. - and now it is not visible from user space.
It is just my note - so not only GUC are interesting - nothing more.
>
> I think this covers the interesting GUCs pretty well, although perhaps I
> missed something.
>
>
> seq_page_cost, random_page_cost, from_collapse_limit,
> join_collapse_limit, ... enable_***
>
All these GUCs are included, of course.
ok - thank you for info.
Regards
Pavel
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 14/12/2018 12:41, Tomas Vondra wrote: > 1) names of the options > > I'm not particularly happy with calling the option "gucs" - it's an > acronym and many users have little idea what GUC stands for. So I think > a better name would be desirable, but I'm not sure what would that be. > Options? Parameters? "settings" (see pg_settings, SET) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/2/19 4:20 PM, Peter Eisentraut wrote: > On 14/12/2018 12:41, Tomas Vondra wrote: >> 1) names of the options >> >> I'm not particularly happy with calling the option "gucs" - it's an >> acronym and many users have little idea what GUC stands for. So I think >> a better name would be desirable, but I'm not sure what would that be. >> Options? Parameters? > > "settings" > > (see pg_settings, SET) > Yup, that seems like a better choice. Thanks. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> Hi, > > every now and then I have to investigate an execution plan that is > strange in some way and I can't reproduce the same behavior. Usually > it's simply due to data distribution changing since the problem was > observed (say, after a nightly batch load/update). > > In many cases it however may be due to some local GUC tweaks, usually > addressing some query specific issues (say, disabling nested loops or > lowering join_collapse_limit). I've repeatedly ran into cases where the > GUC was not properly reset to the "regular" value, and it's rather > difficult to identify this is what's happening. Or cases with different > per-user settings and connection pooling (SET SESSION AUTHORIZATION / > ROLE etc.). > > So I propose to extend EXPLAIN output with an additional option, which > would include information about modified GUCs in the execution plan > (disabled by default, of course): > > test=# explain (gucs) select * from t; > > QUERY PLAN > -------------------------------------------------------------------- > Seq Scan on t (cost=0.00..35.50 rows=2550 width=4) > GUCs: application_name = 'x', client_encoding = 'UTF8', > cpu_tuple_cost = '0.01' > (2 rows) > > Of course, this directly applies to auto_explain too, which gets a new > option log_gucs. > > The patch is quite trivial, but there are about three open questions: > > 1) names of the options > > I'm not particularly happy with calling the option "gucs" - it's an > acronym and many users have little idea what GUC stands for. So I think > a better name would be desirable, but I'm not sure what would that be. > Options? Parameters? > > 2) format of output > > At this point the names/values are simply formatted into a one-line > string. That's not particularly readable, and it's not very useful for > the YAML/JSON formats I guess. So adding each modified GUC as an extra > text property would be better. > > 3) identifying modified (and interesting) GUCs > > We certainly don't want to include all GUCs, so the question is how to > decide which GUCs are interesting. The simplest approach would be to > look for GUCs that changed in the session (source == PGC_S_SESSION), but > that does not help with SET SESSION AUTHORIZATION / ROLE cases. So we > probably want (source > PGC_S_ARGV), but probably not PGC_S_OVERRIDE > because that includes irrelevant options like wal_buffers etc. > > For now I've used > > /* return only options that were modified (not as in config file) */ > if ((conf->source <= PGC_S_ARGV) || (conf->source == PGC_S_OVERRIDE)) > continue; > > which generally does the right thing, although it also includes stuff > like application_name or client_encoding. But perhaps it'd be better to > whitelist the GUCs in some way, because some of the user-defined GUCs > may be sensitive and should not be included in plans. > > Opinions? > > > regards > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services This patch correlates with my proposal "add session information column to pg_stat_statements" https://www.postgresql.org/message-id/3aa097d7-7c47-187b-5913-db8366cd4491%40gmail.com They both address the problem to identify the factors that make different execution plans for the same SQL statements. You are interested in the current settings that affect the execution plan, I'm concerned about historical data in pg_stat_statements. From my experience the most often offending settings are current_schemas/search_path and current_user. Please have in mind that probably the same approach that you will use to extend explain plan functionality will be eventually implemented to extend pg_stat_statements. I think that the list of the GUCs that are reported by explain plan should be structured like JSON, something like extended_settings: { "current_schemas" : ["pg_catalog", "s1", "s2", "public"], "current_user" : "user1", "enable_nestloop" : "off", "work_mem" = "32MB" } It is less important for yours use case explain, but is important for pg_stat_statements case. The pg_stat_statements is often accessed by monitoring and reporting tools, and it will be a good idea to have the data here in the structured and easily parsed format. Thank you, Sergei Agalakov
Hello Sergei, > This patch correlates with my proposal > "add session information column to pg_stat_statements" > https://www.postgresql.org/message-id/3aa097d7-7c47-187b-5913-db8366cd4491%40gmail.com > They both address the problem to identify the factors that make > different execution plans for the same SQL statements. You are > interested in the current settings that affect the execution plan, I'm > concerned about historical data in pg_stat_statements. From my > experience the most often offending settings are > current_schemas/search_path and current_user. Please have in mind that > probably the same approach that you will use to extend explain plan > functionality will be eventually implemented to extend > pg_stat_statements. Possibly, although I don't have an ambition to export the GUCs into pg_stat_statements in this patch. There's an issue with merging different values of GUCs in different executions of a statement, and it's unclear how to solve that. > I think that the list of the GUCs that are reported > by explain plan should be structured like JSON, something like > extended_settings: { "current_schemas" : ["pg_catalog", "s1", "s2", "public"], > "current_user" : "user1", > "enable_nestloop" : "off", > "work_mem" = "32MB" > } > It is less important for yours use case explain, but is important > for pg_stat_statements case. > The pg_stat_statements is often accessed by monitoring and reporting > tools, and it will be a good idea to have > the data here in the > structured and easily parsed format. Yes, that's a good point. I think it's fine to keep the current format for TEXT output, and use a structured format when the explain format is set to json or yaml. That's what we do for data about Hash nodes, for example (see show_hash_info). So I've done that in the attached v5 of the patch, which now produces something like this: test=# explain (gucs, format json) select * from t; QUERY PLAN --------------------------------- [ + { + "Plan": { + "Node Type": "Seq Scan", + "Parallel Aware": false, + "Relation Name": "t", + "Alias": "t", + "Startup Cost": 0.00, + "Total Cost": 61.00, + "Plan Rows": 2550, + "Plan Width": 4 + }, + "GUC": [ + "cpu_tuple_cost": "0.02",+ "work_mem": "1GB" + ] + } + ] (1 row) The one slightly annoying issue is that currently all the options are formatted as text, including e.g. cpu_tuple_cost. That's because the GUC options may have show hook, so I can't access the value directly (not sure if there's an option around it). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 2019-Jan-14, Tomas Vondra wrote: > The one slightly annoying issue is that currently all the options are > formatted as text, including e.g. cpu_tuple_cost. That's because the GUC > options may have show hook, so I can't access the value directly (not > sure if there's an option around it). I think the problem is that you'd have to know how to print the value, which can be in one of several different C types. You'd have to grow some more infrastructure in the GUC tables, I think, and that doesn't seem worth the trouble. Printing as text seems enough. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Tomas Vondra-4 wrote > Hello Sergei, > >> This patch correlates with my proposal >> "add session information column to pg_stat_statements" >> https://www.postgresql.org/message-id/3aa097d7-7c47-187b-5913-db8366cd4491%40gmail.com >> They both address the problem to identify the factors that make >> different execution plans for the same SQL statements. You are >> interested in the current settings that affect the execution plan, I'm >> concerned about historical data in pg_stat_statements. From my >> experience the most often offending settings are >> current_schemas/search_path and current_user. Please have in mind that >> probably the same approach that you will use to extend explain plan >> functionality will be eventually implemented to extend >> pg_stat_statements. > > Possibly, although I don't have an ambition to export the GUCs into > pg_stat_statements in this patch. There's an issue with merging > different values of GUCs in different executions of a statement, and > it's unclear how to solve that. > > [...] > > -- > Tomas Vondra http://www.2ndQuadrant.com > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services This explain with GUCs feature can be included very easily for historical data management in pg_store_plans or pg_stat_sql_plans extensions (that both use a planid based on the normalized plan text). Regards PAscal -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
On 1/14/19 11:13 PM, Alvaro Herrera wrote: > On 2019-Jan-14, Tomas Vondra wrote: > >> The one slightly annoying issue is that currently all the options are >> formatted as text, including e.g. cpu_tuple_cost. That's because the GUC >> options may have show hook, so I can't access the value directly (not >> sure if there's an option around it). > > I think the problem is that you'd have to know how to print the value, > which can be in one of several different C types. You'd have to grow > some more infrastructure in the GUC tables, I think, and that doesn't > seem worth the trouble. Printing as text seems enough. > I don't think the number of formats is such a big issue - the range of formats is quite limited: PGC_BOOL, PGC_INT, PGC_REAL, PGC_STRING and PGC_ENUM. But the show hook simply returns string, and I'm not sure it's guaranteed it matches the raw value (afaik the assign/show hooks can do all kinds of funny stuff). regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> [v5] Hi Tomas, Peter suggested upthread to use 'settings' rather than 'gucs' for the explain option (and output?), and you seemed to agree. Are you going to include that in a future version? Speaking of the output, v5's default text doesn't match that of formatted text ('GUCs' / 'GUC').
Hi John, On 1/16/19 10:08 PM, John Naylor wrote: >> [v5] > > Hi Tomas, > Peter suggested upthread to use 'settings' rather than 'gucs' for the > explain option (and output?), and you seemed to agree. Are you going > to include that in a future version? Speaking of the output, v5's > default text doesn't match that of formatted text ('GUCs' / 'GUC'). > Attached is v6 of the patch, adopting "settings" instead of "guc". regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Sun, Jan 20, 2019 at 2:31 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > Attached is v6 of the patch, adopting "settings" instead of "guc". Ok, looks good. I tried changing config values with the .conf file, alter system, and alter database, and tried a few queries with auto_explain. I made a pass through the config parameters, and don't see anything obviously left out. I have no comments on the source code. One thing stands out: For the docs on auto_explain, all other options have "Only superusers can change this setting.", but log_settings doesn't. -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 1/22/19 1:35 AM, John Naylor wrote: > On Sun, Jan 20, 2019 at 2:31 PM Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> Attached is v6 of the patch, adopting "settings" instead of "guc". > > Ok, looks good. I tried changing config values with the .conf file, > alter system, and alter database, and tried a few queries with > auto_explain. I made a pass through the config parameters, and don't > see anything obviously left out. I have no comments on the source > code. > > One thing stands out: For the docs on auto_explain, all other options > have "Only superusers can change this setting.", but log_settings > doesn't. Yes, that's an omission in the docs. Will fix. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jan 22, 2019 at 02:29:06AM +0100, Tomas Vondra wrote: > Yes, that's an omission in the docs. Will fix. Could you fix your patch then? I am moving it to next CF, waiting on author. -- Michael
Attachment
Hi, On 2019-01-15 02:39:49 +0100, Tomas Vondra wrote: > > > On 1/14/19 11:13 PM, Alvaro Herrera wrote: > > On 2019-Jan-14, Tomas Vondra wrote: > > > >> The one slightly annoying issue is that currently all the options are > >> formatted as text, including e.g. cpu_tuple_cost. That's because the GUC > >> options may have show hook, so I can't access the value directly (not > >> sure if there's an option around it). > > > > I think the problem is that you'd have to know how to print the value, > > which can be in one of several different C types. You'd have to grow > > some more infrastructure in the GUC tables, I think, and that doesn't > > seem worth the trouble. Printing as text seems enough. > > > > I don't think the number of formats is such a big issue - the range of > formats is quite limited: PGC_BOOL, PGC_INT, PGC_REAL, PGC_STRING and > PGC_ENUM. But the show hook simply returns string, and I'm not sure it's > guaranteed it matches the raw value (afaik the assign/show hooks can do > all kinds of funny stuff). Yea, but the underlying values are quite useless for humans. E.g. counting shared_buffers, wal_buffers etc in weird units is not helpful. So you'd need to support the different units for each. Greetings, Andres Freund
On 2/14/19 8:55 PM, Andres Freund wrote: > Hi, > > On 2019-01-15 02:39:49 +0100, Tomas Vondra wrote: >> >> >> On 1/14/19 11:13 PM, Alvaro Herrera wrote: >>> On 2019-Jan-14, Tomas Vondra wrote: >>> >>>> The one slightly annoying issue is that currently all the options are >>>> formatted as text, including e.g. cpu_tuple_cost. That's because the GUC >>>> options may have show hook, so I can't access the value directly (not >>>> sure if there's an option around it). >>> >>> I think the problem is that you'd have to know how to print the value, >>> which can be in one of several different C types. You'd have to grow >>> some more infrastructure in the GUC tables, I think, and that doesn't >>> seem worth the trouble. Printing as text seems enough. >>> >> >> I don't think the number of formats is such a big issue - the range of >> formats is quite limited: PGC_BOOL, PGC_INT, PGC_REAL, PGC_STRING and >> PGC_ENUM. But the show hook simply returns string, and I'm not sure it's >> guaranteed it matches the raw value (afaik the assign/show hooks can do >> all kinds of funny stuff). > > Yea, but the underlying values are quite useless for > humans. E.g. counting shared_buffers, wal_buffers etc in weird units is > not helpful. So you'd need to support the different units for each. > True. So you agree printing the values as text (as the patch currently does) seems enough? I guess I'm fine with that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-02-15 17:10:28 +0100, Tomas Vondra wrote: > True. So you agree printing the values as text (as the patch currently > does) seems enough? I guess I'm fine with that. I do agree, yes.
Hi, attached is an updated patch, fixing and slightly tweaking the docs. Barring objections, I'll get this committed later next week. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Sun, 24 Feb 2019 at 00:06, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > Hi, > > attached is an updated patch, fixing and slightly tweaking the docs. > > > Barring objections, I'll get this committed later next week. > I was having a look at this patch, and this kept me wondering, +static void +ExplainShowSettings(ExplainState *es) +{ Is there some reason for not providing any explanation above this function just like the rest of the functions in this file? Similarly, for struct config_generic ** get_explain_guc_options(int *num) { /* also bail out of there are no options */ + if (!num) + return; I think you meant 'if' instead if 'of' here.
On Wed, Mar 27, 2019 at 09:06:04AM +0100, Rafia Sabih wrote: >On Tue, 26 Mar 2019 at 21:04, Tomas Vondra <tomas.vondra@2ndquadrant.com> >wrote: > >> On Mon, Mar 18, 2019 at 11:31:48AM +0100, Rafia Sabih wrote: >> >On Sun, 24 Feb 2019 at 00:06, Tomas Vondra <tomas.vondra@2ndquadrant.com> >> wrote: >> >> >> >> Hi, >> >> >> >> attached is an updated patch, fixing and slightly tweaking the docs. >> >> >> >> >> >> Barring objections, I'll get this committed later next week. >> >> >> >I was having a look at this patch, and this kept me wondering, >> > >> >+static void >> >+ExplainShowSettings(ExplainState *es) >> >+{ >> >Is there some reason for not providing any explanation above this >> >function just like the rest of the functions in this file? >> > >> >Similarly, for >> > >> >struct config_generic ** >> >get_explain_guc_options(int *num) >> >{ >> > >> >/* also bail out of there are no options */ >> >+ if (!num) >> >+ return; >> >I think you meant 'if' instead if 'of' here. >> >> Thanks for the review - attached is a patch adding the missing comments, >> and doing two additional minor improvements: >> >> 1) Renaming ExplainShowSettings to ExplainPrintSettings, to make it more >> consistent with naming of the other functions in explain.c. >> >> 2) Actually counting GUC_EXPLAIN options, and only allocating space for >> those in get_explain_guc_options, instead of using num_guc_variables. The >> diffrence is quite significant (~50 vs. ~300), and considering each entry >> is 8B it makes a difference because such large chunks tend to have higher >> palloc overhed (due to ALLOCSET_SEPARATE_THRESHOLD). >> >> > Looks like the patch is in need of a rebase. >At commit: 1983af8e899389187026cb34c1ca9d89ea986120 > >P.S. reject files attached. > D'oh! That was a stupid mistake - I apparently attched just the delta against the previous patch version, i.e. the improvements I described. Attaches is a correct (and complete) patch. I planned to get this committed today, but considering this I'll wait until early next week to allow for feedback. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Fri, 29 Mar 2019 at 22:07, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On Wed, Mar 27, 2019 at 09:06:04AM +0100, Rafia Sabih wrote:
>On Tue, 26 Mar 2019 at 21:04, Tomas Vondra <tomas.vondra@2ndquadrant.com>
>wrote:
>
>> On Mon, Mar 18, 2019 at 11:31:48AM +0100, Rafia Sabih wrote:
>> >On Sun, 24 Feb 2019 at 00:06, Tomas Vondra <tomas.vondra@2ndquadrant.com>
>> wrote:
>> >>
>> >> Hi,
>> >>
>> >> attached is an updated patch, fixing and slightly tweaking the docs.
>> >>
>> >>
>> >> Barring objections, I'll get this committed later next week.
>> >>
>> >I was having a look at this patch, and this kept me wondering,
>> >
>> >+static void
>> >+ExplainShowSettings(ExplainState *es)
>> >+{
>> >Is there some reason for not providing any explanation above this
>> >function just like the rest of the functions in this file?
>> >
>> >Similarly, for
>> >
>> >struct config_generic **
>> >get_explain_guc_options(int *num)
>> >{
>> >
>> >/* also bail out of there are no options */
>> >+ if (!num)
>> >+ return;
>> >I think you meant 'if' instead if 'of' here.
>>
>> Thanks for the review - attached is a patch adding the missing comments,
>> and doing two additional minor improvements:
>>
>> 1) Renaming ExplainShowSettings to ExplainPrintSettings, to make it more
>> consistent with naming of the other functions in explain.c.
>>
>> 2) Actually counting GUC_EXPLAIN options, and only allocating space for
>> those in get_explain_guc_options, instead of using num_guc_variables. The
>> diffrence is quite significant (~50 vs. ~300), and considering each entry
>> is 8B it makes a difference because such large chunks tend to have higher
>> palloc overhed (due to ALLOCSET_SEPARATE_THRESHOLD).
>>
>>
> Looks like the patch is in need of a rebase.
>At commit: 1983af8e899389187026cb34c1ca9d89ea986120
>
>P.S. reject files attached.
>
D'oh! That was a stupid mistake - I apparently attched just the delta against
the previous patch version, i.e. the improvements I described. Attaches is a
correct (and complete) patch.
I planned to get this committed today, but considering this I'll wait until
early next week to allow for feedback.
The patch looks good to me.
Regards,
Rafia Sabih
Rafia Sabih
Hi, I've committed this, with some minor documentation tweaks. I've also fixed a minor bug in the last patch, where the group with settings was not properly labeled in some formats (e.g. json). Thanks to all the reviewers! regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services