Thread: compute_query_id and pg_stat_statements
When compute_query_id is not enabled (this is the default setting), pg_stat_statements doesn't track any statements. This means that we will see no entries in pg_stat_statements by default. I'm afraid that users may easily forget to enable compute_query_id when using pg_stat_statements (because this setting was not necessary in v13 or before), and finally may have noticed the mis-configuration and failure of statements tracking after many queries were executed. For example, we already have one report about this issue, in [1]. Shouldn't we do something so that users can avoid such mis-configuration? One idea is to change the default value of compute_query_id from false to true. If enabling compute_query_id doesn't incur any performance penalty, IMO this idea is very simple and enough. Another idea is to change pg_stat_statements so that it emits an error at the server startup (i.e., prevents the server from starting up) if compute_query_id is not enabled. In this case, users can easily notice the mis-configuration from the error message in the server log, enable compute_query_id, and then restart the server. IMO the former is better if there is no such performance risk. Otherwise we should adopt the latter approach. Or you have the better idea? Thought? [1] https://postgr.es/m/1953aec168224b95b0c962a622bef0794da6ff40.camel@moonset.ru Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Le sam. 24 avr. 2021 à 22:54, Fujii Masao <masao.fujii@oss.nttdata.com> a écrit :
For example, we already have one report about this issue, in [1].
this report was only a few days after the patch changing the behavior was committed, unless you've been following the original thread (which has been going on for 2 years), that's kind of expected. release notes for pg14 should highlight that change, so hopefully people upgrading will see it. I'll also try to write some blog article about it to add more warnings.
Shouldn't we do something so that users can avoid such mis-configuration?
One idea is to change the default value of compute_query_id from false to true.
If enabling compute_query_id doesn't incur any performance penalty,
IMO this idea is very simple and enough.
it adds some noticeable overhead in oltp style workloads. I think that I did some benchmarks in the original thread, and we decided not to enable it by default
Another idea is to change pg_stat_statements so that it emits an error
at the server startup (i.e., prevents the server from starting up)
if compute_query_id is not enabled. In this case, users can easily notice
the mis-configuration from the error message in the server log,
enable compute_query_id, and then restart the server.
that's also not an option, as one can now use pg_stat_statetements with a different queryid calculation. see for instance https://github.com/rjuju/pg_queryid for a proof a concept extension for that. I think it's clear that multiple people will want to use a different calculation as they have been asking for that for years.
IMO the former is better if there is no such performance risk. Otherwise
we should adopt the latter approach. Or you have the better idea?
I'm not sure how to address that, as temporarily disabling queryId calculation should be allowed. maybe we could raise a warning once per backend if pgss sees a dml query without queryId? but it could end up creating more problems than it solves.
for the record people have also raised bugs on the powa project because planning counters are not tracked by default, so compute_query_id will probably add a bit of traffic.
On Sat, Apr 24, 2021 at 11:54:25PM +0900, Fujii Masao wrote: > When compute_query_id is not enabled (this is the default setting), > pg_stat_statements doesn't track any statements. This means that > we will see no entries in pg_stat_statements by default. I'm afraid that > users may easily forget to enable compute_query_id > when using pg_stat_statements (because this setting was not necessary > in v13 or before), and finally may have noticed the mis-configuration > and failure of statements tracking after many queries were executed. > For example, we already have one report about this issue, in [1]. > > Shouldn't we do something so that users can avoid such mis-configuration? > > One idea is to change the default value of compute_query_id from false to true. > If enabling compute_query_id doesn't incur any performance penalty, > IMO this idea is very simple and enough. I think the query overhead was too high (2%) to enable it by default: https://www.postgresql.org/message-id/20201016160355.GA31474@alvherre.pgsql > Another idea is to change pg_stat_statements so that it emits an error > at the server startup (i.e., prevents the server from starting up) > if compute_query_id is not enabled. In this case, users can easily notice > the mis-configuration from the error message in the server log, > enable compute_query_id, and then restart the server. I think it throws an error in the server logs, but preventing server start seems extreme. Also, compute_query_id is PGC_SUSET, meaning it can be changed by the super-user, so you could enable compute_query_id without a server restart, which makes failing on start kind of odd. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Sat, Apr 24, 2021 at 5:22 PM Bruce Momjian <bruce@momjian.us> wrote: > > On Sat, Apr 24, 2021 at 11:54:25PM +0900, Fujii Masao wrote: > > When compute_query_id is not enabled (this is the default setting), > > pg_stat_statements doesn't track any statements. This means that > > we will see no entries in pg_stat_statements by default. I'm afraid that > > users may easily forget to enable compute_query_id > > when using pg_stat_statements (because this setting was not necessary > > in v13 or before), and finally may have noticed the mis-configuration > > and failure of statements tracking after many queries were executed. > > For example, we already have one report about this issue, in [1]. > > > > Shouldn't we do something so that users can avoid such mis-configuration? > > > > One idea is to change the default value of compute_query_id from false to true. > > If enabling compute_query_id doesn't incur any performance penalty, > > IMO this idea is very simple and enough. > > I think the query overhead was too high (2%) to enable it by default: > > https://www.postgresql.org/message-id/20201016160355.GA31474@alvherre.pgsql Personally I'd say 2% is not too high to turn it on by default, as it goes down when you move past trivial queries, which is what most people do. And since you can easily turn it off. > > Another idea is to change pg_stat_statements so that it emits an error > > at the server startup (i.e., prevents the server from starting up) > > if compute_query_id is not enabled. In this case, users can easily notice > > the mis-configuration from the error message in the server log, > > enable compute_query_id, and then restart the server. > > I think it throws an error in the server logs, but preventing server > start seems extreme. Also, compute_query_id is PGC_SUSET, meaning it > can be changed by the super-user, so you could enable compute_query_id > without a server restart, which makes failing on start kind of odd. How about turning it into an enum instead of a boolean, that can be: off = always off auto = pg_stat_statments turns it on when it's loaded in shared_preload_libraries. Other extensions using it can do that to. But it remains off if you haven't installed any *extension* that needs it on = always on (if you want it in pg_stat_activity regardless of extensions) The default would be "auto", which means that pg_stat_statements would work as expected, but those who haven't installed it (or another extension that changes it) would not have to pay the overhead. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Sat, Apr 24, 2021 at 06:48:53PM +0200, Magnus Hagander wrote: > > I think the query overhead was too high (2%) to enable it by default: > > > > https://www.postgresql.org/message-id/20201016160355.GA31474@alvherre.pgsql > > Personally I'd say 2% is not too high to turn it on by default, as it > goes down when you move past trivial queries, which is what most > people do. And since you can easily turn it off. We would do a lot of work to reduce overhead by 2% on every query, and to add 2% for a hash that previously was only used by pg_stat_statements seems unwise. > How about turning it into an enum instead of a boolean, that can be: > > off = always off > auto = pg_stat_statments turns it on when it's loaded in > shared_preload_libraries. Other extensions using it can do that to. > But it remains off if you haven't installed any *extension* that needs > it > on = always on (if you want it in pg_stat_activity regardless of extensions) > > The default would be "auto", which means that pg_stat_statements would > work as expected, but those who haven't installed it (or another > extension that changes it) would not have to pay the overhead. That's a pretty weird API. I think we just need people to turn it on like they are doing when the configure pg_stat_statements anyway. pg_stat_statements already requires configuration anyway. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Bruce Momjian <bruce@momjian.us> writes: > That's a pretty weird API. I think we just need people to turn it on > like they are doing when the configure pg_stat_statements anyway. > pg_stat_statements already requires configuration anyway. Agreed. If pg_stat_statements were zero-configuration today then this would be an annoying new burden, but it isn't. I haven't looked, but did we put anything into pg_stat_statements to make it easy to tell if you've messed up this setting? regards, tom lane
On Sat, Apr 24, 2021 at 01:43:51PM -0400, Tom Lane wrote: > > I haven't looked, but did we put anything into pg_stat_statements > to make it easy to tell if you've messed up this setting? You mean apart from from having pg_stat_statements' view/SRFs returning nothing? I think it's a reasonable use case to sometime disable query_id calculation, eg. if you know that it will only lead to useless bloat in the entry and that you won't need the info, so spamming warnings if there are no queryid could cause some pain.
Julien Rouhaud <rjuju123@gmail.com> writes: > On Sat, Apr 24, 2021 at 01:43:51PM -0400, Tom Lane wrote: >> I haven't looked, but did we put anything into pg_stat_statements >> to make it easy to tell if you've messed up this setting? > You mean apart from from having pg_stat_statements' view/SRFs returning > nothing? > I think it's a reasonable use case to sometime disable query_id calculation, > eg. if you know that it will only lead to useless bloat in the entry and that > you won't need the info, so spamming warnings if there are no queryid could > cause some pain. I agree repeated warnings would be bad news. I was wondering if we could arrange a single warning at the time pg_stat_statements is preloaded into the postmaster. In this way, if you tried to use a configuration file that used to work, you'd hopefully get some notice about why it no longer does what you want. Also, if you are preloading pg_stat_statements, it seems reasonable to assume that you'd like the global value of the flag to be "on", even if there are use-cases for transiently disabling it. I think the way to detect "being loaded into the postmaster" is if (IsPostmasterEnvironment && !IsUnderPostmaster) regards, tom lane
On Sun, Apr 25, 2021 at 11:39:55AM -0400, Tom Lane wrote: > > I agree repeated warnings would be bad news. I was wondering if we could > arrange a single warning at the time pg_stat_statements is preloaded into > the postmaster. In this way, if you tried to use a configuration file > that used to work, you'd hopefully get some notice about why it no longer > does what you want. Also, if you are preloading pg_stat_statements, it > seems reasonable to assume that you'd like the global value of the flag > to be "on", even if there are use-cases for transiently disabling it. What about people who wants to use pg_stat_statements but are not ok with our query_id heuristics and use a third-party plugin for that?
Julien Rouhaud <rjuju123@gmail.com> writes: > On Sun, Apr 25, 2021 at 11:39:55AM -0400, Tom Lane wrote: >> I agree repeated warnings would be bad news. I was wondering if we could >> arrange a single warning at the time pg_stat_statements is preloaded into >> the postmaster. In this way, if you tried to use a configuration file >> that used to work, you'd hopefully get some notice about why it no longer >> does what you want. Also, if you are preloading pg_stat_statements, it >> seems reasonable to assume that you'd like the global value of the flag >> to be "on", even if there are use-cases for transiently disabling it. > What about people who wants to use pg_stat_statements but are not ok with our > query_id heuristics and use a third-party plugin for that? They're still going to want the GUC set to something other than "off", no? In any case it's just a one-time log message, so it's not likely to be *that* annoying. regards, tom lane
On Sun, Apr 25, 2021 at 01:17:03PM -0400, Tom Lane wrote: > Julien Rouhaud <rjuju123@gmail.com> writes: > > On Sun, Apr 25, 2021 at 11:39:55AM -0400, Tom Lane wrote: > >> I agree repeated warnings would be bad news. I was wondering if we could > >> arrange a single warning at the time pg_stat_statements is preloaded into > >> the postmaster. In this way, if you tried to use a configuration file > >> that used to work, you'd hopefully get some notice about why it no longer > >> does what you want. Also, if you are preloading pg_stat_statements, it > >> seems reasonable to assume that you'd like the global value of the flag > >> to be "on", even if there are use-cases for transiently disabling it. > > > What about people who wants to use pg_stat_statements but are not ok with our > > query_id heuristics and use a third-party plugin for that? > > They're still going to want the GUC set to something other than "off", > no? They will want compute_query_id to be off. And they actually will *need* that, as we recommend third-party plugins computing alternative query_id to error out if they see a that a query_id has already been generated, to avoid any problem if compute_query_id is being temporarily toggled. That's what I did in the POC plugin for external query_id at [1]. > In any case it's just a one-time log message, so it's not likely > to be *that* annoying. In that case it should be phrased in a way that makes it clear that pg_stat_statements can work without enabling compute_query_id, something like: "compute_query_id is disabled. This module won't track any activity unless you configured a third-party extension that computes query identifiers" [1] https://github.com/rjuju/pg_queryid/blob/master/pg_queryid.c#L172
On 24.04.21 19:43, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: >> That's a pretty weird API. I think we just need people to turn it on >> like they are doing when the configure pg_stat_statements anyway. >> pg_stat_statements already requires configuration anyway. > > Agreed. If pg_stat_statements were zero-configuration today then > this would be an annoying new burden, but it isn't. I think people can understand "add pg_stat_statements to shared_preload_libraries" and "install the extension". You have to turn it on somehow after all. Now there is the additional burden of turning on this weird setting that no one understands. That's a 50% increase in burden. And almost no one will want to use a nondefault setting. pg_stat_statements is pretty popular. I think leaving in this requirement will lead to widespread confusion and complaints.
Re: Peter Eisentraut > > Agreed. If pg_stat_statements were zero-configuration today then > > this would be an annoying new burden, but it isn't. > > I think people can understand "add pg_stat_statements to > shared_preload_libraries" and "install the extension". You have to turn it > on somehow after all. Fwiw, I'd claim that pg_stat_statements *is* zero-configuration today. You just have to load the module (= shared_preload_libraries), and it will start working. Later you can run CREATE EXTENSION to actually see the stats, but they are already being collected in the background. > Now there is the additional burden of turning on this weird setting that no > one understands. That's a 50% increase in burden. > > And almost no one will want to use a nondefault setting. > > pg_stat_statements is pretty popular. I think leaving in this requirement > will lead to widespread confusion and complaints. Ack, please make the default config (i.e. after setting shared_preload_libraries) do something sensible. Having to enable some "weird" internal other setting will be very hard to explain to users. Fwiw, I'd even want to have pg_stat_statements enabled in core by default. That would awesome UX. (And turning off could be as simple as setting compute_query_id=off.) Christoph
On Mon, Apr 26, 2021 at 05:34:30PM +0200, Christoph Berg wrote: > Re: Peter Eisentraut > > > Agreed. If pg_stat_statements were zero-configuration today then > > > this would be an annoying new burden, but it isn't. > > > > I think people can understand "add pg_stat_statements to > > shared_preload_libraries" and "install the extension". You have to turn it > > on somehow after all. > > Fwiw, I'd claim that pg_stat_statements *is* zero-configuration today. > You just have to load the module (= shared_preload_libraries), and it > will start working. Later you can run CREATE EXTENSION to actually see > the stats, but they are already being collected in the background. > > > Now there is the additional burden of turning on this weird setting that no > > one understands. That's a 50% increase in burden. > > > > And almost no one will want to use a nondefault setting. > > > > pg_stat_statements is pretty popular. I think leaving in this requirement > > will lead to widespread confusion and complaints. > > Ack, please make the default config (i.e. after setting shared_preload_libraries) > do something sensible. Having to enable some "weird" internal other setting > will be very hard to explain to users. > > Fwiw, I'd even want to have pg_stat_statements enabled in core by > default. That would awesome UX. (And turning off could be as simple as > setting compute_query_id=off.) Techically, pg_stat_statements can turn on compute_query_id when it is loaded, even if it is 'off' in postgresql.conf, right? And pg_stat_statements would know if an alternate hash method is being used, right? This is closer to Magnus's idea of having a three-value compute_query_id, except is it more controlled by pg_stat_statements. Another idea would be to throw a user-visible warning if the pg_stat_statements extension is loaded and compute_query_id is off. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Greetings, * Bruce Momjian (bruce@momjian.us) wrote: > On Mon, Apr 26, 2021 at 05:34:30PM +0200, Christoph Berg wrote: > > Re: Peter Eisentraut > > > > Agreed. If pg_stat_statements were zero-configuration today then > > > > this would be an annoying new burden, but it isn't. > > > > > > I think people can understand "add pg_stat_statements to > > > shared_preload_libraries" and "install the extension". You have to turn it > > > on somehow after all. > > > > Fwiw, I'd claim that pg_stat_statements *is* zero-configuration today. > > You just have to load the module (= shared_preload_libraries), and it > > will start working. Later you can run CREATE EXTENSION to actually see > > the stats, but they are already being collected in the background. > > > > > Now there is the additional burden of turning on this weird setting that no > > > one understands. That's a 50% increase in burden. > > > > > > And almost no one will want to use a nondefault setting. > > > > > > pg_stat_statements is pretty popular. I think leaving in this requirement > > > will lead to widespread confusion and complaints. > > > > Ack, please make the default config (i.e. after setting shared_preload_libraries) > > do something sensible. Having to enable some "weird" internal other setting > > will be very hard to explain to users. > > > > Fwiw, I'd even want to have pg_stat_statements enabled in core by > > default. That would awesome UX. (And turning off could be as simple as > > setting compute_query_id=off.) > > Techically, pg_stat_statements can turn on compute_query_id when it is > loaded, even if it is 'off' in postgresql.conf, right? And > pg_stat_statements would know if an alternate hash method is being used, > right? +1 on this approach. I agree that we should avoid having to make every new user and every user who is upgrading with pg_stat_statements installed have to go twiddle this parameter. Thanks, Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Bruce Momjian (bruce@momjian.us) wrote: >> Techically, pg_stat_statements can turn on compute_query_id when it is >> loaded, even if it is 'off' in postgresql.conf, right? And >> pg_stat_statements would know if an alternate hash method is being used, >> right? > +1 on this approach. That'd make it impossible to turn off or adjust afterwards, wouldn't it? I'm afraid the confusion stemming from that would outweigh any simplicity. I would be in favor of logging a message at startup to the effect of "this is misconfigured" (as per upthread discussion), although whether people would see that is uncertain. In the end, it's not like this is the first time we've ever made an incompatible change in configuration needs; and it won't be the last either. I don't buy the argument that pg_stat_statements users can't cope with adding the additional setting. (Of course, we should be careful to call it out as an incompatible change in the release notes.) regards, tom lane
On Mon, Apr 26, 2021 at 12:56:13PM -0400, Tom Lane wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Bruce Momjian (bruce@momjian.us) wrote: > >> Techically, pg_stat_statements can turn on compute_query_id when it is > >> loaded, even if it is 'off' in postgresql.conf, right? And > >> pg_stat_statements would know if an alternate hash method is being used, > >> right? > > > +1 on this approach. > > That'd make it impossible to turn off or adjust afterwards, wouldn't it? > I'm afraid the confusion stemming from that would outweigh any simplicity. > > I would be in favor of logging a message at startup to the effect of > "this is misconfigured" (as per upthread discussion), although whether > people would see that is uncertain. I think a user-visible warning at CREATE EXNTENSION would help too. > In the end, it's not like this is the first time we've ever made an > incompatible change in configuration needs; and it won't be the last > either. I don't buy the argument that pg_stat_statements users can't > cope with adding the additional setting. (Of course, we should be > careful to call it out as an incompatible change in the release notes.) Agreed. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Mon, Apr 26, 2021 at 6:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Stephen Frost <sfrost@snowman.net> writes: > > * Bruce Momjian (bruce@momjian.us) wrote: > >> Techically, pg_stat_statements can turn on compute_query_id when it is > >> loaded, even if it is 'off' in postgresql.conf, right? And > >> pg_stat_statements would know if an alternate hash method is being used, > >> right? > > > +1 on this approach. > > That'd make it impossible to turn off or adjust afterwards, wouldn't it? > I'm afraid the confusion stemming from that would outweigh any simplicity. Thatäs why I suggested the three value one. Default to a mode where it's automatic, which is what the majority is going to want, but have a way to explicitly turn it on. > I would be in favor of logging a message at startup to the effect of > "this is misconfigured" (as per upthread discussion), although whether > people would see that is uncertain. Some people would. Many wouldn't, and sadly many hours would be spent on debugging things before they got there -- based on experience of how many people actually read the logs.. > In the end, it's not like this is the first time we've ever made an > incompatible change in configuration needs; and it won't be the last > either. I don't buy the argument that pg_stat_statements users can't > cope with adding the additional setting. (Of course, we should be > careful to call it out as an incompatible change in the release notes.) The fact that we've made changes before that complicated our users experience isn't in itself an argument for doing it again though... -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Tue, Apr 27, 2021 at 12:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Stephen Frost <sfrost@snowman.net> writes: > > * Bruce Momjian (bruce@momjian.us) wrote: > >> Techically, pg_stat_statements can turn on compute_query_id when it is > >> loaded, even if it is 'off' in postgresql.conf, right? And > >> pg_stat_statements would know if an alternate hash method is being used, > >> right? > > > +1 on this approach. > > That'd make it impossible to turn off or adjust afterwards, wouldn't it? I think so, which would also make it impossible to use an external query_id plugin. Enabling compute_query_id by default or raising a WARNING in pg_stat_statements' PG_INIT seems like the only 2 sensible options.
On Mon, Apr 26, 2021 at 7:00 PM Bruce Momjian <bruce@momjian.us> wrote: > > On Mon, Apr 26, 2021 at 12:56:13PM -0400, Tom Lane wrote: > > Stephen Frost <sfrost@snowman.net> writes: > > > * Bruce Momjian (bruce@momjian.us) wrote: > > >> Techically, pg_stat_statements can turn on compute_query_id when it is > > >> loaded, even if it is 'off' in postgresql.conf, right? And > > >> pg_stat_statements would know if an alternate hash method is being used, > > >> right? > > > > > +1 on this approach. > > > > That'd make it impossible to turn off or adjust afterwards, wouldn't it? > > I'm afraid the confusion stemming from that would outweigh any simplicity. > > > > I would be in favor of logging a message at startup to the effect of > > "this is misconfigured" (as per upthread discussion), although whether > > people would see that is uncertain. > > I think a user-visible warning at CREATE EXNTENSION would help too. It would help a bit, but actually logging it would probably help more. Most people don't run the CREATE EXTENSION commands manually, it's all done as part of either system install scripts or of application migrations. But that doesn't mean it wouldn't be useful to do it for those that *do* run things manually, it just wouldn't be sufficient in itself. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Tue, Apr 27, 2021 at 1:04 AM Magnus Hagander <magnus@hagander.net> wrote: > > Thatäs why I suggested the three value one. Default to a mode where > it's automatic, which is what the majority is going to want, but have > a way to explicitly turn it on. Agreed, that also sounds like a sensible default.
Greetings, * Magnus Hagander (magnus@hagander.net) wrote: > On Mon, Apr 26, 2021 at 6:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Stephen Frost <sfrost@snowman.net> writes: > > > * Bruce Momjian (bruce@momjian.us) wrote: > > >> Techically, pg_stat_statements can turn on compute_query_id when it is > > >> loaded, even if it is 'off' in postgresql.conf, right? And > > >> pg_stat_statements would know if an alternate hash method is being used, > > >> right? > > > > > +1 on this approach. > > > > That'd make it impossible to turn off or adjust afterwards, wouldn't it? > > I'm afraid the confusion stemming from that would outweigh any simplicity. I don't know that it actually would, but ... > Thatäs why I suggested the three value one. Default to a mode where > it's automatic, which is what the majority is going to want, but have > a way to explicitly turn it on. This is certainly fine with me too, though it seems a bit surprising to me that we couldn't just figure out what the user actually wants based on what's installed/running for any given combination. > > In the end, it's not like this is the first time we've ever made an > > incompatible change in configuration needs; and it won't be the last > > either. I don't buy the argument that pg_stat_statements users can't > > cope with adding the additional setting. (Of course, we should be > > careful to call it out as an incompatible change in the release notes.) > > The fact that we've made changes before that complicated our users > experience isn't in itself an argument for doing it again though... I'm generally a proponent of sensible changes across major versions even if it means that the user has to adjust things, but this seems like a case where we're punting on something that we really should just be able to figure out the right answer to and that seems like a step backwards. Thanks, Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Magnus Hagander (magnus@hagander.net) wrote: >> Thatäs why I suggested the three value one. Default to a mode where >> it's automatic, which is what the majority is going to want, but have >> a way to explicitly turn it on. > This is certainly fine with me too, though it seems a bit surprising to > me that we couldn't just figure out what the user actually wants based > on what's installed/running for any given combination. I'd be on board with having pg_stat_statement's pg_init function do something to adjust the setting, if we can figure out how to do that in a way that's not confusing in itself. I'm not sure though that the GUC engine offers a good way. regards, tom lane
On 2021-Apr-26, Tom Lane wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Magnus Hagander (magnus@hagander.net) wrote: > >> Thatäs why I suggested the three value one. Default to a mode where > >> it's automatic, which is what the majority is going to want, but have > >> a way to explicitly turn it on. > > > This is certainly fine with me too, though it seems a bit surprising to > > me that we couldn't just figure out what the user actually wants based > > on what's installed/running for any given combination. > > I'd be on board with having pg_stat_statement's pg_init function do > something to adjust the setting, if we can figure out how to do that > in a way that's not confusing in itself. I'm not sure though that > the GUC engine offers a good way. I think it's straightforward, if we decouple the tri-valued enum used for guc.c purposes from a separate boolean that actually enables the feature. GUC sets the boolean to "off" initially when it sees the enum as "auto", and then pg_stat_statement's _PG_init modifies it during its own startup as needed. So the user can turn the GUC off, and then pg_stat_statement does nothing and there is no performance drawback; or leave it "auto" and it'll only compute query_id if pg_stat_statement is loaded; or leave it on if they want the query_id for other purposes. -- Álvaro Herrera 39°49'30"S 73°17'W "El miedo atento y previsor es la madre de la seguridad" (E. Burke)
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Magnus Hagander (magnus@hagander.net) wrote: > >> Thatäs why I suggested the three value one. Default to a mode where > >> it's automatic, which is what the majority is going to want, but have > >> a way to explicitly turn it on. > > > This is certainly fine with me too, though it seems a bit surprising to > > me that we couldn't just figure out what the user actually wants based > > on what's installed/running for any given combination. > > I'd be on board with having pg_stat_statement's pg_init function do > something to adjust the setting, if we can figure out how to do that > in a way that's not confusing in itself. I'm not sure though that > the GUC engine offers a good way. Both of the extensions are getting loaded via pg_stat_statements and both can have pg_init functions which work together to come up with the right answer, no? That is- can't pg_stat_statements, when it's loaded, enable compute_query_id if it's not already enabled, and then the pg_queryid module simply disable it when it gets loaded in it's pg_init()? Telling people who are using pg_queryid to have it loaded *after* pg_stat_statements certainly seems reasonable to me, but if folks don't like that then maybe have a tri-state which is 'auto', 'on', and 'off', where pg_stat_statements would set it to 'on' if it's set to 'auto', but not do anything if it starts as 'off'. pg_queryid would then set it to 'off' when it's loaded and it wouldn't matter if it's loaded before or after. Thanks, Stephen
Attachment
Greetings, * Alvaro Herrera (alvherre@alvh.no-ip.org) wrote: > On 2021-Apr-26, Tom Lane wrote: > > > Stephen Frost <sfrost@snowman.net> writes: > > > * Magnus Hagander (magnus@hagander.net) wrote: > > >> Thatäs why I suggested the three value one. Default to a mode where > > >> it's automatic, which is what the majority is going to want, but have > > >> a way to explicitly turn it on. > > > > > This is certainly fine with me too, though it seems a bit surprising to > > > me that we couldn't just figure out what the user actually wants based > > > on what's installed/running for any given combination. > > > > I'd be on board with having pg_stat_statement's pg_init function do > > something to adjust the setting, if we can figure out how to do that > > in a way that's not confusing in itself. I'm not sure though that > > the GUC engine offers a good way. > > I think it's straightforward, if we decouple the tri-valued enum used > for guc.c purposes from a separate boolean that actually enables the > feature. GUC sets the boolean to "off" initially when it sees the enum > as "auto", and then pg_stat_statement's _PG_init modifies it during its > own startup as needed. > > So the user can turn the GUC off, and then pg_stat_statement does > nothing and there is no performance drawback; or leave it "auto" and > it'll only compute query_id if pg_stat_statement is loaded; or leave it > on if they want the query_id for other purposes. Yeah, this is more-or-less the same as what I was just proposing in an email that crossed this one. Using a separate boolean would certainly be fine. Thanks, Stephen
Attachment
Hi, On 2021-04-26 13:43:31 -0400, Alvaro Herrera wrote: > I think it's straightforward, if we decouple the tri-valued enum used > for guc.c purposes from a separate boolean that actually enables the > feature. GUC sets the boolean to "off" initially when it sees the enum > as "auto", and then pg_stat_statement's _PG_init modifies it during its > own startup as needed. > So the user can turn the GUC off, and then pg_stat_statement does > nothing and there is no performance drawback; or leave it "auto" and > it'll only compute query_id if pg_stat_statement is loaded; or leave it > on if they want the query_id for other purposes. I think that's the right direction. I wonder though if we shouldn't go a bit further. Have one guc that determines the "query id provider" (NULL or a shared library), and one GUC that configures whether query-id is computed (never, on-demand/auto, always). For the provider GUC load the .so and look up a function with some well known name. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > I think that's the right direction. I wonder though if we shouldn't go a > bit further. Have one guc that determines the "query id provider" (NULL > or a shared library), and one GUC that configures whether query-id is > computed (never, on-demand/auto, always). For the provider GUC load the > .so and look up a function with some well known name. That's sounding like a pretty sane design, actually. Not sure about the shared-library-name-with-fixed-function-name detail, but certainly it seems to be useful to separate "I need a query-id" from the details of the ID calculation. Rather than a GUC per se for the ID provider, maybe we could have a function hook that defaults to pointing at the in-core computation, and then a module wanting to override that just gets into the hook. regards, tom lane
Hi, On 2021-04-26 14:21:00 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > That's sounding like a pretty sane design, actually. Not sure about > the shared-library-name-with-fixed-function-name detail, but certainly > it seems to be useful to separate "I need a query-id" from the details > of the ID calculation. > > Rather than a GUC per se for the ID provider, maybe we could have a > function hook that defaults to pointing at the in-core computation, > and then a module wanting to override that just gets into the hook. I have a preference to determining the provider via GUC instead of a hook because it is both easier to introspect and easier to configure. If the provider is loaded via a hook, and the shared library is loaded via shared_preload_libraries, one can't easily just turn that off in a single session, but needs to restart or explicitly load a different library (that can't already be loaded). We also don't have any way to show what's hooking into a hook. Greetings, Andres Freund
On Mon, Apr 26, 2021 at 8:14 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2021-04-26 13:43:31 -0400, Alvaro Herrera wrote: > > I think it's straightforward, if we decouple the tri-valued enum used > > for guc.c purposes from a separate boolean that actually enables the > > feature. GUC sets the boolean to "off" initially when it sees the enum > > as "auto", and then pg_stat_statement's _PG_init modifies it during its > > own startup as needed. That's pretty much exactly my original suggestion, yes :) > > So the user can turn the GUC off, and then pg_stat_statement does > > nothing and there is no performance drawback; or leave it "auto" and > > it'll only compute query_id if pg_stat_statement is loaded; or leave it > > on if they want the query_id for other purposes. > > I think that's the right direction. I wonder though if we shouldn't go a > bit further. Have one guc that determines the "query id provider" (NULL > or a shared library), and one GUC that configures whether query-id is > computed (never, on-demand/auto, always). For the provider GUC load the > .so and look up a function with some well known name. On Mon, Apr 26, 2021 at 8:37 PM Andres Freund <andres@anarazel.de> wrote: > I have a preference to determining the provider via GUC instead of a > hook because it is both easier to introspect and easier to configure. +1 in general. Though we could of course also have a read-only internal GUC that would show what we ended up with, and still configure it with shared_preload_libraries, or loaded in some other way. In a way it'd be cleaner to "always load modules with shared_preload_libraries", but I can certainly see the arguments in either direction.. But whichever way it's configured, having a well exposed way to know what it actually is would be important. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Mon, Apr 26, 2021 at 11:37:45AM -0700, Andres Freund wrote: > Hi, > > On 2021-04-26 14:21:00 -0400, Tom Lane wrote: > > Andres Freund <andres@anarazel.de> writes: > > That's sounding like a pretty sane design, actually. Not sure about > > the shared-library-name-with-fixed-function-name detail, but certainly > > it seems to be useful to separate "I need a query-id" from the details > > of the ID calculation. > > > > Rather than a GUC per se for the ID provider, maybe we could have a > > function hook that defaults to pointing at the in-core computation, > > and then a module wanting to override that just gets into the hook. > > I have a preference to determining the provider via GUC instead of a > hook because it is both easier to introspect and easier to configure. In any case, having a different provider would greatly simplify third-party queryid lib authors and users life. For now the core queryid is computed before post_parse_analyze_hook, but any third party plugin would have to do it as a post_parse_analyze_hook, so you have to make sure that the lib is at the right position in shared_preload_libraries to have it work, eg. [1], depending on how pg_stat_statements and other similar module call prev_post_parse_analyze_hook, which is a pretty bad thing. > If the provider is loaded via a hook, and the shared library is loaded > via shared_preload_libraries, one can't easily just turn that off in a > single session, but needs to restart or explicitly load a different > library (that can't already be loaded). On the other hand we *don't* want to dynamically change the provider. Temporarily enabling/disabling queryid calculation is ok, but generating different have for the same query isn't. > We also don't have any way to show what's hooking into a hook. If we had a dedicated query_id hook, then plugins should error out if users configured multiple plugins to calculate a query_id, so it should be easy to know which plugin is responsible for it without knowing who hooked into the hook. [1] https://github.com/rjuju/pg_queryid/blob/master/pg_queryid.c#L116-L117
On Tue, Apr 27, 2021 at 02:25:04PM +0800, Julien Rouhaud wrote: > On Mon, Apr 26, 2021 at 11:37:45AM -0700, Andres Freund wrote: >> On 2021-04-26 14:21:00 -0400, Tom Lane wrote: >>> That's sounding like a pretty sane design, actually. Not sure about >>> the shared-library-name-with-fixed-function-name detail, but certainly >>> it seems to be useful to separate "I need a query-id" from the details >>> of the ID calculation. >>> >>> Rather than a GUC per se for the ID provider, maybe we could have a >>> function hook that defaults to pointing at the in-core computation, >>> and then a module wanting to override that just gets into the hook. >> >> I have a preference to determining the provider via GUC instead of a >> hook because it is both easier to introspect and easier to configure. So, this thread has died two weeks ago, and it is still an open item. Could it be possible to move to a resolution by beta1? The consensus I can get from the thread is that we should have a tri-value state to track an extra "auto" for the query ID computation, as proposed by Alvaro here: https://www.postgresql.org/message-id/20210426174331.GA19401@alvherre.pgsql Unfortunately, nothing has happened to be able to do something like that. -- Michael
Attachment
On 2021/05/11 15:04, Michael Paquier wrote: > On Tue, Apr 27, 2021 at 02:25:04PM +0800, Julien Rouhaud wrote: >> On Mon, Apr 26, 2021 at 11:37:45AM -0700, Andres Freund wrote: >>> On 2021-04-26 14:21:00 -0400, Tom Lane wrote: >>>> That's sounding like a pretty sane design, actually. Not sure about >>>> the shared-library-name-with-fixed-function-name detail, but certainly >>>> it seems to be useful to separate "I need a query-id" from the details >>>> of the ID calculation. >>>> >>>> Rather than a GUC per se for the ID provider, maybe we could have a >>>> function hook that defaults to pointing at the in-core computation, >>>> and then a module wanting to override that just gets into the hook. >>> >>> I have a preference to determining the provider via GUC instead of a >>> hook because it is both easier to introspect and easier to configure. > > So, this thread has died two weeks ago, and it is still an open item. > Could it be possible to move to a resolution by beta1? The consensus > I can get from the thread is that we should have a tri-value state to > track an extra "auto" for the query ID computation, as proposed by > Alvaro here: > https://www.postgresql.org/message-id/20210426174331.GA19401@alvherre.pgsql +1 Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Tue, May 11, 2021 at 03:04:13PM +0900, Michael Paquier wrote: > On Tue, Apr 27, 2021 at 02:25:04PM +0800, Julien Rouhaud wrote: > > On Mon, Apr 26, 2021 at 11:37:45AM -0700, Andres Freund wrote: > >> On 2021-04-26 14:21:00 -0400, Tom Lane wrote: > >>> That's sounding like a pretty sane design, actually. Not sure about > >>> the shared-library-name-with-fixed-function-name detail, but certainly > >>> it seems to be useful to separate "I need a query-id" from the details > >>> of the ID calculation. > >>> > >>> Rather than a GUC per se for the ID provider, maybe we could have a > >>> function hook that defaults to pointing at the in-core computation, > >>> and then a module wanting to override that just gets into the hook. > >> > >> I have a preference to determining the provider via GUC instead of a > >> hook because it is both easier to introspect and easier to configure. > > So, this thread has died two weeks ago, and it is still an open item. > Could it be possible to move to a resolution by beta1? The consensus > I can get from the thread is that we should have a tri-value state to > track an extra "auto" for the query ID computation, as proposed by > Alvaro here: > https://www.postgresql.org/message-id/20210426174331.GA19401@alvherre.pgsql > > Unfortunately, nothing has happened to be able to do something like > that. My understanding was that there wasn't a consensus on how to fix the problem. Anyway, PFA a patch that implement a [off | on | auto] compute_query_id, and provides a new queryIdWanted() function to let third party plugins inform us that they want a query id if possible. As it was noted somewhere in that thread, that's a hack on top on the GUC machinery, so compute_query_id will display "on" rather than "auto" (or "auto and enabled" or whatever) since GUC isn't designed to handle that behavior. For the record I also tested the patch using pg_qualstats(), which can be loaded interactively and also benefits from a query identifier. It works as expected, as in "query idenfitier are enabled but only for the backend that loaded pg_qualstats".
Attachment
On Tue, May 11, 2021 at 8:04 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Apr 27, 2021 at 02:25:04PM +0800, Julien Rouhaud wrote: > > On Mon, Apr 26, 2021 at 11:37:45AM -0700, Andres Freund wrote: > >> On 2021-04-26 14:21:00 -0400, Tom Lane wrote: > >>> That's sounding like a pretty sane design, actually. Not sure about > >>> the shared-library-name-with-fixed-function-name detail, but certainly > >>> it seems to be useful to separate "I need a query-id" from the details > >>> of the ID calculation. > >>> > >>> Rather than a GUC per se for the ID provider, maybe we could have a > >>> function hook that defaults to pointing at the in-core computation, > >>> and then a module wanting to override that just gets into the hook. > >> > >> I have a preference to determining the provider via GUC instead of a > >> hook because it is both easier to introspect and easier to configure. > > So, this thread has died two weeks ago, and it is still an open item. > Could it be possible to move to a resolution by beta1? The consensus > I can get from the thread is that we should have a tri-value state to > track an extra "auto" for the query ID computation, as proposed by > Alvaro here: > https://www.postgresql.org/message-id/20210426174331.GA19401@alvherre.pgsql Technically I think that was my suggestion from earlier in that thread that Alvaro just +1ed :) That said, I sort of put that one aside when both Bruce and Tom considered it "a pretty weird API" to quote Bruce. I had missed the fact that Tom changed his mind (maybe when picking up on more of the details). And FTR, I still think this is the best way forward. I think Andres also raised a good point about the ability to actually know which one is in use. Even if we keep the current way of *setting* the hook, I think it might be worthwhile to expose a PGC_INTERNAL guc that shows *which* implementation is actually in use? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Tue, May 11, 2021 at 9:35 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Tue, May 11, 2021 at 03:04:13PM +0900, Michael Paquier wrote: > > On Tue, Apr 27, 2021 at 02:25:04PM +0800, Julien Rouhaud wrote: > > > On Mon, Apr 26, 2021 at 11:37:45AM -0700, Andres Freund wrote: > > >> On 2021-04-26 14:21:00 -0400, Tom Lane wrote: > > >>> That's sounding like a pretty sane design, actually. Not sure about > > >>> the shared-library-name-with-fixed-function-name detail, but certainly > > >>> it seems to be useful to separate "I need a query-id" from the details > > >>> of the ID calculation. > > >>> > > >>> Rather than a GUC per se for the ID provider, maybe we could have a > > >>> function hook that defaults to pointing at the in-core computation, > > >>> and then a module wanting to override that just gets into the hook. > > >> > > >> I have a preference to determining the provider via GUC instead of a > > >> hook because it is both easier to introspect and easier to configure. > > > > So, this thread has died two weeks ago, and it is still an open item. > > Could it be possible to move to a resolution by beta1? The consensus > > I can get from the thread is that we should have a tri-value state to > > track an extra "auto" for the query ID computation, as proposed by > > Alvaro here: > > https://www.postgresql.org/message-id/20210426174331.GA19401@alvherre.pgsql > > > > Unfortunately, nothing has happened to be able to do something like > > that. > > My understanding was that there wasn't a consensus on how to fix the problem. > > Anyway, PFA a patch that implement a [off | on | auto] compute_query_id, and > provides a new queryIdWanted() function to let third party plugins inform us > that they want a query id if possible. 30 second review -- wouldn't it be cleaner to keep a separate boolean telling the backend "include it or not", which is set to true/false in the guc assign hook and can then be flipped from false->true in queryIdWanted()? (I'd suggest a more verbose name for that function btw, something like requestQueryIdGeneration() or so). (Again, just the 30 second review between meetings, so maybe I'm completely off) -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Tue, May 11, 2021 at 09:43:25AM +0200, Magnus Hagander wrote: > > 30 second review -- wouldn't it be cleaner to keep a separate boolean > telling the backend "include it or not", which is set to true/false in > the guc assign hook and can then be flipped from false->true in > queryIdWanted()? (I'd suggest a more verbose name for that function > btw, something like requestQueryIdGeneration() or so). > > (Again, just the 30 second review between meetings, so maybe I'm completely off) It it surely would, but then that variable would need to be explicitly handled as it wouldn't be automatically inherited on Windows and EXEC_BACKEND right?
On 2021/05/11 16:35, Julien Rouhaud wrote: > Anyway, PFA a patch that implement a [off | on | auto] compute_query_id, and > provides a new queryIdWanted() function to let third party plugins inform us > that they want a query id if possible. Thanks! > As it was noted somewhere in that thread, that's a hack on top on the GUC > machinery, so compute_query_id will display "on" rather than "auto" (or "auto > and enabled" or whatever) since GUC isn't designed to handle that behavior. Can't we work around this issue by making queryIdWanted() set another flag like query_id_wanted instead of overwriting compute_query_id?If we do this, query id computation is necessary when "compute_query_id == COMPUTE_QUERY_ID_ON || (compute_query_id== COMPUTE_QUERY_ID_AUTO && query_id_wanted)". Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Tue, May 11, 2021 at 05:41:53PM +0900, Fujii Masao wrote: > > On 2021/05/11 16:35, Julien Rouhaud wrote: > > Anyway, PFA a patch that implement a [off | on | auto] compute_query_id, and > > provides a new queryIdWanted() function to let third party plugins inform us > > that they want a query id if possible. > > Thanks! > > > > As it was noted somewhere in that thread, that's a hack on top on the GUC > > machinery, so compute_query_id will display "on" rather than "auto" (or "auto > > and enabled" or whatever) since GUC isn't designed to handle that behavior. > > Can't we work around this issue by making queryIdWanted() set another flag like query_id_wanted instead of overwritingcompute_query_id? If we do this, query id computation is necessary when "compute_query_id == COMPUTE_QUERY_ID_ON|| (compute_query_id == COMPUTE_QUERY_ID_AUTO && query_id_wanted)". That's exactly what Magnus mentioned :) It's not possible because variable aren't inherited on Windows or EXEC_BACKEND. I didn't check but I'm assuming that it could work if the other flag was an internal GUC that couldn't be set by users, but then we would have some kind of internal flag that would have to be documented as "how to check if compute_query_id" is actually enabled or not, which doesn't seem like a good idea. Another approach would be to add a new "auto (enabled)" option to the enum, and prevent users from manually setting the guc to that value. It's not perfect but maybe it would be cleaner. Overall it seems that we don't have a clear consensus on how exactly to address the problem, which is why I originally didn't sent a patch.
On Tue, May 11, 2021 at 10:51 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Tue, May 11, 2021 at 05:41:53PM +0900, Fujii Masao wrote: > > > > On 2021/05/11 16:35, Julien Rouhaud wrote: > > > Anyway, PFA a patch that implement a [off | on | auto] compute_query_id, and > > > provides a new queryIdWanted() function to let third party plugins inform us > > > that they want a query id if possible. > > > > Thanks! > > > > > > > As it was noted somewhere in that thread, that's a hack on top on the GUC > > > machinery, so compute_query_id will display "on" rather than "auto" (or "auto > > > and enabled" or whatever) since GUC isn't designed to handle that behavior. > > > > Can't we work around this issue by making queryIdWanted() set another flag like query_id_wanted instead of overwritingcompute_query_id? If we do this, query id computation is necessary when "compute_query_id == COMPUTE_QUERY_ID_ON|| (compute_query_id == COMPUTE_QUERY_ID_AUTO && query_id_wanted)". > > That's exactly what Magnus mentioned :) It's not possible because variable > aren't inherited on Windows or EXEC_BACKEND. I didn't check but I'm > assuming that it could work if the other flag was an internal GUC that couldn't > be set by users, but then we would have some kind of internal flag that would > have to be documented as "how to check if compute_query_id" is actually enabled > or not, which doesn't seem like a good idea. That doesn't fundamentally make it impossible, you just have to add it to the list of variables being copied over, wouldn't you? See save_backend_variables() -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Tue, May 11, 2021 at 10:59:51AM +0200, Magnus Hagander wrote: > > That doesn't fundamentally make it impossible, you just have to add it > to the list of variables being copied over, wouldn't you? See > save_backend_variables() Yes, I agree, and that's what I meant by "explicitly handled". The thing is that I don't know if that's the best way to go, as it doesn't solve the "is it actually enabled" and/or "which implementation is used". At least the patch I sent, although it's totally a hack, let you know if compute_query_id is enabled or not. I'm fine with implementing it that way, but only if there's a consensus.
On Tue, May 11, 2021 at 05:41:06PM +0800, Julien Rouhaud wrote: > On Tue, May 11, 2021 at 10:59:51AM +0200, Magnus Hagander wrote: > > > > That doesn't fundamentally make it impossible, you just have to add it > > to the list of variables being copied over, wouldn't you? See > > save_backend_variables() > > Yes, I agree, and that's what I meant by "explicitly handled". The thing is > that I don't know if that's the best way to go, as it doesn't solve the "is it > actually enabled" and/or "which implementation is used". At least the patch I > sent, although it's totally a hack, let you know if compute_query_id is enabled > or not. I'm fine with implementing it that way, but only if there's a > consensus. Actually, isn't that how e.g. wal_buffers = -1 is working? The original value is lost and what you get is the computed value. This is a bit different here as the value isn't always changed, and can be changed interactively but otherwise it's the same behavior.
At Tue, 11 May 2021 18:52:49 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in > On Tue, May 11, 2021 at 05:41:06PM +0800, Julien Rouhaud wrote: > > On Tue, May 11, 2021 at 10:59:51AM +0200, Magnus Hagander wrote: > > > > > > That doesn't fundamentally make it impossible, you just have to add it > > > to the list of variables being copied over, wouldn't you? See > > > save_backend_variables() > > > > Yes, I agree, and that's what I meant by "explicitly handled". The thing is > > that I don't know if that's the best way to go, as it doesn't solve the "is it > > actually enabled" and/or "which implementation is used". At least the patch I > > sent, although it's totally a hack, let you know if compute_query_id is enabled > > or not. I'm fine with implementing it that way, but only if there's a > > consensus. > > Actually, isn't that how e.g. wal_buffers = -1 is working? The original value > is lost and what you get is the computed value. This is a bit different here > as the value isn't always changed, and can be changed interactively but > otherwise it's the same behavior. If we look it in pg_settings, it shows the current value and the value at boot-time. So I'm fine with that behavior. However, IMHO, I doubt the necessity of "on". Assuming that we require any module that wants query-id to call queryIdWanted() at any time after each process startup (or it could be inherited to children), I think only "auto" and "off" are enough for the variable. Thinking in this line, the variable is a subset of a GUC variable to specify the name of a query-id provider (as Andres suggested upthread), and I think it would work better in future. So for now I propose that we have a variable query_id_provider that has only 'default' and 'none' as the domain. We can later expand it so that any other query-id provider modules can be loaded without chaning the interface. postgresql.conf # query_id_provider = 'default' # provider module for query-id. 'none' means # # disabling query-id calculation. If we want to have a direct way to know whether query-id is active or not, it'd be good to have a read-only variable "query_id_active". regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Wed, 12 May 2021 10:42:01 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in > Hello Horiguchi-san, > > On Wed, May 12, 2021 at 11:08:36AM +0900, Kyotaro Horiguchi wrote: > > > > If we look it in pg_settings, it shows the current value and the value > > at boot-time. So I'm fine with that behavior. > > > > However, IMHO, I doubt the necessity of "on". Assuming that we require > > any module that wants query-id to call queryIdWanted() at any time > > after each process startup (or it could be inherited to children), I > > think only "auto" and "off" are enough for the variable. > > I don't think that this approach would cope well for people who want a queryid > without pg_stat_statements or such. Since the queryid can now be found in > pg_stat_activity, EXPLAIN output or the logs I think it's entirely reasonable > to allow users to benefit from that even if they don't install additional > module. Ah, I missed that case. And we are wanting to use pg_stat_statements with (almost) zero-config? How about the following behavior? Setting query_id_provider to 'none' means we don't calculate query-id by default. However, if queryIdWante() is called, the default provider is set up and starts calculating query id. Setting query_id_provider to something else means the user wants query-id calcualted using the provider. Setting 'default' is equivalent to setting compute_query_id to 'on'. There might be a case where a user sets query_id_provider to non-'none' but don't want have query-id calculated, but it can be said a kind of mis-configuration? > > Thinking in > > this line, the variable is a subset of a GUC variable to specify the > > name of a query-id provider (as Andres suggested upthread), and I > > think it would work better in future. > > > > So for now I propose that we have a variable query_id_provider that > > has only 'default' and 'none' as the domain. > > I think this would be a mistake to do that, as it would mean that we don't > officially support alternative queryid provider. Ok, if we want to support alternative providers from the first, we need to actually write the loader code for query-id providers. It would not be so hard?, but it might not be suitable to this stage so I proposed that to get rid of needing such complexity for now. (Anyway I prefer to load query-id provider as a dynamically loadable module rather than hook-function.) > > We can later expand it > > so that any other query-id provider modules can be loaded without > > chaning the interface. > > The GUC itself may not change, but third-party queryid provider would probably > need changes as the new entry point will be dedicated to compute a queryid > only, while third-party plugins may do more than that in their > post_parse_analyze_hook. And also users will have to change their I don't think it is not that a problem. Even if any third-party extension is having query-id generator by itself, in most cases it would be a copy of JumbleQuery in case of pg_stat_statement is not loaded and now it is moved in-core as 'default' provider. What the exntension needs to be done is just ripping out the copied generator code. I guess... > configuration to use that new interface, and additionally the module may now > have to be removed from shared_preload_libraries. Overall, it doesn't seem to > me that it would make users' life easier. Why the third-party module need to be removed from shared_preload_libraries? The module can stay as a preloaded shared library but just no longer need to have its own query-id provider since it is provided in-core. If the extension required a specific provider, the developer need to make it a loadable module and users need to specify the provider module explicitly. I don't think that is not a problem but if we wanted to make it easier, we can let users free from that step by allowing 'auto' for query-id-provider to load any module by the first extension. So, for example, how about the following interface? GUC query_id_provider: - 'none' : query_id is not calculated, don't allow loading external generator module. - 'default' : use default provider and calculate query-id. - '<provider-name>' : use the provider and calculate query-id using it. - 'auto' : query_id is not calculated, but allow to load query-id provider if queryIdWanted() is called. # of course 'auto' and 'default' are inhibited as the provier name. - core function bool queryIdWanted(char *provider_name, bool use_existing) Allows extensions to request to load a provider if not yet, then start calculating query-id. Returns true if the request is accepted. provider_name : - 'default' or '<provider-name>': requests the provider to be loaded and start calculating query-id. Refuse the request if 'none' is set to query_id_provider. use_existing: Set true to allow using a provider already loaded. Otherwise refuses the request if any other provider than prvoder_name is already loaded. In most cases users set query_id_provider to 'auto' and extensions call queryIdWanted with ('default', true). regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Wed, 12 May 2021 14:33:35 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > Ok, if we want to support alternative providers from the first, we > need to actually write the loader code for query-id providers. It > would not be so hard?, but it might not be suitable to this stage so I > proposed that to get rid of needing such complexity for now. > > (Anyway I prefer to load query-id provider as a dynamically loadable > module rather than hook-function.) ... > So, for example, how about the following interface? > > GUC query_id_provider: > > - 'none' : query_id is not calculated, don't allow loading external > generator module. > > - 'default' : use default provider and calculate query-id. > > - '<provider-name>' : use the provider and calculate query-id using it. > > - 'auto' : query_id is not calculated, but allow to load query-id > provider if queryIdWanted() is called. > > # of course 'auto' and 'default' are inhibited as the provier name. > > - core function bool queryIdWanted(char *provider_name, bool use_existing) > > Allows extensions to request to load a provider if not yet, then > start calculating query-id. Returns true if the request is accepted. > > provider_name : > > - 'default' or '<provider-name>': requests the provider to be loaded > and start calculating query-id. Refuse the request if 'none' is > set to query_id_provider. > > use_existing: Set true to allow using a provider already loaded. > Otherwise refuses the request if any other provider than > prvoder_name is already loaded. > > In most cases users set query_id_provider to 'auto' and extensions > call queryIdWanted with ('default', true). Hmm. They are in contradiction. Based on this future picture, at this stage it can be simplified to allowing only 'default' as the provider name. If you want to support any other provider at this point,,, we need to imlement the full-spec? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Hi
Ah, I missed that case. And we are wanting to use pg_stat_statements
with (almost) zero-config? How about the following behavior?
Until now, the pg_stat_statements was zero-config. So the change is not user friendly.
The idea so pg_stat_statements requires enabled computed_query_id is not good. There should be dependency only on the queryid column.
Regards
Pavel
On Wed, May 12, 2021 at 07:49:13AM +0200, Pavel Stehule wrote: > > > Ah, I missed that case. And we are wanting to use pg_stat_statements > > with (almost) zero-config? How about the following behavior? > > > > > Until now, the pg_stat_statements was zero-config. So the change is not > user friendly. Apart from configuring shared_preload_libraries, but agreed. > The idea so pg_stat_statements requires enabled computed_query_id is not > good. There should be dependency only on the queryid column. I agree that requiring to change compute_query_id when you already added pg_stat_statements in shared_preload_libraries isn't good, and the patch I sent yesterday would fix that.
st 12. 5. 2021 v 8:10 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
On Wed, May 12, 2021 at 07:49:13AM +0200, Pavel Stehule wrote:
>
> > Ah, I missed that case. And we are wanting to use pg_stat_statements
> > with (almost) zero-config? How about the following behavior?
> >
> >
> Until now, the pg_stat_statements was zero-config. So the change is not
> user friendly.
Apart from configuring shared_preload_libraries, but agreed.
> The idea so pg_stat_statements requires enabled computed_query_id is not
> good. There should be dependency only on the queryid column.
I agree that requiring to change compute_query_id when you already added
pg_stat_statements in shared_preload_libraries isn't good, and the patch I sent
yesterday would fix that.
I don't like the idea of implicit force enabling any feature flag, but it is better than current design. But it doesn't look like a robust solution.
Does it mean that if somebody disables computed_query_id, then pg_stat_statements will not work?
Why is there the strong dependency between computed_query_id and pg_stat_statements? Can this dependency be just optional?
Regards
Pavel
On Wed, May 12, 2021 at 08:58:45AM +0200, Pavel Stehule wrote: > > I don't like the idea of implicit force enabling any feature flag, but it > is better than current design. But it doesn't look like a robust solution. > > Does it mean that if somebody disables computed_query_id, then > pg_stat_statements will not work? It depends, but if you mean "setting up pg_stat_statements, intentionally disabling in-core queryid calculation and not configuring an alternative source" then yes pg_stat_statements will not work. But I don't see any difference from "someone reduce wal_level and complain that replication does not work" or "someone disable fsync and complain that data got corrupted". We provide a sensible default configuration, you can mess it up if you don't know what you're doing. > Why is there the strong dependency between computed_query_id and > pg_stat_statements? Can this dependency be just optional? Once again no, as it otherwise would mean that postgres unilaterally decides that pg_stat_statements' approach to compute a query identifier is the one and only ultimate truth and nothing else could be useful for anyone.
st 12. 5. 2021 v 9:13 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
On Wed, May 12, 2021 at 08:58:45AM +0200, Pavel Stehule wrote:
>
> I don't like the idea of implicit force enabling any feature flag, but it
> is better than current design. But it doesn't look like a robust solution.
>
> Does it mean that if somebody disables computed_query_id, then
> pg_stat_statements will not work?
It depends, but if you mean "setting up pg_stat_statements, intentionally
disabling in-core queryid calculation and not configuring an alternative
source" then yes pg_stat_statements will not work. But I don't see any
difference from "someone reduce wal_level and complain that replication does
not work" or "someone disable fsync and complain that data got corrupted". We
provide a sensible default configuration, you can mess it up if you don't know
what you're doing.
> Why is there the strong dependency between computed_query_id and
> pg_stat_statements? Can this dependency be just optional?
Once again no, as it otherwise would mean that postgres unilaterally decides
that pg_stat_statements' approach to compute a query identifier is the one and
only ultimate truth and nothing else could be useful for anyone.
ok. Understand.
If I understand well, then computed_query_id does not make sense for pg_stat_statemenst, because this extension always requires it.
Cannot be better to use queryid inside pg_stat_statements every time without dependency on computed_query_id? And computed_query_id can be used only for EXPLAIN and for pg_stat_activity.
pg_stat_statements cannot work without a queryid, so is useless to speak about configuration. If you use pg_stat_statements, then the queryid will be computed every time, but the visibility will be only for pg_stat_statements.
Or a different strategy. I understand so computed_query_id should be active. But I dislike the empty result of pg_stat_statements when computed_query_id is off. Is it possible to raise an exception instead of showing an empty result?
The most correct fix from my perspective is just check in function pg_stat_statements if query id is computed or not. If not, and there is no data, then raise an exception with the hint "enable compute_query_id". When there is data, then show a warning with the mentioned hint and show data.
What do you think about it?
Pavel
On Wed, May 12, 2021 at 09:51:26AM +0200, Pavel Stehule wrote: > > If I understand well, then computed_query_id does not make sense for > pg_stat_statemenst, because this extension always requires it. No, pg_stat_statements requires *a* queryid, not specifially *our* queryid. > Cannot be better to use queryid inside pg_stat_statements every time > without dependency on computed_query_id? And computed_query_id can be used > only for EXPLAIN and for pg_stat_activity. No, because then you will have a discrepancy between those two. And if you want a different queryid approach (say based on object names rather than oid so it survives logical replication), then you also want that queryid used for pg_stat_statements. And that what happen is that you have to fork pg_stat_statements to only change the queryid implementation, which is one of the thing that the patch to move the implementation to core solves. > pg_stat_statements cannot work without a queryid, so is useless to speak > about configuration. If you use pg_stat_statements, then the queryid will > be computed every time, but the visibility will be only for > pg_stat_statements. Yes, pg_stat_statements cannot work without a queryid, but it CAN work without core queryid. > Or a different strategy. I understand so computed_query_id should be > active. But I dislike the empty result of pg_stat_statements when > computed_query_id is off. Is it possible to raise an exception instead of > showing an empty result? Yes, but I don't think that it's a good idea. For instance pg_stat_statements will behave poorly if you have to regularly evict entry. For instance: any query touching a temporary table. One way to avoid that it to avoid storing entries that you know are very likely to be eventually evicted. So to fix this problem, you have 2 ways to go: 1) fix your app and explicitly disable/enable pg_stat_statements around all those queries, and hope you won't miss any 2) write your own queryid implementation to not generate a queryid in such case. 2 seems like a reasonable scenario, and if you force pg_stat_statements to error out in that case then you would be forced to use approach 1.
st 12. 5. 2021 v 10:14 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
On Wed, May 12, 2021 at 09:51:26AM +0200, Pavel Stehule wrote:
>
> If I understand well, then computed_query_id does not make sense for
> pg_stat_statemenst, because this extension always requires it.
No, pg_stat_statements requires *a* queryid, not specifially *our* queryid.
> Cannot be better to use queryid inside pg_stat_statements every time
> without dependency on computed_query_id? And computed_query_id can be used
> only for EXPLAIN and for pg_stat_activity.
No, because then you will have a discrepancy between those two. And if you
want a different queryid approach (say based on object names rather than oid so
it survives logical replication), then you also want that queryid used for
pg_stat_statements. And that what happen is that you have to fork
pg_stat_statements to only change the queryid implementation, which is one of
the thing that the patch to move the implementation to core solves.
> pg_stat_statements cannot work without a queryid, so is useless to speak
> about configuration. If you use pg_stat_statements, then the queryid will
> be computed every time, but the visibility will be only for
> pg_stat_statements.
Yes, pg_stat_statements cannot work without a queryid, but it CAN work without
core queryid.
> Or a different strategy. I understand so computed_query_id should be
> active. But I dislike the empty result of pg_stat_statements when
> computed_query_id is off. Is it possible to raise an exception instead of
> showing an empty result?
Yes, but I don't think that it's a good idea. For instance pg_stat_statements
will behave poorly if you have to regularly evict entry. For instance: any
query touching a temporary table. One way to avoid that it to avoid storing
entries that you know are very likely to be eventually evicted.
So to fix this problem, you have 2 ways to go:
1) fix your app and explicitly disable/enable pg_stat_statements around all
those queries, and hope you won't miss any
2) write your own queryid implementation to not generate a queryid in such case.
2 seems like a reasonable scenario, and if you force pg_stat_statements to
error out in that case then you would be forced to use approach 1.
The custom queryid can return null, and still the queryid will be computed. Maybe the warning can be enough. Just, if somebody use pg_stat_statements function, then enforce the check if queryid is computed (compute_query_id is true || some hook is not null), and if not then raise a warning.
On Wed, May 12, 2021 at 10:57:25AM +0200, Pavel Stehule wrote: > > My second proposal can work for your example too. pg_stat_statements have > to require any active queryid computing. And when it is not available, then > the exception should be raised. > > The custom queryid can return null, and still the queryid will be computed. > Maybe the warning can be enough. Just, if somebody use pg_stat_statements > function, then enforce the check if queryid is computed (compute_query_id > is true || some hook is not null), and if not then raise a warning. Ah I'm sorry I misunderstood your proposal. Yes, definitely adding a warning or an error when executing pg_stat_statements() SRF would help, that's a great idea! I'll wait a bit in case someone has any objection, and if not send an updated patch!
At Wed, 12 May 2021 14:05:16 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in > On Wed, May 12, 2021 at 02:33:35PM +0900, Kyotaro Horiguchi wrote: > > At Wed, 12 May 2021 10:42:01 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in > > > > > > I don't think that this approach would cope well for people who want a queryid > > > without pg_stat_statements or such. Since the queryid can now be found in > > > pg_stat_activity, EXPLAIN output or the logs I think it's entirely reasonable > > > to allow users to benefit from that even if they don't install additional > > > module. > > > > Ah, I missed that case. And we are wanting to use pg_stat_statements > > with (almost) zero-config? How about the following behavior? > > > > Setting query_id_provider to 'none' means we don't calculate query-id > > by default. However, if queryIdWante() is called, the default provider > > is set up and starts calculating query id. > > Having "none" meant "not unless someone asks for it" looks like a POLA > violation. Sorry for confusion. A different behavior for "none" is proposed later in the mail. It is just an intermediate discussion. > > Setting query_id_provider to something else means the user wants > > query-id calcualted using the provider. Setting 'default' is > > equivalent to setting compute_query_id to 'on'. > > > > There might be a case where a user sets query_id_provider to > > non-'none' but don't want have query-id calculated, but it can be said > > a kind of mis-configuration? > > So if I'm understanding correctly, you're arguing for an approach different to > what Michael stated as the general consensus in [1]. I'm not saying that I > think it's a bad idea (and I actually suggested it before), but we have to > chose an approach and stick with it. I'm not sure how much room for change of the direction is left. So it was just a proposal. So if the majority still thinks that it is the way to stick to controling on/off/(auto) the in-core implement and separately allow another module to be hooked, I don't further object to that decision. > > > I think this would be a mistake to do that, as it would mean that we don't > > > officially support alternative queryid provider. > > > > Ok, if we want to support alternative providers from the first, we > > need to actually write the loader code for query-id providers. It > > would not be so hard?, but it might not be suitable to this stage so I > > proposed that to get rid of needing such complexity for now. > > I did write a POC extension [2] to demonstrate that moving pg_stat_statement's > queryid calculation in core doesn't mean that we're imposing it to everyone. > And yes this is critical and a must have in the initial implementation. Ok, understood. > > (Anyway I prefer to load query-id provider as a dynamically loadable > > module rather than hook-function.) > > I agree that having a specific API (I'm fine with a hook or a dynamically > loaded function) for that would be better, but it doesn't appear to be the > opinion of the majority. Ugg. Ok. > > > The GUC itself may not change, but third-party queryid provider would probably > > > need changes as the new entry point will be dedicated to compute a queryid > > > only, while third-party plugins may do more than that in their > > > post_parse_analyze_hook. And also users will have to change their > > > > I don't think it is not that a problem. > > Did you mean "I don't think that it's a problem"? Otherwise I don't get it. Yes, you're right. Sorry for the typo. > > Even if any third-party > > extension is having query-id generator by itself, in most cases it > > would be a copy of JumbleQuery in case of pg_stat_statement is not > > loaded and now it is moved in-core as 'default' provider. What the > > exntension needs to be done is just ripping out the copied generator > > code. I guess... > > I don't fully understand, but it seems that you're arguing that the only use > case is to have something similar to pg_stat_statements (say e.g. > pg_store_plans), that always have the same queryid implementation as > pg_stat_statements. That's not the case, as there already are "clones" of > pg_stat_statements, and the main difference is an alternative queryid > implementation. So in that case what author would do is to drop everything > *except* the queryid implementation. > > And if I'm not mistaken, pg_store_plans also wants a different queryid > implementation, but has to handle a secondary queryid on top of it > (https://github.com/ossc-db/pg_store_plans/blob/master/pg_store_plans.c#L843-L855). Yeah, the extension intended to be used joining with the pg_stat_statements view. And the reason for the second query-id dates back to the era when query id was not available in the pg_stat_statements view. Now it is mere a fall-back query id when pg_stat_statments is not active. Now that the in-core query-id is available, I think there's no reason to keep that implement. > So here again what the extension want is to get rid of pg_stat_statements (and > now core) queryid implementation. So the extension might be a good reason for the discussion^^; > > > configuration to use that new interface, and additionally the module may now > > > have to be removed from shared_preload_libraries. Overall, it doesn't seem to > > > me that it would make users' life easier. > > > > Why the third-party module need to be removed from > > shared_preload_libraries? The module can stay as a preloaded shared > > library but just no longer need to have its own query-id provider > > since it is provided in-core. If the extension required a specific > > provider, the developer need to make it a loadable module and users > > need to specify the provider module explicitly. > > It's the same misunderstanding here. Basically people want to benefit from the > whole ecosystem based on a queryid (pg_stat_statements, now > pg_stat_activity.query_id and such) but with another definition of what a > queryid is. So those people will now only need to implement something like > [2], rather than forking every single extension they want to use. Hmm. I'm not sure the [2] gives sufficient reason for leaving the current interface. But will follow if it is sitll the consensus. (And it seems like true.) > [1]: https://www.postgresql.org/message-id/YJoeXcrwe1EDmqKT@paquier.xyz > [2]: https://github.com/rjuju/pg_queryid regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Wed, 12 May 2021 17:30:26 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in > On Wed, May 12, 2021 at 10:57:25AM +0200, Pavel Stehule wrote: > > > > My second proposal can work for your example too. pg_stat_statements have > > to require any active queryid computing. And when it is not available, then > > the exception should be raised. > > > > The custom queryid can return null, and still the queryid will be computed. > > Maybe the warning can be enough. Just, if somebody use pg_stat_statements > > function, then enforce the check if queryid is computed (compute_query_id > > is true || some hook is not null), and if not then raise a warning. > > Ah I'm sorry I misunderstood your proposal. Yes, definitely adding a warning > or an error when executing pg_stat_statements() SRF would help, that's a great > idea! > > I'll wait a bit in case someone has any objection, and if not send an updated > patch! Isn't there a case where pg_stat_statements uses an alternative query-id provider? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
st 12. 5. 2021 v 11:39 odesílatel Kyotaro Horiguchi <horikyota.ntt@gmail.com> napsal:
At Wed, 12 May 2021 17:30:26 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in
> On Wed, May 12, 2021 at 10:57:25AM +0200, Pavel Stehule wrote:
> >
> > My second proposal can work for your example too. pg_stat_statements have
> > to require any active queryid computing. And when it is not available, then
> > the exception should be raised.
> >
> > The custom queryid can return null, and still the queryid will be computed.
> > Maybe the warning can be enough. Just, if somebody use pg_stat_statements
> > function, then enforce the check if queryid is computed (compute_query_id
> > is true || some hook is not null), and if not then raise a warning.
>
> Ah I'm sorry I misunderstood your proposal. Yes, definitely adding a warning
> or an error when executing pg_stat_statements() SRF would help, that's a great
> idea!
>
> I'll wait a bit in case someone has any objection, and if not send an updated
> patch!
Isn't there a case where pg_stat_statements uses an alternative
query-id provider?
this check just can check if there is "any" query-id provider. In this context is not important if it is buildin or external
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
At Wed, 12 May 2021 18:39:27 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > At Wed, 12 May 2021 17:30:26 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in > > On Wed, May 12, 2021 at 10:57:25AM +0200, Pavel Stehule wrote: > > > > > > My second proposal can work for your example too. pg_stat_statements have > > > to require any active queryid computing. And when it is not available, then > > > the exception should be raised. > > > > > > The custom queryid can return null, and still the queryid will be computed. > > > Maybe the warning can be enough. Just, if somebody use pg_stat_statements > > > function, then enforce the check if queryid is computed (compute_query_id > > > is true || some hook is not null), and if not then raise a warning. > > > > Ah I'm sorry I misunderstood your proposal. Yes, definitely adding a warning > > or an error when executing pg_stat_statements() SRF would help, that's a great > > idea! > > > > I'll wait a bit in case someone has any objection, and if not send an updated > > patch! > > Isn't there a case where pg_stat_statements uses an alternative > query-id provider? I don't object that if we allow false non-error when an extension that uses the hooks but doesn't compute a query id. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, May 12, 2021 at 11:42:12AM +0200, Pavel Stehule wrote: > st 12. 5. 2021 v 11:39 odesílatel Kyotaro Horiguchi <horikyota.ntt@gmail.com> > napsal: > > > At Wed, 12 May 2021 17:30:26 +0800, Julien Rouhaud <rjuju123@gmail.com> > > wrote in > > > On Wed, May 12, 2021 at 10:57:25AM +0200, Pavel Stehule wrote: > > > > > > > > My second proposal can work for your example too. pg_stat_statements > > have > > > > to require any active queryid computing. And when it is not available, > > then > > > > the exception should be raised. > > > > > > > > The custom queryid can return null, and still the queryid will be > > computed. > > > > Maybe the warning can be enough. Just, if somebody use > > pg_stat_statements > > > > function, then enforce the check if queryid is computed > > (compute_query_id > > > > is true || some hook is not null), and if not then raise a warning. > > > > > > Ah I'm sorry I misunderstood your proposal. Yes, definitely adding a > > warning > > > or an error when executing pg_stat_statements() SRF would help, that's a > > great > > > idea! > > > > > > I'll wait a bit in case someone has any objection, and if not send an > > updated > > > patch! > > > > Isn't there a case where pg_stat_statements uses an alternative > > query-id provider? > > > > this check just can check if there is "any" query-id provider. In this > context is not important if it is buildin or external Yes, the idea is that if you execute "SELECT * FROM pg_stat_statements" or similar, then if the executing query itself doesn't have a queryid then it's very likely that you didn't configure compute_query_id or an alternative query_id implementation properly. And loudly complaining seems like the right thing to do.
On Wed, May 12, 2021 at 05:30:26PM +0800, Julien Rouhaud wrote: > On Wed, May 12, 2021 at 10:57:25AM +0200, Pavel Stehule wrote: > > > > My second proposal can work for your example too. pg_stat_statements have > > to require any active queryid computing. And when it is not available, then > > the exception should be raised. > > > > The custom queryid can return null, and still the queryid will be computed. > > Maybe the warning can be enough. Just, if somebody use pg_stat_statements > > function, then enforce the check if queryid is computed (compute_query_id > > is true || some hook is not null), and if not then raise a warning. > > Ah I'm sorry I misunderstood your proposal. Yes, definitely adding a warning > or an error when executing pg_stat_statements() SRF would help, that's a great > idea! > > I'll wait a bit in case someone has any objection, and if not send an updated > patch! Hearing no complaint, PFA a v2 implementing such a warning. Here's an extract from the updated regression tests: -- Check that pg_stat_statements() will complain if the configuration appears -- to be broken. SET compute_query_id = off; SELECT pg_stat_statements_reset(); pg_stat_statements_reset -------------------------- (1 row) SELECT count(*) FROM pg_stat_statements; WARNING: Query identifier calculation seems to be disabled HINT: If you don't want to use a third-party module to compute query identifiers, you may want to enable compute_query_id count ------- 0 (1 row) I'm of course open to suggestions for some better wording.
Attachment
On Wed, May 12, 2021 at 05:51:49PM +0800, Julien Rouhaud wrote: > On Wed, May 12, 2021 at 11:42:12AM +0200, Pavel Stehule wrote: > > this check just can check if there is "any" query-id provider. In this > > context is not important if it is buildin or external > > Yes, the idea is that if you execute "SELECT * FROM pg_stat_statements" or > similar, then if the executing query itself doesn't have a queryid then > it's very likely that you didn't configure compute_query_id or an alternative > query_id implementation properly. And loudly complaining seems like the right > thing to do. I understand the desire to make pg_stat_statements require minimal configuration, but frankly, if the server-side variable query id API is confusing, I think we have done more harm than good. The problem with compute_query_id=auto is that there is no way to know if the query id is actually enabled, unless you guess from the installed extensions, or we add another variable to report that, and maybe another variable to control the provier, unless we require turning compute_query_id=off if you are using custom query id computation. What if it is auto, and pg_stat_statments is installed, and you want to use a custom query id computation --- what happens? As you can see, this is all becoming very complicated. I think we might be just as well to go with compute_query_id=on/off, and just complain loudly from CREATE EXTENSION, or in the server logs on server start via shared_preload_libraries, or when querying pg_stat_statements system view. We simply say to change compute_query_id=on or to provide a custom query id implementation. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Wed, May 12, 2021 at 08:36:18PM -0400, Bruce Momjian wrote: > On Wed, May 12, 2021 at 05:51:49PM +0800, Julien Rouhaud wrote: > > On Wed, May 12, 2021 at 11:42:12AM +0200, Pavel Stehule wrote: > > > this check just can check if there is "any" query-id provider. In this > > > context is not important if it is buildin or external > > > > Yes, the idea is that if you execute "SELECT * FROM pg_stat_statements" or > > similar, then if the executing query itself doesn't have a queryid then > > it's very likely that you didn't configure compute_query_id or an alternative > > query_id implementation properly. And loudly complaining seems like the right > > thing to do. > > I understand the desire to make pg_stat_statements require minimal > configuration, but frankly, if the server-side variable query id API is > confusing, I think we have done more harm than good. > > The problem with compute_query_id=auto is that there is no way to know > if the query id is actually enabled, unless you guess from the installed > extensions, or we add another variable to report that, and maybe another > variable to control the provier, unless we require turning > compute_query_id=off if you are using custom query id computation. What > if it is auto, and pg_stat_statments is installed, and you want to use a > custom query id computation --- what happens? As you can see, this is > all becoming very complicated. Well, as implemented you can get the value of compute_query_id, and if it's still "auto" then it's not enabled as calling queryIdWanted() would turn it to on. I agree that it's not ideal but you have a way to know. We could document that auto means that it's set to auto and no one asked to automatically enabled it. Or you can just do e.g. SELECT query_id FROM pg_stat_activity WHERE pid = pg_backend_pid(); and see if you have a query_id or not. If you want to use third-party modules, they you have to explicitly disable compute_query_id. If you don't, every query execution will raise an error as we documented that third-party modules should error out if they see that a query_id is already generated. Such module could also explicitly check that compute_query_id is off and also raise an error if that's not the case. > I think we might be just as well to go with compute_query_id=on/off, and > just complain loudly from CREATE EXTENSION, or in the server logs on > server start via shared_preload_libraries, or when querying > pg_stat_statements system view. We simply say to change > compute_query_id=on or to provide a custom query id implementation. I'm not opposed to that, but it was already suggested and apparently people didn't like that approach.
At Wed, 12 May 2021 18:09:30 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in > On Wed, May 12, 2021 at 06:37:24PM +0900, Kyotaro Horiguchi wrote: > > At Wed, 12 May 2021 14:05:16 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in > > > > > > And if I'm not mistaken, pg_store_plans also wants a different queryid > > > implementation, but has to handle a secondary queryid on top of it > > > (https://github.com/ossc-db/pg_store_plans/blob/master/pg_store_plans.c#L843-L855). > > > > Yeah, the extension intended to be used joining with the > > pg_stat_statements view. And the reason for the second query-id dates > > back to the era when query id was not available in the > > pg_stat_statements view. Now it is mere a fall-back query id when > > pg_stat_statments is not active. Now that the in-core query-id is > > available, I think there's no reason to keep that implement. > > > > > So here again what the extension want is to get rid of pg_stat_statements (and > > > now core) queryid implementation. > > > > So the extension might be a good reason for the discussion^^; > > Indeed. So IIUC, what pg_store_plans wants is: Ugg. Very sorry. My brain should need more oxygen, or caffeine.. The last sentence forgetting a negation. The plugin does not need a special query-id provider so the special provider can be removed without problems if the core provides one. > - to use its own query_id implementation > - to be able to be joined to pg_stat_statements > > Is that correct? It is correct, but a bit short in detail. The query_id of its own is provided because pg_stat_statements did not expose query_id. And it has been preserved only for the case the plugin is used without pg_stat_statements activated. Now that the in-core query_id is available, the last reason for the special provider has gone. > If yes, it seems that starting with pg14, it can be easily achieved by: So, it would be a bit different. > - documenting to disable compute_query_id documenting to *not disable* compute_query_id. That is set it to on or auto. > - eventually error out at execution time if it's enabled So. the extension would check if any query_id provider *is* active. > - don't call queryIdWanted() > - expose its query_id > > It will then work just fine, and will be more efficient compared to what is > done today as only one queryid will be calculated. After reading Magnus' comment nearby, I realized that my most significant concern here is how to know any query_id provider is active. The way of setting the hook cannot enforce notifying that kind of things on plugins. For me implementing them as a dll looked as one of the most promising way of enabling that without needing any boiler-plates. Another not-prefect (in that it needs a boiler-plate) but workable way is letting query-id providers set some variable including GUC explicitly as Magnus' suggested. GUC would be better in that it is naturally observable from users. Even though there'a possibility that a developer of a query_id provider forgets to set it, but maybe it would be easily noticeable. On the other hand it gives a sure means to know any query_id provider is active. How about adding a GUC_INTERNAL "current_query_provider" or such? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Wed, 12 May 2021 20:36:18 -0400, Bruce Momjian <bruce@momjian.us> wrote in > On Wed, May 12, 2021 at 05:51:49PM +0800, Julien Rouhaud wrote: > > On Wed, May 12, 2021 at 11:42:12AM +0200, Pavel Stehule wrote: > > > this check just can check if there is "any" query-id provider. In this > > > context is not important if it is buildin or external > > > > Yes, the idea is that if you execute "SELECT * FROM pg_stat_statements" or > > similar, then if the executing query itself doesn't have a queryid then > > it's very likely that you didn't configure compute_query_id or an alternative > > query_id implementation properly. And loudly complaining seems like the right > > thing to do. > > I understand the desire to make pg_stat_statements require minimal > configuration, but frankly, if the server-side variable query id API is > confusing, I think we have done more harm than good. > > The problem with compute_query_id=auto is that there is no way to know > if the query id is actually enabled, unless you guess from the installed > extensions, or we add another variable to report that, and maybe another > variable to control the provier, unless we require turning > compute_query_id=off if you are using custom query id computation. What > if it is auto, and pg_stat_statments is installed, and you want to use a > custom query id computation --- what happens? As you can see, this is > all becoming very complicated. > > I think we might be just as well to go with compute_query_id=on/off, and > just complain loudly from CREATE EXTENSION, or in the server logs on > server start via shared_preload_libraries, or when querying > pg_stat_statements system view. We simply say to change > compute_query_id=on or to provide a custom query id implementation. FWIW, I personally am fine with that (ignoring details :p), that is, leaving the whole responsibility of a sane setup to users. If we are going to automate even a part of it, I think we need to make it perfect at least to a certain level. The current auery_id = auto looks like somewhat halfway, or narrow-ranged. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, May 13, 2021 at 08:52:36AM +0800, Julien Rouhaud wrote: > On Wed, May 12, 2021 at 08:36:18PM -0400, Bruce Momjian wrote: > > The problem with compute_query_id=auto is that there is no way to know > > if the query id is actually enabled, unless you guess from the installed > > extensions, or we add another variable to report that, and maybe another > > variable to control the provier, unless we require turning > > compute_query_id=off if you are using custom query id computation. What > > if it is auto, and pg_stat_statments is installed, and you want to use a > > custom query id computation --- what happens? As you can see, this is > > all becoming very complicated. > > Well, as implemented you can get the value of compute_query_id, and if it's > still "auto" then it's not enabled as calling queryIdWanted() would turn it to > on. I agree that it's not ideal but you have a way to know. We could document > that auto means that it's set to auto and no one asked to automatically enabled > it. Wow, so the extension changes it? How do we record the "source" of that change? Do we have other GUCs that do that? > Or you can just do e.g. > > SELECT query_id FROM pg_stat_activity WHERE pid = pg_backend_pid(); > > and see if you have a query_id or not. True. > If you want to use third-party modules, they you have to explicitly disable > compute_query_id. If you don't, every query execution will raise an error as > we documented that third-party modules should error out if they see that a > query_id is already generated. Such module could also explicitly check that > compute_query_id is off and also raise an error if that's not the case. OK. > > I think we might be just as well to go with compute_query_id=on/off, and > > just complain loudly from CREATE EXTENSION, or in the server logs on > > server start via shared_preload_libraries, or when querying > > pg_stat_statements system view. We simply say to change > > compute_query_id=on or to provide a custom query id implementation. > > I'm not opposed to that, but it was already suggested and apparently people > didn't like that approach. Also probably true. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
At Thu, 13 May 2021 09:59:43 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > How about adding a GUC_INTERNAL "current_query_provider" or such? On the second thought, I wonder why we don't just call JumbleQuery in pgss_post_parse_analyze when compute_query_id is "off". We can think this behavior as the following. - compute_query_id sets whether the *internal* query-id provider turn on. If it is "off", query_id in , for example, pg_stat_activity is not set. Even in that case it is set to some valid value if some alternative query-id provider is active. On the other hand pg_stat_statements looks as if providing "alternative" query-id provider, but actually it is just calling in-core JumbleQuery if not called yet. @@ -830,6 +830,10 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate) return; } + /* Call in-core JumbleQuery if it was not called in-core */ + if (!jstate) + jstate = JumbleQuery(query, pstate->p_sourcetext); + /* Any plugins that want to use its own query-id generator can WARN if jstate is not NULL, but also can proceed ignoring the exisint jstate. WARNING: the default query-id provier is active, turn off compute_query_id to avoid unnecessary calculation regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, May 12, 2021 at 09:13:25PM -0400, Bruce Momjian wrote: > On Thu, May 13, 2021 at 08:52:36AM +0800, Julien Rouhaud wrote: > > > > Well, as implemented you can get the value of compute_query_id, and if it's > > still "auto" then it's not enabled as calling queryIdWanted() would turn it to > > on. I agree that it's not ideal but you have a way to know. We could document > > that auto means that it's set to auto and no one asked to automatically enabled > > it. > > Wow, so the extension changes it? Yes. It seemed better to go this way rather than having a secondary read-only GUC for that. > How do we record the "source" of that > change? Do we have other GUCs that do that? No, we don't. But I don't know what exactly you would like to have as a source? What if you have for instance pg_stat_statements, pg_stat_kcache, pg_store_plans and pg_wait_sampling installed? All those extensions need a query_id (or at least benefit from it for pg_wait_sampling), is there any value to give a full list of all the modules that would enable compute_query_id? I'm assuming that anyone wanting to install any of those extensions (or any similar one) is fully aware that they aggregate metrics based on at least a query_id. If they don't, well they probably never read any documentation since postgres 9.2 which introduced query normalization, and I doubt that they will be interested in having access to the information anyway.
At Thu, 13 May 2021 10:02:45 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in > On Thu, May 13, 2021 at 10:51:52AM +0900, Kyotaro Horiguchi wrote: > > At Thu, 13 May 2021 09:59:43 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > > > How about adding a GUC_INTERNAL "current_query_provider" or such? > > > > On the second thought, I wonder why we don't just call JumbleQuery in > > pgss_post_parse_analyze when compute_query_id is "off". > > Because not generating a query_id for a custom query_id implementation is a > valid use case for queries that are known to lead to huge pg_stat_statements > overhead, as I mentioned in [1]. For the record I implemented that in > pg_queryid (optionally don't generate query_id for queries referencing a temp > relation) yesterday evening as a POC for that approach. Yes, I know. So I said that "if not yet called". I believe any "real" alternative query-id provider is supposed to be hooked "before" pg_stat_statements. (It is a kind of magic to control the order of plugins, though..) When the alternative provider generated a query_id (that is, it has set jstate), pg_stat_statment doesn't call the in-core JumbleQuery and uses the givin query_id. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Thu, 13 May 2021 11:26:29 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > At Thu, 13 May 2021 10:02:45 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in > > On Thu, May 13, 2021 at 10:51:52AM +0900, Kyotaro Horiguchi wrote: > > > At Thu, 13 May 2021 09:59:43 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > > > > How about adding a GUC_INTERNAL "current_query_provider" or such? > > > > > > On the second thought, I wonder why we don't just call JumbleQuery in > > > pgss_post_parse_analyze when compute_query_id is "off". > > > > Because not generating a query_id for a custom query_id implementation is a > > valid use case for queries that are known to lead to huge pg_stat_statements > > overhead, as I mentioned in [1]. For the record I implemented that in > > pg_queryid (optionally don't generate query_id for queries referencing a temp > > relation) yesterday evening as a POC for that approach. > > Yes, I know. So I said that "if not yet called". I believe any "real" > alternative query-id provider is supposed to be hooked "before" > pg_stat_statements. (It is a kind of magic to control the order of > plugins, though..) When the alternative provider generated a query_id > (that is, it has set jstate), pg_stat_statment doesn't call the > in-core JumbleQuery and uses the givin query_id. Forgot to mention, I think that the state "query_id provider is active but it has not assigned one to this query" can be signaled by jstate=<non-null> and query_id = 0. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Thu, 13 May 2021 10:39:20 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in > On Thu, May 13, 2021 at 11:30:56AM +0900, Kyotaro Horiguchi wrote: > > At Thu, 13 May 2021 11:26:29 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > > > At Thu, 13 May 2021 10:02:45 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in > > > Yes, I know. So I said that "if not yet called". I believe any "real" > > > alternative query-id provider is supposed to be hooked "before" > > > pg_stat_statements. (It is a kind of magic to control the order of > > > plugins, though..) When the alternative provider generated a query_id > > > (that is, it has set jstate), pg_stat_statment doesn't call the > > > in-core JumbleQuery and uses the givin query_id. > > > > Forgot to mention, I think that the state "query_id provider is active > > but it has not assigned one to this query" can be signaled by > > jstate=<non-null> and query_id = 0. > > I assume that you mean "third-party query_id provider" here, as the core one > will always return a non-zero query_id? Right. > I guess it could work, but a lot of people are complaining that having > compute_query_id = [ off | on | auto ] is too confusing, so I don't see how > having "off" means "sometimes off, sometimes on" is going to be any clearer for > users. I don't get it. It read as "people are complaining the tristate is too confusing, so I made it tristate"? For the second point, so I said that the variable controls whether the "internal" query-id pvovider turn on. It is more clearer if the name were something like "use_internal_query_id_generator". regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, May 13, 2021 at 09:57:00AM +0800, Julien Rouhaud wrote: > On Wed, May 12, 2021 at 09:13:25PM -0400, Bruce Momjian wrote: > > On Thu, May 13, 2021 at 08:52:36AM +0800, Julien Rouhaud wrote: > > > > > > Well, as implemented you can get the value of compute_query_id, and if it's > > > still "auto" then it's not enabled as calling queryIdWanted() would turn it to > > > on. I agree that it's not ideal but you have a way to know. We could document > > > that auto means that it's set to auto and no one asked to automatically enabled > > > it. > > > > Wow, so the extension changes it? > > Yes. It seemed better to go this way rather than having a secondary read-only > GUC for that. > > > How do we record the "source" of that > > change? Do we have other GUCs that do that? > > No, we don't. But I don't know what exactly you would like to have as a OK. > source? What if you have for instance pg_stat_statements, pg_stat_kcache, > pg_store_plans and pg_wait_sampling installed? All those extensions need a > query_id (or at least benefit from it for pg_wait_sampling), is there any value > to give a full list of all the modules that would enable compute_query_id? Well, we don't have any other cases where the source of the change is this indeterminate, so I don't really have a suggestion. I think this does highlight another case where 'auto' just isn't a good fit for our API. > I'm assuming that anyone wanting to install any of those extensions (or any > similar one) is fully aware that they aggregate metrics based on at least a > query_id. If they don't, well they probably never read any documentation since > postgres 9.2 which introduced query normalization, and I doubt that they will > be interested in having access to the information anyway. My point is that we are changing a setting from an extension, and if you look in pg_setting, it will say "default"? If the user already has to edit postgresql.conf to set shared_preload_libraries, I still don't see why having them set compute_query_id at the same time is a significant problem and a reason to distort our API to do 'auto'. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
At Thu, 13 May 2021 10:43:03 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in > On Thu, May 13, 2021 at 11:26:29AM +0900, Kyotaro Horiguchi wrote: > > > > I believe any "real" > > alternative query-id provider is supposed to be hooked "before" > > pg_stat_statements. (It is a kind of magic to control the order of > > plugins, though.. > > Indeed, you have to configure shared_preload_libraries depending on whether > each module calls the previous post_parse_analyze_hook before or after its own > processing, and that's the main reason why I think a dedicated entry point > would be better. I see it as cleaner than the status-quo. (But still believing less cleaner than DLL:p, since the same problem happens if two query_id-generating modules are competing on the new hook ponit.). You told that a special query-id provider needed to be separated to another DLL, but a preload shared librarie is also a dll and I think any exported function in it can be obtained via load_external_function(). As the result, even if we take the DLL approach, still not need to split out the query-id provider part. By the following config: > query_id_provider = 'pg_stat_statements' the core can obtain the entrypoint of, say, "_PG_calculate_query_id" to call it. And it can be of another module. It seems like the only problem doing that is we don't have a means to call per-process intializer for a preload libralies. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, May 12, 2021 at 11:06:52PM -0400, Bruce Momjian wrote: > On Thu, May 13, 2021 at 09:57:00AM +0800, Julien Rouhaud wrote: > > > source? What if you have for instance pg_stat_statements, pg_stat_kcache, > > pg_store_plans and pg_wait_sampling installed? All those extensions need a > > query_id (or at least benefit from it for pg_wait_sampling), is there any value > > to give a full list of all the modules that would enable compute_query_id? > > Well, we don't have any other cases where the source of the change is > this indeterminate, so I don't really have a suggestion. I think this > does highlight another case where 'auto' just isn't a good fit for our > API. It may depends on your next question > > I'm assuming that anyone wanting to install any of those extensions (or any > > similar one) is fully aware that they aggregate metrics based on at least a > > query_id. If they don't, well they probably never read any documentation since > > postgres 9.2 which introduced query normalization, and I doubt that they will > > be interested in having access to the information anyway. > > My point is that we are changing a setting from an extension, and if you > look in pg_setting, it will say "default"? No, it will say "on". What the patch I sent implements is: - compute_query_id = on means it was either explicitly set to on, or automatically set to on if it was allowed to (so initially set to auto). It means you know whether core query_id calculation is enabled or not, you can know looking at the reset value it it was changed by an extension, you just don't know which one(s). - compute_query_id = auto means that it can be set to on, it just wasn't yet, so it's off, and may change if you have an extension that can be dynamically loaded and request for core query_id calculation to be enabled - compute_query_id = off means it's off > If the user already has to edit postgresql.conf to set > shared_preload_libraries, I still don't see why having them set > compute_query_id at the same time is a significant problem and a reason > to distort our API to do 'auto'. Looking at the arguments until now my understanding is that it's because "no one will read the doc anyway".
On Thu, May 13, 2021 at 11:16:13AM +0800, Julien Rouhaud wrote: > On Wed, May 12, 2021 at 11:06:52PM -0400, Bruce Momjian wrote: > > On Thu, May 13, 2021 at 09:57:00AM +0800, Julien Rouhaud wrote: > > > > > source? What if you have for instance pg_stat_statements, pg_stat_kcache, > > > pg_store_plans and pg_wait_sampling installed? All those extensions need a > > > query_id (or at least benefit from it for pg_wait_sampling), is there any value > > > to give a full list of all the modules that would enable compute_query_id? > > > > Well, we don't have any other cases where the source of the change is > > this indeterminate, so I don't really have a suggestion. I think this > > does highlight another case where 'auto' just isn't a good fit for our > > API. > > It may depends on your next question > > > > I'm assuming that anyone wanting to install any of those extensions (or any > > > similar one) is fully aware that they aggregate metrics based on at least a > > > query_id. If they don't, well they probably never read any documentation since > > > postgres 9.2 which introduced query normalization, and I doubt that they will > > > be interested in having access to the information anyway. > > > > My point is that we are changing a setting from an extension, and if you > > look in pg_setting, it will say "default"? > > No, it will say "on". What the patch I sent implements is: I was asking what pg_settings.source will say: SELECT name, soource FROM pg_settings; > - compute_query_id = on means it was either explicitly set to on, or > automatically set to on if it was allowed to (so initially set to auto). It > means you know whether core query_id calculation is enabled or not, you can > know looking at the reset value it it was changed by an extension, you just > don't know which one(s). > > - compute_query_id = auto means that it can be set to on, it just wasn't yet, > so it's off, and may change if you have an extension that can be dynamically > loaded and request for core query_id calculation to be enabled So, it is 'off', but not set to 'off' in the GUC sense --- just off as in not being computed. You can see the confusion in my just reading that sentence. > - compute_query_id = off means it's off > > > If the user already has to edit postgresql.conf to set > > shared_preload_libraries, I still don't see why having them set > > compute_query_id at the same time is a significant problem and a reason > > to distort our API to do 'auto'. > > Looking at the arguments until now my understanding is that it's because "no > one will read the doc anyway". How do they know to set shared_preload_libraries then? We change the user API all the time, especially in GUCs, and even rename them, but for some reason we don't think people using pg_stat_statements can be trusted to read the release notes and change their behavior. I just don't get it. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote: > On Thu, May 13, 2021 at 11:16:13AM +0800, Julien Rouhaud wrote: > > On Wed, May 12, 2021 at 11:06:52PM -0400, Bruce Momjian wrote: > > > On Thu, May 13, 2021 at 09:57:00AM +0800, Julien Rouhaud wrote: > > > > > > > source? What if you have for instance pg_stat_statements, pg_stat_kcache, > > > > pg_store_plans and pg_wait_sampling installed? All those extensions need a > > > > query_id (or at least benefit from it for pg_wait_sampling), is there any value > > > > to give a full list of all the modules that would enable compute_query_id? > > > > > > Well, we don't have any other cases where the source of the change is > > > this indeterminate, so I don't really have a suggestion. I think this > > > does highlight another case where 'auto' just isn't a good fit for our > > > API. > > > > It may depends on your next question > > > > > > I'm assuming that anyone wanting to install any of those extensions (or any > > > > similar one) is fully aware that they aggregate metrics based on at least a > > > > query_id. If they don't, well they probably never read any documentation since > > > > postgres 9.2 which introduced query normalization, and I doubt that they will > > > > be interested in having access to the information anyway. > > > > > > My point is that we are changing a setting from an extension, and if you > > > look in pg_setting, it will say "default"? > > > > No, it will say "on". What the patch I sent implements is: > > I was asking what pg_settings.source will say: > > SELECT name, soource FROM pg_settings; Oh sorry. Here's the full output before and after a dynamic call to queryIsWanted(): name | compute_query_id setting | auto unit | <NULL> category | Statistics / Monitoring short_desc | Compute query identifiers. extra_desc | <NULL> context | superuser vartype | enum source | default min_val | <NULL> max_val | <NULL> enumvals | {auto,on,off} boot_val | auto reset_val | auto sourcefile | <NULL> sourceline | <NULL> pending_restart | f =# LOAD 'pg_qualstats'; /* will call queryIsWanted() */ WARNING: 01000: Without shared_preload_libraries, only current backend stats will be available. LOAD name | compute_query_id setting | on unit | <NULL> category | Statistics / Monitoring short_desc | Compute query identifiers. extra_desc | <NULL> context | superuser vartype | enum source | default min_val | <NULL> max_val | <NULL> enumvals | {auto,on,off} boot_val | auto reset_val | auto sourcefile | <NULL> sourceline | <NULL> pending_restart | f So yes, it says "default" (and it's the same if the change is done during postmaster init). It can be fixed if that's the only issue. > > > - compute_query_id = on means it was either explicitly set to on, or > > automatically set to on if it was allowed to (so initially set to auto). It > > means you know whether core query_id calculation is enabled or not, you can > > know looking at the reset value it it was changed by an extension, you just > > don't know which one(s). > > > > - compute_query_id = auto means that it can be set to on, it just wasn't yet, > > so it's off, and may change if you have an extension that can be dynamically > > loaded and request for core query_id calculation to be enabled > > So, it is 'off', but not set to 'off' in the GUC sense --- just off as > in not being computed. You can see the confusion in my just reading > that sentence. It's technically not "off" but "not on yet", but that's probably just making it worse :) > How do they know to set shared_preload_libraries then? We change the > user API all the time, especially in GUCs, and even rename them, but for > some reason we don't think people using pg_stat_statements can be > trusted to read the release notes and change their behavior. I just > don't get it. I don't know what to say. So here is a summary of the complaints that I'm aware of: - https://www.postgresql.org/message-id/1953aec168224b95b0c962a622bef0794da6ff40.camel@moonset.ru That was only a couple of days after the commit just before the feature freeze, so it may be the less relevant complaint. - https://www.postgresql.org/message-id/CAOxo6XJEYunL71g0yD-zRzNRRqBG0Ssw-ARygy5pGRdSjK5YLQ%40mail.gmail.com Did a git bisect to find the commit that changed the behavior and somehow didn't notice the new setting - this thread, with Fuji-san saying: > I'm afraid that users may easily forget to enable compute_query_id when using > pg_stat_statements (because this setting was not necessary in v13 or before) - this thread, with Peter E. saying: > Now there is the additional burden of turning on this weird setting that > no one understands. That's a 50% increase in burden. And almost no one will > want to use a nondefault setting. pg_stat_statements is pretty popular. I > think leaving in this requirement will lead to widespread confusion and > complaints. - this thread, with Pavel saying: > Until now, the pg_stat_statements was zero-config. So the change is not user > friendly. So it's a mix of "it's changing something that didn't change in a long time" and "it's adding extra footgun and/or burden as it's not doing by default what the majority of users will want", with an overwhelming majority of people supporting the "we don't want that extra burden".
On 2021/05/13 13:03, Julien Rouhaud wrote: > On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote: >> On Thu, May 13, 2021 at 11:16:13AM +0800, Julien Rouhaud wrote: >>> On Wed, May 12, 2021 at 11:06:52PM -0400, Bruce Momjian wrote: >>>> On Thu, May 13, 2021 at 09:57:00AM +0800, Julien Rouhaud wrote: >>>> >>>>> source? What if you have for instance pg_stat_statements, pg_stat_kcache, >>>>> pg_store_plans and pg_wait_sampling installed? All those extensions need a >>>>> query_id (or at least benefit from it for pg_wait_sampling), is there any value >>>>> to give a full list of all the modules that would enable compute_query_id? >>>> >>>> Well, we don't have any other cases where the source of the change is >>>> this indeterminate, so I don't really have a suggestion. I think this >>>> does highlight another case where 'auto' just isn't a good fit for our >>>> API. >>> >>> It may depends on your next question >>> >>>>> I'm assuming that anyone wanting to install any of those extensions (or any >>>>> similar one) is fully aware that they aggregate metrics based on at least a >>>>> query_id. If they don't, well they probably never read any documentation since >>>>> postgres 9.2 which introduced query normalization, and I doubt that they will >>>>> be interested in having access to the information anyway. >>>> >>>> My point is that we are changing a setting from an extension, and if you >>>> look in pg_setting, it will say "default"? >>> >>> No, it will say "on". What the patch I sent implements is: >> >> I was asking what pg_settings.source will say: >> >> SELECT name, soource FROM pg_settings; > > Oh sorry. Here's the full output before and after a dynamic call to > queryIsWanted(): > > name | compute_query_id > setting | auto > unit | <NULL> > category | Statistics / Monitoring > short_desc | Compute query identifiers. > extra_desc | <NULL> > context | superuser > vartype | enum > source | default > min_val | <NULL> > max_val | <NULL> > enumvals | {auto,on,off} > boot_val | auto > reset_val | auto > sourcefile | <NULL> > sourceline | <NULL> > pending_restart | f > > =# LOAD 'pg_qualstats'; /* will call queryIsWanted() */ > WARNING: 01000: Without shared_preload_libraries, only current backend stats will be available. > LOAD > > name | compute_query_id > setting | on > unit | <NULL> > category | Statistics / Monitoring > short_desc | Compute query identifiers. > extra_desc | <NULL> > context | superuser > vartype | enum > source | default > min_val | <NULL> > max_val | <NULL> > enumvals | {auto,on,off} > boot_val | auto > reset_val | auto > sourcefile | <NULL> > sourceline | <NULL> > pending_restart | f > > So yes, it says "default" (and it's the same if the change is done during > postmaster init). It can be fixed if that's the only issue. I like leaving compute_query_id=auto instead of overwriting it to "on", even when queryIsWanted() is called, as I told upthread. Then we can decide that query id computation is necessary when compute_query_id=auto and the special flag is enabled by queryIsWanted(). Of course as you and Magnus discussed upthread, the issue in EXEC_BACKEND case should be fixed, maybe by using save_backend_variables() or something, though. If we do this, compute_query_id=auto seems to be similar to huge_pages=try. When huge_pages=try, whether huge pages is actually used is defined depending outside PostgreSQL (i.e, OS setting in this case). Neither pg_settings.setting nor souce are not changed in that case. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Le jeu. 13 mai 2021 à 12:18, Fujii Masao <masao.fujii@oss.nttdata.com> a écrit :
I like leaving compute_query_id=auto instead of overwriting it to "on",
even when queryIsWanted() is called, as I told upthread. Then we can decide
that query id computation is necessary when compute_query_id=auto and
the special flag is enabled by queryIsWanted(). Of course as you and Magnus
discussed upthread, the issue in EXEC_BACKEND case should be fixed,
maybe by using save_backend_variables() or something, though.
If we do this, compute_query_id=auto seems to be similar to huge_pages=try.
When huge_pages=try, whether huge pages is actually used is defined depending
outside PostgreSQL (i.e, OS setting in this case). Neither pg_settings.setting nor
souce are not changed in that case.
I'm fine with that, but a lot of people complained that it wasn't good because you don't really know if it's actually on or not. I personally don't think that it's an issue, because what user want is to automagumically do what they want, not check how the magic happened, and if they want a third party implementation then the module can error out if the setting is on, so the burden will only be for those users, and handled by the third party module author.
At Thu, 13 May 2021 12:11:12 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > As the result, even if we take the DLL approach, still not need to > split out the query-id provider part. By the following config: > > > query_id_provider = 'pg_stat_statements' > > the core can obtain the entrypoint of, say, "_PG_calculate_query_id" > to call it. And it can be of another module. > > It seems like the only problem doing that is we don't have a means to > call per-process intializer for a preload libralies. So this is a crude PoC of that. pg_stat_statemnts defines its own query-id provider function in pg_stat_statements which calls in-core DefaultJumbeQuery (end emits a log line). If server started with query_id_provider='auto' and pg_stat_statements is not loaded, pg_stat_activity.query_id is null. If query_id_provider='auto' and pg_stat_statements is loaded, pg_stat_activity.query_id is filled in with a query id. If query_id_provider='default' or 'pg_stat_statements' and pg_stat_statements is not loaded, pg_stat_activity.query_id is filled in with a query id. If query_id_provider='none' and pg_stat_statements is loaded, it complains as "query id provider is not available" and refuss to start. If showing the variable, it shows the real provider name instead of the setting in postgresql.conf. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index a85f962801..207c4362af 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -295,6 +295,7 @@ static bool pgss_save; /* whether to save stats across shutdown */ void _PG_init(void); void _PG_fini(void); +JumbleState *_PG_calculate_query_id(Query *query, const char *querytext); PG_FUNCTION_INFO_V1(pg_stat_statements_reset); PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7); @@ -478,6 +479,13 @@ _PG_fini(void) ProcessUtility_hook = prev_ProcessUtility; } +/* Test queryid provider function */ +JumbleState *_PG_calculate_query_id(Query *query, const char *querytext) +{ + elog(LOG, "Called query id generatr of pg_stat_statements"); + return DefaultJumbleQuery(query, querytext); +} + /* * shmem_startup hook: allocate or attach to shared memory, * then load any pre-existing statistics from file. @@ -544,6 +552,11 @@ pgss_shmem_startup(void) if (!IsUnderPostmaster) on_shmem_exit(pgss_shmem_shutdown, (Datum) 0); + /* request my defalt provider, but allow exisint one */ + if (!queryIdWanted("pg_stat_statements", true)) + ereport(ERROR, + (errmsg ("query_id provider is not available"))); + /* * Done if some other process already completed our initialization. */ diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 1202bf85a3..be00564221 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -245,8 +245,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, es->summary = (summary_set) ? es->summary : es->analyze; query = castNode(Query, stmt->query); - if (compute_query_id) - jstate = JumbleQuery(query, pstate->p_sourcetext); + jstate = JumbleQuery(query, pstate->p_sourcetext); if (post_parse_analyze_hook) (*post_parse_analyze_hook) (pstate, query, jstate); diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c index 168198acd1..bdf3a5a6d1 100644 --- a/src/backend/parser/analyze.c +++ b/src/backend/parser/analyze.c @@ -124,8 +124,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText, query = transformTopLevelStmt(pstate, parseTree); - if (compute_query_id) - jstate = JumbleQuery(query, sourceText); + jstate = JumbleQuery(query, sourceText); if (post_parse_analyze_hook) (*post_parse_analyze_hook) (pstate, query, jstate); @@ -163,8 +162,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText, /* make sure all is well with parameter types */ check_variable_parameters(pstate, query); - if (compute_query_id) - jstate = JumbleQuery(query, sourceText); + jstate = JumbleQuery(query, sourceText); if (post_parse_analyze_hook) (*post_parse_analyze_hook) (pstate, query, jstate); diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 6200699ddd..1034dfea28 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -704,8 +704,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree, query = transformTopLevelStmt(pstate, parsetree); - if (compute_query_id) - jstate = JumbleQuery(query, query_string); + jstate = JumbleQuery(query, query_string); if (post_parse_analyze_hook) (*post_parse_analyze_hook) (pstate, query, jstate); diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index eb7f7181e4..70d06b825e 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -101,6 +101,7 @@ #include "utils/plancache.h" #include "utils/portal.h" #include "utils/ps_status.h" +#include "utils/queryjumble.h" #include "utils/rls.h" #include "utils/snapmgr.h" #include "utils/tzparser.h" @@ -534,7 +535,6 @@ extern const struct config_enum_entry dynamic_shared_memory_options[]; /* * GUC option variables that are exported from this module */ -bool compute_query_id = false; bool log_duration = false; bool Debug_print_plan = false; bool Debug_print_parse = false; @@ -1441,15 +1441,6 @@ static struct config_bool ConfigureNamesBool[] = true, NULL, NULL, NULL }, - { - {"compute_query_id", PGC_SUSET, STATS_MONITORING, - gettext_noop("Compute query identifiers."), - NULL - }, - &compute_query_id, - false, - NULL, NULL, NULL - }, { {"log_parser_stats", PGC_SUSET, STATS_MONITORING, gettext_noop("Writes parser performance statistics to the server log."), @@ -4579,6 +4570,16 @@ static struct config_string ConfigureNamesString[] = check_backtrace_functions, assign_backtrace_functions, NULL }, + + { + {"query_id_provider", PGC_SUSET, CLIENT_CONN_PRELOAD, + gettext_noop("Sets the query-id provider."), + }, + &query_id_provider, + "auto", + check_query_id_provider, assign_query_id_provider, NULL + }, + /* End-of-list marker */ { {NULL, 0, 0, NULL, NULL}, NULL, NULL, NULL, NULL, NULL diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample index efde01ee56..fc31ce15c4 100644 --- a/src/backend/utils/misc/postgresql.conf.sample +++ b/src/backend/utils/misc/postgresql.conf.sample @@ -604,7 +604,7 @@ # - Monitoring - -#compute_query_id = off +#query_id_provider = 'auto' #log_statement_stats = off #log_parser_stats = off #log_planner_stats = off diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c index f004a9ce8c..709d654ea0 100644 --- a/src/backend/utils/misc/queryjumble.c +++ b/src/backend/utils/misc/queryjumble.c @@ -47,6 +47,96 @@ static void JumbleRangeTable(JumbleState *jstate, List *rtable); static void JumbleRowMarks(JumbleState *jstate, List *rowMarks); static void JumbleExpr(JumbleState *jstate, Node *node); static void RecordConstLocation(JumbleState *jstate, int location); +static JumbleState *DummyJumbleQuery(Query *query, const char *querytext); + +char *query_id_provider = NULL; +static bool lock_provider = false; + +typedef JumbleState *(*JumbleQueryType) (Query *query, const char *querytext); +JumbleQueryType JumbleQuery = NULL; + +typedef struct QueryIdProviderExtra +{ + JumbleQueryType pfunc; +} QueryIdProviderExtra; + +bool +check_query_id_provider(char **newval, void **extra, GucSource source) +{ + QueryIdProviderExtra *param = NULL; + + if (lock_provider) + return false; + + param = (QueryIdProviderExtra *)malloc(sizeof(QueryIdProviderExtra)); + if (param == NULL) + return false; + + if (strcmp(*newval, "none") == 0 || + strcmp(*newval, "auto") == 0) + param->pfunc = DummyJumbleQuery; + else if (strcmp(*newval, "default") == 0) + param->pfunc = DefaultJumbleQuery; + else + param->pfunc = + load_external_function(*newval, "_PG_calculate_query_id", + false, NULL); + + if (param->pfunc == NULL) + { + free(param); + param = NULL; + GUC_check_errdetail("failed to load query id provider"); + return false; + } + + *extra = (void *) param; + return true; +} + +void +assign_query_id_provider(const char *newval, void *extra) +{ + QueryIdProviderExtra *param = (QueryIdProviderExtra *)extra; + + JumbleQuery = param->pfunc; +} + +bool +queryIdWanted(char *provider_name, bool use_existing) +{ + JumbleQueryType func; + + Assert(query_id_provider != NULL); + Assert(JumbleQuery != NULL); + + if (lock_provider || strcmp(query_id_provider, "none") == 0) + return false; + + /* use existing provider when use_existing */ + if (strcmp(query_id_provider, "auto") != 0 && use_existing) + return true; + + if (strcmp(provider_name, "default") == 0) + func = DefaultJumbleQuery; + else + func = load_external_function(provider_name, "_PG_calculate_query_id", + false, NULL); + + if (func == NULL) + return false; + + elog(LOG, "query-id provider \"%s\" loaded", provider_name); + JumbleQuery = func; + + /* expose real provider name */ + SetConfigOption("query_id_provider", provider_name, + PGC_POSTMASTER, PGC_S_OVERRIDE); + + lock_provider = true; + + return true; +} /* * Given a possibly multi-statement source string, confine our attention to the @@ -92,7 +182,7 @@ CleanQuerytext(const char *query, int *location, int *len) } JumbleState * -JumbleQuery(Query *query, const char *querytext) +DefaultJumbleQuery(Query *query, const char *querytext) { JumbleState *jstate = NULL; @@ -132,6 +222,12 @@ JumbleQuery(Query *query, const char *querytext) return jstate; } +static JumbleState * +DummyJumbleQuery(Query *query, const char *querytext) +{ + return NULL; +} + /* * Compute a query identifier for the given utility query string. */ diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h index 24a5d9d3fb..a7c3a4958e 100644 --- a/src/include/utils/guc.h +++ b/src/include/utils/guc.h @@ -247,7 +247,6 @@ extern bool log_btree_build_stats; extern PGDLLIMPORT bool check_function_bodies; extern bool session_auth_is_superuser; -extern bool compute_query_id; extern bool log_duration; extern int log_parameter_max_length; extern int log_parameter_max_length_on_error; diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h index 83ba7339fa..682d687b79 100644 --- a/src/include/utils/queryjumble.h +++ b/src/include/utils/queryjumble.h @@ -15,6 +15,7 @@ #define QUERYJUBLE_H #include "nodes/parsenodes.h" +#include "utils/guc.h" #define JUMBLE_SIZE 1024 /* query serialization buffer size */ @@ -52,7 +53,12 @@ typedef struct JumbleState int highest_extern_param_id; } JumbleState; +extern char *query_id_provider; +extern JumbleState *(*JumbleQuery)(Query *query, const char *querytext); + +JumbleState *DefaultJumbleQuery(Query *query, const char *querytext); +bool queryIdWanted(char *provider_name, bool use_existing); +bool check_query_id_provider(char **newval, void **extra, GucSource source); +void assign_query_id_provider(const char *newval, void *extra); const char *CleanQuerytext(const char *query, int *location, int *len); -JumbleState *JumbleQuery(Query *query, const char *querytext); - #endif /* QUERYJUMBLE_H */ diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out index cda28098ba..16375d5596 100644 --- a/src/test/regress/expected/explain.out +++ b/src/test/regress/expected/explain.out @@ -477,7 +477,7 @@ select jsonb_pretty( (1 row) rollback; -set compute_query_id = on; +set compute_query_id = 'default'; select explain_filter('explain (verbose) select * from int8_tbl i8'); explain_filter ----------------------------------------------------------------
At Thu, 13 May 2021 13:26:37 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > At Thu, 13 May 2021 12:11:12 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > > As the result, even if we take the DLL approach, still not need to > > split out the query-id provider part. By the following config: > > > > > query_id_provider = 'pg_stat_statements' > > > > the core can obtain the entrypoint of, say, "_PG_calculate_query_id" > > to call it. And it can be of another module. > > > > It seems like the only problem doing that is we don't have a means to > > call per-process intializer for a preload libralies. > > So this is a crude PoC of that. > > pg_stat_statemnts defines its own query-id provider function in > pg_stat_statements which calls in-core DefaultJumbeQuery (end emits a > log line). > > If server started with query_id_provider='auto' and pg_stat_statements > is not loaded, pg_stat_activity.query_id is null. > > If query_id_provider='auto' and pg_stat_statements is loaded, > pg_stat_activity.query_id is filled in with a query id. > > If query_id_provider='default' or 'pg_stat_statements' and > pg_stat_statements is not loaded, pg_stat_activity.query_id is filled > in with a query id. > > If query_id_provider='none' and pg_stat_statements is loaded, it > complains as "query id provider is not available" and refuss to start. > > If showing the variable, it shows the real provider name instead of > the setting in postgresql.conf. The change contains needless things that tries to handle per-backend change case, so it would be simpler assuming we don't want on-the-fly change of provider (and I believe so since that change surely causes inconsistency) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, May 12, 2021 at 9:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote: > > How do they know to set shared_preload_libraries then? We change the > > user API all the time, especially in GUCs, and even rename them, but for > > some reason we don't think people using pg_stat_statements can be > > trusted to read the release notes and change their behavior. I just > > don't get it. > > I don't know what to say. So here is a summary of the complaints that I'm > aware of: > > - https://www.postgresql.org/message-id/1953aec168224b95b0c962a622bef0794da6ff40.camel@moonset.ru > That was only a couple of days after the commit just before the feature freeze, > so it may be the less relevant complaint. > > - https://www.postgresql.org/message-id/CAOxo6XJEYunL71g0yD-zRzNRRqBG0Ssw-ARygy5pGRdSjK5YLQ%40mail.gmail.com > Did a git bisect to find the commit that changed the behavior and somehow > didn't notice the new setting > > - this thread, with Fuji-san saying: > > > I'm afraid that users may easily forget to enable compute_query_id when using > > pg_stat_statements (because this setting was not necessary in v13 or before) > > - this thread, with Peter E. saying: > > > Now there is the additional burden of turning on this weird setting that > > no one understands. That's a 50% increase in burden. And almost no one will > > want to use a nondefault setting. pg_stat_statements is pretty popular. I > > think leaving in this requirement will lead to widespread confusion and > > complaints. > > - this thread, with Pavel saying: > > > Until now, the pg_stat_statements was zero-config. So the change is not user > > friendly. > > So it's a mix of "it's changing something that didn't change in a long time" > and "it's adding extra footgun and/or burden as it's not doing by default what > the majority of users will want", with an overwhelming majority of people > supporting the "we don't want that extra burden". For what it's worth, I don't think the actual changing of an extra setting is that big a burden: it's the figuring out that you need to change it, and how you should configure it, that is the problem. Especially since all major search engines still seem to return 9.4 (!) documentation as the first hit for a "pg_stat_statements" search. The common case (installing pg_stat_statements but not tweaking query id generation) should be simple.
Le jeu. 13 mai 2021 à 12:52, Maciek Sakrejda <m.sakrejda@gmail.com> a écrit :
For what it's worth, I don't think the actual changing of an extra
setting is that big a burden: it's the figuring out that you need to
change it, and how you should configure it, that is the problem.
Especially since all major search engines still seem to return 9.4 (!)
documentation as the first hit for a "pg_stat_statements" search. The
common case (installing pg_stat_statements but not tweaking query id
generation) should be simple.
the v2 patch I sent should address both your requirements.
On Wed, May 12, 2021 at 9:58 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > Le jeu. 13 mai 2021 à 12:52, Maciek Sakrejda <m.sakrejda@gmail.com> a écrit : >> >> For what it's worth, I don't think the actual changing of an extra >> setting is that big a burden: it's the figuring out that you need to >> change it, and how you should configure it, that is the problem. >> Especially since all major search engines still seem to return 9.4 (!) >> documentation as the first hit for a "pg_stat_statements" search. The >> common case (installing pg_stat_statements but not tweaking query id >> generation) should be simple. > > > the v2 patch I sent should address both your requirements. Yes, thanks--I just tried it and this is great. I just wanted to argue against reversing course here.
On Wed, May 12, 2021 at 10:31:01PM -0700, Maciek Sakrejda wrote: > On Wed, May 12, 2021 at 9:58 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Le jeu. 13 mai 2021 à 12:52, Maciek Sakrejda <m.sakrejda@gmail.com> a écrit : > >> > >> For what it's worth, I don't think the actual changing of an extra > >> setting is that big a burden: it's the figuring out that you need to > >> change it, and how you should configure it, that is the problem. > >> Especially since all major search engines still seem to return 9.4 (!) > >> documentation as the first hit for a "pg_stat_statements" search. The > >> common case (installing pg_stat_statements but not tweaking query id > >> generation) should be simple. > > > > > > the v2 patch I sent should address both your requirements. > > Yes, thanks--I just tried it and this is great. I just wanted to argue > against reversing course here. Oh ok. Good news then, thanks!
At Thu, 13 May 2021 12:33:47 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in > Le jeu. 13 mai 2021 à 12:26, Kyotaro Horiguchi <horikyota.ntt@gmail.com> a > écrit : > > > At Thu, 13 May 2021 12:11:12 +0900 (JST), Kyotaro Horiguchi < > > horikyota.ntt@gmail.com> wrote in > > pg_stat_statemnts defines its own query-id provider function in > > pg_stat_statements which calls in-core DefaultJumbeQuery (end emits a > > log line). > > > > If server started with query_id_provider='auto' and pg_stat_statements > > is not loaded, pg_stat_activity.query_id is null. > > > > If query_id_provider='auto' and pg_stat_statements is loaded, > > pg_stat_activity.query_id is filled in with a query id. > > > > If query_id_provider='default' or 'pg_stat_statements' and > > pg_stat_statements is not loaded, pg_stat_activity.query_id is filled > > in with a query id. > > > > If query_id_provider='none' and pg_stat_statements is loaded, it > > complains as "query id provider is not available" and refuss to start. > > > > If showing the variable, it shows the real provider name instead of > > the setting in postgresql.conf. > > > > what if you want to have some other extensions like pg_stat_kcache or > pg_store_plans that need a query_id but don't really care if > pg_stat_statements is enabled or not? should they all declare their own Thanks for looking it. The addtional provider function in pg_stat_statements is just an example to show what if it needs its own query-id provider, which is useless in reality. In reality pg_stat_statements just calls "queryIdWanted("default", true)" to use any query-id provider and use the in-core one as the fallback implement, and no need to define its own one. Any extension can use the in-core provider and accepting any other ones by calling queryIdWanted("default", true) then get what they want regardless of existence of pg_stat_statements. > wrapper too? should the system complain or silently ignore when they all > try to install their provider function? Of course if two extensions require diffrent query-id providers, they just cannot coexist (that is, server refuses to start). It is quite sane behavior in the standpoint of safety. I think almost all query-id users don't insist on a specific implmentation. (So the second parameter to queryIdWanted() could be omtted and assumed to be true.) reagrds. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, May 13, 2021 at 12:03:42PM +0800, Julien Rouhaud wrote: > On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote: > I don't know what to say. So here is a summary of the complaints that I'm > aware of: > > - https://www.postgresql.org/message-id/1953aec168224b95b0c962a622bef0794da6ff40.camel@moonset.ru > That was only a couple of days after the commit just before the feature freeze, > so it may be the less relevant complaint. > > - https://www.postgresql.org/message-id/CAOxo6XJEYunL71g0yD-zRzNRRqBG0Ssw-ARygy5pGRdSjK5YLQ%40mail.gmail.com > Did a git bisect to find the commit that changed the behavior and somehow > didn't notice the new setting > > - this thread, with Fuji-san saying: > > > I'm afraid that users may easily forget to enable compute_query_id when using > > pg_stat_statements (because this setting was not necessary in v13 or before) > > - this thread, with Peter E. saying: > > > Now there is the additional burden of turning on this weird setting that > > no one understands. That's a 50% increase in burden. And almost no one will > > want to use a nondefault setting. pg_stat_statements is pretty popular. I > > think leaving in this requirement will lead to widespread confusion and > > complaints. > > - this thread, with Pavel saying: > > > Until now, the pg_stat_statements was zero-config. So the change is not user > > friendly. > > So it's a mix of "it's changing something that didn't change in a long time" > and "it's adding extra footgun and/or burden as it's not doing by default what > the majority of users will want", with an overwhelming majority of people > supporting the "we don't want that extra burden". Well, now that we have clear warnings when it is misconfigured, especially when querying the pg_stat_statements view, are these complaints still valid? Also, how is modifying shared_preload_libraries zero-config, but modifying shared_preload_libraries and compute_query_id a huge burden? I am personally not comfortable committing a patch to add an auto option the way it is implemented, so another committer will need to take ownership of this, or the entire feature can be removed. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Thu, May 13, 2021 at 7:42 AM Bruce Momjian <bruce@momjian.us> wrote: > > On Thu, May 13, 2021 at 12:03:42PM +0800, Julien Rouhaud wrote: > > On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote: > > I don't know what to say. So here is a summary of the complaints that I'm > > aware of: > > > > - https://www.postgresql.org/message-id/1953aec168224b95b0c962a622bef0794da6ff40.camel@moonset.ru > > That was only a couple of days after the commit just before the feature freeze, > > so it may be the less relevant complaint. > > > > - https://www.postgresql.org/message-id/CAOxo6XJEYunL71g0yD-zRzNRRqBG0Ssw-ARygy5pGRdSjK5YLQ%40mail.gmail.com > > Did a git bisect to find the commit that changed the behavior and somehow > > didn't notice the new setting > > > > - this thread, with Fuji-san saying: > > > > > I'm afraid that users may easily forget to enable compute_query_id when using > > > pg_stat_statements (because this setting was not necessary in v13 or before) > > > > - this thread, with Peter E. saying: > > > > > Now there is the additional burden of turning on this weird setting that > > > no one understands. That's a 50% increase in burden. And almost no one will > > > want to use a nondefault setting. pg_stat_statements is pretty popular. I > > > think leaving in this requirement will lead to widespread confusion and > > > complaints. > > > > - this thread, with Pavel saying: > > > > > Until now, the pg_stat_statements was zero-config. So the change is not user > > > friendly. > > > > So it's a mix of "it's changing something that didn't change in a long time" > > and "it's adding extra footgun and/or burden as it's not doing by default what > > the majority of users will want", with an overwhelming majority of people > > supporting the "we don't want that extra burden". > > Well, now that we have clear warnings when it is misconfigured, > especially when querying the pg_stat_statements view, are these > complaints still valid? Also, how is modifying > shared_preload_libraries zero-config, but modifying > shared_preload_libraries and compute_query_id a huge burden? The warning makes it clear that there's something wrong, but not how to fix it (as I noted in another message in this thread, a web search for pg_stat_statements docs still leads to an obsolete version). I don't think anyone is arguing that this is insurmountable for all users, but it is additional friction, and every bit of friction makes Postgres harder to use. Users don't read documentation, or misread documentation, or just are not sure what the documentation or the warning is telling them, in spite of our best efforts. And you're right, modifying shared_preload_libraries is not zero-config--I would love it if we could drop that requirement ;). Still, adding another setting is clearly one more thing to get wrong. > I am personally not comfortable committing a patch to add an auto option > the way it is implemented, so another committer will need to take > ownership of this, or the entire feature can be removed. Assuming we do want to avoid additional configuration requirements for pg_stat_statements, is there another mechanism you feel would work better? Or is that constraint incompatible with sane behavior for this feature?
On Thu, May 13, 2021 at 10:41:43AM -0400, Bruce Momjian wrote: > > Well, now that we have clear warnings when it is misconfigured, > especially when querying the pg_stat_statements view, are these > complaints still valid? I'm personally fine with it, and I can send a new version with just the warning when calling pg_stat_statements() or one of the view(s). Or was there other warnings that you were referring too? > I am personally not comfortable committing a patch to add an auto option > the way it is implemented, so another committer will need to take > ownership of this, or the entire feature can be removed. That's fair. Just to be clear, I'm assuming that you also don't like Horigushi-san approach either?
On Thu, May 13, 2021 at 08:32:50AM -0700, Maciek Sakrejda wrote: > > The warning makes it clear that there's something wrong, but not how > to fix it I'm confused, are we talking about the new warning in v2 as suggested by Pavel? As it should make things quite clear: +SELECT count(*) FROM pg_stat_statements; +WARNING: Query identifier calculation seems to be disabled +HINT: If you don't want to use a third-party module to compute query identifiers, you may want to enable compute_query_id The wording can of course be improved.
On Thu, May 13, 2021 at 08:32:50AM -0700, Maciek Sakrejda wrote: > > Well, now that we have clear warnings when it is misconfigured, > > especially when querying the pg_stat_statements view, are these > > complaints still valid? Also, how is modifying > > shared_preload_libraries zero-config, but modifying > > shared_preload_libraries and compute_query_id a huge burden? > > The warning makes it clear that there's something wrong, but not how > to fix it (as I noted in another message in this thread, a web search > for pg_stat_statements docs still leads to an obsolete version). I > don't think anyone is arguing that this is insurmountable for all > users, but it is additional friction, and every bit of friction makes > Postgres harder to use. Users don't read documentation, or misread > documentation, or just are not sure what the documentation or the > warning is telling them, in spite of our best efforts. Well, then let's have the error message tell them what is wrong and how to fix it. My issue is that 'auto' spreads confusion around the entire API, as you can see from the discussion in this thread. > And you're right, modifying shared_preload_libraries is not > zero-config--I would love it if we could drop that requirement ;). > Still, adding another setting is clearly one more thing to get wrong. > > > I am personally not comfortable committing a patch to add an auto option > > the way it is implemented, so another committer will need to take > > ownership of this, or the entire feature can be removed. > > Assuming we do want to avoid additional configuration requirements for > pg_stat_statements, is there another mechanism you feel would work > better? Or is that constraint incompatible with sane behavior for this > feature? I think we just need to leave it is on/off, and then help people find the way to fix it if the misconfigure it, which I think is already been shown to be possible. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Thu, May 13, 2021 at 11:35:13PM +0800, Julien Rouhaud wrote: > On Thu, May 13, 2021 at 10:41:43AM -0400, Bruce Momjian wrote: > > > > Well, now that we have clear warnings when it is misconfigured, > > especially when querying the pg_stat_statements view, are these > > complaints still valid? > > I'm personally fine with it, and I can send a new version with just the > warning when calling pg_stat_statements() or one of the view(s). Or was there > other warnings that you were referring too? No, that was the big fix that made misconfiguration very clear to users who didn't see the change before. > > I am personally not comfortable committing a patch to add an auto option > > the way it is implemented, so another committer will need to take > > ownership of this, or the entire feature can be removed. > > That's fair. Just to be clear, I'm assuming that you also don't like > Horigushi-san approach either? Uh, anything with 'auto', I don't like. I am afraid if I commit it, I would feel responsible for the long tail of confusion this will cause users, which is why I was saying I would rather remove it than be responsible for causing such confusion. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Thu, May 13, 2021 at 8:38 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Thu, May 13, 2021 at 08:32:50AM -0700, Maciek Sakrejda wrote: > > > > The warning makes it clear that there's something wrong, but not how > > to fix it > > I'm confused, are we talking about the new warning in v2 as suggested by Pavel? > As it should make things quite clear: > > +SELECT count(*) FROM pg_stat_statements; > +WARNING: Query identifier calculation seems to be disabled > +HINT: If you don't want to use a third-party module to compute query identifiers, you may want to enable compute_query_id > > The wording can of course be improved. I meant that no warning can be as clear as things just working, but I do have feedback on the specific message here: * "seems to" be disabled? Is it? Any reason not to be more definitive here? * On reading the beginning of the hint, I can see users asking themselves, "Do I want to use a third-party module to compute query identifiers?" * "may want to enable"? Are there any situations where I don't want to use a third-party module *and* I don't want to enable compute_query_id? So maybe something like > +SELECT count(*) FROM pg_stat_statements; > +WARNING: Query identifier calculation is disabled > +HINT: You must enable compute_query_id or configure a third-party module to compute query identifiers in order to usepg_stat_statements. (I admit "configure a third-party module" is pretty vague, but I think that suggests it's only an option to consider if you know what you're doing.) Also, if you're configuring this for usage with a tool like pganalyze, and neglect to run a manual query (we guide users to do that, but they may skip that step), the warnings may not even be visible (the Go driver we are using does not surface client warnings). Should this be an error instead of a warning? Is it ever useful to get an empty result set from querying pg_stat_statements? Using an error here would parallel the behavior of shared_preload_libraries not including pg_stat_statements.
On Thu, May 13, 2021 at 09:30:55AM -0700, Maciek Sakrejda wrote: > On Thu, May 13, 2021 at 8:38 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Thu, May 13, 2021 at 08:32:50AM -0700, Maciek Sakrejda wrote: > > > > > > The warning makes it clear that there's something wrong, but not how > > > to fix it > > > > I'm confused, are we talking about the new warning in v2 as suggested by Pavel? > > As it should make things quite clear: > > > > +SELECT count(*) FROM pg_stat_statements; > > +WARNING: Query identifier calculation seems to be disabled > > +HINT: If you don't want to use a third-party module to compute query identifiers, you may want to enable compute_query_id > > > > The wording can of course be improved. > > I meant that no warning can be as clear as things just working, but I > do have feedback on the specific message here: > > * "seems to" be disabled? Is it? Any reason not to be more definitive here? > * On reading the beginning of the hint, I can see users asking > themselves, "Do I want to use a third-party module to compute query > identifiers?" > * "may want to enable"? Are there any situations where I don't want > to use a third-party module *and* I don't want to enable > compute_query_id? > > So maybe something like > > > +SELECT count(*) FROM pg_stat_statements; > > +WARNING: Query identifier calculation is disabled > > +HINT: You must enable compute_query_id or configure a third-party module to compute query identifiers in order to usepg_stat_statements. Yes, I like this. The reason the old message was so vague is that 'auto', the default some people wanted, didn't issue that error, only 'off' did, so there was an assumption you wanted a custom module since you changed the value to off. If we are going with just on/off, no auto, the message you suggest, leading with compute_query_id, is the right approach. > (I admit "configure a third-party module" is pretty vague, but I think > that suggests it's only an option to consider if you know what you're > doing.) Seems fine to me. > Also, if you're configuring this for usage with a tool like pganalyze, > and neglect to run a manual query (we guide users to do that, but they > may skip that step), the warnings may not even be visible (the Go > driver we are using does not surface client warnings). Should this be > an error instead of a warning? Is it ever useful to get an empty > result set from querying pg_stat_statements? Using an error here would > parallel the behavior of shared_preload_libraries not including > pg_stat_statements. Good question. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On 5/13/21 12:18 AM, Fujii Masao wrote: > > > > If we do this, compute_query_id=auto seems to be similar to > huge_pages=try. > When huge_pages=try, whether huge pages is actually used is defined > depending > outside PostgreSQL (i.e, OS setting in this case). Neither > pg_settings.setting nor > souce are not changed in that case. > > Not a bad analogy, showing there's some precedent for this sort of thing. The only thing that bugs me is that we're pretty damn late in the process to be engaging in this amount of design. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Thu, May 13, 2021 at 12:45:07PM -0400, Andrew Dunstan wrote: > > On 5/13/21 12:18 AM, Fujii Masao wrote: > > > > > > > > If we do this, compute_query_id=auto seems to be similar to > > huge_pages=try. > > When huge_pages=try, whether huge pages is actually used is defined > > depending > > outside PostgreSQL (i.e, OS setting in this case). Neither > > pg_settings.setting nor > > souce are not changed in that case. > > > > > > Not a bad analogy, showing there's some precedent for this sort of thing. > > > The only thing that bugs me is that we're pretty damn late in the > process to be engaging in this amount of design. The issue is that there is no external way to check what query id computation is being used, unlike huge pages, which can be queried from the operating system. I also agree it is late, and discussion of auto continues to show cases where this makes later improvements more complex. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Andrew Dunstan <andrew@dunslane.net> writes: > The only thing that bugs me is that we're pretty damn late in the > process to be engaging in this amount of design. Indeed. I feel that this feature was forced in before it was really ready. regards, tom lane
On Thu, May 13, 2021 at 01:17:16PM -0400, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > The only thing that bugs me is that we're pretty damn late in the > > process to be engaging in this amount of design. > > Indeed. I feel that this feature was forced in before it was really > ready. The user API has always been a challenge for this feature but I thought we had it ironed out. What I didn't anticipate were the configuration complaints, and if those are valid, the feature should be removed since we can't improve it at this point, nor do I have any idea if that is even possible without unacceptable negatives. If the configuration complaints are invalid, what we have now is very good, I think, though adding more warnings about misconfiguration would be wise. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > The only thing that bugs me is that we're pretty damn late in the > > process to be engaging in this amount of design. > > Indeed. I feel that this feature was forced in before it was really > ready. I'm coming around to have a similar feeling. While having an alternative query ID might be useful, I have a hard time seeing it as likely to be a hugely popular feature that is worth a lot of users complaining (as we've seen already, multiple times, before even getting to beta...) that things aren't working anymore. That we can't figure out which libraries to load automatically based on what extensions have been installed and therefore make everyone have to change shared_preload_libraries isn't a good thing and requiring additional configuration for extremely common extensions like pg_stat_statements is making it worse. Thanks, Stephen
Attachment
Re: Bruce Momjian > Well, now that we have clear warnings when it is misconfigured, > especially when querying the pg_stat_statements view, are these > complaints still valid? Also, how is modifying > shared_preload_libraries zero-config, but modifying > shared_preload_libraries and compute_query_id a huge burden? It's zero-config in the sense that if you want to have pg_stat_statements, loading that module via shared_preload_libraries is just natural. Having to set compute_query_id isn't natural. It's a setting with a completely different name, and the connection of pg_stat_statements to compute_query_id isn't obvious. The reasoning with "we have warnings and stuff" might be ok if pg_stat_statements were a new thing, but it has worked via shared_preload_libraries only for the last decade, and requiring something extra will confuse probably every single user of pg_stat_statements out there. Perhaps worse, note that these warnings will likely first be seen by the end users of databases, not by the admin performing the initial setup or upgrade, who will not be able to fix the problem themselves. Christoph
On Thu, May 13, 2021 at 01:33:27PM -0400, Stephen Frost wrote: > I'm coming around to have a similar feeling. While having an > alternative query ID might be useful, I have a hard time seeing it as > likely to be a hugely popular feature that is worth a lot of users > complaining (as we've seen already, multiple times, before even getting > to beta...) that things aren't working anymore. That we can't figure > out which libraries to load automatically based on what extensions have > been installed and therefore make everyone have to change > shared_preload_libraries isn't a good thing and requiring additional > configuration for extremely common extensions like pg_stat_statements is > making it worse. Would someone please explain what is wrong with what is in the tree now, except that it needs additional warnings about misconfiguration? Requiring two postgresql.conf changes instead of one doesn't seem like a valid complaint to me, especially if the warnings are in place and the release notes mention it. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Greetings, * Bruce Momjian (bruce@momjian.us) wrote: > On Thu, May 13, 2021 at 01:33:27PM -0400, Stephen Frost wrote: > > I'm coming around to have a similar feeling. While having an > > alternative query ID might be useful, I have a hard time seeing it as > > likely to be a hugely popular feature that is worth a lot of users > > complaining (as we've seen already, multiple times, before even getting > > to beta...) that things aren't working anymore. That we can't figure > > out which libraries to load automatically based on what extensions have > > been installed and therefore make everyone have to change > > shared_preload_libraries isn't a good thing and requiring additional > > configuration for extremely common extensions like pg_stat_statements is > > making it worse. > > Would someone please explain what is wrong with what is in the tree > now, except that it needs additional warnings about misconfiguration? > Requiring two postgresql.conf changes instead of one doesn't seem like a > valid complaint to me, especially if the warnings are in place and the > release notes mention it. Will you be updating pg_upgrade to detect and throw a warning during check in the event that it discovers a broken config? If not, then I don't think you're correct in arguing that this need for additional configuration isn't a valid complaint. Thanks, Stephen
Attachment
On Thu, May 13, 2021 at 01:51:07PM -0400, Stephen Frost wrote: > Greetings, > > * Bruce Momjian (bruce@momjian.us) wrote: > > On Thu, May 13, 2021 at 01:33:27PM -0400, Stephen Frost wrote: > > > I'm coming around to have a similar feeling. While having an > > > alternative query ID might be useful, I have a hard time seeing it as > > > likely to be a hugely popular feature that is worth a lot of users > > > complaining (as we've seen already, multiple times, before even getting > > > to beta...) that things aren't working anymore. That we can't figure > > > out which libraries to load automatically based on what extensions have > > > been installed and therefore make everyone have to change > > > shared_preload_libraries isn't a good thing and requiring additional > > > configuration for extremely common extensions like pg_stat_statements is > > > making it worse. > > > > Would someone please explain what is wrong with what is in the tree > > now, except that it needs additional warnings about misconfiguration? > > Requiring two postgresql.conf changes instead of one doesn't seem like a > > valid complaint to me, especially if the warnings are in place and the > > release notes mention it. > > Will you be updating pg_upgrade to detect and throw a warning during > check in the event that it discovers a broken config? Uh, how does this relate to pg_upgrade? Are you saying someone misconfigures the new system with pg_stat_statements but not query id? The server would still start and upgrade, no? How is this different from any other GUC we rename? I am not following much of the logic in this discussion, frankly. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Thu, May 13, 2021 at 07:39:45PM +0200, Christoph Berg wrote: > Re: Bruce Momjian > > Well, now that we have clear warnings when it is misconfigured, > > especially when querying the pg_stat_statements view, are these > > complaints still valid? Also, how is modifying > > shared_preload_libraries zero-config, but modifying > > shared_preload_libraries and compute_query_id a huge burden? > > It's zero-config in the sense that if you want to have > pg_stat_statements, loading that module via shared_preload_libraries > is just natural. > > Having to set compute_query_id isn't natural. It's a setting with a > completely different name, and the connection of pg_stat_statements to > compute_query_id isn't obvious. > > The reasoning with "we have warnings and stuff" might be ok if > pg_stat_statements were a new thing, but it has worked via > shared_preload_libraries only for the last decade, and requiring > something extra will confuse probably every single user of > pg_stat_statements out there. > > Perhaps worse, note that these warnings will likely first be seen by > the end users of databases, not by the admin performing the initial > setup or upgrade, who will not be able to fix the problem themselves. Well, but doing this extra configuration, the query id shows up in a lot more places. I can't imagine how this could be done cleanly without requiring extra configuration, unless the query_id computation was cheaper to compute and we could enable it by default. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Greetings, * Bruce Momjian (bruce@momjian.us) wrote: > On Thu, May 13, 2021 at 07:39:45PM +0200, Christoph Berg wrote: > > Re: Bruce Momjian > > > Well, now that we have clear warnings when it is misconfigured, > > > especially when querying the pg_stat_statements view, are these > > > complaints still valid? Also, how is modifying > > > shared_preload_libraries zero-config, but modifying > > > shared_preload_libraries and compute_query_id a huge burden? > > > > It's zero-config in the sense that if you want to have > > pg_stat_statements, loading that module via shared_preload_libraries > > is just natural. Not sure about natural but it's certainly what folks have at least become used to. We should be working to eliminate it though. > > Having to set compute_query_id isn't natural. It's a setting with a > > completely different name, and the connection of pg_stat_statements to > > compute_query_id isn't obvious. > > > > The reasoning with "we have warnings and stuff" might be ok if > > pg_stat_statements were a new thing, but it has worked via > > shared_preload_libraries only for the last decade, and requiring > > something extra will confuse probably every single user of > > pg_stat_statements out there. As we keep seeing, over and over. The ongoing comments claiming that it's "just" a minor additional configuration tweak fall pretty flat when you consider the number of times it's already been brought up, and who it has been brought up by. > > Perhaps worse, note that these warnings will likely first be seen by > > the end users of databases, not by the admin performing the initial > > setup or upgrade, who will not be able to fix the problem themselves. I don't think this is appreciated anywhere near well enough. This takes existing perfectly working configurations and actively breaks them in a manner that isn't obvious and isn't something that an admin would have any idea about until after they've upgraded and then started trying to query the view. That's pretty terrible. > Well, but doing this extra configuration, the query id shows up in a lot > more places. I can't imagine how this could be done cleanly without > requiring extra configuration, unless the query_id computation was > cheaper to compute and we could enable it by default. There's a ridiculously simple option here which is: drop the idea that we support an extension redefining the query id and then just make it on/off with the default to be 'on'. If people actually have a problem with it being on and they don't want to use pg_stat_statements then they can turn it off. This won't break any existing configs that are out there in the field and avoids the complexity of having some kind of 'auto' config. I do agree with the general idea of wanting to be extensible but I'm not convinced that, in this particular case, it's worth all of this. I'm somewhat convinced that having a way to disable the query id is useful in limited cases and if people want a way to do that, then we can give that to them in a straightfoward way that doens't break things. Thanks, Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > There's a ridiculously simple option here which is: drop the idea that > we support an extension redefining the query id and then just make it > on/off with the default to be 'on'. I do not think that defaulting it to 'on' is acceptable unless you can show that the added overhead is negligible. IIUC the measurements that have been done show the opposite. Maybe we should revert this thing pending somebody doing the work to make a version of queryid labeling that actually is negligibly cheap. It certainly seems like that could be done; one more traversal of the parse tree can't be that expensive in itself. I suspect that the performance problem is with the particular hashing mechanism that was used, which looks mighty ad-hoc anyway. regards, tom lane
Greetings, * Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > There's a ridiculously simple option here which is: drop the idea that > > we support an extension redefining the query id and then just make it > > on/off with the default to be 'on'. > > I do not think that defaulting it to 'on' is acceptable unless you can > show that the added overhead is negligible. IIUC the measurements that > have been done show the opposite. Ah, right, it had only been done before when pg_stat_statements was loaded.. In which case, it seems like we should: a) go back to that b) if someone wants an alternative query ID, tell them to add it to pg_stat_statements and make it configurable *there* c) Have pg_stat_statements provide another function/view/etc that folks can use to get a queryid for an ongoing query ..? d) Maybe come up with a way for extensions, generically, to make a value available to log_line_prefix? That could be pretty interesting. Or just accept that this is a bit hokey with the 'auto' approach. I get Bruce has concerns about it but I'm not convinced that it's actually all that bad to go with that. Thanks, Stephen
Attachment
On 2021-May-13, Stephen Frost wrote: > Or just accept that this is a bit hokey with the 'auto' approach. I get > Bruce has concerns about it but I'm not convinced that it's actually all > that bad to go with that. Yeah, I think the alleged confusion there is overstated. I'm happy to act as committer for that if he wants to step away from it. I'm already used to being lapidated at every corner anyway. -- Álvaro Herrera Valdivia, Chile "E pur si muove" (Galileo Galilei)
On Thu, May 13, 2021 at 02:29:09PM -0400, Tom Lane wrote: > Stephen Frost <sfrost@snowman.net> writes: > > There's a ridiculously simple option here which is: drop the idea that > > we support an extension redefining the query id and then just make it > > on/off with the default to be 'on'. > > I do not think that defaulting it to 'on' is acceptable unless you can > show that the added overhead is negligible. IIUC the measurements that > have been done show the opposite. Agreed. > Maybe we should revert this thing pending somebody doing the work to > make a version of queryid labeling that actually is negligibly cheap. > It certainly seems like that could be done; one more traversal of the > parse tree can't be that expensive in itself. I suspect that the > performance problem is with the particular hashing mechanism that > was used, which looks mighty ad-hoc anyway. I was surprised it was ~2%. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Thu, May 13, 2021 at 03:04:30PM -0400, Álvaro Herrera wrote: > On 2021-May-13, Stephen Frost wrote: > > > Or just accept that this is a bit hokey with the 'auto' approach. I get > > Bruce has concerns about it but I'm not convinced that it's actually all > > that bad to go with that. > > Yeah, I think the alleged confusion there is overstated. > > I'm happy to act as committer for that if he wants to step away from it. > I'm already used to being lapidated at every corner anyway. OK, feel free to take ownership of it, thanks. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On 5/13/21 3:04 PM, Alvaro Herrera wrote: > On 2021-May-13, Stephen Frost wrote: > >> Or just accept that this is a bit hokey with the 'auto' approach. I get >> Bruce has concerns about it but I'm not convinced that it's actually all >> that bad to go with that. > Yeah, I think the alleged confusion there is overstated. > > I'm happy to act as committer for that if he wants to step away from it. > I'm already used to being lapidated at every corner anyway. > Many thanks Alvaro, among other things for teaching me a new word. cheers (delapidated) andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Thu, May 13, 2021 at 07:19:11PM -0400, Andrew Dunstan wrote: > > On 5/13/21 3:04 PM, Alvaro Herrera wrote: > > On 2021-May-13, Stephen Frost wrote: > > > >> Or just accept that this is a bit hokey with the 'auto' approach. I get > >> Bruce has concerns about it but I'm not convinced that it's actually all > >> that bad to go with that. > > Yeah, I think the alleged confusion there is overstated. > > > > I'm happy to act as committer for that if he wants to step away from it. > > I'm already used to being lapidated at every corner anyway. > > > > > Many thanks Alvaro, among other things for teaching me a new word. > > (delapidated) andrew Yes, I had to look it up too. :-) -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Thu, May 13, 2021 at 07:19:11PM -0400, Andrew Dunstan wrote: > On 5/13/21 3:04 PM, Alvaro Herrera wrote: >> I'm happy to act as committer for that if he wants to step away from it. >> I'm already used to being lapidated at every corner anyway. > > Many thanks Alvaro, among other things for teaching me a new word. +1. Thanks, Alvaro. -- Michael
Attachment
Here's a first attempt at what was suggested. If you say "auto" it remains auto in SHOW, but it gets enabled if a module asks for it. Not final yet, but I thought I'd throw it out for early commentary ... -- Álvaro Herrera Valdivia, Chile
Attachment
On Thu, May 13, 2021 at 08:04:37PM -0400, Álvaro Herrera wrote: > Here's a first attempt at what was suggested. If you say "auto" it > remains auto in SHOW, but it gets enabled if a module asks for it. > > Not final yet, but I thought I'd throw it out for early commentary ... I certainly like this idea better than having the extension change the output of the GUC. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Fri, May 14, 2021 at 3:12 AM Bruce Momjian <bruce@momjian.us> wrote: > > > Maybe we should revert this thing pending somebody doing the work to > > make a version of queryid labeling that actually is negligibly cheap. > > It certainly seems like that could be done; one more traversal of the > > parse tree can't be that expensive in itself. I suspect that the > > performance problem is with the particular hashing mechanism that > > was used, which looks mighty ad-hoc anyway. > > I was surprised it was ~2%. Just to be clear, the 2% was a worst case scenario, ie. a very fast read-only query on small data returning a single row. As soon as you get something more realistic / expensive the overhead goes away. For reference here is the detail: https://www.postgresql.org/message-id/CAOBaU_ZVmGPfKTwZ6cM_qdzaF2E1gMkrLDMwwLy4Z1JxQ6=CZg@mail.gmail.com
On Fri, May 14, 2021 at 8:13 AM Bruce Momjian <bruce@momjian.us> wrote: > > On Thu, May 13, 2021 at 08:04:37PM -0400, Álvaro Herrera wrote: > > Here's a first attempt at what was suggested. If you say "auto" it > > remains auto in SHOW, but it gets enabled if a module asks for it. > > > > Not final yet, but I thought I'd throw it out for early commentary ... > > I certainly like this idea better than having the extension change the > output of the GUC. Oh, I didn't understand that it was the major blocker. I'm fine with it too.
On Fri, May 14, 2021 at 09:40:15AM +0800, Julien Rouhaud wrote: > On Fri, May 14, 2021 at 8:13 AM Bruce Momjian <bruce@momjian.us> wrote: > > > > On Thu, May 13, 2021 at 08:04:37PM -0400, Álvaro Herrera wrote: > > > Here's a first attempt at what was suggested. If you say "auto" it > > > remains auto in SHOW, but it gets enabled if a module asks for it. > > > > > > Not final yet, but I thought I'd throw it out for early commentary ... > > > > I certainly like this idea better than having the extension change the > > output of the GUC. > > Oh, I didn't understand that it was the major blocker. I'm fine with it too. I think if we keep the output as 'auto', and document that you check pg_stat_activity for a hash to see if it is enabled, that gets us pretty far. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Julien Rouhaud <rjuju123@gmail.com> writes: > On Fri, May 14, 2021 at 3:12 AM Bruce Momjian <bruce@momjian.us> wrote: >> I was surprised it was ~2%. > Just to be clear, the 2% was a worst case scenario, ie. a very fast > read-only query on small data returning a single row. As soon as you > get something more realistic / expensive the overhead goes away. Of course, for plenty of people that IS the realistic scenario that they care about max performance for. regards, tom lane
On Thu, May 13, 2021 at 09:47:02PM -0400, Tom Lane wrote: > Julien Rouhaud <rjuju123@gmail.com> writes: > > On Fri, May 14, 2021 at 3:12 AM Bruce Momjian <bruce@momjian.us> wrote: > >> I was surprised it was ~2%. > > > Just to be clear, the 2% was a worst case scenario, ie. a very fast > > read-only query on small data returning a single row. As soon as you > > get something more realistic / expensive the overhead goes away. > > Of course, for plenty of people that IS the realistic scenario that > they care about max performance for. I'm not arguing that the scenario is unrealistic. I'm arguing that retrieving the first row of a join between pg_class and pg_attribute on an otherwise vanilla database may not be the most representative workload, especially when you take into account that it was done on hardware that still took 3 ms to do that. Unfortunately my laptop is pretty old and has already proven multiple time to give unreliable benchmark results, so I'm not confident at all that those 2% are even real outside of my machine.
On 2021/05/14 9:04, Alvaro Herrera wrote: > Here's a first attempt at what was suggested. If you say "auto" it > remains auto in SHOW, but it gets enabled if a module asks for it. > > Not final yet, but I thought I'd throw it out for early commentary ... Many thanks! The patch basically looks good to me. +void +EnableQueryId(void) +{ + if (compute_query_id == COMPUTE_QUERY_ID_AUTO) + auto_query_id_enabled = true; Shouldn't EnableQueryId() enable auto_query_id_enabled whatever compute_query_id is? Otherwise, for example, the following scenario can happen and it's a bit strange. 1. The server starts up with shared_preload_libraries=pg_stat_statements and compute_query_id=on 2. compute_query_id is set to auto and the configuration file is reloaded Then, even though compute_query_id is auto and pg_stat_statements is loaded, query ids are not computed and no queries are tracked by pg_stat_statements. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Fri, May 14, 2021 at 12:20:00PM +0900, Fujii Masao wrote: > > > On 2021/05/14 9:04, Alvaro Herrera wrote: > > Here's a first attempt at what was suggested. If you say "auto" it > > remains auto in SHOW, but it gets enabled if a module asks for it. > > > > Not final yet, but I thought I'd throw it out for early commentary ... > > Many thanks! The patch basically looks good to me. > > +void > +EnableQueryId(void) > +{ > + if (compute_query_id == COMPUTE_QUERY_ID_AUTO) > + auto_query_id_enabled = true; > > Shouldn't EnableQueryId() enable auto_query_id_enabled whatever compute_query_id is? > Otherwise, for example, the following scenario can happen and it's a bit strange. > > 1. The server starts up with shared_preload_libraries=pg_stat_statements and compute_query_id=on > 2. compute_query_id is set to auto and the configuration file is reloaded > Then, even though compute_query_id is auto and pg_stat_statements is loaded, > query ids are not computed and no queries are tracked by pg_stat_statements. +1. Note that if you switch from compute_query_id = off + custom query_id + pg_stat_statements to compute_query_id = auto then thing will immediately break (as we instruct third-party plugins authors to error out in that case), which is way better than breaking at the next random service restart.
I wrote: > Maybe we should revert this thing pending somebody doing the work to > make a version of queryid labeling that actually is negligibly cheap. > It certainly seems like that could be done; one more traversal of the > parse tree can't be that expensive in itself. I suspect that the > performance problem is with the particular hashing mechanism that > was used, which looks mighty ad-hoc anyway. To put a little bit of meat on that idea, I experimented with jacking up the "jumble" calculation and driving some other implementations underneath. I thought that Julien's "worst case" scenario was pretty far from worst case, since it involved a join which a lot of simple queries don't. I tested this scenario instead: $ cat naive.sql SELECT * FROM pg_class c ORDER BY oid DESC LIMIT 1; $ pgbench -n -f naive.sql -T 60 postgres which is still complicated enough that there's work for the query fingerprinter to do, but not so much for planning and execution. I confirm that on HEAD, there's a noticeable TPS penalty from turning on compute_query_id: about 3.2% on my machine. The first patch attached replaces the "jumble" calculation with two CRC32s (two so that we still get 64 bits out at the end). I see 2.7% penalty with this version. Now, I'm using an Intel machine with #define USE_SSE42_CRC32C_WITH_RUNTIME_CHECK 1 so on machines without any hardware CRC support, this'd likely be a loss. But it still proves the point that the existing implementation is just not very speedy. I then tried a really dumb xor'ing implementation, and that gives me a slowdown of 2.2%. This could undoubtedly be improved on further, say by unrolling the loop or processing multiple bytes at once. One problem with it is that I suspect it will tend to concentrate the entropy into the third/fourth and seventh/eighth bytes of the accumulator, since so many of the fields being jumbled are 4-byte or 8-byte fields with most of the entropy in their low-order bits. Probably that could be improved with a bit more thought -- say, an extra bump of the nextbyte pointer after each field. Anyway, I think that what we have here is quite an inferior implementation, and we can do better with some more thought. regards, tom lane diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c index f004a9ce8c..74ed555ed7 100644 --- a/src/backend/utils/misc/queryjumble.c +++ b/src/backend/utils/misc/queryjumble.c @@ -41,7 +41,7 @@ static uint64 compute_utility_query_id(const char *str, int query_location, int query_len); static void AppendJumble(JumbleState *jstate, - const unsigned char *item, Size size); + const void *item, Size size); static void JumbleQueryInternal(JumbleState *jstate, Query *query); static void JumbleRangeTable(JumbleState *jstate, List *rtable); static void JumbleRowMarks(JumbleState *jstate, List *rowMarks); @@ -106,9 +106,11 @@ JumbleQuery(Query *query, const char *querytext) { jstate = (JumbleState *) palloc(sizeof(JumbleState)); - /* Set up workspace for query jumbling */ - jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE); - jstate->jumble_len = 0; + /* Initialize CRCs for query jumbling */ + INIT_CRC32C(jstate->crc0); + INIT_CRC32C(jstate->crc1); + jstate->whichcrc = 0; + /* Initialize state for tracking where constants are */ jstate->clocations_buf_size = 32; jstate->clocations = (LocationLen *) palloc(jstate->clocations_buf_size * sizeof(LocationLen)); @@ -117,9 +119,11 @@ JumbleQuery(Query *query, const char *querytext) /* Compute query ID and mark the Query node with it */ JumbleQueryInternal(jstate, query); - query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble, - jstate->jumble_len, - 0)); + + FIN_CRC32C(jstate->crc0); + FIN_CRC32C(jstate->crc1); + query->queryId = (((uint64) jstate->crc0) << 32) | + ((uint64) jstate->crc1); /* * If we are unlucky enough to get a hash of zero, use 1 instead, to @@ -165,36 +169,18 @@ compute_utility_query_id(const char *query_text, int query_location, int query_l * the current jumble. */ static void -AppendJumble(JumbleState *jstate, const unsigned char *item, Size size) +AppendJumble(JumbleState *jstate, const void *item, Size size) { - unsigned char *jumble = jstate->jumble; - Size jumble_len = jstate->jumble_len; - - /* - * Whenever the jumble buffer is full, we hash the current contents and - * reset the buffer to contain just that hash value, thus relying on the - * hash to summarize everything so far. - */ - while (size > 0) + if (jstate->whichcrc) { - Size part_size; - - if (jumble_len >= JUMBLE_SIZE) - { - uint64 start_hash; - - start_hash = DatumGetUInt64(hash_any_extended(jumble, - JUMBLE_SIZE, 0)); - memcpy(jumble, &start_hash, sizeof(start_hash)); - jumble_len = sizeof(start_hash); - } - part_size = Min(size, JUMBLE_SIZE - jumble_len); - memcpy(jumble + jumble_len, item, part_size); - jumble_len += part_size; - item += part_size; - size -= part_size; + COMP_CRC32C(jstate->crc1, item, size); + jstate->whichcrc = 0; + } + else + { + COMP_CRC32C(jstate->crc0, item, size); + jstate->whichcrc = 1; } - jstate->jumble_len = jumble_len; } /* @@ -202,9 +188,9 @@ AppendJumble(JumbleState *jstate, const unsigned char *item, Size size) * of individual local variable elements. */ #define APP_JUMB(item) \ - AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item)) + AppendJumble(jstate, (const void *) &(item), sizeof(item)) #define APP_JUMB_STRING(str) \ - AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1) + AppendJumble(jstate, (const void *) (str), strlen(str) + 1) /* * JumbleQueryInternal: Selectively serialize the query tree, appending diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h index 83ba7339fa..3a09c555fa 100644 --- a/src/include/utils/queryjumble.h +++ b/src/include/utils/queryjumble.h @@ -15,8 +15,7 @@ #define QUERYJUBLE_H #include "nodes/parsenodes.h" - -#define JUMBLE_SIZE 1024 /* query serialization buffer size */ +#include "port/pg_crc32c.h" /* * Struct for tracking locations/lengths of constants during normalization @@ -33,11 +32,13 @@ typedef struct LocationLen */ typedef struct JumbleState { - /* Jumble of current query tree */ - unsigned char *jumble; - - /* Number of bytes used in jumble[] */ - Size jumble_len; + /* + * Since we'd like a 64-bit-wide query ID, but we're using 32-bit CRC + * technology, we combine two 32-bit CRCs. + */ + pg_crc32c crc0; /* some fields go into here ... */ + pg_crc32c crc1; /* ... and others into here */ + int whichcrc; /* 0 or 1, says which CRC accum to use next */ /* Array of locations of constants that should be removed */ LocationLen *clocations; diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c index f004a9ce8c..225940d40c 100644 --- a/src/backend/utils/misc/queryjumble.c +++ b/src/backend/utils/misc/queryjumble.c @@ -106,9 +106,10 @@ JumbleQuery(Query *query, const char *querytext) { jstate = (JumbleState *) palloc(sizeof(JumbleState)); - /* Set up workspace for query jumbling */ - jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE); - jstate->jumble_len = 0; + /* Initialize state for query jumbling */ + jstate->j.id = 0; + jstate->nextbyte = 0; + /* Initialize state for tracking where constants are */ jstate->clocations_buf_size = 32; jstate->clocations = (LocationLen *) palloc(jstate->clocations_buf_size * sizeof(LocationLen)); @@ -117,9 +118,7 @@ JumbleQuery(Query *query, const char *querytext) /* Compute query ID and mark the Query node with it */ JumbleQueryInternal(jstate, query); - query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble, - jstate->jumble_len, - 0)); + query->queryId = jstate->j.id; /* * If we are unlucky enough to get a hash of zero, use 1 instead, to @@ -167,34 +166,17 @@ compute_utility_query_id(const char *query_text, int query_location, int query_l static void AppendJumble(JumbleState *jstate, const unsigned char *item, Size size) { - unsigned char *jumble = jstate->jumble; - Size jumble_len = jstate->jumble_len; + int nextbyte = jstate->nextbyte; - /* - * Whenever the jumble buffer is full, we hash the current contents and - * reset the buffer to contain just that hash value, thus relying on the - * hash to summarize everything so far. - */ while (size > 0) { - Size part_size; - - if (jumble_len >= JUMBLE_SIZE) - { - uint64 start_hash; - - start_hash = DatumGetUInt64(hash_any_extended(jumble, - JUMBLE_SIZE, 0)); - memcpy(jumble, &start_hash, sizeof(start_hash)); - jumble_len = sizeof(start_hash); - } - part_size = Min(size, JUMBLE_SIZE - jumble_len); - memcpy(jumble + jumble_len, item, part_size); - jumble_len += part_size; - item += part_size; - size -= part_size; + jstate->j.bytes[nextbyte] ^= *item++; + nextbyte++; + if (nextbyte >= sizeof(uint64)) + nextbyte = 0; + size--; } - jstate->jumble_len = jumble_len; + jstate->nextbyte = nextbyte; } /* diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h index 83ba7339fa..d01712af4d 100644 --- a/src/include/utils/queryjumble.h +++ b/src/include/utils/queryjumble.h @@ -16,8 +16,6 @@ #include "nodes/parsenodes.h" -#define JUMBLE_SIZE 1024 /* query serialization buffer size */ - /* * Struct for tracking locations/lengths of constants during normalization */ @@ -33,11 +31,13 @@ typedef struct LocationLen */ typedef struct JumbleState { - /* Jumble of current query tree */ - unsigned char *jumble; - - /* Number of bytes used in jumble[] */ - Size jumble_len; + /* Working state for accumulating a 64-bit query ID */ + union + { + unsigned char bytes[sizeof(uint64)]; + uint64 id; + } j; + int nextbyte; /* index of next byte to update in j.bytes[] */ /* Array of locations of constants that should be removed */ LocationLen *clocations;
On Fri, May 14, 2021 at 12:26:23AM -0400, Tom Lane wrote: > I then tried a really dumb xor'ing implementation, and > that gives me a slowdown of 2.2%. This could undoubtedly > be improved on further, say by unrolling the loop or > processing multiple bytes at once. One problem with it > is that I suspect it will tend to concentrate the entropy > into the third/fourth and seventh/eighth bytes of the > accumulator, since so many of the fields being jumbled > are 4-byte or 8-byte fields with most of the entropy in > their low-order bits. Probably that could be improved > with a bit more thought -- say, an extra bump of the > nextbyte pointer after each field. > > Anyway, I think that what we have here is quite an inferior > implementation, and we can do better with some more thought. Considering what even a simple query has to do, I am still baffled why such a computation takes ~2%, though it obviously does since you have confirmed that figure. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote: > On Fri, May 14, 2021 at 09:40:15AM +0800, Julien Rouhaud wrote: > > On Fri, May 14, 2021 at 8:13 AM Bruce Momjian <bruce@momjian.us> wrote: > > > > > > On Thu, May 13, 2021 at 08:04:37PM -0400, Álvaro Herrera wrote: > > > > Here's a first attempt at what was suggested. If you say "auto" it > > > > remains auto in SHOW, but it gets enabled if a module asks for it. > > > > > > > > Not final yet, but I thought I'd throw it out for early commentary ... > > > > > > I certainly like this idea better than having the extension change the > > > output of the GUC. > > > > Oh, I didn't understand that it was the major blocker. I'm fine with it too. > > I think if we keep the output as 'auto', and document that you check > pg_stat_activity for a hash to see if it is enabled, that gets us pretty > far. I think keeping the output as 'auto', and documenting that this query must be run to determine if the query id is being computed: SELECT query_id FROM pg_stat_activity WHERE pid = pg_backend_pid(); is the right approach. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Fri, May 14, 2021 at 08:35:14AM -0400, Bruce Momjian wrote: > On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote: > > I think if we keep the output as 'auto', and document that you check > > pg_stat_activity for a hash to see if it is enabled, that gets us pretty > > far. > > I think keeping the output as 'auto', and documenting that this query > must be run to determine if the query id is being computed: > > SELECT query_id > FROM pg_stat_activity > WHERE pid = pg_backend_pid(); > > is the right approach. Actually, we talked about huge_pages = try needing to use OS commands to see if huge pages are being used, so requiring an SQL query to see if query id is being computed seems reasonable. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Fri, May 14, 2021 at 08:57:41AM -0400, Bruce Momjian wrote: > On Fri, May 14, 2021 at 08:35:14AM -0400, Bruce Momjian wrote: > > On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote: > > > I think if we keep the output as 'auto', and document that you check > > > pg_stat_activity for a hash to see if it is enabled, that gets us pretty > > > far. > > > > I think keeping the output as 'auto', and documenting that this query > > must be run to determine if the query id is being computed: > > > > SELECT query_id > > FROM pg_stat_activity > > WHERE pid = pg_backend_pid(); > > > > is the right approach. > > Actually, we talked about huge_pages = try needing to use OS commands to > see if huge pages are being used, so requiring an SQL query to see if > query id is being computed seems reasonable. I totally agree.
Sent from my iPhone > On May 14, 2021, at 8:35 AM, Bruce Momjian <bruce@momjian.us> wrote: > > On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote: >>> On Fri, May 14, 2021 at 09:40:15AM +0800, Julien Rouhaud wrote: >>> On Fri, May 14, 2021 at 8:13 AM Bruce Momjian <bruce@momjian.us> wrote: >>>> >>>> On Thu, May 13, 2021 at 08:04:37PM -0400, Álvaro Herrera wrote: >>>>> Here's a first attempt at what was suggested. If you say "auto" it >>>>> remains auto in SHOW, but it gets enabled if a module asks for it. >>>>> >>>>> Not final yet, but I thought I'd throw it out for early commentary ... >>>> >>>> I certainly like this idea better than having the extension change the >>>> output of the GUC. >>> >>> Oh, I didn't understand that it was the major blocker. I'm fine with it too. >> >> I think if we keep the output as 'auto', and document that you check >> pg_stat_activity for a hash to see if it is enabled, that gets us pretty >> far. > > I think keeping the output as 'auto', and documenting that this query > must be run to determine if the query id is being computed: > > SELECT query_id > FROM pg_stat_activity > WHERE pid = pg_backend_pid(); > > is the right approach. I’d rather we added a specific function. This is not really obvious. Cheers Andrew
On Fri, May 14, 2021 at 12:04:05PM -0400, Andrew Dunstan wrote: > > > On May 14, 2021, at 8:35 AM, Bruce Momjian <bruce@momjian.us> wrote: > > > > On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote: > >>> On Fri, May 14, 2021 at 09:40:15AM +0800, Julien Rouhaud wrote: > >>> On Fri, May 14, 2021 at 8:13 AM Bruce Momjian <bruce@momjian.us> wrote: > >>>> > >>>> On Thu, May 13, 2021 at 08:04:37PM -0400, Álvaro Herrera wrote: > >>>>> Here's a first attempt at what was suggested. If you say "auto" it > >>>>> remains auto in SHOW, but it gets enabled if a module asks for it. > >>>>> > >>>>> Not final yet, but I thought I'd throw it out for early commentary ... > >>>> > >>>> I certainly like this idea better than having the extension change the > >>>> output of the GUC. > >>> > >>> Oh, I didn't understand that it was the major blocker. I'm fine with it too. > >> > >> I think if we keep the output as 'auto', and document that you check > >> pg_stat_activity for a hash to see if it is enabled, that gets us pretty > >> far. > > > > I think keeping the output as 'auto', and documenting that this query > > must be run to determine if the query id is being computed: > > > > SELECT query_id > > FROM pg_stat_activity > > WHERE pid = pg_backend_pid(); > > > > is the right approach. > > I’d rather we added a specific function. This is not really obvious. We could, but I don't know how much this will be used in practice. The only way someone would try to know if "auto" means that query_id are computed is if she has an extension like pg_stat_statements, and she will probably just check that anyway, and will get a warning if query_id are *not* computed. That being said no objection to an SQL wrapper around a query like it.
On Fri, May 14, 2021 at 12:04:05PM -0400, Andrew Dunstan wrote: > > On May 14, 2021, at 8:35 AM, Bruce Momjian <bruce@momjian.us> wrote: > > > > On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote: > > I think keeping the output as 'auto', and documenting that this query > > must be run to determine if the query id is being computed: > > > > SELECT query_id > > FROM pg_stat_activity > > WHERE pid = pg_backend_pid(); > > > > is the right approach. > > I’d rather we added a specific function. This is not really obvious. Well, we can document this query, add a function, or add a read-only GUC. I am not sure how we decide which one to use. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
pá 14. 5. 2021 v 18:21 odesílatel Bruce Momjian <bruce@momjian.us> napsal:
On Fri, May 14, 2021 at 12:04:05PM -0400, Andrew Dunstan wrote:
> > On May 14, 2021, at 8:35 AM, Bruce Momjian <bruce@momjian.us> wrote:
> >
> > On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:
> > I think keeping the output as 'auto', and documenting that this query
> > must be run to determine if the query id is being computed:
> >
> > SELECT query_id
> > FROM pg_stat_activity
> > WHERE pid = pg_backend_pid();
> >
> > is the right approach.
>
> I’d rather we added a specific function. This is not really obvious.
Well, we can document this query, add a function, or add a read-only
GUC. I am not sure how we decide which one to use.
I though and I prefer read only GUC
It is easy to write "show all"
Pavel
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
On Fri, May 14, 2021 at 12:21:23PM -0400, Bruce Momjian wrote: > On Fri, May 14, 2021 at 12:04:05PM -0400, Andrew Dunstan wrote: > > > On May 14, 2021, at 8:35 AM, Bruce Momjian <bruce@momjian.us> wrote: > > > > > > On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote: > > > I think keeping the output as 'auto', and documenting that this query > > > must be run to determine if the query id is being computed: > > > > > > SELECT query_id > > > FROM pg_stat_activity > > > WHERE pid = pg_backend_pid(); > > > > > > is the right approach. > > > > I’d rather we added a specific function. This is not really obvious. > > Well, we can document this query, add a function, or add a read-only > GUC. I am not sure how we decide which one to use. I wonder if we should go with an SQL query now (no new API needed) and then add a GUC once we decide on how extensions can register that they are generating the query id, so the GUC can report the generating source, not just a boolean. The core server can also register as the source. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
pá 14. 5. 2021 v 19:28 odesílatel Bruce Momjian <bruce@momjian.us> napsal:
On Fri, May 14, 2021 at 12:21:23PM -0400, Bruce Momjian wrote:
> On Fri, May 14, 2021 at 12:04:05PM -0400, Andrew Dunstan wrote:
> > > On May 14, 2021, at 8:35 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > >
> > > On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:
> > > I think keeping the output as 'auto', and documenting that this query
> > > must be run to determine if the query id is being computed:
> > >
> > > SELECT query_id
> > > FROM pg_stat_activity
> > > WHERE pid = pg_backend_pid();
> > >
> > > is the right approach.
> >
> > I’d rather we added a specific function. This is not really obvious.
>
> Well, we can document this query, add a function, or add a read-only
> GUC. I am not sure how we decide which one to use.
I wonder if we should go with an SQL query now (no new API needed) and
then add a GUC once we decide on how extensions can register that they
are generating the query id, so the GUC can report the generating
source, not just a boolean. The core server can also register as the
source.
I have no problem with it. This is an internal feature and can be enhanced (fixed) in time without problems.
Pavel
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
On 2021-May-14, Julien Rouhaud wrote: > On Fri, May 14, 2021 at 12:20:00PM +0900, Fujii Masao wrote: > > +void > > +EnableQueryId(void) > > +{ > > + if (compute_query_id == COMPUTE_QUERY_ID_AUTO) > > + auto_query_id_enabled = true; > > > > Shouldn't EnableQueryId() enable auto_query_id_enabled whatever compute_query_id is? > > Otherwise, for example, the following scenario can happen and it's a bit strange. > > > > 1. The server starts up with shared_preload_libraries=pg_stat_statements and compute_query_id=on > > 2. compute_query_id is set to auto and the configuration file is reloaded > > Then, even though compute_query_id is auto and pg_stat_statements is loaded, > > query ids are not computed and no queries are tracked by pg_stat_statements. > > +1. That makes sense. Done in this version. I took out the new WARNING in pg_stat_statements. It's not clear to me that that's terribly useful (it stops working as soon as you have one query in the pg_stat_statements stash and later disable everything). Maybe there is an useful warning to add, but I think it's independent of changing the GUC behabior. I also made IsQueryIdEnabled() a static inline in queryjumble.h, to avoid a function call at every site where we need that. Also did some light doc editing. I think I should aim at pushing this tomorrow morning. > Note that if you switch from compute_query_id = off + custom > query_id + pg_stat_statements to compute_query_id = auto then thing will > immediately break (as we instruct third-party plugins authors to error out in > that case), which is way better than breaking at the next random service > restart. Hmm, ok. I tested pg_queryid and that behavior of throwing an error seems quite unhelpful -- it basically makes every single query fail if you set things wrong. I think that instruction is bogus and should be reconsidered. Maybe pg_queryid could use a custom Boolean GUC that tells it to overwrite the core query_id if that is enabled, or to just silently do nothing in that case. While thinking about this patch it occurred to that an useful gadget might be to let pg_stat_statements be loaded always, but with compute_query_id=false; so it's never active; except if you ALTER {DATABASE, USER} foo SET (compute_query_id=on); so that it's possible to enable it selectively. I think this doesn't currently achieve anything because pgss_store is always called regardless of query ID being active (so you'd always have at least one function call as performance penalty, only that the work would be for naught), but that seems a simple change to make. I didn't look closely to see what other things would need patched. -- Álvaro Herrera 39°49'30"S 73°17'W Maybe there's lots of data loss but the records of data loss are also lost. (Lincoln Yeoh)
Attachment
On Fri, May 14, 2021 at 07:50:13PM -0400, Alvaro Herrera wrote: > +++ b/doc/src/sgml/config.sgml > @@ -7643,7 +7643,12 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; > identifier to be computed. Note that an external module can > alternatively be used if the in-core query identifier computation > method is not acceptable. In this case, in-core computation > - must be disabled. The default is <literal>off</literal>. > + must be always disabled. > + Valid values are <literal>off</literal> (always disabled), > + <literal>on</literal> (always enabled) and <literal>auto</literal>, > + which let modules such as <xref linkend="pgstatstatements"/> > + automatically enable it. > + The default is <literal>auto</literal>. which lets > +/* True when a module requests query IDs and they're set auto */ > +bool query_id_enabled = false; Does "they're" mean the GUC compute_query_id ? > +/* > + * This should only be called if IsQueryIdEnabled() > + * return true. > + */ > JumbleState * > JumbleQuery(Query *query, const char *querytext) Should it Assert() that ? Maybe you should update this too ? doc/src/sgml/release-14.sgml
On Sat, May 15, 2021 at 7:50 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > I took out the new WARNING in pg_stat_statements. It's not clear to me > that that's terribly useful (it stops working as soon as you have one > query in the pg_stat_statements stash and later disable everything). If no query_id is calculated and you have entries in pg_stat_statements, it means someone deliberately deactivated compute_query_id. In that case it's clear that they know the GUC exists, so there's no much point in warning them that they deactivated it I think. > Maybe there is an useful warning to add, but I think it's independent of > changing the GUC behabior. I'm fine with it. > > Note that if you switch from compute_query_id = off + custom > > query_id + pg_stat_statements to compute_query_id = auto then thing will > > immediately break (as we instruct third-party plugins authors to error out in > > that case), which is way better than breaking at the next random service > > restart. > > Hmm, ok. I tested pg_queryid and that behavior of throwing an error > seems quite unhelpful -- it basically makes every single query fail if > you set things wrong. I think that instruction is bogus and should be > reconsidered. Maybe pg_queryid could use a custom Boolean GUC that > tells it to overwrite the core query_id if that is enabled, or to just > silently do nothing in that case. Unless I'm missing something, if we remove that instruction it means that we encourage users to dynamically change the query_id source without any safeguard, which will in the majority of case result in unwanted behavior, going from duplicated entries, poor performance in pg_stat_statements if that leads to more evictions, or even totally bogus metrics if that leads to hash collision. > While thinking about this patch it occurred to that an useful gadget > might be to let pg_stat_statements be loaded always, but with > compute_query_id=false; so it's never active; except if you > ALTER {DATABASE, USER} foo SET (compute_query_id=on); > so that it's possible to enable it selectively. I think this doesn't > currently achieve anything because pgss_store is always called > regardless of query ID being active (so you'd always have at least one > function call as performance penalty, only that the work would be for > naught), but that seems a simple change to make. I didn't look closely > to see what other things would need patched. Couldn't it already be achieved with ALTER [ DATABASE | USER ] foo SET pg_stat_statements.track = [ none | top | all ] ?
On Sat, May 15, 2021 at 04:09:32PM +0800, Julien Rouhaud wrote: > > While thinking about this patch it occurred to that an useful gadget > > might be to let pg_stat_statements be loaded always, but with > > compute_query_id=false; so it's never active; except if you > > ALTER {DATABASE, USER} foo SET (compute_query_id=on); > > so that it's possible to enable it selectively. I think this doesn't > > currently achieve anything because pgss_store is always called > > regardless of query ID being active (so you'd always have at least one > > function call as performance penalty, only that the work would be for > > naught), but that seems a simple change to make. I didn't look closely > > to see what other things would need patched. > > Couldn't it already be achieved with ALTER [ DATABASE | USER ] foo SET > pg_stat_statements.track = [ none | top | all ] ? I am no longer the committer in charge of this feature, but I would like to remind the group that beta1 will be wrapped on Monday, and it is hard to change non-read-only GUCs after beta since the settings are embedded in postgresql.conf. There is also a release notes item that probably will need to be adjusted. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Sat, May 15, 2021 at 10:00 PM Bruce Momjian <bruce@momjian.us> wrote: > > I am no longer the committer in charge of this feature, but I would like > to remind the group that beta1 will be wrapped on Monday, and it is hard > to change non-read-only GUCs after beta since the settings are embedded > in postgresql.conf. There is also a release notes item that probably > will need to be adjusted. It seems that everyone agrees on the definition of compute_query_id in Álvaro's v4 patch (module Justin's comments) so this could be committed before the beta1. If the safeguards for custom query_id or GUC misconfiguration have to be tweaked it shouldn't impact the GUC in any way.
On 2021-May-16, Julien Rouhaud wrote: > On Sat, May 15, 2021 at 10:00 PM Bruce Momjian <bruce@momjian.us> wrote: > > > > I am no longer the committer in charge of this feature, but I would like > > to remind the group that beta1 will be wrapped on Monday, and it is hard > > to change non-read-only GUCs after beta since the settings are embedded > > in postgresql.conf. There is also a release notes item that probably > > will need to be adjusted. > > It seems that everyone agrees on the definition of compute_query_id in > Álvaro's v4 patch (module Justin's comments) so this could be > committed before the beta1. If the safeguards for custom query_id or > GUC misconfiguration have to be tweaked it shouldn't impact the GUC in > any way. Pushed after adding the fixes from Justin. Note I didn't include the WARNING in pg_stat_statements when this is disabled; if anybody wants to argue for that, let's add it separately. I commented out the release notes para that is now wrong. What remains is this: Move query hash computation from pg_stat_statements to the core server (Julien Rouhaud) We could perhaps add something like Extension pg_stat_statements continues to work without requiring any configuration changes. but that seems a bit pointless. Or maybe Extension pg_stat_statements automatically enables query identifier computation if compute_query_id is set to auto. Third-party modules to compute query identifiers can be installed and used if this is set to off. I wonder why the initial line says "query hash" instead of "query identifier". Do we want to say "hash" everywhere? Why didn't we name the GUC "compute_query_hash" in that case? Anyway, let me remind you that it is pretty common to require initdb during the beta period. -- Álvaro Herrera Valdivia, Chile
On Sat, May 15, 2021 at 02:21:59PM -0400, Álvaro Herrera wrote: > I commented out the release notes para that is now wrong. What remains > is this: > > Move query hash computation from pg_stat_statements to the core server (Julien Rouhaud) > > We could perhaps add something like > > Extension pg_stat_statements continues to work without requiring any > configuration changes. > > but that seems a bit pointless. Or maybe > > Extension pg_stat_statements automatically enables query identifier > computation if compute_query_id is set to auto. Third-party modules > to compute query identifiers can be installed and used if this is set > to off. > OK, new text is: <listitem> <!-- Author: Bruce Momjian <bruce@momjian.us> 2021-04-07 [5fd9dfa5f] Move pg_stat_statements query jumbling to core. --> <para> Move query hash computation from pg_stat_statements to the core server (Julien Rouhaud) </para> <para> The new server variable compute_query_id's default of 'auto' will automatically enable query id computation when this extension is loaded. </para> </listitem> I also added Alvaro as an author of the compute_query_id item. > I wonder why the initial line says "query hash" instead of "query > identifier". Do we want to say "hash" everywhere? Why didn't we name > the GUC "compute_query_hash" in that case? It is queryid (no underscore) in pg_stat_statements, which was a whole different discussion. ;-) > Anyway, let me remind you that it is pretty common to require initdb > during the beta period. True. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On 2021-May-15, Bruce Momjian wrote: > On Sat, May 15, 2021 at 02:21:59PM -0400, Álvaro Herrera wrote: > > I wonder why the initial line says "query hash" instead of "query > > identifier". Do we want to say "hash" everywhere? Why didn't we name > > the GUC "compute_query_hash" in that case? > > It is queryid (no underscore) in pg_stat_statements, which was a whole > different discussion. ;-) Yeah, I realize that, but I wonder if we shouldn't use the term "query identifier" instead of "query hash" in that paragraph. > I also added Alvaro as an author of the compute_query_id item. I've been wondering if I should ask to stick my name in other features I helped get committed -- specifically the PQtrace() item and autovacuum for partitioned tables. I'll go comment in the release notes thread. -- Álvaro Herrera Valdivia, Chile "Digital and video cameras have this adjustment and film cameras don't for the same reason dogs and cats lick themselves: because they can." (Ken Rockwell)
On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote: > OK, new text is: > > <listitem> > <!-- > Author: Bruce Momjian <bruce@momjian.us> > 2021-04-07 [5fd9dfa5f] Move pg_stat_statements query jumbling to core. > --> > > <para> > Move query hash computation from pg_stat_statements to the core > server (Julien Rouhaud) > </para> > > <para> > The new server variable compute_query_id's default of 'auto' will > automatically enable query id computation when this extension > is loaded. > </para> > </listitem> > > I also added Alvaro as an author of the compute_query_id item. -------------------------------------------------------------- Based on the commit message, adding Alvaro was incorrect, so I will revert this change. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On 2021-May-15, Bruce Momjian wrote: > On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote: > > I also added Alvaro as an author of the compute_query_id item. > -------------------------------------------------------------- > > Based on the commit message, adding Alvaro was incorrect, so I will > revert this change. Agreed. My work on this one was janitorial. -- Álvaro Herrera 39°49'30"S 73°17'W "Hay que recordar que la existencia en el cosmos, y particularmente la elaboración de civilizaciones dentro de él no son, por desgracia, nada idílicas" (Ijon Tichy)
Le dim. 16 mai 2021 à 11:23, Alvaro Herrera <alvherre@alvh.no-ip.org> a écrit :
On 2021-May-15, Bruce Momjian wrote:
> On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote:
> > I also added Alvaro as an author of the compute_query_id item.
> --------------------------------------------------------------
>
> Based on the commit message, adding Alvaro was incorrect, so I will
> revert this change.
Agreed. My work on this one was janitorial.
Thanks a lot Alvaro and Bruce!
On Sat, May 15, 2021 at 07:01:25PM -0400, Álvaro Herrera wrote: > On 2021-May-15, Bruce Momjian wrote: > > > On Sat, May 15, 2021 at 02:21:59PM -0400, Álvaro Herrera wrote: > > > > I wonder why the initial line says "query hash" instead of "query > > > identifier". Do we want to say "hash" everywhere? Why didn't we name > > > the GUC "compute_query_hash" in that case? > > > > It is queryid (no underscore) in pg_stat_statements, which was a whole > > different discussion. ;-) > > Yeah, I realize that, but I wonder if we shouldn't use the term "query > identifier" instead of "query hash" in that paragraph. Yes, of course, you are right --- updated text: <listitem> <!-- Author: Bruce Momjian <bruce@momjian.us> 2021-04-07 [4f0b0966c] Make use of in-core query id added by commit 5fd9dfa5f5 Author: Bruce Momjian <bruce@momjian.us> 2021-04-07 [f57a2f5e0] Add csvlog output for the new query_id value Author: Bruce Momjian <bruce@momjian.us> 2021-04-20 [9660834dd] adjust query id feature to use pg_stat_activity.query_id Author: Bruce Momjian <bruce@momjian.us> 2021-05-03 [f7a97b6ec] Update query_id computation Author: Alvaro Herrera <alvherre@alvh.no-ip.org> 2021-05-15 [cafde58b3] Allow compute_query_id to be set to 'auto' and make it d --> <para> If server variable compute_query_id is enabled, display the query id in pg_stat_activity, EXPLAIN VERBOSE, csvlog, and optionally in log_line_prefix (Julien Rouhaud) </para> <para> A query id computed by an extension will also be displayed. </para> </listitem> > > > I also added Alvaro as an author of the compute_query_id item. > > I've been wondering if I should ask to stick my name in other features I > helped get committed -- specifically the PQtrace() item and autovacuum > for partitioned tables. I'll go comment in the release notes thread. Yes, done. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Sat, May 15, 2021 at 11:23:25PM -0400, Álvaro Herrera wrote: > On 2021-May-15, Bruce Momjian wrote: > > > On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote: > > > > I also added Alvaro as an author of the compute_query_id item. > > -------------------------------------------------------------- > > > > Based on the commit message, adding Alvaro was incorrect, so I will > > revert this change. > > Agreed. My work on this one was janitorial. OK, removed, thanks. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On Sun, May 16, 2021 at 08:39:33PM +0800, Julien Rouhaud wrote: > Le dim. 16 mai 2021 à 11:23, Alvaro Herrera <alvherre@alvh.no-ip.org> a écrit : > > On 2021-May-15, Bruce Momjian wrote: > > > On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote: > > > > I also added Alvaro as an author of the compute_query_id item. > > -------------------------------------------------------------- > > > > Based on the commit message, adding Alvaro was incorrect, so I will > > revert this change. > > Agreed. My work on this one was janitorial. > > > Thanks a lot Alvaro and Bruce! We are going to get to the goal line, one way or the other! ;-) -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
On 5/16/21 11:13 PM, Bruce Momjian wrote: > On Sun, May 16, 2021 at 08:39:33PM +0800, Julien Rouhaud wrote: >> Le dim. 16 mai 2021 à 11:23, Alvaro Herrera <alvherre@alvh.no-ip.org> a écrit : >> >> On 2021-May-15, Bruce Momjian wrote: >> >> > On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote: >> >> > > I also added Alvaro as an author of the compute_query_id item. >> > -------------------------------------------------------------- >> > >> > Based on the commit message, adding Alvaro was incorrect, so I will >> > revert this change. >> >> Agreed. My work on this one was janitorial. >> >> >> Thanks a lot Alvaro and Bruce! > We are going to get to the goal line, one way or the other! ;-) I've discussed this with Alvaro. He's not planning to do anything more regarding this and I think we can close the open item. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Fri, May 21, 2021 at 02:19:13PM -0400, Andrew Dunstan wrote: > > On 5/16/21 11:13 PM, Bruce Momjian wrote: > > On Sun, May 16, 2021 at 08:39:33PM +0800, Julien Rouhaud wrote: > >> Le dim. 16 mai 2021 à 11:23, Alvaro Herrera <alvherre@alvh.no-ip.org> a écrit : > >> > >> On 2021-May-15, Bruce Momjian wrote: > >> > >> > On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote: > >> > >> > > I also added Alvaro as an author of the compute_query_id item. > >> > -------------------------------------------------------------- > >> > > >> > Based on the commit message, adding Alvaro was incorrect, so I will > >> > revert this change. > >> > >> Agreed. My work on this one was janitorial. > >> > >> > >> Thanks a lot Alvaro and Bruce! > > We are going to get to the goal line, one way or the other! ;-) > > > > I've discussed this with Alvaro. He's not planning to do anything more > regarding this and I think we can close the open item. Works for me. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Le sam. 22 mai 2021 à 02:27, Bruce Momjian <bruce@momjian.us> a écrit :
On Fri, May 21, 2021 at 02:19:13PM -0400, Andrew Dunstan wrote:
>
> On 5/16/21 11:13 PM, Bruce Momjian wrote:
> > On Sun, May 16, 2021 at 08:39:33PM +0800, Julien Rouhaud wrote:
> >> Le dim. 16 mai 2021 à 11:23, Alvaro Herrera <alvherre@alvh.no-ip.org> a écrit :
> >>
> >> On 2021-May-15, Bruce Momjian wrote:
> >>
> >> > On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote:
> >>
> >> > > I also added Alvaro as an author of the compute_query_id item.
> >> > --------------------------------------------------------------
> >> >
> >> > Based on the commit message, adding Alvaro was incorrect, so I will
> >> > revert this change.
> >>
> >> Agreed. My work on this one was janitorial.
> >>
> >>
> >> Thanks a lot Alvaro and Bruce!
> > We are going to get to the goal line, one way or the other! ;-)
>
>
>
> I've discussed this with Alvaro. He's not planning to do anything more
> regarding this and I think we can close the open item.
Works for me.
works for me too.