Thread: compute_query_id and pg_stat_statements

compute_query_id and pg_stat_statements

From
Fujii Masao
Date:
When compute_query_id is not enabled (this is the default setting),
pg_stat_statements doesn't track any statements. This means that
we will see no entries in pg_stat_statements by default. I'm afraid that
users may easily forget to enable compute_query_id
when using pg_stat_statements (because this setting was not necessary
in v13 or before), and finally may have noticed the mis-configuration
and failure of statements tracking after many queries were executed.
For example, we already have one report about this issue, in [1].

Shouldn't we do something so that users can avoid such mis-configuration?

One idea is to change the default value of compute_query_id from false to true.
If enabling compute_query_id doesn't incur any performance penalty,
IMO this idea is very simple and enough.

Another idea is to change pg_stat_statements so that it emits an error
at the server startup (i.e., prevents the server from starting up)
if compute_query_id is not enabled. In this case, users can easily notice
the mis-configuration from the error message in the server log,
enable compute_query_id, and then restart the server.

IMO the former is better if there is no such performance risk. Otherwise
we should adopt the latter approach. Or you have the better idea?

Thought?

[1]
https://postgr.es/m/1953aec168224b95b0c962a622bef0794da6ff40.camel@moonset.ru

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
Le sam. 24 avr. 2021 à 22:54, Fujii Masao <masao.fujii@oss.nttdata.com> a écrit :
For example, we already have one report about this issue, in [1].

this report was only a few days after the patch changing the behavior was committed, unless you've been following the original thread (which has been going on for 2 years), that's kind of expected. release notes for pg14 should highlight that change, so hopefully people upgrading will see it. I'll also try to write some blog article about it to add more warnings. 

Shouldn't we do something so that users can avoid such mis-configuration?

One idea is to change the default value of compute_query_id from false to true.
If enabling compute_query_id doesn't incur any performance penalty,
IMO this idea is very simple and enough.

it adds some noticeable overhead in oltp style workloads. I think that I did some benchmarks in the original thread, and we decided not to enable it by default 

Another idea is to change pg_stat_statements so that it emits an error
at the server startup (i.e., prevents the server from starting up)
if compute_query_id is not enabled. In this case, users can easily notice
the mis-configuration from the error message in the server log,
enable compute_query_id, and then restart the server.

that's also not an option, as one can now use pg_stat_statetements with a different queryid calculation. see for instance https://github.com/rjuju/pg_queryid for a proof a concept extension for that. I think it's clear that multiple people will want to use a different calculation as they have been asking for that for years. 

IMO the former is better if there is no such performance risk. Otherwise
we should adopt the latter approach. Or you have the better idea?

I'm not sure how to address that, as temporarily disabling queryId calculation should be allowed. maybe we could raise a warning once per backend if pgss sees a dml query without queryId? but it could end up creating more problems than it solves. 

for the record people have also raised bugs on the powa project because planning counters are not tracked by default, so compute_query_id will probably add a bit of traffic. 

Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Sat, Apr 24, 2021 at 11:54:25PM +0900, Fujii Masao wrote:
> When compute_query_id is not enabled (this is the default setting),
> pg_stat_statements doesn't track any statements. This means that
> we will see no entries in pg_stat_statements by default. I'm afraid that
> users may easily forget to enable compute_query_id
> when using pg_stat_statements (because this setting was not necessary
> in v13 or before), and finally may have noticed the mis-configuration
> and failure of statements tracking after many queries were executed.
> For example, we already have one report about this issue, in [1].
> 
> Shouldn't we do something so that users can avoid such mis-configuration?
> 
> One idea is to change the default value of compute_query_id from false to true.
> If enabling compute_query_id doesn't incur any performance penalty,
> IMO this idea is very simple and enough.

I think the query overhead was too high (2%) to enable it by default:

    https://www.postgresql.org/message-id/20201016160355.GA31474@alvherre.pgsql

> Another idea is to change pg_stat_statements so that it emits an error
> at the server startup (i.e., prevents the server from starting up)
> if compute_query_id is not enabled. In this case, users can easily notice
> the mis-configuration from the error message in the server log,
> enable compute_query_id, and then restart the server.

I think it throws an error in the server logs, but preventing server
start seems extreme.  Also, compute_query_id is PGC_SUSET, meaning it
can be changed by the super-user, so you could enable compute_query_id
without a server restart, which makes failing on start kind of odd.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Magnus Hagander
Date:
On Sat, Apr 24, 2021 at 5:22 PM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Sat, Apr 24, 2021 at 11:54:25PM +0900, Fujii Masao wrote:
> > When compute_query_id is not enabled (this is the default setting),
> > pg_stat_statements doesn't track any statements. This means that
> > we will see no entries in pg_stat_statements by default. I'm afraid that
> > users may easily forget to enable compute_query_id
> > when using pg_stat_statements (because this setting was not necessary
> > in v13 or before), and finally may have noticed the mis-configuration
> > and failure of statements tracking after many queries were executed.
> > For example, we already have one report about this issue, in [1].
> >
> > Shouldn't we do something so that users can avoid such mis-configuration?
> >
> > One idea is to change the default value of compute_query_id from false to true.
> > If enabling compute_query_id doesn't incur any performance penalty,
> > IMO this idea is very simple and enough.
>
> I think the query overhead was too high (2%) to enable it by default:
>
>         https://www.postgresql.org/message-id/20201016160355.GA31474@alvherre.pgsql

Personally I'd say 2% is not too high to turn it on by default, as it
goes down when you move past trivial queries, which is what most
people do. And since you can easily turn it off.


> > Another idea is to change pg_stat_statements so that it emits an error
> > at the server startup (i.e., prevents the server from starting up)
> > if compute_query_id is not enabled. In this case, users can easily notice
> > the mis-configuration from the error message in the server log,
> > enable compute_query_id, and then restart the server.
>
> I think it throws an error in the server logs, but preventing server
> start seems extreme.  Also, compute_query_id is PGC_SUSET, meaning it
> can be changed by the super-user, so you could enable compute_query_id
> without a server restart, which makes failing on start kind of odd.

How about turning it into an enum instead of a boolean, that can be:

off = always off
auto = pg_stat_statments turns it on when it's loaded in
shared_preload_libraries. Other extensions using it can do that to.
But it remains off if you haven't installed any *extension* that needs
it
on = always on (if you want it in pg_stat_activity regardless of extensions)

The default would be "auto", which means that pg_stat_statements would
work as expected, but those who haven't installed it (or another
extension that changes it) would not have to pay the overhead.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Sat, Apr 24, 2021 at 06:48:53PM +0200, Magnus Hagander wrote:
> > I think the query overhead was too high (2%) to enable it by default:
> >
> >         https://www.postgresql.org/message-id/20201016160355.GA31474@alvherre.pgsql
> 
> Personally I'd say 2% is not too high to turn it on by default, as it
> goes down when you move past trivial queries, which is what most
> people do. And since you can easily turn it off.

We would do a lot of work to reduce overhead by 2% on every query, and
to add 2% for a hash that previously was only used by pg_stat_statements
seems unwise.

> How about turning it into an enum instead of a boolean, that can be:
> 
> off = always off
> auto = pg_stat_statments turns it on when it's loaded in
> shared_preload_libraries. Other extensions using it can do that to.
> But it remains off if you haven't installed any *extension* that needs
> it
> on = always on (if you want it in pg_stat_activity regardless of extensions)
> 
> The default would be "auto", which means that pg_stat_statements would
> work as expected, but those who haven't installed it (or another
> extension that changes it) would not have to pay the overhead.

That's a pretty weird API.  I think we just need people to turn it on
like they are doing when the configure pg_stat_statements anyway. 
pg_stat_statements already requires configuration anyway.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
> That's a pretty weird API.  I think we just need people to turn it on
> like they are doing when the configure pg_stat_statements anyway. 
> pg_stat_statements already requires configuration anyway.

Agreed.  If pg_stat_statements were zero-configuration today then
this would be an annoying new burden, but it isn't.

I haven't looked, but did we put anything into pg_stat_statements
to make it easy to tell if you've messed up this setting?

            regards, tom lane



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Sat, Apr 24, 2021 at 01:43:51PM -0400, Tom Lane wrote:
> 
> I haven't looked, but did we put anything into pg_stat_statements
> to make it easy to tell if you've messed up this setting?

You mean apart from from having pg_stat_statements' view/SRFs returning
nothing?

I think it's a reasonable use case to sometime disable query_id calculation,
eg. if you know that it will only lead to useless bloat in the entry and that
you won't need the info, so spamming warnings if there are no queryid could
cause some pain.



Re: compute_query_id and pg_stat_statements

From
Tom Lane
Date:
Julien Rouhaud <rjuju123@gmail.com> writes:
> On Sat, Apr 24, 2021 at 01:43:51PM -0400, Tom Lane wrote:
>> I haven't looked, but did we put anything into pg_stat_statements
>> to make it easy to tell if you've messed up this setting?

> You mean apart from from having pg_stat_statements' view/SRFs returning
> nothing?

> I think it's a reasonable use case to sometime disable query_id calculation,
> eg. if you know that it will only lead to useless bloat in the entry and that
> you won't need the info, so spamming warnings if there are no queryid could
> cause some pain.

I agree repeated warnings would be bad news.  I was wondering if we could
arrange a single warning at the time pg_stat_statements is preloaded into
the postmaster.  In this way, if you tried to use a configuration file
that used to work, you'd hopefully get some notice about why it no longer
does what you want.  Also, if you are preloading pg_stat_statements, it
seems reasonable to assume that you'd like the global value of the flag
to be "on", even if there are use-cases for transiently disabling it.

I think the way to detect "being loaded into the postmaster" is
    if (IsPostmasterEnvironment && !IsUnderPostmaster)

            regards, tom lane



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Sun, Apr 25, 2021 at 11:39:55AM -0400, Tom Lane wrote:
> 
> I agree repeated warnings would be bad news.  I was wondering if we could
> arrange a single warning at the time pg_stat_statements is preloaded into
> the postmaster.  In this way, if you tried to use a configuration file
> that used to work, you'd hopefully get some notice about why it no longer
> does what you want.  Also, if you are preloading pg_stat_statements, it
> seems reasonable to assume that you'd like the global value of the flag
> to be "on", even if there are use-cases for transiently disabling it.

What about people who wants to use pg_stat_statements but are not ok with our
query_id heuristics and use a third-party plugin for that?



Re: compute_query_id and pg_stat_statements

From
Tom Lane
Date:
Julien Rouhaud <rjuju123@gmail.com> writes:
> On Sun, Apr 25, 2021 at 11:39:55AM -0400, Tom Lane wrote:
>> I agree repeated warnings would be bad news.  I was wondering if we could
>> arrange a single warning at the time pg_stat_statements is preloaded into
>> the postmaster.  In this way, if you tried to use a configuration file
>> that used to work, you'd hopefully get some notice about why it no longer
>> does what you want.  Also, if you are preloading pg_stat_statements, it
>> seems reasonable to assume that you'd like the global value of the flag
>> to be "on", even if there are use-cases for transiently disabling it.

> What about people who wants to use pg_stat_statements but are not ok with our
> query_id heuristics and use a third-party plugin for that?

They're still going to want the GUC set to something other than "off",
no?  In any case it's just a one-time log message, so it's not likely
to be *that* annoying.

            regards, tom lane



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Sun, Apr 25, 2021 at 01:17:03PM -0400, Tom Lane wrote:
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > On Sun, Apr 25, 2021 at 11:39:55AM -0400, Tom Lane wrote:
> >> I agree repeated warnings would be bad news.  I was wondering if we could
> >> arrange a single warning at the time pg_stat_statements is preloaded into
> >> the postmaster.  In this way, if you tried to use a configuration file
> >> that used to work, you'd hopefully get some notice about why it no longer
> >> does what you want.  Also, if you are preloading pg_stat_statements, it
> >> seems reasonable to assume that you'd like the global value of the flag
> >> to be "on", even if there are use-cases for transiently disabling it.
> 
> > What about people who wants to use pg_stat_statements but are not ok with our
> > query_id heuristics and use a third-party plugin for that?
> 
> They're still going to want the GUC set to something other than "off",
> no?

They will want compute_query_id to be off.  And they actually will *need* that,
as we recommend third-party plugins computing alternative query_id to error out
if they see a that a query_id has already been generated, to avoid any problem
if compute_query_id is being temporarily toggled.  That's what I did in the POC
plugin for external query_id at [1].

> In any case it's just a one-time log message, so it's not likely
> to be *that* annoying.

In that case it should be phrased in a way that makes it clear that
pg_stat_statements can work without enabling compute_query_id, something like:

"compute_query_id is disabled.  This module won't track any activity unless you
configured a third-party extension that computes query identifiers"

[1] https://github.com/rjuju/pg_queryid/blob/master/pg_queryid.c#L172



Re: compute_query_id and pg_stat_statements

From
Peter Eisentraut
Date:
On 24.04.21 19:43, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
>> That's a pretty weird API.  I think we just need people to turn it on
>> like they are doing when the configure pg_stat_statements anyway.
>> pg_stat_statements already requires configuration anyway.
> 
> Agreed.  If pg_stat_statements were zero-configuration today then
> this would be an annoying new burden, but it isn't.

I think people can understand "add pg_stat_statements to 
shared_preload_libraries" and "install the extension".  You have to turn 
it on somehow after all.

Now there is the additional burden of turning on this weird setting that 
no one understands.  That's a 50% increase in burden.

And almost no one will want to use a nondefault setting.

pg_stat_statements is pretty popular.  I think leaving in this 
requirement will lead to widespread confusion and complaints.



Re: compute_query_id and pg_stat_statements

From
Christoph Berg
Date:
Re: Peter Eisentraut
> > Agreed.  If pg_stat_statements were zero-configuration today then
> > this would be an annoying new burden, but it isn't.
> 
> I think people can understand "add pg_stat_statements to
> shared_preload_libraries" and "install the extension".  You have to turn it
> on somehow after all.

Fwiw, I'd claim that pg_stat_statements *is* zero-configuration today.
You just have to load the module (= shared_preload_libraries), and it
will start working. Later you can run CREATE EXTENSION to actually see
the stats, but they are already being collected in the background.

> Now there is the additional burden of turning on this weird setting that no
> one understands.  That's a 50% increase in burden.
> 
> And almost no one will want to use a nondefault setting.
> 
> pg_stat_statements is pretty popular.  I think leaving in this requirement
> will lead to widespread confusion and complaints.

Ack, please make the default config (i.e. after setting shared_preload_libraries)
do something sensible. Having to enable some "weird" internal other setting
will be very hard to explain to users.

Fwiw, I'd even want to have pg_stat_statements enabled in core by
default. That would awesome UX. (And turning off could be as simple as
setting compute_query_id=off.)

Christoph



Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Mon, Apr 26, 2021 at 05:34:30PM +0200, Christoph Berg wrote:
> Re: Peter Eisentraut
> > > Agreed.  If pg_stat_statements were zero-configuration today then
> > > this would be an annoying new burden, but it isn't.
> > 
> > I think people can understand "add pg_stat_statements to
> > shared_preload_libraries" and "install the extension".  You have to turn it
> > on somehow after all.
> 
> Fwiw, I'd claim that pg_stat_statements *is* zero-configuration today.
> You just have to load the module (= shared_preload_libraries), and it
> will start working. Later you can run CREATE EXTENSION to actually see
> the stats, but they are already being collected in the background.
> 
> > Now there is the additional burden of turning on this weird setting that no
> > one understands.  That's a 50% increase in burden.
> > 
> > And almost no one will want to use a nondefault setting.
> > 
> > pg_stat_statements is pretty popular.  I think leaving in this requirement
> > will lead to widespread confusion and complaints.
> 
> Ack, please make the default config (i.e. after setting shared_preload_libraries)
> do something sensible. Having to enable some "weird" internal other setting
> will be very hard to explain to users.
> 
> Fwiw, I'd even want to have pg_stat_statements enabled in core by
> default. That would awesome UX. (And turning off could be as simple as
> setting compute_query_id=off.)

Techically, pg_stat_statements can turn on compute_query_id when it is
loaded, even if it is 'off' in postgresql.conf, right?  And
pg_stat_statements would know if an alternate hash method is being used,
right?

This is closer to Magnus's idea of having a three-value
compute_query_id, except is it more controlled by pg_stat_statements. 
Another idea would be to throw a user-visible warning if the
pg_stat_statements extension is loaded and compute_query_id is off.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Stephen Frost
Date:
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Mon, Apr 26, 2021 at 05:34:30PM +0200, Christoph Berg wrote:
> > Re: Peter Eisentraut
> > > > Agreed.  If pg_stat_statements were zero-configuration today then
> > > > this would be an annoying new burden, but it isn't.
> > >
> > > I think people can understand "add pg_stat_statements to
> > > shared_preload_libraries" and "install the extension".  You have to turn it
> > > on somehow after all.
> >
> > Fwiw, I'd claim that pg_stat_statements *is* zero-configuration today.
> > You just have to load the module (= shared_preload_libraries), and it
> > will start working. Later you can run CREATE EXTENSION to actually see
> > the stats, but they are already being collected in the background.
> >
> > > Now there is the additional burden of turning on this weird setting that no
> > > one understands.  That's a 50% increase in burden.
> > >
> > > And almost no one will want to use a nondefault setting.
> > >
> > > pg_stat_statements is pretty popular.  I think leaving in this requirement
> > > will lead to widespread confusion and complaints.
> >
> > Ack, please make the default config (i.e. after setting shared_preload_libraries)
> > do something sensible. Having to enable some "weird" internal other setting
> > will be very hard to explain to users.
> >
> > Fwiw, I'd even want to have pg_stat_statements enabled in core by
> > default. That would awesome UX. (And turning off could be as simple as
> > setting compute_query_id=off.)
>
> Techically, pg_stat_statements can turn on compute_query_id when it is
> loaded, even if it is 'off' in postgresql.conf, right?  And
> pg_stat_statements would know if an alternate hash method is being used,
> right?

+1 on this approach.  I agree that we should avoid having to make every
new user and every user who is upgrading with pg_stat_statements
installed have to go twiddle this parameter.

Thanks,

Stephen

Attachment

Re: compute_query_id and pg_stat_statements

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Bruce Momjian (bruce@momjian.us) wrote:
>> Techically, pg_stat_statements can turn on compute_query_id when it is
>> loaded, even if it is 'off' in postgresql.conf, right?  And
>> pg_stat_statements would know if an alternate hash method is being used,
>> right?

> +1 on this approach.

That'd make it impossible to turn off or adjust afterwards, wouldn't it?
I'm afraid the confusion stemming from that would outweigh any simplicity.

I would be in favor of logging a message at startup to the effect of
"this is misconfigured" (as per upthread discussion), although whether
people would see that is uncertain.

In the end, it's not like this is the first time we've ever made an
incompatible change in configuration needs; and it won't be the last
either.  I don't buy the argument that pg_stat_statements users can't
cope with adding the additional setting.  (Of course, we should be
careful to call it out as an incompatible change in the release notes.)

            regards, tom lane



Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Mon, Apr 26, 2021 at 12:56:13PM -0400, Tom Lane wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Bruce Momjian (bruce@momjian.us) wrote:
> >> Techically, pg_stat_statements can turn on compute_query_id when it is
> >> loaded, even if it is 'off' in postgresql.conf, right?  And
> >> pg_stat_statements would know if an alternate hash method is being used,
> >> right?
> 
> > +1 on this approach.
> 
> That'd make it impossible to turn off or adjust afterwards, wouldn't it?
> I'm afraid the confusion stemming from that would outweigh any simplicity.
> 
> I would be in favor of logging a message at startup to the effect of
> "this is misconfigured" (as per upthread discussion), although whether
> people would see that is uncertain.

I think a user-visible warning at CREATE EXNTENSION would help too.

> In the end, it's not like this is the first time we've ever made an
> incompatible change in configuration needs; and it won't be the last
> either.  I don't buy the argument that pg_stat_statements users can't
> cope with adding the additional setting.  (Of course, we should be
> careful to call it out as an incompatible change in the release notes.)

Agreed.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Magnus Hagander
Date:
On Mon, Apr 26, 2021 at 6:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Stephen Frost <sfrost@snowman.net> writes:
> > * Bruce Momjian (bruce@momjian.us) wrote:
> >> Techically, pg_stat_statements can turn on compute_query_id when it is
> >> loaded, even if it is 'off' in postgresql.conf, right?  And
> >> pg_stat_statements would know if an alternate hash method is being used,
> >> right?
>
> > +1 on this approach.
>
> That'd make it impossible to turn off or adjust afterwards, wouldn't it?
> I'm afraid the confusion stemming from that would outweigh any simplicity.

Thatäs why I suggested the three value one. Default to a mode where
it's automatic, which is what the majority is going to want, but have
a way to explicitly turn it on.


> I would be in favor of logging a message at startup to the effect of
> "this is misconfigured" (as per upthread discussion), although whether
> people would see that is uncertain.

Some people would. Many wouldn't, and sadly many hours would be spent
on debugging things before they got there -- based on experience of
how many people actually read the logs..

> In the end, it's not like this is the first time we've ever made an
> incompatible change in configuration needs; and it won't be the last
> either.  I don't buy the argument that pg_stat_statements users can't
> cope with adding the additional setting.  (Of course, we should be
> careful to call it out as an incompatible change in the release notes.)

The fact that we've made changes before that complicated our users
experience isn't in itself an argument for doing it again though...

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Tue, Apr 27, 2021 at 12:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Stephen Frost <sfrost@snowman.net> writes:
> > * Bruce Momjian (bruce@momjian.us) wrote:
> >> Techically, pg_stat_statements can turn on compute_query_id when it is
> >> loaded, even if it is 'off' in postgresql.conf, right?  And
> >> pg_stat_statements would know if an alternate hash method is being used,
> >> right?
>
> > +1 on this approach.
>
> That'd make it impossible to turn off or adjust afterwards, wouldn't it?

I think so, which would also make it impossible to use an external
query_id plugin.

Enabling compute_query_id by default or raising a WARNING in
pg_stat_statements' PG_INIT seems like the only 2 sensible options.



Re: compute_query_id and pg_stat_statements

From
Magnus Hagander
Date:
On Mon, Apr 26, 2021 at 7:00 PM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Mon, Apr 26, 2021 at 12:56:13PM -0400, Tom Lane wrote:
> > Stephen Frost <sfrost@snowman.net> writes:
> > > * Bruce Momjian (bruce@momjian.us) wrote:
> > >> Techically, pg_stat_statements can turn on compute_query_id when it is
> > >> loaded, even if it is 'off' in postgresql.conf, right?  And
> > >> pg_stat_statements would know if an alternate hash method is being used,
> > >> right?
> >
> > > +1 on this approach.
> >
> > That'd make it impossible to turn off or adjust afterwards, wouldn't it?
> > I'm afraid the confusion stemming from that would outweigh any simplicity.
> >
> > I would be in favor of logging a message at startup to the effect of
> > "this is misconfigured" (as per upthread discussion), although whether
> > people would see that is uncertain.
>
> I think a user-visible warning at CREATE EXNTENSION would help too.

It would help a bit, but actually logging it would probably help more.
Most people don't run the CREATE EXTENSION commands manually, it's all
done as part of either system install scripts or of application
migrations.

But that doesn't mean it wouldn't be useful to do it for those that
*do* run things manually, it just wouldn't be sufficient in itself.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Tue, Apr 27, 2021 at 1:04 AM Magnus Hagander <magnus@hagander.net> wrote:
>
> Thatäs why I suggested the three value one. Default to a mode where
> it's automatic, which is what the majority is going to want, but have
> a way to explicitly turn it on.

Agreed, that also sounds like a sensible default.



Re: compute_query_id and pg_stat_statements

From
Stephen Frost
Date:
Greetings,

* Magnus Hagander (magnus@hagander.net) wrote:
> On Mon, Apr 26, 2021 at 6:56 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Stephen Frost <sfrost@snowman.net> writes:
> > > * Bruce Momjian (bruce@momjian.us) wrote:
> > >> Techically, pg_stat_statements can turn on compute_query_id when it is
> > >> loaded, even if it is 'off' in postgresql.conf, right?  And
> > >> pg_stat_statements would know if an alternate hash method is being used,
> > >> right?
> >
> > > +1 on this approach.
> >
> > That'd make it impossible to turn off or adjust afterwards, wouldn't it?
> > I'm afraid the confusion stemming from that would outweigh any simplicity.

I don't know that it actually would, but ...

> Thatäs why I suggested the three value one. Default to a mode where
> it's automatic, which is what the majority is going to want, but have
> a way to explicitly turn it on.

This is certainly fine with me too, though it seems a bit surprising to
me that we couldn't just figure out what the user actually wants based
on what's installed/running for any given combination.

> > In the end, it's not like this is the first time we've ever made an
> > incompatible change in configuration needs; and it won't be the last
> > either.  I don't buy the argument that pg_stat_statements users can't
> > cope with adding the additional setting.  (Of course, we should be
> > careful to call it out as an incompatible change in the release notes.)
>
> The fact that we've made changes before that complicated our users
> experience isn't in itself an argument for doing it again though...

I'm generally a proponent of sensible changes across major versions even
if it means that the user has to adjust things, but this seems like a
case where we're punting on something that we really should just be able
to figure out the right answer to and that seems like a step backwards.

Thanks,

Stephen

Attachment

Re: compute_query_id and pg_stat_statements

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Magnus Hagander (magnus@hagander.net) wrote:
>> Thatäs why I suggested the three value one. Default to a mode where
>> it's automatic, which is what the majority is going to want, but have
>> a way to explicitly turn it on.

> This is certainly fine with me too, though it seems a bit surprising to
> me that we couldn't just figure out what the user actually wants based
> on what's installed/running for any given combination.

I'd be on board with having pg_stat_statement's pg_init function do
something to adjust the setting, if we can figure out how to do that
in a way that's not confusing in itself.  I'm not sure though that
the GUC engine offers a good way.

            regards, tom lane



Re: compute_query_id and pg_stat_statements

From
Alvaro Herrera
Date:
On 2021-Apr-26, Tom Lane wrote:

> Stephen Frost <sfrost@snowman.net> writes:
> > * Magnus Hagander (magnus@hagander.net) wrote:
> >> Thatäs why I suggested the three value one. Default to a mode where
> >> it's automatic, which is what the majority is going to want, but have
> >> a way to explicitly turn it on.
> 
> > This is certainly fine with me too, though it seems a bit surprising to
> > me that we couldn't just figure out what the user actually wants based
> > on what's installed/running for any given combination.
> 
> I'd be on board with having pg_stat_statement's pg_init function do
> something to adjust the setting, if we can figure out how to do that
> in a way that's not confusing in itself.  I'm not sure though that
> the GUC engine offers a good way.

I think it's straightforward, if we decouple the tri-valued enum used
for guc.c purposes from a separate boolean that actually enables the
feature.  GUC sets the boolean to "off" initially when it sees the enum
as "auto", and then pg_stat_statement's _PG_init modifies it during its
own startup as needed.

So the user can turn the GUC off, and then pg_stat_statement does
nothing and there is no performance drawback; or leave it "auto" and
it'll only compute query_id if pg_stat_statement is loaded; or leave it
on if they want the query_id for other purposes.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"El miedo atento y previsor es la madre de la seguridad" (E. Burke)



Re: compute_query_id and pg_stat_statements

From
Stephen Frost
Date:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Magnus Hagander (magnus@hagander.net) wrote:
> >> Thatäs why I suggested the three value one. Default to a mode where
> >> it's automatic, which is what the majority is going to want, but have
> >> a way to explicitly turn it on.
>
> > This is certainly fine with me too, though it seems a bit surprising to
> > me that we couldn't just figure out what the user actually wants based
> > on what's installed/running for any given combination.
>
> I'd be on board with having pg_stat_statement's pg_init function do
> something to adjust the setting, if we can figure out how to do that
> in a way that's not confusing in itself.  I'm not sure though that
> the GUC engine offers a good way.

Both of the extensions are getting loaded via pg_stat_statements and
both can have pg_init functions which work together to come up with the
right answer, no?

That is- can't pg_stat_statements, when it's loaded, enable
compute_query_id if it's not already enabled, and then the pg_queryid
module simply disable it when it gets loaded in it's pg_init()?  Telling
people who are using pg_queryid to have it loaded *after*
pg_stat_statements certainly seems reasonable to me, but if folks don't
like that then maybe have a tri-state which is 'auto', 'on', and 'off',
where pg_stat_statements would set it to 'on' if it's set to 'auto', but
not do anything if it starts as 'off'.  pg_queryid would then set it to
'off' when it's loaded and it wouldn't matter if it's loaded before or
after.

Thanks,

Stephen

Attachment

Re: compute_query_id and pg_stat_statements

From
Stephen Frost
Date:
Greetings,

* Alvaro Herrera (alvherre@alvh.no-ip.org) wrote:
> On 2021-Apr-26, Tom Lane wrote:
>
> > Stephen Frost <sfrost@snowman.net> writes:
> > > * Magnus Hagander (magnus@hagander.net) wrote:
> > >> Thatäs why I suggested the three value one. Default to a mode where
> > >> it's automatic, which is what the majority is going to want, but have
> > >> a way to explicitly turn it on.
> >
> > > This is certainly fine with me too, though it seems a bit surprising to
> > > me that we couldn't just figure out what the user actually wants based
> > > on what's installed/running for any given combination.
> >
> > I'd be on board with having pg_stat_statement's pg_init function do
> > something to adjust the setting, if we can figure out how to do that
> > in a way that's not confusing in itself.  I'm not sure though that
> > the GUC engine offers a good way.
>
> I think it's straightforward, if we decouple the tri-valued enum used
> for guc.c purposes from a separate boolean that actually enables the
> feature.  GUC sets the boolean to "off" initially when it sees the enum
> as "auto", and then pg_stat_statement's _PG_init modifies it during its
> own startup as needed.
>
> So the user can turn the GUC off, and then pg_stat_statement does
> nothing and there is no performance drawback; or leave it "auto" and
> it'll only compute query_id if pg_stat_statement is loaded; or leave it
> on if they want the query_id for other purposes.

Yeah, this is more-or-less the same as what I was just proposing in an
email that crossed this one.  Using a separate boolean would certainly
be fine.

Thanks,

Stephen

Attachment

Re: compute_query_id and pg_stat_statements

From
Andres Freund
Date:
Hi,

On 2021-04-26 13:43:31 -0400, Alvaro Herrera wrote:
> I think it's straightforward, if we decouple the tri-valued enum used
> for guc.c purposes from a separate boolean that actually enables the
> feature.  GUC sets the boolean to "off" initially when it sees the enum
> as "auto", and then pg_stat_statement's _PG_init modifies it during its
> own startup as needed.

> So the user can turn the GUC off, and then pg_stat_statement does
> nothing and there is no performance drawback; or leave it "auto" and
> it'll only compute query_id if pg_stat_statement is loaded; or leave it
> on if they want the query_id for other purposes.

I think that's the right direction. I wonder though if we shouldn't go a
bit further. Have one guc that determines the "query id provider" (NULL
or a shared library), and one GUC that configures whether query-id is
computed (never, on-demand/auto, always). For the provider GUC load the
.so and look up a function with some well known name.

Greetings,

Andres Freund



Re: compute_query_id and pg_stat_statements

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> I think that's the right direction. I wonder though if we shouldn't go a
> bit further. Have one guc that determines the "query id provider" (NULL
> or a shared library), and one GUC that configures whether query-id is
> computed (never, on-demand/auto, always). For the provider GUC load the
> .so and look up a function with some well known name.

That's sounding like a pretty sane design, actually.  Not sure about
the shared-library-name-with-fixed-function-name detail, but certainly
it seems to be useful to separate "I need a query-id" from the details
of the ID calculation.

Rather than a GUC per se for the ID provider, maybe we could have a
function hook that defaults to pointing at the in-core computation,
and then a module wanting to override that just gets into the hook.

            regards, tom lane



Re: compute_query_id and pg_stat_statements

From
Andres Freund
Date:
Hi,

On 2021-04-26 14:21:00 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> That's sounding like a pretty sane design, actually.  Not sure about
> the shared-library-name-with-fixed-function-name detail, but certainly
> it seems to be useful to separate "I need a query-id" from the details
> of the ID calculation.
> 
> Rather than a GUC per se for the ID provider, maybe we could have a
> function hook that defaults to pointing at the in-core computation,
> and then a module wanting to override that just gets into the hook.

I have a preference to determining the provider via GUC instead of a
hook because it is both easier to introspect and easier to configure.

If the provider is loaded via a hook, and the shared library is loaded
via shared_preload_libraries, one can't easily just turn that off in a
single session, but needs to restart or explicitly load a different
library (that can't already be loaded).

We also don't have any way to show what's hooking into a hook.

Greetings,

Andres Freund



Re: compute_query_id and pg_stat_statements

From
Magnus Hagander
Date:
On Mon, Apr 26, 2021 at 8:14 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2021-04-26 13:43:31 -0400, Alvaro Herrera wrote:
> > I think it's straightforward, if we decouple the tri-valued enum used
> > for guc.c purposes from a separate boolean that actually enables the
> > feature.  GUC sets the boolean to "off" initially when it sees the enum
> > as "auto", and then pg_stat_statement's _PG_init modifies it during its
> > own startup as needed.

That's pretty much exactly my original suggestion, yes :)


> > So the user can turn the GUC off, and then pg_stat_statement does
> > nothing and there is no performance drawback; or leave it "auto" and
> > it'll only compute query_id if pg_stat_statement is loaded; or leave it
> > on if they want the query_id for other purposes.
>
> I think that's the right direction. I wonder though if we shouldn't go a
> bit further. Have one guc that determines the "query id provider" (NULL
> or a shared library), and one GUC that configures whether query-id is
> computed (never, on-demand/auto, always). For the provider GUC load the
> .so and look up a function with some well known name.

On Mon, Apr 26, 2021 at 8:37 PM Andres Freund <andres@anarazel.de> wrote:
> I have a preference to determining the provider via GUC instead of a
> hook because it is both easier to introspect and easier to configure.

+1 in general. Though we could of course also have a read-only
internal GUC that would show what we ended up with, and still
configure it with shared_preload_libraries, or loaded in some other
way. In a way it'd be cleaner to "always load modules with
shared_preload_libraries", but I can certainly see the arguments in
either direction..

But whichever way it's configured, having a well exposed way to know
what it actually is would be important.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Mon, Apr 26, 2021 at 11:37:45AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2021-04-26 14:21:00 -0400, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > That's sounding like a pretty sane design, actually.  Not sure about
> > the shared-library-name-with-fixed-function-name detail, but certainly
> > it seems to be useful to separate "I need a query-id" from the details
> > of the ID calculation.
> > 
> > Rather than a GUC per se for the ID provider, maybe we could have a
> > function hook that defaults to pointing at the in-core computation,
> > and then a module wanting to override that just gets into the hook.
> 
> I have a preference to determining the provider via GUC instead of a
> hook because it is both easier to introspect and easier to configure.

In any case, having a different provider would greatly simplify third-party
queryid lib authors and users life.  For now the core queryid is computed
before post_parse_analyze_hook, but any third party plugin would have to do it
as a post_parse_analyze_hook, so you have to make sure that the lib is at the
right position in shared_preload_libraries to have it work, eg. [1], depending
on how pg_stat_statements and other similar module call
prev_post_parse_analyze_hook, which is a pretty bad thing.

> If the provider is loaded via a hook, and the shared library is loaded
> via shared_preload_libraries, one can't easily just turn that off in a
> single session, but needs to restart or explicitly load a different
> library (that can't already be loaded).

On the other hand we *don't* want to dynamically change the provider.
Temporarily enabling/disabling queryid calculation is ok, but generating
different have for the same query isn't.

> We also don't have any way to show what's hooking into a hook.

If we had a dedicated query_id hook, then plugins should error out if users
configured multiple plugins to calculate a query_id, so it should be easy to
know which plugin is responsible for it without knowing who hooked into the
hook.

[1] https://github.com/rjuju/pg_queryid/blob/master/pg_queryid.c#L116-L117



Re: compute_query_id and pg_stat_statements

From
Michael Paquier
Date:
On Tue, Apr 27, 2021 at 02:25:04PM +0800, Julien Rouhaud wrote:
> On Mon, Apr 26, 2021 at 11:37:45AM -0700, Andres Freund wrote:
>> On 2021-04-26 14:21:00 -0400, Tom Lane wrote:
>>> That's sounding like a pretty sane design, actually.  Not sure about
>>> the shared-library-name-with-fixed-function-name detail, but certainly
>>> it seems to be useful to separate "I need a query-id" from the details
>>> of the ID calculation.
>>>
>>> Rather than a GUC per se for the ID provider, maybe we could have a
>>> function hook that defaults to pointing at the in-core computation,
>>> and then a module wanting to override that just gets into the hook.
>>
>> I have a preference to determining the provider via GUC instead of a
>> hook because it is both easier to introspect and easier to configure.

So, this thread has died two weeks ago, and it is still an open item.
Could it be possible to move to a resolution by beta1?  The consensus
I can get from the thread is that we should have a tri-value state to
track an extra "auto" for the query ID computation, as proposed by
Alvaro here:
https://www.postgresql.org/message-id/20210426174331.GA19401@alvherre.pgsql

Unfortunately, nothing has happened to be able to do something like
that.
--
Michael

Attachment

Re: compute_query_id and pg_stat_statements

From
Fujii Masao
Date:

On 2021/05/11 15:04, Michael Paquier wrote:
> On Tue, Apr 27, 2021 at 02:25:04PM +0800, Julien Rouhaud wrote:
>> On Mon, Apr 26, 2021 at 11:37:45AM -0700, Andres Freund wrote:
>>> On 2021-04-26 14:21:00 -0400, Tom Lane wrote:
>>>> That's sounding like a pretty sane design, actually.  Not sure about
>>>> the shared-library-name-with-fixed-function-name detail, but certainly
>>>> it seems to be useful to separate "I need a query-id" from the details
>>>> of the ID calculation.
>>>>
>>>> Rather than a GUC per se for the ID provider, maybe we could have a
>>>> function hook that defaults to pointing at the in-core computation,
>>>> and then a module wanting to override that just gets into the hook.
>>>
>>> I have a preference to determining the provider via GUC instead of a
>>> hook because it is both easier to introspect and easier to configure.
> 
> So, this thread has died two weeks ago, and it is still an open item.
> Could it be possible to move to a resolution by beta1?  The consensus
> I can get from the thread is that we should have a tri-value state to
> track an extra "auto" for the query ID computation, as proposed by
> Alvaro here:
> https://www.postgresql.org/message-id/20210426174331.GA19401@alvherre.pgsql

+1

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Tue, May 11, 2021 at 03:04:13PM +0900, Michael Paquier wrote:
> On Tue, Apr 27, 2021 at 02:25:04PM +0800, Julien Rouhaud wrote:
> > On Mon, Apr 26, 2021 at 11:37:45AM -0700, Andres Freund wrote:
> >> On 2021-04-26 14:21:00 -0400, Tom Lane wrote:
> >>> That's sounding like a pretty sane design, actually.  Not sure about
> >>> the shared-library-name-with-fixed-function-name detail, but certainly
> >>> it seems to be useful to separate "I need a query-id" from the details
> >>> of the ID calculation.
> >>> 
> >>> Rather than a GUC per se for the ID provider, maybe we could have a
> >>> function hook that defaults to pointing at the in-core computation,
> >>> and then a module wanting to override that just gets into the hook.
> >> 
> >> I have a preference to determining the provider via GUC instead of a
> >> hook because it is both easier to introspect and easier to configure.
> 
> So, this thread has died two weeks ago, and it is still an open item.
> Could it be possible to move to a resolution by beta1?  The consensus
> I can get from the thread is that we should have a tri-value state to
> track an extra "auto" for the query ID computation, as proposed by
> Alvaro here:
> https://www.postgresql.org/message-id/20210426174331.GA19401@alvherre.pgsql
> 
> Unfortunately, nothing has happened to be able to do something like
> that.

My understanding was that there wasn't a consensus on how to fix the problem.

Anyway, PFA a patch that implement a [off | on | auto] compute_query_id, and
provides a new queryIdWanted() function to let third party plugins inform us
that they want a query id if possible.

As it was noted somewhere in that thread, that's a hack on top on the GUC
machinery, so compute_query_id will display "on" rather than "auto" (or "auto
and enabled" or whatever) since GUC isn't designed to handle that behavior.

For the record I also tested the patch using pg_qualstats(), which can be
loaded interactively and also benefits from a query identifier.  It works as
expected, as in "query idenfitier are enabled but only for the backend that
loaded pg_qualstats".

Attachment

Re: compute_query_id and pg_stat_statements

From
Magnus Hagander
Date:
On Tue, May 11, 2021 at 8:04 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Apr 27, 2021 at 02:25:04PM +0800, Julien Rouhaud wrote:
> > On Mon, Apr 26, 2021 at 11:37:45AM -0700, Andres Freund wrote:
> >> On 2021-04-26 14:21:00 -0400, Tom Lane wrote:
> >>> That's sounding like a pretty sane design, actually.  Not sure about
> >>> the shared-library-name-with-fixed-function-name detail, but certainly
> >>> it seems to be useful to separate "I need a query-id" from the details
> >>> of the ID calculation.
> >>>
> >>> Rather than a GUC per se for the ID provider, maybe we could have a
> >>> function hook that defaults to pointing at the in-core computation,
> >>> and then a module wanting to override that just gets into the hook.
> >>
> >> I have a preference to determining the provider via GUC instead of a
> >> hook because it is both easier to introspect and easier to configure.
>
> So, this thread has died two weeks ago, and it is still an open item.
> Could it be possible to move to a resolution by beta1?  The consensus
> I can get from the thread is that we should have a tri-value state to
> track an extra "auto" for the query ID computation, as proposed by
> Alvaro here:
> https://www.postgresql.org/message-id/20210426174331.GA19401@alvherre.pgsql


Technically I think that was my suggestion from earlier in that thread
that Alvaro just +1ed :)

That said, I sort of put that one aside when both Bruce and Tom
considered it "a pretty weird API" to quote Bruce. I had missed the
fact that Tom changed his mind (maybe when picking up on more of the
details).

And FTR, I still think this is the best way forward.

I think Andres also raised a good point about the ability to actually
know which one is in use.

Even if we keep the current way of *setting* the hook, I think it
might be worthwhile to expose a PGC_INTERNAL guc that shows *which*
implementation is actually in use?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: compute_query_id and pg_stat_statements

From
Magnus Hagander
Date:
On Tue, May 11, 2021 at 9:35 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Tue, May 11, 2021 at 03:04:13PM +0900, Michael Paquier wrote:
> > On Tue, Apr 27, 2021 at 02:25:04PM +0800, Julien Rouhaud wrote:
> > > On Mon, Apr 26, 2021 at 11:37:45AM -0700, Andres Freund wrote:
> > >> On 2021-04-26 14:21:00 -0400, Tom Lane wrote:
> > >>> That's sounding like a pretty sane design, actually.  Not sure about
> > >>> the shared-library-name-with-fixed-function-name detail, but certainly
> > >>> it seems to be useful to separate "I need a query-id" from the details
> > >>> of the ID calculation.
> > >>>
> > >>> Rather than a GUC per se for the ID provider, maybe we could have a
> > >>> function hook that defaults to pointing at the in-core computation,
> > >>> and then a module wanting to override that just gets into the hook.
> > >>
> > >> I have a preference to determining the provider via GUC instead of a
> > >> hook because it is both easier to introspect and easier to configure.
> >
> > So, this thread has died two weeks ago, and it is still an open item.
> > Could it be possible to move to a resolution by beta1?  The consensus
> > I can get from the thread is that we should have a tri-value state to
> > track an extra "auto" for the query ID computation, as proposed by
> > Alvaro here:
> > https://www.postgresql.org/message-id/20210426174331.GA19401@alvherre.pgsql
> >
> > Unfortunately, nothing has happened to be able to do something like
> > that.
>
> My understanding was that there wasn't a consensus on how to fix the problem.
>
> Anyway, PFA a patch that implement a [off | on | auto] compute_query_id, and
> provides a new queryIdWanted() function to let third party plugins inform us
> that they want a query id if possible.

30 second review -- wouldn't it be cleaner to keep a separate boolean
telling the backend "include it or not", which is set to true/false in
the guc assign hook and can then be flipped from false->true in
queryIdWanted()? (I'd suggest a more verbose name for that function
btw, something like requestQueryIdGeneration() or so).

(Again, just the 30 second review between meetings, so maybe I'm completely off)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Tue, May 11, 2021 at 09:43:25AM +0200, Magnus Hagander wrote:
> 
> 30 second review -- wouldn't it be cleaner to keep a separate boolean
> telling the backend "include it or not", which is set to true/false in
> the guc assign hook and can then be flipped from false->true in
> queryIdWanted()? (I'd suggest a more verbose name for that function
> btw, something like requestQueryIdGeneration() or so).
> 
> (Again, just the 30 second review between meetings, so maybe I'm completely off)

It it surely would, but then that variable would need to be explicitly handled
as it wouldn't be automatically inherited on Windows and EXEC_BACKEND right?



Re: compute_query_id and pg_stat_statements

From
Fujii Masao
Date:

On 2021/05/11 16:35, Julien Rouhaud wrote:
> Anyway, PFA a patch that implement a [off | on | auto] compute_query_id, and
> provides a new queryIdWanted() function to let third party plugins inform us
> that they want a query id if possible.

Thanks!


> As it was noted somewhere in that thread, that's a hack on top on the GUC
> machinery, so compute_query_id will display "on" rather than "auto" (or "auto
> and enabled" or whatever) since GUC isn't designed to handle that behavior.

Can't we work around this issue by making queryIdWanted() set another flag like query_id_wanted instead of overwriting
compute_query_id?If we do this, query id computation is necessary when "compute_query_id == COMPUTE_QUERY_ID_ON ||
(compute_query_id== COMPUTE_QUERY_ID_AUTO && query_id_wanted)".
 

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Tue, May 11, 2021 at 05:41:53PM +0900, Fujii Masao wrote:
> 
> On 2021/05/11 16:35, Julien Rouhaud wrote:
> > Anyway, PFA a patch that implement a [off | on | auto] compute_query_id, and
> > provides a new queryIdWanted() function to let third party plugins inform us
> > that they want a query id if possible.
> 
> Thanks!
> 
> 
> > As it was noted somewhere in that thread, that's a hack on top on the GUC
> > machinery, so compute_query_id will display "on" rather than "auto" (or "auto
> > and enabled" or whatever) since GUC isn't designed to handle that behavior.
> 
> Can't we work around this issue by making queryIdWanted() set another flag like query_id_wanted instead of
overwritingcompute_query_id? If we do this, query id computation is necessary when "compute_query_id ==
COMPUTE_QUERY_ID_ON|| (compute_query_id == COMPUTE_QUERY_ID_AUTO && query_id_wanted)".
 

That's exactly what Magnus mentioned :)  It's not possible because variable
aren't inherited on Windows or EXEC_BACKEND.  I didn't check but I'm
assuming that it could work if the other flag was an internal GUC that couldn't
be set by users, but then we would have some kind of internal flag that would
have to be documented as "how to check if compute_query_id" is actually enabled
or not, which doesn't seem like a good idea.

Another approach would be to add a new "auto (enabled)" option to the enum, and
prevent users from manually setting the guc to that value.  It's not perfect
but maybe it would be cleaner.

Overall it seems that we don't have a clear consensus on how exactly to address
the problem, which is why I originally didn't sent a patch.



Re: compute_query_id and pg_stat_statements

From
Magnus Hagander
Date:
On Tue, May 11, 2021 at 10:51 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Tue, May 11, 2021 at 05:41:53PM +0900, Fujii Masao wrote:
> >
> > On 2021/05/11 16:35, Julien Rouhaud wrote:
> > > Anyway, PFA a patch that implement a [off | on | auto] compute_query_id, and
> > > provides a new queryIdWanted() function to let third party plugins inform us
> > > that they want a query id if possible.
> >
> > Thanks!
> >
> >
> > > As it was noted somewhere in that thread, that's a hack on top on the GUC
> > > machinery, so compute_query_id will display "on" rather than "auto" (or "auto
> > > and enabled" or whatever) since GUC isn't designed to handle that behavior.
> >
> > Can't we work around this issue by making queryIdWanted() set another flag like query_id_wanted instead of
overwritingcompute_query_id? If we do this, query id computation is necessary when "compute_query_id ==
COMPUTE_QUERY_ID_ON|| (compute_query_id == COMPUTE_QUERY_ID_AUTO && query_id_wanted)". 
>
> That's exactly what Magnus mentioned :)  It's not possible because variable
> aren't inherited on Windows or EXEC_BACKEND.  I didn't check but I'm
> assuming that it could work if the other flag was an internal GUC that couldn't
> be set by users, but then we would have some kind of internal flag that would
> have to be documented as "how to check if compute_query_id" is actually enabled
> or not, which doesn't seem like a good idea.

That doesn't fundamentally make it impossible, you just have to add it
to the list of variables being copied over, wouldn't you? See
save_backend_variables()

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Tue, May 11, 2021 at 10:59:51AM +0200, Magnus Hagander wrote:
> 
> That doesn't fundamentally make it impossible, you just have to add it
> to the list of variables being copied over, wouldn't you? See
> save_backend_variables()

Yes, I agree, and that's what I meant by "explicitly handled".  The thing is
that I don't know if that's the best way to go, as it doesn't solve the "is it
actually enabled" and/or "which implementation is used".  At least the patch I
sent, although it's totally a hack, let you know if compute_query_id is enabled
or not.  I'm fine with implementing it that way, but only if there's a
consensus.



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Tue, May 11, 2021 at 05:41:06PM +0800, Julien Rouhaud wrote:
> On Tue, May 11, 2021 at 10:59:51AM +0200, Magnus Hagander wrote:
> > 
> > That doesn't fundamentally make it impossible, you just have to add it
> > to the list of variables being copied over, wouldn't you? See
> > save_backend_variables()
> 
> Yes, I agree, and that's what I meant by "explicitly handled".  The thing is
> that I don't know if that's the best way to go, as it doesn't solve the "is it
> actually enabled" and/or "which implementation is used".  At least the patch I
> sent, although it's totally a hack, let you know if compute_query_id is enabled
> or not.  I'm fine with implementing it that way, but only if there's a
> consensus.

Actually, isn't that how e.g. wal_buffers = -1 is working?  The original value
is lost and what you get is the computed value.  This is a bit different here
as the value isn't always changed, and can be changed interactively but
otherwise it's the same behavior.



Re: compute_query_id and pg_stat_statements

From
Kyotaro Horiguchi
Date:
At Tue, 11 May 2021 18:52:49 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in 
> On Tue, May 11, 2021 at 05:41:06PM +0800, Julien Rouhaud wrote:
> > On Tue, May 11, 2021 at 10:59:51AM +0200, Magnus Hagander wrote:
> > > 
> > > That doesn't fundamentally make it impossible, you just have to add it
> > > to the list of variables being copied over, wouldn't you? See
> > > save_backend_variables()
> > 
> > Yes, I agree, and that's what I meant by "explicitly handled".  The thing is
> > that I don't know if that's the best way to go, as it doesn't solve the "is it
> > actually enabled" and/or "which implementation is used".  At least the patch I
> > sent, although it's totally a hack, let you know if compute_query_id is enabled
> > or not.  I'm fine with implementing it that way, but only if there's a
> > consensus.
> 
> Actually, isn't that how e.g. wal_buffers = -1 is working?  The original value
> is lost and what you get is the computed value.  This is a bit different here
> as the value isn't always changed, and can be changed interactively but
> otherwise it's the same behavior.

If we look it in pg_settings, it shows the current value and the value
at boot-time.  So I'm fine with that behavior.

However, IMHO, I doubt the necessity of "on". Assuming that we require
any module that wants query-id to call queryIdWanted() at any time
after each process startup (or it could be inherited to children), I
think only "auto" and "off" are enough for the variable.  Thinking in
this line, the variable is a subset of a GUC variable to specify the
name of a query-id provider (as Andres suggested upthread), and I
think it would work better in future.

So for now I propose that we have a variable query_id_provider that
has only 'default' and 'none' as the domain.  We can later expand it
so that any other query-id provider modules can be loaded without
chaning the interface.

postgresql.conf
# query_id_provider = 'default'  # provider module for query-id. 'none' means
#                                # disabling query-id calculation.

If we want to have a direct way to know whether query-id is active or
not, it'd be good to have a read-only variable "query_id_active".

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
 



Re: compute_query_id and pg_stat_statements

From
Kyotaro Horiguchi
Date:
At Wed, 12 May 2021 10:42:01 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in 
> Hello Horiguchi-san,
> 
> On Wed, May 12, 2021 at 11:08:36AM +0900, Kyotaro Horiguchi wrote:
> > 
> > If we look it in pg_settings, it shows the current value and the value
> > at boot-time.  So I'm fine with that behavior.
> > 
> > However, IMHO, I doubt the necessity of "on". Assuming that we require
> > any module that wants query-id to call queryIdWanted() at any time
> > after each process startup (or it could be inherited to children), I
> > think only "auto" and "off" are enough for the variable.
> 
> I don't think that this approach would cope well for people who want a queryid
> without pg_stat_statements or such.  Since the queryid can now be found in
> pg_stat_activity, EXPLAIN output or the logs I think it's entirely reasonable
> to allow users to benefit from that even if they don't install additional
> module.

Ah, I missed that case.  And we are wanting to use pg_stat_statements
with (almost) zero-config?  How about the following behavior?

Setting query_id_provider to 'none' means we don't calculate query-id
by default. However, if queryIdWante() is called, the default provider
is set up and starts calculating query id.

Setting query_id_provider to something else means the user wants
query-id calcualted using the provider. Setting 'default' is
equivalent to setting compute_query_id to 'on'.

There might be a case where a user sets query_id_provider to
non-'none' but don't want have query-id calculated, but it can be said
a kind of mis-configuration?

> > Thinking in
> > this line, the variable is a subset of a GUC variable to specify the
> > name of a query-id provider (as Andres suggested upthread), and I
> > think it would work better in future.
> > 
> > So for now I propose that we have a variable query_id_provider that
> > has only 'default' and 'none' as the domain.
> 
> I think this would be a mistake to do that, as it would mean that we don't
> officially support alternative queryid provider.

Ok, if we want to support alternative providers from the first, we
need to actually write the loader code for query-id providers.  It
would not be so hard?, but it might not be suitable to this stage so I
proposed that to get rid of needing such complexity for now.

(Anyway I prefer to load query-id provider as a dynamically loadable
 module rather than hook-function.)

> > We can later expand it
> > so that any other query-id provider modules can be loaded without
> > chaning the interface.
> 
> The GUC itself may not change, but third-party queryid provider would probably
> need changes as the new entry point will be dedicated to compute a queryid
> only, while third-party plugins may do more than that in their
> post_parse_analyze_hook.  And also users will have to change their

I don't think it is not that a problem. Even if any third-party
extension is having query-id generator by itself, in most cases it
would be a copy of JumbleQuery in case of pg_stat_statement is not
loaded and now it is moved in-core as 'default' provider. What the
exntension needs to be done is just ripping out the copied generator
code.  I guess...

> configuration to use that new interface, and additionally the module may now
> have to be removed from shared_preload_libraries.  Overall, it doesn't seem to
> me that it would make users' life easier.

Why the third-party module need to be removed from
shared_preload_libraries?  The module can stay as a preloaded shared
library but just no longer need to have its own query-id provider
since it is provided in-core.  If the extension required a specific
provider, the developer need to make it a loadable module and users
need to specify the provider module explicitly.  I don't think that is
not a problem but if we wanted to make it easier, we can let users
free from that step by allowing 'auto' for query-id-provider to load
any module by the first extension.

So, for example, how about the following interface?

GUC query_id_provider:

- 'none' : query_id is not calculated, don't allow loading external
           generator module.

- 'default' : use default provider and calculate query-id.

- '<provider-name>' : use the provider and calculate query-id using it.

- 'auto' : query_id is not calculated, but allow to load query-id
           provider if queryIdWanted() is called.

# of course 'auto' and 'default' are inhibited as the provier name.

- core function bool queryIdWanted(char *provider_name, bool use_existing)

 Allows extensions to request to load a provider if not yet, then
 start calculating query-id.  Returns true if the request is accepted.

 provider_name :

  - 'default' or '<provider-name>': requests the provider to be loaded
    and start calculating query-id. Refuse the request if 'none' is
    set to query_id_provider.

 use_existing: Set true to allow using a provider already loaded.
    Otherwise refuses the request if any other provider than
    prvoder_name is already loaded.

In most cases users set query_id_provider to 'auto' and extensions
call queryIdWanted with ('default', true).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: compute_query_id and pg_stat_statements

From
Kyotaro Horiguchi
Date:
At Wed, 12 May 2021 14:33:35 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> Ok, if we want to support alternative providers from the first, we
> need to actually write the loader code for query-id providers.  It
> would not be so hard?, but it might not be suitable to this stage so I
> proposed that to get rid of needing such complexity for now.
> 
> (Anyway I prefer to load query-id provider as a dynamically loadable
>  module rather than hook-function.)
...
> So, for example, how about the following interface?
> 
> GUC query_id_provider:
> 
> - 'none' : query_id is not calculated, don't allow loading external
>            generator module.
> 
> - 'default' : use default provider and calculate query-id.
> 
> - '<provider-name>' : use the provider and calculate query-id using it.
> 
> - 'auto' : query_id is not calculated, but allow to load query-id
>            provider if queryIdWanted() is called.
> 
> # of course 'auto' and 'default' are inhibited as the provier name.
> 
> - core function bool queryIdWanted(char *provider_name, bool use_existing)
> 
>  Allows extensions to request to load a provider if not yet, then
>  start calculating query-id.  Returns true if the request is accepted.
> 
>  provider_name :
> 
>   - 'default' or '<provider-name>': requests the provider to be loaded
>     and start calculating query-id. Refuse the request if 'none' is
>     set to query_id_provider.
> 
>  use_existing: Set true to allow using a provider already loaded.
>     Otherwise refuses the request if any other provider than
>     prvoder_name is already loaded.
> 
> In most cases users set query_id_provider to 'auto' and extensions
> call queryIdWanted with ('default', true).

Hmm. They are in contradiction.  Based on this future picture, at this
stage it can be simplified to allowing only 'default' as the provider
name.

If you want to support any other provider at this point,,, we need to
imlement the full-spec?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: compute_query_id and pg_stat_statements

From
Pavel Stehule
Date:
Hi


Ah, I missed that case.  And we are wanting to use pg_stat_statements
with (almost) zero-config?  How about the following behavior?


Until now, the pg_stat_statements was zero-config. So the change is not user friendly.

The idea so pg_stat_statements requires enabled computed_query_id is not good. There should be dependency only on the queryid column.

Regards

Pavel

Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Wed, May 12, 2021 at 07:49:13AM +0200, Pavel Stehule wrote:
> 
> > Ah, I missed that case.  And we are wanting to use pg_stat_statements
> > with (almost) zero-config?  How about the following behavior?
> >
> >
> Until now, the pg_stat_statements was zero-config. So the change is not
> user friendly.

Apart from configuring shared_preload_libraries, but agreed.

> The idea so pg_stat_statements requires enabled computed_query_id is not
> good. There should be dependency only on the queryid column.

I agree that requiring to change compute_query_id when you already added
pg_stat_statements in shared_preload_libraries isn't good, and the patch I sent
yesterday would fix that.



Re: compute_query_id and pg_stat_statements

From
Pavel Stehule
Date:


st 12. 5. 2021 v 8:10 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
On Wed, May 12, 2021 at 07:49:13AM +0200, Pavel Stehule wrote:
>
> > Ah, I missed that case.  And we are wanting to use pg_stat_statements
> > with (almost) zero-config?  How about the following behavior?
> >
> >
> Until now, the pg_stat_statements was zero-config. So the change is not
> user friendly.

Apart from configuring shared_preload_libraries, but agreed.

> The idea so pg_stat_statements requires enabled computed_query_id is not
> good. There should be dependency only on the queryid column.

I agree that requiring to change compute_query_id when you already added
pg_stat_statements in shared_preload_libraries isn't good, and the patch I sent
yesterday would fix that.

I don't like the idea of implicit force enabling any feature flag, but it is better than current design. But it doesn't look like a robust solution.

Does it mean that if somebody disables computed_query_id, then pg_stat_statements will not work?

Why is there the strong dependency between computed_query_id and pg_stat_statements? Can this dependency be just optional?

Regards

Pavel

Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Wed, May 12, 2021 at 08:58:45AM +0200, Pavel Stehule wrote:
> 
> I don't like the idea of implicit force enabling any feature flag, but it
> is better than current design. But it doesn't look like a robust solution.
> 
> Does it mean that if somebody disables computed_query_id, then
> pg_stat_statements will not work?

It depends, but if you mean "setting up pg_stat_statements, intentionally
disabling in-core queryid calculation and not configuring an alternative
source" then yes pg_stat_statements will not work.  But I don't see any
difference from "someone reduce wal_level and complain that replication does
not work" or "someone disable fsync and complain that data got corrupted".  We
provide a sensible default configuration, you can mess it up if you don't know
what you're doing.

> Why is there the strong dependency between computed_query_id and
> pg_stat_statements? Can this dependency be just optional?

Once again no, as it otherwise would mean that postgres unilaterally decides
that pg_stat_statements' approach to compute a query identifier is the one and
only ultimate truth and nothing else could be useful for anyone.



Re: compute_query_id and pg_stat_statements

From
Pavel Stehule
Date:


st 12. 5. 2021 v 9:13 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
On Wed, May 12, 2021 at 08:58:45AM +0200, Pavel Stehule wrote:
>
> I don't like the idea of implicit force enabling any feature flag, but it
> is better than current design. But it doesn't look like a robust solution.
>
> Does it mean that if somebody disables computed_query_id, then
> pg_stat_statements will not work?

It depends, but if you mean "setting up pg_stat_statements, intentionally
disabling in-core queryid calculation and not configuring an alternative
source" then yes pg_stat_statements will not work.  But I don't see any
difference from "someone reduce wal_level and complain that replication does
not work" or "someone disable fsync and complain that data got corrupted".  We
provide a sensible default configuration, you can mess it up if you don't know
what you're doing.

> Why is there the strong dependency between computed_query_id and
> pg_stat_statements? Can this dependency be just optional?

Once again no, as it otherwise would mean that postgres unilaterally decides
that pg_stat_statements' approach to compute a query identifier is the one and
only ultimate truth and nothing else could be useful for anyone.

ok. Understand.

If I understand well, then computed_query_id does not make sense for pg_stat_statemenst, because this extension always requires it.

Cannot be better to use queryid inside pg_stat_statements every time without dependency on computed_query_id? And computed_query_id can be used only for EXPLAIN and for pg_stat_activity.

pg_stat_statements cannot work without a queryid, so is useless to speak about configuration. If you use pg_stat_statements, then the queryid will be computed every time, but the visibility will be only for pg_stat_statements.

Or a different strategy. I understand so computed_query_id should be active. But I dislike the empty result of pg_stat_statements when computed_query_id is off. Is it possible to raise an exception instead of showing an empty result?

The most correct fix from my perspective is just check in function pg_stat_statements if query id is computed or not. If not, and there is no data, then raise an exception with the hint "enable compute_query_id". When there is data, then show a warning with the mentioned hint and show data.

What do you think about it?

Pavel






Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Wed, May 12, 2021 at 09:51:26AM +0200, Pavel Stehule wrote:
> 
> If I understand well, then computed_query_id does not make sense for
> pg_stat_statemenst, because this extension always requires it.

No, pg_stat_statements requires *a* queryid, not specifially *our* queryid.

> Cannot be better to use queryid inside pg_stat_statements every time
> without dependency on computed_query_id? And computed_query_id can be used
> only for EXPLAIN and for pg_stat_activity.

No, because then you will have a discrepancy between those two.  And if you
want a different queryid approach (say based on object names rather than oid so
it survives logical replication), then you also want that queryid used for
pg_stat_statements.  And that what happen is that you have to fork
pg_stat_statements to only change the queryid implementation, which is one of
the thing that the patch to move the implementation to core solves.

> pg_stat_statements cannot work without a queryid, so is useless to speak
> about configuration. If you use pg_stat_statements, then the queryid will
> be computed every time, but the visibility will be only for
> pg_stat_statements.

Yes, pg_stat_statements cannot work without a queryid, but it CAN work without
core queryid.

> Or a different strategy. I understand so computed_query_id should be
> active. But I dislike the empty result of pg_stat_statements when
> computed_query_id is off. Is it possible to raise an exception instead of
> showing an empty result?

Yes, but I don't think that it's a good idea.  For instance pg_stat_statements
will behave poorly if you have to regularly evict entry.  For instance: any
query touching a temporary table.  One way to avoid that it to avoid storing
entries that you know are very likely to be eventually evicted.

So to fix this problem, you have 2 ways to go:

1) fix your app and explicitly disable/enable pg_stat_statements around all
  those queries, and hope you won't miss any

2) write your own queryid implementation to not generate a queryid in such case.

2 seems like a reasonable scenario, and if you force pg_stat_statements to
error out in that case then you would be forced to use approach 1.



Re: compute_query_id and pg_stat_statements

From
Pavel Stehule
Date:


st 12. 5. 2021 v 10:14 odesílatel Julien Rouhaud <rjuju123@gmail.com> napsal:
On Wed, May 12, 2021 at 09:51:26AM +0200, Pavel Stehule wrote:
>
> If I understand well, then computed_query_id does not make sense for
> pg_stat_statemenst, because this extension always requires it.

No, pg_stat_statements requires *a* queryid, not specifially *our* queryid.

> Cannot be better to use queryid inside pg_stat_statements every time
> without dependency on computed_query_id? And computed_query_id can be used
> only for EXPLAIN and for pg_stat_activity.

No, because then you will have a discrepancy between those two.  And if you
want a different queryid approach (say based on object names rather than oid so
it survives logical replication), then you also want that queryid used for
pg_stat_statements.  And that what happen is that you have to fork
pg_stat_statements to only change the queryid implementation, which is one of
the thing that the patch to move the implementation to core solves.

> pg_stat_statements cannot work without a queryid, so is useless to speak
> about configuration. If you use pg_stat_statements, then the queryid will
> be computed every time, but the visibility will be only for
> pg_stat_statements.

Yes, pg_stat_statements cannot work without a queryid, but it CAN work without
core queryid.

 

> Or a different strategy. I understand so computed_query_id should be
> active. But I dislike the empty result of pg_stat_statements when
> computed_query_id is off. Is it possible to raise an exception instead of
> showing an empty result?

Yes, but I don't think that it's a good idea.  For instance pg_stat_statements
will behave poorly if you have to regularly evict entry.  For instance: any
query touching a temporary table.  One way to avoid that it to avoid storing
entries that you know are very likely to be eventually evicted.

So to fix this problem, you have 2 ways to go:

1) fix your app and explicitly disable/enable pg_stat_statements around all
  those queries, and hope you won't miss any

2) write your own queryid implementation to not generate a queryid in such case.

2 seems like a reasonable scenario, and if you force pg_stat_statements to
error out in that case then you would be forced to use approach 1.

My second proposal can work for your example too. pg_stat_statements have to require any active queryid computing. And when it is not available, then the exception should be raised.

The custom queryid can return null, and still the queryid will be computed. Maybe the warning can be enough. Just, if somebody use pg_stat_statements function, then enforce the check if queryid is computed (compute_query_id is true || some hook is not null), and if not then raise a warning.




Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Wed, May 12, 2021 at 10:57:25AM +0200, Pavel Stehule wrote:
> 
> My second proposal can work for your example too. pg_stat_statements have
> to require any active queryid computing. And when it is not available, then
> the exception should be raised.
> 
> The custom queryid can return null, and still the queryid will be computed.
> Maybe the warning can be enough. Just, if somebody use pg_stat_statements
> function, then enforce the check if queryid is computed (compute_query_id
> is true || some hook is not null), and if not then raise a warning.

Ah I'm sorry I misunderstood your proposal.  Yes, definitely adding a warning
or an error when executing pg_stat_statements() SRF would help, that's a great
idea!

I'll wait a bit in case someone has any objection, and if not send an updated
patch!



Re: compute_query_id and pg_stat_statements

From
Kyotaro Horiguchi
Date:
At Wed, 12 May 2021 14:05:16 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in 
> On Wed, May 12, 2021 at 02:33:35PM +0900, Kyotaro Horiguchi wrote:
> > At Wed, 12 May 2021 10:42:01 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in 
> > > 
> > > I don't think that this approach would cope well for people who want a queryid
> > > without pg_stat_statements or such.  Since the queryid can now be found in
> > > pg_stat_activity, EXPLAIN output or the logs I think it's entirely reasonable
> > > to allow users to benefit from that even if they don't install additional
> > > module.
> > 
> > Ah, I missed that case.  And we are wanting to use pg_stat_statements
> > with (almost) zero-config?  How about the following behavior?
> > 
> > Setting query_id_provider to 'none' means we don't calculate query-id
> > by default. However, if queryIdWante() is called, the default provider
> > is set up and starts calculating query id.
> 
> Having "none" meant "not unless someone asks for it" looks like a POLA
> violation.

Sorry for confusion. A different behavior for "none" is proposed later
in the mail. It is just an intermediate discussion.

> > Setting query_id_provider to something else means the user wants
> > query-id calcualted using the provider. Setting 'default' is
> > equivalent to setting compute_query_id to 'on'.
> > 
> > There might be a case where a user sets query_id_provider to
> > non-'none' but don't want have query-id calculated, but it can be said
> > a kind of mis-configuration?
> 
> So if I'm understanding correctly, you're arguing for an approach different to
> what Michael stated as the general consensus in [1].  I'm not saying that I
> think it's a bad idea (and I actually suggested it before), but we have to
> chose an approach and stick with it.

I'm not sure how much room for change of the direction is left. So it
was just a proposal. So if the majority still thinks that it is the
way to stick to controling on/off/(auto) the in-core implement and
separately allow another module to be hooked, I don't further object
to that decision.

> > > I think this would be a mistake to do that, as it would mean that we don't
> > > officially support alternative queryid provider.
> > 
> > Ok, if we want to support alternative providers from the first, we
> > need to actually write the loader code for query-id providers.  It
> > would not be so hard?, but it might not be suitable to this stage so I
> > proposed that to get rid of needing such complexity for now.
> 
> I did write a POC extension [2] to demonstrate that moving pg_stat_statement's
> queryid calculation in core doesn't mean that we're imposing it to everyone.
> And yes this is critical and a must have in the initial implementation.

Ok, understood.

> > (Anyway I prefer to load query-id provider as a dynamically loadable
> >  module rather than hook-function.)
> 
> I agree that having a specific API (I'm fine with a hook or a dynamically
> loaded function) for that would be better, but it doesn't appear to be the
> opinion of the majority.

Ugg. Ok.

> > > The GUC itself may not change, but third-party queryid provider would probably
> > > need changes as the new entry point will be dedicated to compute a queryid
> > > only, while third-party plugins may do more than that in their
> > > post_parse_analyze_hook.  And also users will have to change their
> > 
> > I don't think it is not that a problem.
> 
> Did you mean "I don't think that it's a problem"?  Otherwise I don't get it.

Yes, you're right.  Sorry for the typo.

> > Even if any third-party
> > extension is having query-id generator by itself, in most cases it
> > would be a copy of JumbleQuery in case of pg_stat_statement is not
> > loaded and now it is moved in-core as 'default' provider. What the
> > exntension needs to be done is just ripping out the copied generator
> > code.  I guess...
> 
> I don't fully understand, but it seems that you're arguing that the only use
> case is to have something similar to pg_stat_statements (say e.g.
> pg_store_plans), that always have the same queryid implementation as
> pg_stat_statements.  That's not the case, as there already are "clones" of
> pg_stat_statements, and the main difference is an alternative queryid
> implementation.  So in that case what author would do is to drop everything
> *except* the queryid implementation.
> 
> And if I'm not mistaken, pg_store_plans also wants a different queryid
> implementation, but has to handle a secondary queryid on top of it
> (https://github.com/ossc-db/pg_store_plans/blob/master/pg_store_plans.c#L843-L855).

Yeah, the extension intended to be used joining with the
pg_stat_statements view. And the reason for the second query-id dates
back to the era when query id was not available in the
pg_stat_statements view. Now it is mere a fall-back query id when
pg_stat_statments is not active.  Now that the in-core query-id is
available, I think there's no reason to keep that implement.

> So here again what the extension want is to get rid of pg_stat_statements (and
> now core) queryid implementation.

So the extension might be a good reason for the discussion^^;

> > > configuration to use that new interface, and additionally the module may now
> > > have to be removed from shared_preload_libraries.  Overall, it doesn't seem to
> > > me that it would make users' life easier.
> > 
> > Why the third-party module need to be removed from
> > shared_preload_libraries?  The module can stay as a preloaded shared
> > library but just no longer need to have its own query-id provider
> > since it is provided in-core.  If the extension required a specific
> > provider, the developer need to make it a loadable module and users
> > need to specify the provider module explicitly.
> 
> It's the same misunderstanding here.  Basically people want to benefit from the
> whole ecosystem based on a queryid (pg_stat_statements, now
> pg_stat_activity.query_id and such) but with another definition of what a
> queryid is.  So those people will now only need to implement something like
> [2], rather than forking every single extension they want to use.

Hmm. I'm not sure the [2] gives sufficient reason for leaving the
current interface.  But will follow if it is sitll the consensus. (And
it seems like true.)

> [1]: https://www.postgresql.org/message-id/YJoeXcrwe1EDmqKT@paquier.xyz
> [2]: https://github.com/rjuju/pg_queryid

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: compute_query_id and pg_stat_statements

From
Kyotaro Horiguchi
Date:
At Wed, 12 May 2021 17:30:26 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in 
> On Wed, May 12, 2021 at 10:57:25AM +0200, Pavel Stehule wrote:
> > 
> > My second proposal can work for your example too. pg_stat_statements have
> > to require any active queryid computing. And when it is not available, then
> > the exception should be raised.
> > 
> > The custom queryid can return null, and still the queryid will be computed.
> > Maybe the warning can be enough. Just, if somebody use pg_stat_statements
> > function, then enforce the check if queryid is computed (compute_query_id
> > is true || some hook is not null), and if not then raise a warning.
> 
> Ah I'm sorry I misunderstood your proposal.  Yes, definitely adding a warning
> or an error when executing pg_stat_statements() SRF would help, that's a great
> idea!
> 
> I'll wait a bit in case someone has any objection, and if not send an updated
> patch!

Isn't there a case where pg_stat_statements uses an alternative
query-id provider?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: compute_query_id and pg_stat_statements

From
Pavel Stehule
Date:


st 12. 5. 2021 v 11:39 odesílatel Kyotaro Horiguchi <horikyota.ntt@gmail.com> napsal:
At Wed, 12 May 2021 17:30:26 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in
> On Wed, May 12, 2021 at 10:57:25AM +0200, Pavel Stehule wrote:
> >
> > My second proposal can work for your example too. pg_stat_statements have
> > to require any active queryid computing. And when it is not available, then
> > the exception should be raised.
> >
> > The custom queryid can return null, and still the queryid will be computed.
> > Maybe the warning can be enough. Just, if somebody use pg_stat_statements
> > function, then enforce the check if queryid is computed (compute_query_id
> > is true || some hook is not null), and if not then raise a warning.
>
> Ah I'm sorry I misunderstood your proposal.  Yes, definitely adding a warning
> or an error when executing pg_stat_statements() SRF would help, that's a great
> idea!
>
> I'll wait a bit in case someone has any objection, and if not send an updated
> patch!

Isn't there a case where pg_stat_statements uses an alternative
query-id provider?

this check just can check if there is "any" query-id provider. In this context is not important if it is buildin or external


regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Re: compute_query_id and pg_stat_statements

From
Kyotaro Horiguchi
Date:
At Wed, 12 May 2021 18:39:27 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Wed, 12 May 2021 17:30:26 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in 
> > On Wed, May 12, 2021 at 10:57:25AM +0200, Pavel Stehule wrote:
> > > 
> > > My second proposal can work for your example too. pg_stat_statements have
> > > to require any active queryid computing. And when it is not available, then
> > > the exception should be raised.
> > > 
> > > The custom queryid can return null, and still the queryid will be computed.
> > > Maybe the warning can be enough. Just, if somebody use pg_stat_statements
> > > function, then enforce the check if queryid is computed (compute_query_id
> > > is true || some hook is not null), and if not then raise a warning.
> > 
> > Ah I'm sorry I misunderstood your proposal.  Yes, definitely adding a warning
> > or an error when executing pg_stat_statements() SRF would help, that's a great
> > idea!
> > 
> > I'll wait a bit in case someone has any objection, and if not send an updated
> > patch!
> 
> Isn't there a case where pg_stat_statements uses an alternative
> query-id provider?

I don't object that if we allow false non-error when an extension that
uses the hooks but doesn't compute a query id.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Wed, May 12, 2021 at 11:42:12AM +0200, Pavel Stehule wrote:
> st 12. 5. 2021 v 11:39 odesílatel Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> napsal:
> 
> > At Wed, 12 May 2021 17:30:26 +0800, Julien Rouhaud <rjuju123@gmail.com>
> > wrote in
> > > On Wed, May 12, 2021 at 10:57:25AM +0200, Pavel Stehule wrote:
> > > >
> > > > My second proposal can work for your example too. pg_stat_statements
> > have
> > > > to require any active queryid computing. And when it is not available,
> > then
> > > > the exception should be raised.
> > > >
> > > > The custom queryid can return null, and still the queryid will be
> > computed.
> > > > Maybe the warning can be enough. Just, if somebody use
> > pg_stat_statements
> > > > function, then enforce the check if queryid is computed
> > (compute_query_id
> > > > is true || some hook is not null), and if not then raise a warning.
> > >
> > > Ah I'm sorry I misunderstood your proposal.  Yes, definitely adding a
> > warning
> > > or an error when executing pg_stat_statements() SRF would help, that's a
> > great
> > > idea!
> > >
> > > I'll wait a bit in case someone has any objection, and if not send an
> > updated
> > > patch!
> >
> > Isn't there a case where pg_stat_statements uses an alternative
> > query-id provider?
> >
> 
> this check just can check if there is "any" query-id provider. In this
> context is not important if it is buildin or external

Yes, the idea is that if you execute "SELECT * FROM pg_stat_statements" or
similar, then if the executing query itself doesn't have a queryid then
it's very likely that you didn't configure compute_query_id or an alternative
query_id implementation properly.  And loudly complaining seems like the right
thing to do.



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Wed, May 12, 2021 at 05:30:26PM +0800, Julien Rouhaud wrote:
> On Wed, May 12, 2021 at 10:57:25AM +0200, Pavel Stehule wrote:
> > 
> > My second proposal can work for your example too. pg_stat_statements have
> > to require any active queryid computing. And when it is not available, then
> > the exception should be raised.
> > 
> > The custom queryid can return null, and still the queryid will be computed.
> > Maybe the warning can be enough. Just, if somebody use pg_stat_statements
> > function, then enforce the check if queryid is computed (compute_query_id
> > is true || some hook is not null), and if not then raise a warning.
> 
> Ah I'm sorry I misunderstood your proposal.  Yes, definitely adding a warning
> or an error when executing pg_stat_statements() SRF would help, that's a great
> idea!
> 
> I'll wait a bit in case someone has any objection, and if not send an updated
> patch!

Hearing no complaint, PFA a v2 implementing such a warning.  Here's an
extract from the updated regression tests:

-- Check that pg_stat_statements() will complain if the configuration appears
-- to be broken.
SET compute_query_id = off;
SELECT pg_stat_statements_reset();
 pg_stat_statements_reset 
--------------------------
 
(1 row)

SELECT count(*) FROM pg_stat_statements;
WARNING:  Query identifier calculation seems to be disabled
HINT:  If you don't want to use a third-party module to compute query identifiers, you may want to enable
compute_query_id
 count 
-------
     0
(1 row)


I'm of course open to suggestions for some better wording.

Attachment

Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Wed, May 12, 2021 at 05:51:49PM +0800, Julien Rouhaud wrote:
> On Wed, May 12, 2021 at 11:42:12AM +0200, Pavel Stehule wrote:
> > this check just can check if there is "any" query-id provider. In this
> > context is not important if it is buildin or external
> 
> Yes, the idea is that if you execute "SELECT * FROM pg_stat_statements" or
> similar, then if the executing query itself doesn't have a queryid then
> it's very likely that you didn't configure compute_query_id or an alternative
> query_id implementation properly.  And loudly complaining seems like the right
> thing to do.

I understand the desire to make pg_stat_statements require minimal
configuration, but frankly, if the server-side variable query id API is
confusing, I think we have done more harm than good.

The problem with compute_query_id=auto is that there is no way to know
if the query id is actually enabled, unless you guess from the installed
extensions, or we add another variable to report that, and maybe another
variable to control the provier, unless we require turning
compute_query_id=off if you are using custom query id computation.  What
if it is auto, and pg_stat_statments is installed, and you want to use a
custom query id computation --- what happens?  As you can see, this is
all becoming very complicated.

I think we might be just as well to go with compute_query_id=on/off, and
just complain loudly from CREATE EXTENSION, or in the server logs on
server start via shared_preload_libraries, or when querying
pg_stat_statements system view.  We simply say to change
compute_query_id=on or to provide a custom query id implementation.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Wed, May 12, 2021 at 08:36:18PM -0400, Bruce Momjian wrote:
> On Wed, May 12, 2021 at 05:51:49PM +0800, Julien Rouhaud wrote:
> > On Wed, May 12, 2021 at 11:42:12AM +0200, Pavel Stehule wrote:
> > > this check just can check if there is "any" query-id provider. In this
> > > context is not important if it is buildin or external
> > 
> > Yes, the idea is that if you execute "SELECT * FROM pg_stat_statements" or
> > similar, then if the executing query itself doesn't have a queryid then
> > it's very likely that you didn't configure compute_query_id or an alternative
> > query_id implementation properly.  And loudly complaining seems like the right
> > thing to do.
> 
> I understand the desire to make pg_stat_statements require minimal
> configuration, but frankly, if the server-side variable query id API is
> confusing, I think we have done more harm than good.
> 
> The problem with compute_query_id=auto is that there is no way to know
> if the query id is actually enabled, unless you guess from the installed
> extensions, or we add another variable to report that, and maybe another
> variable to control the provier, unless we require turning
> compute_query_id=off if you are using custom query id computation.  What
> if it is auto, and pg_stat_statments is installed, and you want to use a
> custom query id computation --- what happens?  As you can see, this is
> all becoming very complicated.

Well, as implemented you can get the value of compute_query_id, and if it's
still "auto" then it's not enabled as calling queryIdWanted() would turn it to
on.  I agree that it's not ideal but you have a way to know.  We could document
that auto means that it's set to auto and no one asked to automatically enabled
it.

Or you can just do e.g.

SELECT query_id FROM pg_stat_activity WHERE pid = pg_backend_pid();

and see if you have a query_id or not.

If you want to use third-party modules, they you have to explicitly disable
compute_query_id.  If you don't, every query execution will raise an error as
we documented that third-party modules should error out if they see that a
query_id is already generated.  Such module could also explicitly check that
compute_query_id is off and also raise an error if that's not the case.

> I think we might be just as well to go with compute_query_id=on/off, and
> just complain loudly from CREATE EXTENSION, or in the server logs on
> server start via shared_preload_libraries, or when querying
> pg_stat_statements system view.  We simply say to change
> compute_query_id=on or to provide a custom query id implementation.

I'm not opposed to that, but it was already suggested and apparently people
didn't like that approach.



Re: compute_query_id and pg_stat_statements

From
Kyotaro Horiguchi
Date:
At Wed, 12 May 2021 18:09:30 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in 
> On Wed, May 12, 2021 at 06:37:24PM +0900, Kyotaro Horiguchi wrote:
> > At Wed, 12 May 2021 14:05:16 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in 
> > > 
> > > And if I'm not mistaken, pg_store_plans also wants a different queryid
> > > implementation, but has to handle a secondary queryid on top of it
> > > (https://github.com/ossc-db/pg_store_plans/blob/master/pg_store_plans.c#L843-L855).
> > 
> > Yeah, the extension intended to be used joining with the
> > pg_stat_statements view. And the reason for the second query-id dates
> > back to the era when query id was not available in the
> > pg_stat_statements view. Now it is mere a fall-back query id when
> > pg_stat_statments is not active.  Now that the in-core query-id is
> > available, I think there's no reason to keep that implement.
> > 
> > > So here again what the extension want is to get rid of pg_stat_statements (and
> > > now core) queryid implementation.
> > 
> > So the extension might be a good reason for the discussion^^;
> 
> Indeed.  So IIUC, what pg_store_plans wants is:

Ugg. Very sorry. My brain should need more oxygen, or caffeine..  The
last sentence forgetting a negation. The plugin does not need a
special query-id provider so the special provider can be removed
without problems if the core provides one.

> - to use its own query_id implementation
> - to be able to be joined to pg_stat_statements
> 
> Is that correct?

It is correct, but a bit short in detail.

The query_id of its own is provided because pg_stat_statements did not
expose query_id. And it has been preserved only for the case the
plugin is used without pg_stat_statements activated.  Now that the
in-core query_id is available, the last reason for the special
provider has gone.

> If yes, it seems that starting with pg14, it can be easily achieved by:

So, it would be a bit different.

> - documenting to disable compute_query_id

 documenting to *not disable* compute_query_id. That is set it to on
 or auto.

> - eventually error out at execution time if it's enabled

So. the extension would check if any query_id provider *is* active.

> - don't call queryIdWanted()
> - expose its query_id
> 
> It will then work just fine, and will be more efficient compared to what is
> done today as only one queryid will be calculated.

After reading Magnus' comment nearby, I realized that my most
significant concern here is how to know any query_id provider is
active.  The way of setting the hook cannot enforce notifying that
kind of things on plugins. For me implementing them as a dll looked as
one of the most promising way of enabling that without needing any
boiler-plates.

Another not-prefect (in that it needs a boiler-plate) but workable way
is letting query-id providers set some variable including GUC
explicitly as Magnus' suggested.  GUC would be better in that it is
naturally observable from users.

Even though there'a possibility that a developer of a query_id
provider forgets to set it, but maybe it would be easily
noticeable. On the other hand it gives a sure means to know any
query_id provider is active.

How about adding a GUC_INTERNAL "current_query_provider" or such?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: compute_query_id and pg_stat_statements

From
Kyotaro Horiguchi
Date:
At Wed, 12 May 2021 20:36:18 -0400, Bruce Momjian <bruce@momjian.us> wrote in 
> On Wed, May 12, 2021 at 05:51:49PM +0800, Julien Rouhaud wrote:
> > On Wed, May 12, 2021 at 11:42:12AM +0200, Pavel Stehule wrote:
> > > this check just can check if there is "any" query-id provider. In this
> > > context is not important if it is buildin or external
> > 
> > Yes, the idea is that if you execute "SELECT * FROM pg_stat_statements" or
> > similar, then if the executing query itself doesn't have a queryid then
> > it's very likely that you didn't configure compute_query_id or an alternative
> > query_id implementation properly.  And loudly complaining seems like the right
> > thing to do.
> 
> I understand the desire to make pg_stat_statements require minimal
> configuration, but frankly, if the server-side variable query id API is
> confusing, I think we have done more harm than good.
> 
> The problem with compute_query_id=auto is that there is no way to know
> if the query id is actually enabled, unless you guess from the installed
> extensions, or we add another variable to report that, and maybe another
> variable to control the provier, unless we require turning
> compute_query_id=off if you are using custom query id computation.  What
> if it is auto, and pg_stat_statments is installed, and you want to use a
> custom query id computation --- what happens?  As you can see, this is
> all becoming very complicated.
> 
> I think we might be just as well to go with compute_query_id=on/off, and
> just complain loudly from CREATE EXTENSION, or in the server logs on
> server start via shared_preload_libraries, or when querying
> pg_stat_statements system view.  We simply say to change
> compute_query_id=on or to provide a custom query id implementation.

FWIW, I personally am fine with that (ignoring details :p), that is,
leaving the whole responsibility of a sane setup to users.  If we are
going to automate even a part of it, I think we need to make it
perfect at least to a certain level.  The current auery_id = auto
looks like somewhat halfway, or narrow-ranged.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Thu, May 13, 2021 at 08:52:36AM +0800, Julien Rouhaud wrote:
> On Wed, May 12, 2021 at 08:36:18PM -0400, Bruce Momjian wrote:
> > The problem with compute_query_id=auto is that there is no way to know
> > if the query id is actually enabled, unless you guess from the installed
> > extensions, or we add another variable to report that, and maybe another
> > variable to control the provier, unless we require turning
> > compute_query_id=off if you are using custom query id computation.  What
> > if it is auto, and pg_stat_statments is installed, and you want to use a
> > custom query id computation --- what happens?  As you can see, this is
> > all becoming very complicated.
> 
> Well, as implemented you can get the value of compute_query_id, and if it's
> still "auto" then it's not enabled as calling queryIdWanted() would turn it to
> on.  I agree that it's not ideal but you have a way to know.  We could document
> that auto means that it's set to auto and no one asked to automatically enabled
> it.

Wow, so the extension changes it?  How do we record the "source" of that
change?  Do we have other GUCs that do that?

> Or you can just do e.g.
> 
> SELECT query_id FROM pg_stat_activity WHERE pid = pg_backend_pid();
> 
> and see if you have a query_id or not.

True.

> If you want to use third-party modules, they you have to explicitly disable
> compute_query_id.  If you don't, every query execution will raise an error as
> we documented that third-party modules should error out if they see that a
> query_id is already generated.  Such module could also explicitly check that
> compute_query_id is off and also raise an error if that's not the case.

OK.

> > I think we might be just as well to go with compute_query_id=on/off, and
> > just complain loudly from CREATE EXTENSION, or in the server logs on
> > server start via shared_preload_libraries, or when querying
> > pg_stat_statements system view.  We simply say to change
> > compute_query_id=on or to provide a custom query id implementation.
> 
> I'm not opposed to that, but it was already suggested and apparently people
> didn't like that approach.

Also probably true.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Kyotaro Horiguchi
Date:
At Thu, 13 May 2021 09:59:43 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> How about adding a GUC_INTERNAL "current_query_provider" or such?

On the second thought, I wonder why we don't just call JumbleQuery in
pgss_post_parse_analyze when compute_query_id is "off".

We can think this behavior as the following.

- compute_query_id sets whether the *internal* query-id provider turn
  on. If it is "off", query_id in , for example, pg_stat_activity is
  not set.  Even in that case it is set to some valid value if some
  alternative query-id provider is active.

On the other hand pg_stat_statements looks as if providing
"alternative" query-id provider, but actually it is just calling
in-core JumbleQuery if not called yet.


@@ -830,6 +830,10 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query, JumbleState *jstate)
         return;
     }
 
+    /* Call in-core JumbleQuery if it was not called in-core */
+    if (!jstate)
+        jstate = JumbleQuery(query, pstate->p_sourcetext);
+
     /*

Any plugins that want to use its own query-id generator can WARN if
jstate is not NULL, but also can proceed ignoring the exisint jstate.

WARNING:  the default query-id provier is active, turn off compute_query_id to avoid unnecessary calculation

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Wed, May 12, 2021 at 09:13:25PM -0400, Bruce Momjian wrote:
> On Thu, May 13, 2021 at 08:52:36AM +0800, Julien Rouhaud wrote:
> > 
> > Well, as implemented you can get the value of compute_query_id, and if it's
> > still "auto" then it's not enabled as calling queryIdWanted() would turn it to
> > on.  I agree that it's not ideal but you have a way to know.  We could document
> > that auto means that it's set to auto and no one asked to automatically enabled
> > it.
> 
> Wow, so the extension changes it?

Yes.  It seemed better to go this way rather than having a secondary read-only
GUC for that.

> How do we record the "source" of that
> change?  Do we have other GUCs that do that?

No, we don't.  But I don't know what exactly you would like to have as a
source?  What if you have for instance pg_stat_statements, pg_stat_kcache,
pg_store_plans and pg_wait_sampling installed?  All those extensions need a
query_id (or at least benefit from it for pg_wait_sampling), is there any value
to give a full list of all the modules that would enable compute_query_id?

I'm assuming that anyone wanting to install any of those extensions (or any
similar one) is fully aware that they aggregate metrics based on at least a
query_id.  If they don't, well they probably never read any documentation since
postgres 9.2 which introduced query normalization, and I doubt that they will
be interested in having access to the information anyway.



Re: compute_query_id and pg_stat_statements

From
Kyotaro Horiguchi
Date:
At Thu, 13 May 2021 10:02:45 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in 
> On Thu, May 13, 2021 at 10:51:52AM +0900, Kyotaro Horiguchi wrote:
> > At Thu, 13 May 2021 09:59:43 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> > > How about adding a GUC_INTERNAL "current_query_provider" or such?
> > 
> > On the second thought, I wonder why we don't just call JumbleQuery in
> > pgss_post_parse_analyze when compute_query_id is "off".
> 
> Because not generating a query_id for a custom query_id implementation is a
> valid use case for queries that are known to lead to huge pg_stat_statements
> overhead, as I mentioned in [1].  For the record I implemented that in
> pg_queryid (optionally don't generate query_id for queries referencing a temp
> relation) yesterday evening as a POC for that approach.

Yes, I know. So I said that "if not yet called".  I believe any "real"
alternative query-id provider is supposed to be hooked "before"
pg_stat_statements. (It is a kind of magic to control the order of
plugins, though..)  When the alternative provider generated a query_id
(that is, it has set jstate), pg_stat_statment doesn't call the
in-core JumbleQuery and uses the givin query_id.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: compute_query_id and pg_stat_statements

From
Kyotaro Horiguchi
Date:
At Thu, 13 May 2021 11:26:29 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Thu, 13 May 2021 10:02:45 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in 
> > On Thu, May 13, 2021 at 10:51:52AM +0900, Kyotaro Horiguchi wrote:
> > > At Thu, 13 May 2021 09:59:43 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> > > > How about adding a GUC_INTERNAL "current_query_provider" or such?
> > > 
> > > On the second thought, I wonder why we don't just call JumbleQuery in
> > > pgss_post_parse_analyze when compute_query_id is "off".
> > 
> > Because not generating a query_id for a custom query_id implementation is a
> > valid use case for queries that are known to lead to huge pg_stat_statements
> > overhead, as I mentioned in [1].  For the record I implemented that in
> > pg_queryid (optionally don't generate query_id for queries referencing a temp
> > relation) yesterday evening as a POC for that approach.
> 
> Yes, I know. So I said that "if not yet called".  I believe any "real"
> alternative query-id provider is supposed to be hooked "before"
> pg_stat_statements. (It is a kind of magic to control the order of
> plugins, though..)  When the alternative provider generated a query_id
> (that is, it has set jstate), pg_stat_statment doesn't call the
> in-core JumbleQuery and uses the givin query_id.

Forgot to mention, I think that the state "query_id provider is active
but it has not assigned one to this query" can be signaled by
jstate=<non-null> and query_id = 0.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: compute_query_id and pg_stat_statements

From
Kyotaro Horiguchi
Date:
At Thu, 13 May 2021 10:39:20 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in 
> On Thu, May 13, 2021 at 11:30:56AM +0900, Kyotaro Horiguchi wrote:
> > At Thu, 13 May 2021 11:26:29 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> > > At Thu, 13 May 2021 10:02:45 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in 
> > > Yes, I know. So I said that "if not yet called".  I believe any "real"
> > > alternative query-id provider is supposed to be hooked "before"
> > > pg_stat_statements. (It is a kind of magic to control the order of
> > > plugins, though..)  When the alternative provider generated a query_id
> > > (that is, it has set jstate), pg_stat_statment doesn't call the
> > > in-core JumbleQuery and uses the givin query_id.
> > 
> > Forgot to mention, I think that the state "query_id provider is active
> > but it has not assigned one to this query" can be signaled by
> > jstate=<non-null> and query_id = 0.
> 
> I assume that you mean "third-party query_id provider" here, as the core one
> will always return a non-zero query_id?

Right.

> I guess it could work, but a lot of people are complaining that having
> compute_query_id = [ off | on | auto ] is too confusing, so I don't see how
> having "off" means "sometimes off, sometimes on" is going to be any clearer for
> users.

I don't get it. It read as "people are complaining the tristate is too
confusing, so I made it tristate"?

For the second point, so I said that the variable controls whether the
"internal" query-id pvovider turn on.  It is more clearer if the name
were something like "use_internal_query_id_generator".

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Thu, May 13, 2021 at 09:57:00AM +0800, Julien Rouhaud wrote:
> On Wed, May 12, 2021 at 09:13:25PM -0400, Bruce Momjian wrote:
> > On Thu, May 13, 2021 at 08:52:36AM +0800, Julien Rouhaud wrote:
> > > 
> > > Well, as implemented you can get the value of compute_query_id, and if it's
> > > still "auto" then it's not enabled as calling queryIdWanted() would turn it to
> > > on.  I agree that it's not ideal but you have a way to know.  We could document
> > > that auto means that it's set to auto and no one asked to automatically enabled
> > > it.
> > 
> > Wow, so the extension changes it?
> 
> Yes.  It seemed better to go this way rather than having a secondary read-only
> GUC for that.
> 
> > How do we record the "source" of that
> > change?  Do we have other GUCs that do that?
> 
> No, we don't.  But I don't know what exactly you would like to have as a

OK.

> source?  What if you have for instance pg_stat_statements, pg_stat_kcache,
> pg_store_plans and pg_wait_sampling installed?  All those extensions need a
> query_id (or at least benefit from it for pg_wait_sampling), is there any value
> to give a full list of all the modules that would enable compute_query_id?

Well, we don't have any other cases where the source of the change is
this indeterminate, so I don't really have a suggestion.  I think this
does highlight another case where 'auto' just isn't a good fit for our
API.

> I'm assuming that anyone wanting to install any of those extensions (or any
> similar one) is fully aware that they aggregate metrics based on at least a
> query_id.  If they don't, well they probably never read any documentation since
> postgres 9.2 which introduced query normalization, and I doubt that they will
> be interested in having access to the information anyway.

My point is that we are changing a setting from an extension, and if you
look in pg_setting, it will say "default"?

If the user already has to edit postgresql.conf to set
shared_preload_libraries, I still don't see why having them set
compute_query_id at the same time is a significant problem and a reason
to distort our API to do 'auto'.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Kyotaro Horiguchi
Date:
At Thu, 13 May 2021 10:43:03 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in 
> On Thu, May 13, 2021 at 11:26:29AM +0900, Kyotaro Horiguchi wrote:
> > 
> > I believe any "real"
> > alternative query-id provider is supposed to be hooked "before"
> > pg_stat_statements. (It is a kind of magic to control the order of
> > plugins, though..
> 
> Indeed, you have to configure shared_preload_libraries depending on whether
> each module calls the previous post_parse_analyze_hook before or after its own
> processing, and that's the main reason why I think a dedicated entry point
> would be better.

I see it as cleaner than the status-quo. (But still believing less
cleaner than DLL:p, since the same problem happens if two
query_id-generating modules are competing on the new hook ponit.).

You told that a special query-id provider needed to be separated to
another DLL, but a preload shared librarie is also a dll and I think
any exported function in it can be obtained via
load_external_function().

As the result, even if we take the DLL approach, still not need to
split out the query-id provider part. By the following config:

> query_id_provider = 'pg_stat_statements'

the core can obtain the entrypoint of, say, "_PG_calculate_query_id"
to call it. And it can be of another module.

It seems like the only problem doing that is we don't have a means to
call per-process intializer for a preload libralies.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Wed, May 12, 2021 at 11:06:52PM -0400, Bruce Momjian wrote:
> On Thu, May 13, 2021 at 09:57:00AM +0800, Julien Rouhaud wrote:
> 
> > source?  What if you have for instance pg_stat_statements, pg_stat_kcache,
> > pg_store_plans and pg_wait_sampling installed?  All those extensions need a
> > query_id (or at least benefit from it for pg_wait_sampling), is there any value
> > to give a full list of all the modules that would enable compute_query_id?
> 
> Well, we don't have any other cases where the source of the change is
> this indeterminate, so I don't really have a suggestion.  I think this
> does highlight another case where 'auto' just isn't a good fit for our
> API.

It may depends on your next question

> > I'm assuming that anyone wanting to install any of those extensions (or any
> > similar one) is fully aware that they aggregate metrics based on at least a
> > query_id.  If they don't, well they probably never read any documentation since
> > postgres 9.2 which introduced query normalization, and I doubt that they will
> > be interested in having access to the information anyway.
> 
> My point is that we are changing a setting from an extension, and if you
> look in pg_setting, it will say "default"?

No, it will say "on".  What the patch I sent implements is:

- compute_query_id = on means it was either explicitly set to on, or
  automatically set to on if it was allowed to (so initially set to auto).  It
  means you know whether core query_id calculation is enabled or not, you can
  know looking at the reset value it it was changed by an extension, you just
  don't know which one(s).

- compute_query_id = auto means that it can be set to on, it just wasn't yet,
  so it's off, and may change if you have an extension that can be dynamically
  loaded and request for core query_id calculation to be enabled

- compute_query_id = off means it's off

> If the user already has to edit postgresql.conf to set
> shared_preload_libraries, I still don't see why having them set
> compute_query_id at the same time is a significant problem and a reason
> to distort our API to do 'auto'.

Looking at the arguments until now my understanding is that it's because "no
one will read the doc anyway".



Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Thu, May 13, 2021 at 11:16:13AM +0800, Julien Rouhaud wrote:
> On Wed, May 12, 2021 at 11:06:52PM -0400, Bruce Momjian wrote:
> > On Thu, May 13, 2021 at 09:57:00AM +0800, Julien Rouhaud wrote:
> > 
> > > source?  What if you have for instance pg_stat_statements, pg_stat_kcache,
> > > pg_store_plans and pg_wait_sampling installed?  All those extensions need a
> > > query_id (or at least benefit from it for pg_wait_sampling), is there any value
> > > to give a full list of all the modules that would enable compute_query_id?
> > 
> > Well, we don't have any other cases where the source of the change is
> > this indeterminate, so I don't really have a suggestion.  I think this
> > does highlight another case where 'auto' just isn't a good fit for our
> > API.
> 
> It may depends on your next question
> 
> > > I'm assuming that anyone wanting to install any of those extensions (or any
> > > similar one) is fully aware that they aggregate metrics based on at least a
> > > query_id.  If they don't, well they probably never read any documentation since
> > > postgres 9.2 which introduced query normalization, and I doubt that they will
> > > be interested in having access to the information anyway.
> > 
> > My point is that we are changing a setting from an extension, and if you
> > look in pg_setting, it will say "default"?
> 
> No, it will say "on".  What the patch I sent implements is:

I was asking what pg_settings.source will say:

    SELECT name, soource FROM pg_settings;

> - compute_query_id = on means it was either explicitly set to on, or
>   automatically set to on if it was allowed to (so initially set to auto).  It
>   means you know whether core query_id calculation is enabled or not, you can
>   know looking at the reset value it it was changed by an extension, you just
>   don't know which one(s).
> 
> - compute_query_id = auto means that it can be set to on, it just wasn't yet,
>   so it's off, and may change if you have an extension that can be dynamically
>   loaded and request for core query_id calculation to be enabled

So, it is 'off', but not set to 'off' in the GUC sense --- just off as
in not being computed.  You can see the confusion in my just reading
that sentence.

> - compute_query_id = off means it's off
> 
> > If the user already has to edit postgresql.conf to set
> > shared_preload_libraries, I still don't see why having them set
> > compute_query_id at the same time is a significant problem and a reason
> > to distort our API to do 'auto'.
> 
> Looking at the arguments until now my understanding is that it's because "no
> one will read the doc anyway".

How do they know to set shared_preload_libraries then?  We change the
user API all the time, especially in GUCs, and even rename them, but for
some reason we don't think people using pg_stat_statements can be
trusted to read the release notes and change their behavior.  I just
don't get it.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote:
> On Thu, May 13, 2021 at 11:16:13AM +0800, Julien Rouhaud wrote:
> > On Wed, May 12, 2021 at 11:06:52PM -0400, Bruce Momjian wrote:
> > > On Thu, May 13, 2021 at 09:57:00AM +0800, Julien Rouhaud wrote:
> > > 
> > > > source?  What if you have for instance pg_stat_statements, pg_stat_kcache,
> > > > pg_store_plans and pg_wait_sampling installed?  All those extensions need a
> > > > query_id (or at least benefit from it for pg_wait_sampling), is there any value
> > > > to give a full list of all the modules that would enable compute_query_id?
> > > 
> > > Well, we don't have any other cases where the source of the change is
> > > this indeterminate, so I don't really have a suggestion.  I think this
> > > does highlight another case where 'auto' just isn't a good fit for our
> > > API.
> > 
> > It may depends on your next question
> > 
> > > > I'm assuming that anyone wanting to install any of those extensions (or any
> > > > similar one) is fully aware that they aggregate metrics based on at least a
> > > > query_id.  If they don't, well they probably never read any documentation since
> > > > postgres 9.2 which introduced query normalization, and I doubt that they will
> > > > be interested in having access to the information anyway.
> > > 
> > > My point is that we are changing a setting from an extension, and if you
> > > look in pg_setting, it will say "default"?
> > 
> > No, it will say "on".  What the patch I sent implements is:
> 
> I was asking what pg_settings.source will say:
> 
>     SELECT name, soource FROM pg_settings;

Oh sorry.  Here's the full output before and after a dynamic call to
queryIsWanted():

name            | compute_query_id
setting         | auto
unit            | <NULL>
category        | Statistics / Monitoring
short_desc      | Compute query identifiers.
extra_desc      | <NULL>
context         | superuser
vartype         | enum
source          | default
min_val         | <NULL>
max_val         | <NULL>
enumvals        | {auto,on,off}
boot_val        | auto
reset_val       | auto
sourcefile      | <NULL>
sourceline      | <NULL>
pending_restart | f

=# LOAD 'pg_qualstats'; /* will call queryIsWanted() */
WARNING:  01000: Without shared_preload_libraries, only current backend stats will be available.
LOAD

name            | compute_query_id
setting         | on
unit            | <NULL>
category        | Statistics / Monitoring
short_desc      | Compute query identifiers.
extra_desc      | <NULL>
context         | superuser
vartype         | enum
source          | default
min_val         | <NULL>
max_val         | <NULL>
enumvals        | {auto,on,off}
boot_val        | auto
reset_val       | auto
sourcefile      | <NULL>
sourceline      | <NULL>
pending_restart | f

So yes, it says "default" (and it's the same if the change is done during
postmaster init).  It can be fixed if that's the only issue.

> 
> > - compute_query_id = on means it was either explicitly set to on, or
> >   automatically set to on if it was allowed to (so initially set to auto).  It
> >   means you know whether core query_id calculation is enabled or not, you can
> >   know looking at the reset value it it was changed by an extension, you just
> >   don't know which one(s).
> > 
> > - compute_query_id = auto means that it can be set to on, it just wasn't yet,
> >   so it's off, and may change if you have an extension that can be dynamically
> >   loaded and request for core query_id calculation to be enabled
> 
> So, it is 'off', but not set to 'off' in the GUC sense --- just off as
> in not being computed.  You can see the confusion in my just reading
> that sentence.

It's technically not "off" but "not on yet", but that's probably just making it
worse :)

> How do they know to set shared_preload_libraries then?  We change the
> user API all the time, especially in GUCs, and even rename them, but for
> some reason we don't think people using pg_stat_statements can be
> trusted to read the release notes and change their behavior.  I just
> don't get it.

I don't know what to say.  So here is a summary of the complaints that I'm
aware of:

- https://www.postgresql.org/message-id/1953aec168224b95b0c962a622bef0794da6ff40.camel@moonset.ru
That was only a couple of days after the commit just before the feature freeze,
so it may be the less relevant complaint.

- https://www.postgresql.org/message-id/CAOxo6XJEYunL71g0yD-zRzNRRqBG0Ssw-ARygy5pGRdSjK5YLQ%40mail.gmail.com
Did a git bisect to find the commit that changed the behavior and somehow
didn't notice the new setting

- this thread, with Fuji-san saying:

> I'm afraid that users may easily forget to enable compute_query_id when using
> pg_stat_statements (because this setting was not necessary in v13 or before)

- this thread, with Peter E. saying:

> Now there is the additional burden of turning on this weird setting that
> no one understands.  That's a 50% increase in burden.  And almost no one will
> want to use a nondefault setting.  pg_stat_statements is pretty popular.  I
> think leaving in this requirement will lead to widespread confusion and
> complaints.

- this thread, with Pavel saying:

> Until now, the pg_stat_statements was zero-config. So the change is not user
> friendly.

So it's a mix of "it's changing something that didn't change in a long time"
and "it's adding extra footgun and/or burden as it's not doing by default what
the majority of users will want", with an overwhelming majority of people
supporting the "we don't want that extra burden".



Re: compute_query_id and pg_stat_statements

From
Fujii Masao
Date:

On 2021/05/13 13:03, Julien Rouhaud wrote:
> On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote:
>> On Thu, May 13, 2021 at 11:16:13AM +0800, Julien Rouhaud wrote:
>>> On Wed, May 12, 2021 at 11:06:52PM -0400, Bruce Momjian wrote:
>>>> On Thu, May 13, 2021 at 09:57:00AM +0800, Julien Rouhaud wrote:
>>>>
>>>>> source?  What if you have for instance pg_stat_statements, pg_stat_kcache,
>>>>> pg_store_plans and pg_wait_sampling installed?  All those extensions need a
>>>>> query_id (or at least benefit from it for pg_wait_sampling), is there any value
>>>>> to give a full list of all the modules that would enable compute_query_id?
>>>>
>>>> Well, we don't have any other cases where the source of the change is
>>>> this indeterminate, so I don't really have a suggestion.  I think this
>>>> does highlight another case where 'auto' just isn't a good fit for our
>>>> API.
>>>
>>> It may depends on your next question
>>>
>>>>> I'm assuming that anyone wanting to install any of those extensions (or any
>>>>> similar one) is fully aware that they aggregate metrics based on at least a
>>>>> query_id.  If they don't, well they probably never read any documentation since
>>>>> postgres 9.2 which introduced query normalization, and I doubt that they will
>>>>> be interested in having access to the information anyway.
>>>>
>>>> My point is that we are changing a setting from an extension, and if you
>>>> look in pg_setting, it will say "default"?
>>>
>>> No, it will say "on".  What the patch I sent implements is:
>>
>> I was asking what pg_settings.source will say:
>>
>>     SELECT name, soource FROM pg_settings;
> 
> Oh sorry.  Here's the full output before and after a dynamic call to
> queryIsWanted():
> 
> name            | compute_query_id
> setting         | auto
> unit            | <NULL>
> category        | Statistics / Monitoring
> short_desc      | Compute query identifiers.
> extra_desc      | <NULL>
> context         | superuser
> vartype         | enum
> source          | default
> min_val         | <NULL>
> max_val         | <NULL>
> enumvals        | {auto,on,off}
> boot_val        | auto
> reset_val       | auto
> sourcefile      | <NULL>
> sourceline      | <NULL>
> pending_restart | f
> 
> =# LOAD 'pg_qualstats'; /* will call queryIsWanted() */
> WARNING:  01000: Without shared_preload_libraries, only current backend stats will be available.
> LOAD
> 
> name            | compute_query_id
> setting         | on
> unit            | <NULL>
> category        | Statistics / Monitoring
> short_desc      | Compute query identifiers.
> extra_desc      | <NULL>
> context         | superuser
> vartype         | enum
> source          | default
> min_val         | <NULL>
> max_val         | <NULL>
> enumvals        | {auto,on,off}
> boot_val        | auto
> reset_val       | auto
> sourcefile      | <NULL>
> sourceline      | <NULL>
> pending_restart | f
> 
> So yes, it says "default" (and it's the same if the change is done during
> postmaster init).  It can be fixed if that's the only issue.

I like leaving compute_query_id=auto instead of overwriting it to "on",
even when queryIsWanted() is called, as I told upthread. Then we can decide
that query id computation is necessary when compute_query_id=auto and
the special flag is enabled by queryIsWanted(). Of course as you and Magnus
discussed upthread, the issue in EXEC_BACKEND case should be fixed,
maybe by using save_backend_variables() or something, though.

If we do this, compute_query_id=auto seems to be similar to huge_pages=try.
When huge_pages=try, whether huge pages is actually used is defined depending
outside PostgreSQL (i.e, OS setting in this case). Neither pg_settings.setting nor
souce are not changed in that case.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
Le jeu. 13 mai 2021 à 12:18, Fujii Masao <masao.fujii@oss.nttdata.com> a écrit :

I like leaving compute_query_id=auto instead of overwriting it to "on",
even when queryIsWanted() is called, as I told upthread. Then we can decide
that query id computation is necessary when compute_query_id=auto and
the special flag is enabled by queryIsWanted(). Of course as you and Magnus
discussed upthread, the issue in EXEC_BACKEND case should be fixed,
maybe by using save_backend_variables() or something, though.

If we do this, compute_query_id=auto seems to be similar to huge_pages=try.
When huge_pages=try, whether huge pages is actually used is defined depending
outside PostgreSQL (i.e, OS setting in this case). Neither pg_settings.setting nor
souce are not changed in that case.

I'm fine with that, but a lot of people complained that it wasn't good because you don't really know if it's actually on or not. I personally don't think that it's an issue, because what user want is to automagumically do what they want, not check how the magic happened, and if they want a third party implementation then the module can error out if the setting is on, so the burden will only be for those users, and handled by the third party module author. 

Re: compute_query_id and pg_stat_statements

From
Kyotaro Horiguchi
Date:
At Thu, 13 May 2021 12:11:12 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> As the result, even if we take the DLL approach, still not need to
> split out the query-id provider part. By the following config:
> 
> > query_id_provider = 'pg_stat_statements'
> 
> the core can obtain the entrypoint of, say, "_PG_calculate_query_id"
> to call it. And it can be of another module.
> 
> It seems like the only problem doing that is we don't have a means to
> call per-process intializer for a preload libralies.

So this is a crude PoC of that.

pg_stat_statemnts defines its own query-id provider function in
pg_stat_statements which calls in-core DefaultJumbeQuery (end emits a
log line).

If server started with query_id_provider='auto' and pg_stat_statements
is not loaded, pg_stat_activity.query_id is null.

If query_id_provider='auto' and pg_stat_statements is loaded,
pg_stat_activity.query_id is filled in with a query id.

If query_id_provider='default' or 'pg_stat_statements' and
pg_stat_statements is not loaded, pg_stat_activity.query_id is filled
in with a query id.

If query_id_provider='none' and pg_stat_statements is loaded, it
complains as "query id provider is not available" and refuss to start.

If showing the variable, it shows the real provider name instead of
the setting in postgresql.conf.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index a85f962801..207c4362af 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -295,6 +295,7 @@ static bool pgss_save;            /* whether to save stats across shutdown */
 
 void        _PG_init(void);
 void        _PG_fini(void);
+JumbleState *_PG_calculate_query_id(Query *query, const char *querytext);
 
 PG_FUNCTION_INFO_V1(pg_stat_statements_reset);
 PG_FUNCTION_INFO_V1(pg_stat_statements_reset_1_7);
@@ -478,6 +479,13 @@ _PG_fini(void)
     ProcessUtility_hook = prev_ProcessUtility;
 }
 
+/* Test queryid provider function */
+JumbleState *_PG_calculate_query_id(Query *query, const char *querytext)
+{
+    elog(LOG, "Called query id generatr of pg_stat_statements");
+    return DefaultJumbleQuery(query, querytext);
+}
+
 /*
  * shmem_startup hook: allocate or attach to shared memory,
  * then load any pre-existing statistics from file.
@@ -544,6 +552,11 @@ pgss_shmem_startup(void)
     if (!IsUnderPostmaster)
         on_shmem_exit(pgss_shmem_shutdown, (Datum) 0);
 
+    /* request my defalt provider, but allow exisint one */
+    if (!queryIdWanted("pg_stat_statements", true))
+        ereport(ERROR,
+                (errmsg ("query_id provider is not available")));
+
     /*
      * Done if some other process already completed our initialization.
      */
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 1202bf85a3..be00564221 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -245,8 +245,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt,
     es->summary = (summary_set) ? es->summary : es->analyze;
 
     query = castNode(Query, stmt->query);
-    if (compute_query_id)
-        jstate = JumbleQuery(query, pstate->p_sourcetext);
+    jstate = JumbleQuery(query, pstate->p_sourcetext);
 
     if (post_parse_analyze_hook)
         (*post_parse_analyze_hook) (pstate, query, jstate);
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 168198acd1..bdf3a5a6d1 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -124,8 +124,7 @@ parse_analyze(RawStmt *parseTree, const char *sourceText,
 
     query = transformTopLevelStmt(pstate, parseTree);
 
-    if (compute_query_id)
-        jstate = JumbleQuery(query, sourceText);
+    jstate = JumbleQuery(query, sourceText);
 
     if (post_parse_analyze_hook)
         (*post_parse_analyze_hook) (pstate, query, jstate);
@@ -163,8 +162,7 @@ parse_analyze_varparams(RawStmt *parseTree, const char *sourceText,
     /* make sure all is well with parameter types */
     check_variable_parameters(pstate, query);
 
-    if (compute_query_id)
-        jstate = JumbleQuery(query, sourceText);
+    jstate = JumbleQuery(query, sourceText);
 
     if (post_parse_analyze_hook)
         (*post_parse_analyze_hook) (pstate, query, jstate);
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 6200699ddd..1034dfea28 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -704,8 +704,7 @@ pg_analyze_and_rewrite_params(RawStmt *parsetree,
 
     query = transformTopLevelStmt(pstate, parsetree);
 
-    if (compute_query_id)
-        jstate = JumbleQuery(query, query_string);
+    jstate = JumbleQuery(query, query_string);
 
     if (post_parse_analyze_hook)
         (*post_parse_analyze_hook) (pstate, query, jstate);
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index eb7f7181e4..70d06b825e 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -101,6 +101,7 @@
 #include "utils/plancache.h"
 #include "utils/portal.h"
 #include "utils/ps_status.h"
+#include "utils/queryjumble.h"
 #include "utils/rls.h"
 #include "utils/snapmgr.h"
 #include "utils/tzparser.h"
@@ -534,7 +535,6 @@ extern const struct config_enum_entry dynamic_shared_memory_options[];
 /*
  * GUC option variables that are exported from this module
  */
-bool        compute_query_id = false;
 bool        log_duration = false;
 bool        Debug_print_plan = false;
 bool        Debug_print_parse = false;
@@ -1441,15 +1441,6 @@ static struct config_bool ConfigureNamesBool[] =
         true,
         NULL, NULL, NULL
     },
-    {
-        {"compute_query_id", PGC_SUSET, STATS_MONITORING,
-            gettext_noop("Compute query identifiers."),
-            NULL
-        },
-        &compute_query_id,
-        false,
-        NULL, NULL, NULL
-    },
     {
         {"log_parser_stats", PGC_SUSET, STATS_MONITORING,
             gettext_noop("Writes parser performance statistics to the server log."),
@@ -4579,6 +4570,16 @@ static struct config_string ConfigureNamesString[] =
         check_backtrace_functions, assign_backtrace_functions, NULL
     },
 
+
+    {
+        {"query_id_provider", PGC_SUSET, CLIENT_CONN_PRELOAD,
+            gettext_noop("Sets the query-id provider."),
+        },
+        &query_id_provider,
+        "auto",
+        check_query_id_provider, assign_query_id_provider, NULL
+    },
+
     /* End-of-list marker */
     {
         {NULL, 0, 0, NULL, NULL}, NULL, NULL, NULL, NULL, NULL
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index efde01ee56..fc31ce15c4 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -604,7 +604,7 @@
 
 # - Monitoring -
 
-#compute_query_id = off
+#query_id_provider = 'auto'
 #log_statement_stats = off
 #log_parser_stats = off
 #log_planner_stats = off
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index f004a9ce8c..709d654ea0 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -47,6 +47,96 @@ static void JumbleRangeTable(JumbleState *jstate, List *rtable);
 static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
 static void JumbleExpr(JumbleState *jstate, Node *node);
 static void RecordConstLocation(JumbleState *jstate, int location);
+static JumbleState *DummyJumbleQuery(Query *query, const char *querytext);
+
+char *query_id_provider = NULL;
+static bool lock_provider = false;
+
+typedef JumbleState *(*JumbleQueryType) (Query *query, const char *querytext);
+JumbleQueryType JumbleQuery = NULL;
+
+typedef struct QueryIdProviderExtra
+{
+    JumbleQueryType pfunc;
+} QueryIdProviderExtra;
+
+bool
+check_query_id_provider(char **newval, void **extra, GucSource source)
+{
+    QueryIdProviderExtra *param = NULL;
+
+    if (lock_provider)
+        return false;
+
+    param = (QueryIdProviderExtra *)malloc(sizeof(QueryIdProviderExtra));
+    if (param == NULL)
+        return false;
+
+    if (strcmp(*newval, "none") == 0 ||
+        strcmp(*newval, "auto") == 0)
+        param->pfunc = DummyJumbleQuery;
+    else if (strcmp(*newval, "default") == 0)
+        param->pfunc = DefaultJumbleQuery;
+    else
+        param->pfunc =
+            load_external_function(*newval, "_PG_calculate_query_id",
+                                   false, NULL);
+
+    if (param->pfunc == NULL)
+    {
+        free(param);
+        param = NULL;
+        GUC_check_errdetail("failed to load query id provider");
+        return false;
+    }
+
+    *extra = (void *) param;
+    return true;
+}
+
+void
+assign_query_id_provider(const char *newval, void *extra)
+{
+    QueryIdProviderExtra *param = (QueryIdProviderExtra *)extra;
+
+    JumbleQuery = param->pfunc;
+}
+
+bool
+queryIdWanted(char *provider_name, bool use_existing)
+{
+    JumbleQueryType func;
+
+    Assert(query_id_provider != NULL);
+    Assert(JumbleQuery != NULL);
+
+    if (lock_provider || strcmp(query_id_provider, "none") == 0)
+        return false;
+
+    /* use existing provider when use_existing */
+    if (strcmp(query_id_provider, "auto") != 0 && use_existing)
+        return true;
+
+    if (strcmp(provider_name, "default") == 0)
+        func = DefaultJumbleQuery;
+    else
+        func = load_external_function(provider_name, "_PG_calculate_query_id",
+                                      false, NULL);
+
+    if (func == NULL)
+        return false;
+
+    elog(LOG, "query-id provider \"%s\" loaded", provider_name);
+    JumbleQuery = func;
+
+    /* expose real provider name */
+    SetConfigOption("query_id_provider", provider_name,
+                    PGC_POSTMASTER, PGC_S_OVERRIDE);
+
+    lock_provider = true;
+
+    return true;
+}
 
 /*
  * Given a possibly multi-statement source string, confine our attention to the
@@ -92,7 +182,7 @@ CleanQuerytext(const char *query, int *location, int *len)
 }
 
 JumbleState *
-JumbleQuery(Query *query, const char *querytext)
+DefaultJumbleQuery(Query *query, const char *querytext)
 {
     JumbleState *jstate = NULL;
 
@@ -132,6 +222,12 @@ JumbleQuery(Query *query, const char *querytext)
     return jstate;
 }
 
+static JumbleState *
+DummyJumbleQuery(Query *query, const char *querytext)
+{
+    return NULL;
+}
+
 /*
  * Compute a query identifier for the given utility query string.
  */
diff --git a/src/include/utils/guc.h b/src/include/utils/guc.h
index 24a5d9d3fb..a7c3a4958e 100644
--- a/src/include/utils/guc.h
+++ b/src/include/utils/guc.h
@@ -247,7 +247,6 @@ extern bool log_btree_build_stats;
 extern PGDLLIMPORT bool check_function_bodies;
 extern bool session_auth_is_superuser;
 
-extern bool compute_query_id;
 extern bool log_duration;
 extern int    log_parameter_max_length;
 extern int    log_parameter_max_length_on_error;
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 83ba7339fa..682d687b79 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -15,6 +15,7 @@
 #define QUERYJUBLE_H
 
 #include "nodes/parsenodes.h"
+#include "utils/guc.h"
 
 #define JUMBLE_SIZE                1024    /* query serialization buffer size */
 
@@ -52,7 +53,12 @@ typedef struct JumbleState
     int            highest_extern_param_id;
 } JumbleState;
 
+extern char *query_id_provider;
+extern JumbleState *(*JumbleQuery)(Query *query, const char *querytext);
+
+JumbleState *DefaultJumbleQuery(Query *query, const char *querytext);
+bool queryIdWanted(char *provider_name, bool use_existing);
+bool check_query_id_provider(char **newval, void **extra, GucSource source);
+void assign_query_id_provider(const char *newval, void *extra);
 const char *CleanQuerytext(const char *query, int *location, int *len);
-JumbleState *JumbleQuery(Query *query, const char *querytext);
-
 #endif                            /* QUERYJUMBLE_H */
diff --git a/src/test/regress/expected/explain.out b/src/test/regress/expected/explain.out
index cda28098ba..16375d5596 100644
--- a/src/test/regress/expected/explain.out
+++ b/src/test/regress/expected/explain.out
@@ -477,7 +477,7 @@ select jsonb_pretty(
 (1 row)
 
 rollback;
-set compute_query_id = on;
+set compute_query_id = 'default';
 select explain_filter('explain (verbose) select * from int8_tbl i8');
                          explain_filter                         
 ----------------------------------------------------------------

Re: compute_query_id and pg_stat_statements

From
Kyotaro Horiguchi
Date:
At Thu, 13 May 2021 13:26:37 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Thu, 13 May 2021 12:11:12 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> > As the result, even if we take the DLL approach, still not need to
> > split out the query-id provider part. By the following config:
> > 
> > > query_id_provider = 'pg_stat_statements'
> > 
> > the core can obtain the entrypoint of, say, "_PG_calculate_query_id"
> > to call it. And it can be of another module.
> > 
> > It seems like the only problem doing that is we don't have a means to
> > call per-process intializer for a preload libralies.
> 
> So this is a crude PoC of that.
> 
> pg_stat_statemnts defines its own query-id provider function in
> pg_stat_statements which calls in-core DefaultJumbeQuery (end emits a
> log line).
> 
> If server started with query_id_provider='auto' and pg_stat_statements
> is not loaded, pg_stat_activity.query_id is null.
> 
> If query_id_provider='auto' and pg_stat_statements is loaded,
> pg_stat_activity.query_id is filled in with a query id.
> 
> If query_id_provider='default' or 'pg_stat_statements' and
> pg_stat_statements is not loaded, pg_stat_activity.query_id is filled
> in with a query id.
> 
> If query_id_provider='none' and pg_stat_statements is loaded, it
> complains as "query id provider is not available" and refuss to start.
> 
> If showing the variable, it shows the real provider name instead of
> the setting in postgresql.conf.

The change contains needless things that tries to handle per-backend
change case, so it would be simpler assuming we don't want on-the-fly
change of provider (and I believe so since that change surely causes
inconsistency)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: compute_query_id and pg_stat_statements

From
Maciek Sakrejda
Date:
On Wed, May 12, 2021 at 9:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote:
> > How do they know to set shared_preload_libraries then?  We change the
> > user API all the time, especially in GUCs, and even rename them, but for
> > some reason we don't think people using pg_stat_statements can be
> > trusted to read the release notes and change their behavior.  I just
> > don't get it.
>
> I don't know what to say.  So here is a summary of the complaints that I'm
> aware of:
>
> - https://www.postgresql.org/message-id/1953aec168224b95b0c962a622bef0794da6ff40.camel@moonset.ru
> That was only a couple of days after the commit just before the feature freeze,
> so it may be the less relevant complaint.
>
> - https://www.postgresql.org/message-id/CAOxo6XJEYunL71g0yD-zRzNRRqBG0Ssw-ARygy5pGRdSjK5YLQ%40mail.gmail.com
> Did a git bisect to find the commit that changed the behavior and somehow
> didn't notice the new setting
>
> - this thread, with Fuji-san saying:
>
> > I'm afraid that users may easily forget to enable compute_query_id when using
> > pg_stat_statements (because this setting was not necessary in v13 or before)
>
> - this thread, with Peter E. saying:
>
> > Now there is the additional burden of turning on this weird setting that
> > no one understands.  That's a 50% increase in burden.  And almost no one will
> > want to use a nondefault setting.  pg_stat_statements is pretty popular.  I
> > think leaving in this requirement will lead to widespread confusion and
> > complaints.
>
> - this thread, with Pavel saying:
>
> > Until now, the pg_stat_statements was zero-config. So the change is not user
> > friendly.
>
> So it's a mix of "it's changing something that didn't change in a long time"
> and "it's adding extra footgun and/or burden as it's not doing by default what
> the majority of users will want", with an overwhelming majority of people
> supporting the "we don't want that extra burden".

For what it's worth, I don't think the actual changing of an extra
setting is that big a burden: it's the figuring out that you need to
change it, and how you should configure it, that is the problem.
Especially since all major search engines still seem to return 9.4 (!)
documentation as the first hit for a "pg_stat_statements" search. The
common case (installing pg_stat_statements but not tweaking query id
generation) should be simple.



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
Le jeu. 13 mai 2021 à 12:52, Maciek Sakrejda <m.sakrejda@gmail.com> a écrit :

For what it's worth, I don't think the actual changing of an extra
setting is that big a burden: it's the figuring out that you need to
change it, and how you should configure it, that is the problem.
Especially since all major search engines still seem to return 9.4 (!)
documentation as the first hit for a "pg_stat_statements" search. The
common case (installing pg_stat_statements but not tweaking query id
generation) should be simple.

the v2 patch I sent should address both your requirements. 

Re: compute_query_id and pg_stat_statements

From
Maciek Sakrejda
Date:
On Wed, May 12, 2021 at 9:58 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> Le jeu. 13 mai 2021 à 12:52, Maciek Sakrejda <m.sakrejda@gmail.com> a écrit :
>>
>> For what it's worth, I don't think the actual changing of an extra
>> setting is that big a burden: it's the figuring out that you need to
>> change it, and how you should configure it, that is the problem.
>> Especially since all major search engines still seem to return 9.4 (!)
>> documentation as the first hit for a "pg_stat_statements" search. The
>> common case (installing pg_stat_statements but not tweaking query id
>> generation) should be simple.
>
>
> the v2 patch I sent should address both your requirements.

Yes, thanks--I just tried it and this is great. I just wanted to argue
against reversing course here.



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Wed, May 12, 2021 at 10:31:01PM -0700, Maciek Sakrejda wrote:
> On Wed, May 12, 2021 at 9:58 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > Le jeu. 13 mai 2021 à 12:52, Maciek Sakrejda <m.sakrejda@gmail.com> a écrit :
> >>
> >> For what it's worth, I don't think the actual changing of an extra
> >> setting is that big a burden: it's the figuring out that you need to
> >> change it, and how you should configure it, that is the problem.
> >> Especially since all major search engines still seem to return 9.4 (!)
> >> documentation as the first hit for a "pg_stat_statements" search. The
> >> common case (installing pg_stat_statements but not tweaking query id
> >> generation) should be simple.
> >
> >
> > the v2 patch I sent should address both your requirements.
> 
> Yes, thanks--I just tried it and this is great. I just wanted to argue
> against reversing course here.

Oh ok.  Good news then, thanks!



Re: compute_query_id and pg_stat_statements

From
Kyotaro Horiguchi
Date:
At Thu, 13 May 2021 12:33:47 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in
> Le jeu. 13 mai 2021 à 12:26, Kyotaro Horiguchi <horikyota.ntt@gmail.com> a
> écrit :
>
> > At Thu, 13 May 2021 12:11:12 +0900 (JST), Kyotaro Horiguchi <
> > horikyota.ntt@gmail.com> wrote in
> > pg_stat_statemnts defines its own query-id provider function in
> > pg_stat_statements which calls in-core DefaultJumbeQuery (end emits a
> > log line).
> >
> > If server started with query_id_provider='auto' and pg_stat_statements
> > is not loaded, pg_stat_activity.query_id is null.
> >
> > If query_id_provider='auto' and pg_stat_statements is loaded,
> > pg_stat_activity.query_id is filled in with a query id.
> >
> > If query_id_provider='default' or 'pg_stat_statements' and
> > pg_stat_statements is not loaded, pg_stat_activity.query_id is filled
> > in with a query id.
> >
> > If query_id_provider='none' and pg_stat_statements is loaded, it
> > complains as "query id provider is not available" and refuss to start.
> >
> > If showing the variable, it shows the real provider name instead of
> > the setting in postgresql.conf.
> >
>
> what if you want to have some other extensions like pg_stat_kcache or
> pg_store_plans that need a query_id but don't really care if
> pg_stat_statements is enabled or not? should they all declare their own

Thanks for looking it.

The addtional provider function in pg_stat_statements is just an
example to show what if it needs its own query-id provider, which is
useless in reality. In reality pg_stat_statements just calls
"queryIdWanted("default", true)" to use any query-id provider and use
the in-core one as the fallback implement, and no need to define its
own one.

Any extension can use the in-core provider and accepting any other
ones by calling queryIdWanted("default", true) then get what they want
regardless of existence of pg_stat_statements.

> wrapper too? should the system complain or silently ignore when they all
> try to install their provider function?

Of course if two extensions require diffrent query-id providers, they
just cannot coexist (that is, server refuses to start). It is quite
sane behavior in the standpoint of safety.  I think almost all
query-id users don't insist on a specific implmentation.  (So the
second parameter to queryIdWanted() could be omtted and assumed to be
true.)

reagrds.

--
Kyotaro Horiguchi
NTT Open Source Software Center



Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Thu, May 13, 2021 at 12:03:42PM +0800, Julien Rouhaud wrote:
> On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote:
> I don't know what to say.  So here is a summary of the complaints that I'm
> aware of:
> 
> - https://www.postgresql.org/message-id/1953aec168224b95b0c962a622bef0794da6ff40.camel@moonset.ru
> That was only a couple of days after the commit just before the feature freeze,
> so it may be the less relevant complaint.
> 
> - https://www.postgresql.org/message-id/CAOxo6XJEYunL71g0yD-zRzNRRqBG0Ssw-ARygy5pGRdSjK5YLQ%40mail.gmail.com
> Did a git bisect to find the commit that changed the behavior and somehow
> didn't notice the new setting
> 
> - this thread, with Fuji-san saying:
> 
> > I'm afraid that users may easily forget to enable compute_query_id when using
> > pg_stat_statements (because this setting was not necessary in v13 or before)
> 
> - this thread, with Peter E. saying:
> 
> > Now there is the additional burden of turning on this weird setting that
> > no one understands.  That's a 50% increase in burden.  And almost no one will
> > want to use a nondefault setting.  pg_stat_statements is pretty popular.  I
> > think leaving in this requirement will lead to widespread confusion and
> > complaints.
> 
> - this thread, with Pavel saying:
> 
> > Until now, the pg_stat_statements was zero-config. So the change is not user
> > friendly.
> 
> So it's a mix of "it's changing something that didn't change in a long time"
> and "it's adding extra footgun and/or burden as it's not doing by default what
> the majority of users will want", with an overwhelming majority of people
> supporting the "we don't want that extra burden".

Well, now that we have clear warnings when it is misconfigured,
especially when querying the pg_stat_statements view, are these
complaints still valid?   Also, how is modifying
shared_preload_libraries zero-config, but modifying
shared_preload_libraries and compute_query_id a huge burden?

I am personally not comfortable committing a patch to add an auto option
the way it is implemented, so another committer will need to take
ownership of this, or the entire feature can be removed.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Maciek Sakrejda
Date:
On Thu, May 13, 2021 at 7:42 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Thu, May 13, 2021 at 12:03:42PM +0800, Julien Rouhaud wrote:
> > On Wed, May 12, 2021 at 11:33:32PM -0400, Bruce Momjian wrote:
> > I don't know what to say.  So here is a summary of the complaints that I'm
> > aware of:
> >
> > - https://www.postgresql.org/message-id/1953aec168224b95b0c962a622bef0794da6ff40.camel@moonset.ru
> > That was only a couple of days after the commit just before the feature freeze,
> > so it may be the less relevant complaint.
> >
> > - https://www.postgresql.org/message-id/CAOxo6XJEYunL71g0yD-zRzNRRqBG0Ssw-ARygy5pGRdSjK5YLQ%40mail.gmail.com
> > Did a git bisect to find the commit that changed the behavior and somehow
> > didn't notice the new setting
> >
> > - this thread, with Fuji-san saying:
> >
> > > I'm afraid that users may easily forget to enable compute_query_id when using
> > > pg_stat_statements (because this setting was not necessary in v13 or before)
> >
> > - this thread, with Peter E. saying:
> >
> > > Now there is the additional burden of turning on this weird setting that
> > > no one understands.  That's a 50% increase in burden.  And almost no one will
> > > want to use a nondefault setting.  pg_stat_statements is pretty popular.  I
> > > think leaving in this requirement will lead to widespread confusion and
> > > complaints.
> >
> > - this thread, with Pavel saying:
> >
> > > Until now, the pg_stat_statements was zero-config. So the change is not user
> > > friendly.
> >
> > So it's a mix of "it's changing something that didn't change in a long time"
> > and "it's adding extra footgun and/or burden as it's not doing by default what
> > the majority of users will want", with an overwhelming majority of people
> > supporting the "we don't want that extra burden".
>
> Well, now that we have clear warnings when it is misconfigured,
> especially when querying the pg_stat_statements view, are these
> complaints still valid?   Also, how is modifying
> shared_preload_libraries zero-config, but modifying
> shared_preload_libraries and compute_query_id a huge burden?

The warning makes it clear that there's something wrong, but not how
to fix it (as I noted in another message in this thread, a web search
for pg_stat_statements docs still leads to an obsolete version). I
don't think anyone is arguing that this is insurmountable for all
users, but it is additional friction, and every bit of friction makes
Postgres harder to use. Users don't read documentation, or misread
documentation, or just are not sure what the documentation or the
warning is telling them, in spite of our best efforts.

And you're right, modifying shared_preload_libraries is not
zero-config--I would love it if we could drop that requirement ;).
Still, adding another setting is clearly one more thing to get wrong.

> I am personally not comfortable committing a patch to add an auto option
> the way it is implemented, so another committer will need to take
> ownership of this, or the entire feature can be removed.

Assuming we do want to avoid additional configuration requirements for
pg_stat_statements, is there another mechanism you feel would work
better? Or is that constraint incompatible with sane behavior for this
feature?



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Thu, May 13, 2021 at 10:41:43AM -0400, Bruce Momjian wrote:
> 
> Well, now that we have clear warnings when it is misconfigured,
> especially when querying the pg_stat_statements view, are these
> complaints still valid?

I'm personally fine with it, and I can send a new version with just the
warning when calling pg_stat_statements() or one of the view(s).  Or was there
other warnings that you were referring too?

> I am personally not comfortable committing a patch to add an auto option
> the way it is implemented, so another committer will need to take
> ownership of this, or the entire feature can be removed.

That's fair.  Just to be clear, I'm assuming that you also don't like
Horigushi-san approach either?



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Thu, May 13, 2021 at 08:32:50AM -0700, Maciek Sakrejda wrote:
> 
> The warning makes it clear that there's something wrong, but not how
> to fix it

I'm confused, are we talking about the new warning in v2 as suggested by Pavel?
As it should make things quite clear:

+SELECT count(*) FROM pg_stat_statements;
+WARNING:  Query identifier calculation seems to be disabled
+HINT:  If you don't want to use a third-party module to compute query identifiers, you may want to enable
compute_query_id

The wording can of course be improved.



Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Thu, May 13, 2021 at 08:32:50AM -0700, Maciek Sakrejda wrote:
> > Well, now that we have clear warnings when it is misconfigured,
> > especially when querying the pg_stat_statements view, are these
> > complaints still valid?   Also, how is modifying
> > shared_preload_libraries zero-config, but modifying
> > shared_preload_libraries and compute_query_id a huge burden?
> 
> The warning makes it clear that there's something wrong, but not how
> to fix it (as I noted in another message in this thread, a web search
> for pg_stat_statements docs still leads to an obsolete version). I
> don't think anyone is arguing that this is insurmountable for all
> users, but it is additional friction, and every bit of friction makes
> Postgres harder to use. Users don't read documentation, or misread
> documentation, or just are not sure what the documentation or the
> warning is telling them, in spite of our best efforts.

Well, then let's have the error message tell them what is wrong and how
to fix it.  My issue is that 'auto' spreads confusion around the entire
API, as you can see from the discussion in this thread.

> And you're right, modifying shared_preload_libraries is not
> zero-config--I would love it if we could drop that requirement ;).
> Still, adding another setting is clearly one more thing to get wrong.
> 
> > I am personally not comfortable committing a patch to add an auto option
> > the way it is implemented, so another committer will need to take
> > ownership of this, or the entire feature can be removed.
> 
> Assuming we do want to avoid additional configuration requirements for
> pg_stat_statements, is there another mechanism you feel would work
> better? Or is that constraint incompatible with sane behavior for this
> feature?

I think we just need to leave it is on/off, and then help people find
the way to fix it if the misconfigure it, which I think is already been
shown to be possible.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Thu, May 13, 2021 at 11:35:13PM +0800, Julien Rouhaud wrote:
> On Thu, May 13, 2021 at 10:41:43AM -0400, Bruce Momjian wrote:
> > 
> > Well, now that we have clear warnings when it is misconfigured,
> > especially when querying the pg_stat_statements view, are these
> > complaints still valid?
> 
> I'm personally fine with it, and I can send a new version with just the
> warning when calling pg_stat_statements() or one of the view(s).  Or was there
> other warnings that you were referring too?

No, that was the big fix that made misconfiguration very clear to users
who didn't see the change before.

> > I am personally not comfortable committing a patch to add an auto option
> > the way it is implemented, so another committer will need to take
> > ownership of this, or the entire feature can be removed.
> 
> That's fair.  Just to be clear, I'm assuming that you also don't like
> Horigushi-san approach either?

Uh, anything with 'auto', I don't like.  I am afraid if I commit it, I
would feel responsible for the long tail of confusion this will cause
users, which is why I was saying I would rather remove it than be
responsible for causing such confusion.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Maciek Sakrejda
Date:
On Thu, May 13, 2021 at 8:38 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Thu, May 13, 2021 at 08:32:50AM -0700, Maciek Sakrejda wrote:
> >
> > The warning makes it clear that there's something wrong, but not how
> > to fix it
>
> I'm confused, are we talking about the new warning in v2 as suggested by Pavel?
> As it should make things quite clear:
>
> +SELECT count(*) FROM pg_stat_statements;
> +WARNING:  Query identifier calculation seems to be disabled
> +HINT:  If you don't want to use a third-party module to compute query identifiers, you may want to enable
compute_query_id
>
> The wording can of course be improved.

I meant that no warning can be as clear as things just working, but I
do have feedback on the specific message here:

 * "seems to" be disabled? Is it? Any reason not to be more definitive here?
 * On reading the beginning of the hint, I can see users asking
themselves, "Do I want to use a third-party module to compute query
identifiers?"
 * "may want to enable"? Are there any situations where I don't want
to use a third-party module *and* I don't want to enable
compute_query_id?

So maybe something like

> +SELECT count(*) FROM pg_stat_statements;
> +WARNING:  Query identifier calculation is disabled
> +HINT:  You must enable compute_query_id or configure a third-party module to compute query identifiers in order to
usepg_stat_statements.
 

(I admit "configure a third-party module" is pretty vague, but I think
that suggests it's only an option to consider if you know what you're
doing.)

Also, if you're configuring this for usage with a tool like pganalyze,
and neglect to run a manual query (we guide users to do that, but they
may skip that step), the warnings may not even be visible (the Go
driver we are using does not surface client warnings). Should this be
an error instead of a warning? Is it ever useful to get an empty
result set from querying pg_stat_statements? Using an error here would
parallel the behavior of shared_preload_libraries not including
pg_stat_statements.



Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Thu, May 13, 2021 at 09:30:55AM -0700, Maciek Sakrejda wrote:
> On Thu, May 13, 2021 at 8:38 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Thu, May 13, 2021 at 08:32:50AM -0700, Maciek Sakrejda wrote:
> > >
> > > The warning makes it clear that there's something wrong, but not how
> > > to fix it
> >
> > I'm confused, are we talking about the new warning in v2 as suggested by Pavel?
> > As it should make things quite clear:
> >
> > +SELECT count(*) FROM pg_stat_statements;
> > +WARNING:  Query identifier calculation seems to be disabled
> > +HINT:  If you don't want to use a third-party module to compute query identifiers, you may want to enable
compute_query_id
> >
> > The wording can of course be improved.
> 
> I meant that no warning can be as clear as things just working, but I
> do have feedback on the specific message here:
> 
>  * "seems to" be disabled? Is it? Any reason not to be more definitive here?
>  * On reading the beginning of the hint, I can see users asking
> themselves, "Do I want to use a third-party module to compute query
> identifiers?"
>  * "may want to enable"? Are there any situations where I don't want
> to use a third-party module *and* I don't want to enable
> compute_query_id?
> 
> So maybe something like
> 
> > +SELECT count(*) FROM pg_stat_statements;
> > +WARNING:  Query identifier calculation is disabled
> > +HINT:  You must enable compute_query_id or configure a third-party module to compute query identifiers in order to
usepg_stat_statements.
 

Yes, I like this.  The reason the old message was so vague is that
'auto', the default some people wanted, didn't issue that error, only
'off' did, so there was an assumption you wanted a custom module since
you changed the value to off.  If we are going with just on/off, no
auto, the message you suggest, leading with compute_query_id, is the
right approach.

> (I admit "configure a third-party module" is pretty vague, but I think
> that suggests it's only an option to consider if you know what you're
> doing.)

Seems fine to me.

> Also, if you're configuring this for usage with a tool like pganalyze,
> and neglect to run a manual query (we guide users to do that, but they
> may skip that step), the warnings may not even be visible (the Go
> driver we are using does not surface client warnings). Should this be
> an error instead of a warning? Is it ever useful to get an empty
> result set from querying pg_stat_statements? Using an error here would
> parallel the behavior of shared_preload_libraries not including
> pg_stat_statements.

Good question.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Andrew Dunstan
Date:
On 5/13/21 12:18 AM, Fujii Masao wrote:
>
>
>
> If we do this, compute_query_id=auto seems to be similar to
> huge_pages=try.
> When huge_pages=try, whether huge pages is actually used is defined
> depending
> outside PostgreSQL (i.e, OS setting in this case). Neither
> pg_settings.setting nor
> souce are not changed in that case.
>
>

Not a bad analogy, showing there's some precedent for this sort of thing.


The only thing that bugs me is that we're pretty damn late in the
process to be engaging in this amount of design.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Thu, May 13, 2021 at 12:45:07PM -0400, Andrew Dunstan wrote:
> 
> On 5/13/21 12:18 AM, Fujii Masao wrote:
> >
> >
> >
> > If we do this, compute_query_id=auto seems to be similar to
> > huge_pages=try.
> > When huge_pages=try, whether huge pages is actually used is defined
> > depending
> > outside PostgreSQL (i.e, OS setting in this case). Neither
> > pg_settings.setting nor
> > souce are not changed in that case.
> >
> >
> 
> Not a bad analogy, showing there's some precedent for this sort of thing.
> 
> 
> The only thing that bugs me is that we're pretty damn late in the
> process to be engaging in this amount of design.

The issue is that there is no external way to check what query id
computation is being used, unlike huge pages, which can be queried from
the operating system.  I also agree it is late, and discussion of auto
continues to show cases where this makes later improvements more
complex.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> The only thing that bugs me is that we're pretty damn late in the
> process to be engaging in this amount of design.

Indeed.  I feel that this feature was forced in before it was really
ready.

            regards, tom lane



Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Thu, May 13, 2021 at 01:17:16PM -0400, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
> > The only thing that bugs me is that we're pretty damn late in the
> > process to be engaging in this amount of design.
> 
> Indeed.  I feel that this feature was forced in before it was really
> ready.

The user API has always been a challenge for this feature but I thought
we had it ironed out.  What I didn't anticipate were the configuration
complaints, and if those are valid, the feature should be removed since
we can't improve it at this point, nor do I have any idea if that is
even possible without unacceptable negatives.  If the configuration
complaints are invalid, what we have now is very good, I think, though
adding more warnings about misconfiguration would be wise.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Stephen Frost
Date:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
> > The only thing that bugs me is that we're pretty damn late in the
> > process to be engaging in this amount of design.
>
> Indeed.  I feel that this feature was forced in before it was really
> ready.

I'm coming around to have a similar feeling.  While having an
alternative query ID might be useful, I have a hard time seeing it as
likely to be a hugely popular feature that is worth a lot of users
complaining (as we've seen already, multiple times, before even getting
to beta...) that things aren't working anymore.  That we can't figure
out which libraries to load automatically based on what extensions have
been installed and therefore make everyone have to change
shared_preload_libraries isn't a good thing and requiring additional
configuration for extremely common extensions like pg_stat_statements is
making it worse.

Thanks,

Stephen

Attachment

Re: compute_query_id and pg_stat_statements

From
Christoph Berg
Date:
Re: Bruce Momjian
> Well, now that we have clear warnings when it is misconfigured,
> especially when querying the pg_stat_statements view, are these
> complaints still valid?   Also, how is modifying
> shared_preload_libraries zero-config, but modifying
> shared_preload_libraries and compute_query_id a huge burden?

It's zero-config in the sense that if you want to have
pg_stat_statements, loading that module via shared_preload_libraries
is just natural.

Having to set compute_query_id isn't natural. It's a setting with a
completely different name, and the connection of pg_stat_statements to
compute_query_id isn't obvious.

The reasoning with "we have warnings and stuff" might be ok if
pg_stat_statements were a new thing, but it has worked via
shared_preload_libraries only for the last decade, and requiring
something extra will confuse probably every single user of
pg_stat_statements out there.

Perhaps worse, note that these warnings will likely first be seen by
the end users of databases, not by the admin performing the initial
setup or upgrade, who will not be able to fix the problem themselves.

Christoph



Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Thu, May 13, 2021 at 01:33:27PM -0400, Stephen Frost wrote:
> I'm coming around to have a similar feeling.  While having an
> alternative query ID might be useful, I have a hard time seeing it as
> likely to be a hugely popular feature that is worth a lot of users
> complaining (as we've seen already, multiple times, before even getting
> to beta...) that things aren't working anymore.  That we can't figure
> out which libraries to load automatically based on what extensions have
> been installed and therefore make everyone have to change
> shared_preload_libraries isn't a good thing and requiring additional
> configuration for extremely common extensions like pg_stat_statements is
> making it worse.

Would someone please explain what is wrong with what is in the tree
now, except that it needs additional warnings about misconfiguration? 
Requiring two postgresql.conf changes instead of one doesn't seem like a
valid complaint to me, especially if the warnings are in place and the
release notes mention it.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Stephen Frost
Date:
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Thu, May 13, 2021 at 01:33:27PM -0400, Stephen Frost wrote:
> > I'm coming around to have a similar feeling.  While having an
> > alternative query ID might be useful, I have a hard time seeing it as
> > likely to be a hugely popular feature that is worth a lot of users
> > complaining (as we've seen already, multiple times, before even getting
> > to beta...) that things aren't working anymore.  That we can't figure
> > out which libraries to load automatically based on what extensions have
> > been installed and therefore make everyone have to change
> > shared_preload_libraries isn't a good thing and requiring additional
> > configuration for extremely common extensions like pg_stat_statements is
> > making it worse.
>
> Would someone please explain what is wrong with what is in the tree
> now, except that it needs additional warnings about misconfiguration?
> Requiring two postgresql.conf changes instead of one doesn't seem like a
> valid complaint to me, especially if the warnings are in place and the
> release notes mention it.

Will you be updating pg_upgrade to detect and throw a warning during
check in the event that it discovers a broken config?

If not, then I don't think you're correct in arguing that this need for
additional configuration isn't a valid complaint.

Thanks,

Stephen

Attachment

Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Thu, May 13, 2021 at 01:51:07PM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > On Thu, May 13, 2021 at 01:33:27PM -0400, Stephen Frost wrote:
> > > I'm coming around to have a similar feeling.  While having an
> > > alternative query ID might be useful, I have a hard time seeing it as
> > > likely to be a hugely popular feature that is worth a lot of users
> > > complaining (as we've seen already, multiple times, before even getting
> > > to beta...) that things aren't working anymore.  That we can't figure
> > > out which libraries to load automatically based on what extensions have
> > > been installed and therefore make everyone have to change
> > > shared_preload_libraries isn't a good thing and requiring additional
> > > configuration for extremely common extensions like pg_stat_statements is
> > > making it worse.
> > 
> > Would someone please explain what is wrong with what is in the tree
> > now, except that it needs additional warnings about misconfiguration? 
> > Requiring two postgresql.conf changes instead of one doesn't seem like a
> > valid complaint to me, especially if the warnings are in place and the
> > release notes mention it.
> 
> Will you be updating pg_upgrade to detect and throw a warning during
> check in the event that it discovers a broken config?

Uh, how does this relate to pg_upgrade?  Are you saying someone
misconfigures the new system with pg_stat_statements but not query id? 
The server would still start and upgrade, no?  How is this different
from any other GUC we rename?  I am not following much of the logic in
this discussion, frankly.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Thu, May 13, 2021 at 07:39:45PM +0200, Christoph Berg wrote:
> Re: Bruce Momjian
> > Well, now that we have clear warnings when it is misconfigured,
> > especially when querying the pg_stat_statements view, are these
> > complaints still valid?   Also, how is modifying
> > shared_preload_libraries zero-config, but modifying
> > shared_preload_libraries and compute_query_id a huge burden?
> 
> It's zero-config in the sense that if you want to have
> pg_stat_statements, loading that module via shared_preload_libraries
> is just natural.
> 
> Having to set compute_query_id isn't natural. It's a setting with a
> completely different name, and the connection of pg_stat_statements to
> compute_query_id isn't obvious.
> 
> The reasoning with "we have warnings and stuff" might be ok if
> pg_stat_statements were a new thing, but it has worked via
> shared_preload_libraries only for the last decade, and requiring
> something extra will confuse probably every single user of
> pg_stat_statements out there.
> 
> Perhaps worse, note that these warnings will likely first be seen by
> the end users of databases, not by the admin performing the initial
> setup or upgrade, who will not be able to fix the problem themselves.

Well, but doing this extra configuration, the query id shows up in a lot
more places.  I can't imagine how this could be done cleanly without
requiring extra configuration, unless the query_id computation was
cheaper to compute and we could enable it by default.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Stephen Frost
Date:
Greetings,

* Bruce Momjian (bruce@momjian.us) wrote:
> On Thu, May 13, 2021 at 07:39:45PM +0200, Christoph Berg wrote:
> > Re: Bruce Momjian
> > > Well, now that we have clear warnings when it is misconfigured,
> > > especially when querying the pg_stat_statements view, are these
> > > complaints still valid?   Also, how is modifying
> > > shared_preload_libraries zero-config, but modifying
> > > shared_preload_libraries and compute_query_id a huge burden?
> >
> > It's zero-config in the sense that if you want to have
> > pg_stat_statements, loading that module via shared_preload_libraries
> > is just natural.

Not sure about natural but it's certainly what folks have at least
become used to.  We should be working to eliminate it though.

> > Having to set compute_query_id isn't natural. It's a setting with a
> > completely different name, and the connection of pg_stat_statements to
> > compute_query_id isn't obvious.
> >
> > The reasoning with "we have warnings and stuff" might be ok if
> > pg_stat_statements were a new thing, but it has worked via
> > shared_preload_libraries only for the last decade, and requiring
> > something extra will confuse probably every single user of
> > pg_stat_statements out there.

As we keep seeing, over and over.  The ongoing comments claiming that
it's "just" a minor additional configuration tweak fall pretty flat when
you consider the number of times it's already been brought up, and who
it has been brought up by.

> > Perhaps worse, note that these warnings will likely first be seen by
> > the end users of databases, not by the admin performing the initial
> > setup or upgrade, who will not be able to fix the problem themselves.

I don't think this is appreciated anywhere near well enough.  This takes
existing perfectly working configurations and actively breaks them in a
manner that isn't obvious and isn't something that an admin would have
any idea about until after they've upgraded and then started trying to
query the view.  That's pretty terrible.

> Well, but doing this extra configuration, the query id shows up in a lot
> more places.  I can't imagine how this could be done cleanly without
> requiring extra configuration, unless the query_id computation was
> cheaper to compute and we could enable it by default.

There's a ridiculously simple option here which is: drop the idea that
we support an extension redefining the query id and then just make it
on/off with the default to be 'on'.  If people actually have a problem
with it being on and they don't want to use pg_stat_statements then they
can turn it off.  This won't break any existing configs that are out
there in the field and avoids the complexity of having some kind of
'auto' config.  I do agree with the general idea of wanting to be
extensible but I'm not convinced that, in this particular case, it's
worth all of this.  I'm somewhat convinced that having a way to disable
the query id is useful in limited cases and if people want a way to do
that, then we can give that to them in a straightfoward way that doens't
break things.

Thanks,

Stephen

Attachment

Re: compute_query_id and pg_stat_statements

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> There's a ridiculously simple option here which is: drop the idea that
> we support an extension redefining the query id and then just make it
> on/off with the default to be 'on'.

I do not think that defaulting it to 'on' is acceptable unless you can
show that the added overhead is negligible.  IIUC the measurements that
have been done show the opposite.

Maybe we should revert this thing pending somebody doing the work to
make a version of queryid labeling that actually is negligibly cheap.
It certainly seems like that could be done; one more traversal of the
parse tree can't be that expensive in itself.  I suspect that the
performance problem is with the particular hashing mechanism that
was used, which looks mighty ad-hoc anyway.

            regards, tom lane



Re: compute_query_id and pg_stat_statements

From
Stephen Frost
Date:
Greetings,

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > There's a ridiculously simple option here which is: drop the idea that
> > we support an extension redefining the query id and then just make it
> > on/off with the default to be 'on'.
>
> I do not think that defaulting it to 'on' is acceptable unless you can
> show that the added overhead is negligible.  IIUC the measurements that
> have been done show the opposite.

Ah, right, it had only been done before when pg_stat_statements was
loaded..  In which case, it seems like we should:

a) go back to that

b) if someone wants an alternative query ID, tell them to add it to
   pg_stat_statements and make it configurable *there*

c) Have pg_stat_statements provide another function/view/etc that folks
   can use to get a queryid for an ongoing query ..?

d) Maybe come up with a way for extensions, generically, to make a value
   available to log_line_prefix?  That could be pretty interesting.

Or just accept that this is a bit hokey with the 'auto' approach.  I get
Bruce has concerns about it but I'm not convinced that it's actually all
that bad to go with that.

Thanks,

Stephen

Attachment

Re: compute_query_id and pg_stat_statements

From
Alvaro Herrera
Date:
On 2021-May-13, Stephen Frost wrote:

> Or just accept that this is a bit hokey with the 'auto' approach.  I get
> Bruce has concerns about it but I'm not convinced that it's actually all
> that bad to go with that.

Yeah, I think the alleged confusion there is overstated.

I'm happy to act as committer for that if he wants to step away from it.
I'm already used to being lapidated at every corner anyway.

-- 
Álvaro Herrera       Valdivia, Chile
"E pur si muove" (Galileo Galilei)



Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Thu, May 13, 2021 at 02:29:09PM -0400, Tom Lane wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > There's a ridiculously simple option here which is: drop the idea that
> > we support an extension redefining the query id and then just make it
> > on/off with the default to be 'on'.
> 
> I do not think that defaulting it to 'on' is acceptable unless you can
> show that the added overhead is negligible.  IIUC the measurements that
> have been done show the opposite.

Agreed.

> Maybe we should revert this thing pending somebody doing the work to
> make a version of queryid labeling that actually is negligibly cheap.
> It certainly seems like that could be done; one more traversal of the
> parse tree can't be that expensive in itself.  I suspect that the
> performance problem is with the particular hashing mechanism that
> was used, which looks mighty ad-hoc anyway.

I was surprised it was ~2%.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Thu, May 13, 2021 at 03:04:30PM -0400, Álvaro Herrera wrote:
> On 2021-May-13, Stephen Frost wrote:
> 
> > Or just accept that this is a bit hokey with the 'auto' approach.  I get
> > Bruce has concerns about it but I'm not convinced that it's actually all
> > that bad to go with that.
> 
> Yeah, I think the alleged confusion there is overstated.
> 
> I'm happy to act as committer for that if he wants to step away from it.
> I'm already used to being lapidated at every corner anyway.

OK, feel free to take ownership of it, thanks.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Andrew Dunstan
Date:
On 5/13/21 3:04 PM, Alvaro Herrera wrote:
> On 2021-May-13, Stephen Frost wrote:
>
>> Or just accept that this is a bit hokey with the 'auto' approach.  I get
>> Bruce has concerns about it but I'm not convinced that it's actually all
>> that bad to go with that.
> Yeah, I think the alleged confusion there is overstated.
>
> I'm happy to act as committer for that if he wants to step away from it.
> I'm already used to being lapidated at every corner anyway.
>


Many thanks Alvaro, among other things for teaching me a new word.


cheers


(delapidated) andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Thu, May 13, 2021 at 07:19:11PM -0400, Andrew Dunstan wrote:
> 
> On 5/13/21 3:04 PM, Alvaro Herrera wrote:
> > On 2021-May-13, Stephen Frost wrote:
> >
> >> Or just accept that this is a bit hokey with the 'auto' approach.  I get
> >> Bruce has concerns about it but I'm not convinced that it's actually all
> >> that bad to go with that.
> > Yeah, I think the alleged confusion there is overstated.
> >
> > I'm happy to act as committer for that if he wants to step away from it.
> > I'm already used to being lapidated at every corner anyway.
> >
> 
> 
> Many thanks Alvaro, among other things for teaching me a new word.
> 
> (delapidated) andrew

Yes, I had to look it up too.  :-)

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Michael Paquier
Date:
On Thu, May 13, 2021 at 07:19:11PM -0400, Andrew Dunstan wrote:
> On 5/13/21 3:04 PM, Alvaro Herrera wrote:
>> I'm happy to act as committer for that if he wants to step away from it.
>> I'm already used to being lapidated at every corner anyway.
>
> Many thanks Alvaro, among other things for teaching me a new word.

+1.  Thanks, Alvaro.
--
Michael

Attachment

Re: compute_query_id and pg_stat_statements

From
Alvaro Herrera
Date:
Here's a first attempt at what was suggested.  If you say "auto" it
remains auto in SHOW, but it gets enabled if a module asks for it.

Not final yet, but I thought I'd throw it out for early commentary ...

-- 
Álvaro Herrera       Valdivia, Chile

Attachment

Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Thu, May 13, 2021 at 08:04:37PM -0400, Álvaro Herrera wrote:
> Here's a first attempt at what was suggested.  If you say "auto" it
> remains auto in SHOW, but it gets enabled if a module asks for it.
> 
> Not final yet, but I thought I'd throw it out for early commentary ...

I certainly like this idea better than having the extension change the
output of the GUC.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Fri, May 14, 2021 at 3:12 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> > Maybe we should revert this thing pending somebody doing the work to
> > make a version of queryid labeling that actually is negligibly cheap.
> > It certainly seems like that could be done; one more traversal of the
> > parse tree can't be that expensive in itself.  I suspect that the
> > performance problem is with the particular hashing mechanism that
> > was used, which looks mighty ad-hoc anyway.
>
> I was surprised it was ~2%.

Just to be clear, the 2% was a worst case scenario, ie. a very fast
read-only query on small data returning a single row.  As soon as you
get something more realistic / expensive the overhead goes away.  For
reference here is the detail:
https://www.postgresql.org/message-id/CAOBaU_ZVmGPfKTwZ6cM_qdzaF2E1gMkrLDMwwLy4Z1JxQ6=CZg@mail.gmail.com



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Fri, May 14, 2021 at 8:13 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Thu, May 13, 2021 at 08:04:37PM -0400, Álvaro Herrera wrote:
> > Here's a first attempt at what was suggested.  If you say "auto" it
> > remains auto in SHOW, but it gets enabled if a module asks for it.
> >
> > Not final yet, but I thought I'd throw it out for early commentary ...
>
> I certainly like this idea better than having the extension change the
> output of the GUC.

Oh, I didn't understand that it was the major blocker.  I'm fine with it too.



Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Fri, May 14, 2021 at 09:40:15AM +0800, Julien Rouhaud wrote:
> On Fri, May 14, 2021 at 8:13 AM Bruce Momjian <bruce@momjian.us> wrote:
> >
> > On Thu, May 13, 2021 at 08:04:37PM -0400, Álvaro Herrera wrote:
> > > Here's a first attempt at what was suggested.  If you say "auto" it
> > > remains auto in SHOW, but it gets enabled if a module asks for it.
> > >
> > > Not final yet, but I thought I'd throw it out for early commentary ...
> >
> > I certainly like this idea better than having the extension change the
> > output of the GUC.
> 
> Oh, I didn't understand that it was the major blocker.  I'm fine with it too.

I think if we keep the output as 'auto', and document that you check
pg_stat_activity for a hash to see if it is enabled, that gets us pretty
far.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Tom Lane
Date:
Julien Rouhaud <rjuju123@gmail.com> writes:
> On Fri, May 14, 2021 at 3:12 AM Bruce Momjian <bruce@momjian.us> wrote:
>> I was surprised it was ~2%.

> Just to be clear, the 2% was a worst case scenario, ie. a very fast
> read-only query on small data returning a single row.  As soon as you
> get something more realistic / expensive the overhead goes away.

Of course, for plenty of people that IS the realistic scenario that
they care about max performance for.

            regards, tom lane



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Thu, May 13, 2021 at 09:47:02PM -0400, Tom Lane wrote:
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > On Fri, May 14, 2021 at 3:12 AM Bruce Momjian <bruce@momjian.us> wrote:
> >> I was surprised it was ~2%.
> 
> > Just to be clear, the 2% was a worst case scenario, ie. a very fast
> > read-only query on small data returning a single row.  As soon as you
> > get something more realistic / expensive the overhead goes away.
> 
> Of course, for plenty of people that IS the realistic scenario that
> they care about max performance for.

I'm not arguing that the scenario is unrealistic.  I'm arguing that retrieving
the first row of a join between pg_class and pg_attribute on an otherwise
vanilla database may not be the most representative workload, especially when
you take into account that it was done on hardware that still took 3 ms to do
that.

Unfortunately my laptop is pretty old and has already proven multiple time to
give unreliable benchmark results, so I'm not confident at all that those 2%
are even real outside of my machine.



Re: compute_query_id and pg_stat_statements

From
Fujii Masao
Date:

On 2021/05/14 9:04, Alvaro Herrera wrote:
> Here's a first attempt at what was suggested.  If you say "auto" it
> remains auto in SHOW, but it gets enabled if a module asks for it.
> 
> Not final yet, but I thought I'd throw it out for early commentary ...

Many thanks! The patch basically looks good to me.

+void
+EnableQueryId(void)
+{
+    if (compute_query_id == COMPUTE_QUERY_ID_AUTO)
+        auto_query_id_enabled = true;

Shouldn't EnableQueryId() enable auto_query_id_enabled whatever compute_query_id is?
Otherwise, for example, the following scenario can happen and it's a bit strange.

1. The server starts up with shared_preload_libraries=pg_stat_statements and compute_query_id=on
2. compute_query_id is set to auto and the configuration file is reloaded
Then, even though compute_query_id is auto and pg_stat_statements is loaded,
query ids are not computed and no queries are tracked by pg_stat_statements.


Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Fri, May 14, 2021 at 12:20:00PM +0900, Fujii Masao wrote:
> 
> 
> On 2021/05/14 9:04, Alvaro Herrera wrote:
> > Here's a first attempt at what was suggested.  If you say "auto" it
> > remains auto in SHOW, but it gets enabled if a module asks for it.
> > 
> > Not final yet, but I thought I'd throw it out for early commentary ...
> 
> Many thanks! The patch basically looks good to me.
> 
> +void
> +EnableQueryId(void)
> +{
> +    if (compute_query_id == COMPUTE_QUERY_ID_AUTO)
> +        auto_query_id_enabled = true;
> 
> Shouldn't EnableQueryId() enable auto_query_id_enabled whatever compute_query_id is?
> Otherwise, for example, the following scenario can happen and it's a bit strange.
> 
> 1. The server starts up with shared_preload_libraries=pg_stat_statements and compute_query_id=on
> 2. compute_query_id is set to auto and the configuration file is reloaded
> Then, even though compute_query_id is auto and pg_stat_statements is loaded,
> query ids are not computed and no queries are tracked by pg_stat_statements.

+1.  Note that if you switch from compute_query_id = off + custom
query_id + pg_stat_statements to compute_query_id = auto then thing will
immediately break (as we instruct third-party plugins authors to error out in
that case), which is way better than breaking at the next random service
restart.



Re: compute_query_id and pg_stat_statements

From
Tom Lane
Date:
I wrote:
> Maybe we should revert this thing pending somebody doing the work to
> make a version of queryid labeling that actually is negligibly cheap.
> It certainly seems like that could be done; one more traversal of the
> parse tree can't be that expensive in itself.  I suspect that the
> performance problem is with the particular hashing mechanism that
> was used, which looks mighty ad-hoc anyway.

To put a little bit of meat on that idea, I experimented with jacking
up the "jumble" calculation and driving some other implementations
underneath.

I thought that Julien's "worst case" scenario was pretty far from
worst case, since it involved a join which a lot of simple queries
don't.  I tested this scenario instead:

$ cat naive.sql
SELECT * FROM pg_class c ORDER BY oid DESC LIMIT 1;
$ pgbench -n -f naive.sql -T 60 postgres

which is still complicated enough that there's work for the
query fingerprinter to do, but not so much for planning and
execution.

I confirm that on HEAD, there's a noticeable TPS penalty from
turning on compute_query_id: about 3.2% on my machine.

The first patch attached replaces the "jumble" calculation
with two CRC32s (two so that we still get 64 bits out at
the end).  I see 2.7% penalty with this version.  Now,
I'm using an Intel machine with
#define USE_SSE42_CRC32C_WITH_RUNTIME_CHECK 1
so on machines without any hardware CRC support, this'd
likely be a loss.  But it still proves the point that the
existing implementation is just not very speedy.

I then tried a really dumb xor'ing implementation, and
that gives me a slowdown of 2.2%.  This could undoubtedly
be improved on further, say by unrolling the loop or
processing multiple bytes at once.  One problem with it
is that I suspect it will tend to concentrate the entropy
into the third/fourth and seventh/eighth bytes of the
accumulator, since so many of the fields being jumbled
are 4-byte or 8-byte fields with most of the entropy in
their low-order bits.  Probably that could be improved
with a bit more thought -- say, an extra bump of the
nextbyte pointer after each field.

Anyway, I think that what we have here is quite an inferior
implementation, and we can do better with some more thought.

            regards, tom lane

diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index f004a9ce8c..74ed555ed7 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -41,7 +41,7 @@

 static uint64 compute_utility_query_id(const char *str, int query_location, int query_len);
 static void AppendJumble(JumbleState *jstate,
-                         const unsigned char *item, Size size);
+                         const void *item, Size size);
 static void JumbleQueryInternal(JumbleState *jstate, Query *query);
 static void JumbleRangeTable(JumbleState *jstate, List *rtable);
 static void JumbleRowMarks(JumbleState *jstate, List *rowMarks);
@@ -106,9 +106,11 @@ JumbleQuery(Query *query, const char *querytext)
     {
         jstate = (JumbleState *) palloc(sizeof(JumbleState));

-        /* Set up workspace for query jumbling */
-        jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-        jstate->jumble_len = 0;
+        /* Initialize CRCs for query jumbling */
+        INIT_CRC32C(jstate->crc0);
+        INIT_CRC32C(jstate->crc1);
+        jstate->whichcrc = 0;
+        /* Initialize state for tracking where constants are */
         jstate->clocations_buf_size = 32;
         jstate->clocations = (LocationLen *)
             palloc(jstate->clocations_buf_size * sizeof(LocationLen));
@@ -117,9 +119,11 @@ JumbleQuery(Query *query, const char *querytext)

         /* Compute query ID and mark the Query node with it */
         JumbleQueryInternal(jstate, query);
-        query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
-                                                          jstate->jumble_len,
-                                                          0));
+
+        FIN_CRC32C(jstate->crc0);
+        FIN_CRC32C(jstate->crc1);
+        query->queryId = (((uint64) jstate->crc0) << 32) |
+            ((uint64) jstate->crc1);

         /*
          * If we are unlucky enough to get a hash of zero, use 1 instead, to
@@ -165,36 +169,18 @@ compute_utility_query_id(const char *query_text, int query_location, int query_l
  * the current jumble.
  */
 static void
-AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
+AppendJumble(JumbleState *jstate, const void *item, Size size)
 {
-    unsigned char *jumble = jstate->jumble;
-    Size        jumble_len = jstate->jumble_len;
-
-    /*
-     * Whenever the jumble buffer is full, we hash the current contents and
-     * reset the buffer to contain just that hash value, thus relying on the
-     * hash to summarize everything so far.
-     */
-    while (size > 0)
+    if (jstate->whichcrc)
     {
-        Size        part_size;
-
-        if (jumble_len >= JUMBLE_SIZE)
-        {
-            uint64        start_hash;
-
-            start_hash = DatumGetUInt64(hash_any_extended(jumble,
-                                                          JUMBLE_SIZE, 0));
-            memcpy(jumble, &start_hash, sizeof(start_hash));
-            jumble_len = sizeof(start_hash);
-        }
-        part_size = Min(size, JUMBLE_SIZE - jumble_len);
-        memcpy(jumble + jumble_len, item, part_size);
-        jumble_len += part_size;
-        item += part_size;
-        size -= part_size;
+        COMP_CRC32C(jstate->crc1, item, size);
+        jstate->whichcrc = 0;
+    }
+    else
+    {
+        COMP_CRC32C(jstate->crc0, item, size);
+        jstate->whichcrc = 1;
     }
-    jstate->jumble_len = jumble_len;
 }

 /*
@@ -202,9 +188,9 @@ AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
  * of individual local variable elements.
  */
 #define APP_JUMB(item) \
-    AppendJumble(jstate, (const unsigned char *) &(item), sizeof(item))
+    AppendJumble(jstate, (const void *) &(item), sizeof(item))
 #define APP_JUMB_STRING(str) \
-    AppendJumble(jstate, (const unsigned char *) (str), strlen(str) + 1)
+    AppendJumble(jstate, (const void *) (str), strlen(str) + 1)

 /*
  * JumbleQueryInternal: Selectively serialize the query tree, appending
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 83ba7339fa..3a09c555fa 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -15,8 +15,7 @@
 #define QUERYJUBLE_H

 #include "nodes/parsenodes.h"
-
-#define JUMBLE_SIZE                1024    /* query serialization buffer size */
+#include "port/pg_crc32c.h"

 /*
  * Struct for tracking locations/lengths of constants during normalization
@@ -33,11 +32,13 @@ typedef struct LocationLen
  */
 typedef struct JumbleState
 {
-    /* Jumble of current query tree */
-    unsigned char *jumble;
-
-    /* Number of bytes used in jumble[] */
-    Size        jumble_len;
+    /*
+     * Since we'd like a 64-bit-wide query ID, but we're using 32-bit CRC
+     * technology, we combine two 32-bit CRCs.
+     */
+    pg_crc32c    crc0;            /* some fields go into here ... */
+    pg_crc32c    crc1;            /* ... and others into here */
+    int            whichcrc;        /* 0 or 1, says which CRC accum to use next */

     /* Array of locations of constants that should be removed */
     LocationLen *clocations;
diff --git a/src/backend/utils/misc/queryjumble.c b/src/backend/utils/misc/queryjumble.c
index f004a9ce8c..225940d40c 100644
--- a/src/backend/utils/misc/queryjumble.c
+++ b/src/backend/utils/misc/queryjumble.c
@@ -106,9 +106,10 @@ JumbleQuery(Query *query, const char *querytext)
     {
         jstate = (JumbleState *) palloc(sizeof(JumbleState));

-        /* Set up workspace for query jumbling */
-        jstate->jumble = (unsigned char *) palloc(JUMBLE_SIZE);
-        jstate->jumble_len = 0;
+        /* Initialize state for query jumbling */
+        jstate->j.id = 0;
+        jstate->nextbyte = 0;
+        /* Initialize state for tracking where constants are */
         jstate->clocations_buf_size = 32;
         jstate->clocations = (LocationLen *)
             palloc(jstate->clocations_buf_size * sizeof(LocationLen));
@@ -117,9 +118,7 @@ JumbleQuery(Query *query, const char *querytext)

         /* Compute query ID and mark the Query node with it */
         JumbleQueryInternal(jstate, query);
-        query->queryId = DatumGetUInt64(hash_any_extended(jstate->jumble,
-                                                          jstate->jumble_len,
-                                                          0));
+        query->queryId = jstate->j.id;

         /*
          * If we are unlucky enough to get a hash of zero, use 1 instead, to
@@ -167,34 +166,17 @@ compute_utility_query_id(const char *query_text, int query_location, int query_l
 static void
 AppendJumble(JumbleState *jstate, const unsigned char *item, Size size)
 {
-    unsigned char *jumble = jstate->jumble;
-    Size        jumble_len = jstate->jumble_len;
+    int            nextbyte = jstate->nextbyte;

-    /*
-     * Whenever the jumble buffer is full, we hash the current contents and
-     * reset the buffer to contain just that hash value, thus relying on the
-     * hash to summarize everything so far.
-     */
     while (size > 0)
     {
-        Size        part_size;
-
-        if (jumble_len >= JUMBLE_SIZE)
-        {
-            uint64        start_hash;
-
-            start_hash = DatumGetUInt64(hash_any_extended(jumble,
-                                                          JUMBLE_SIZE, 0));
-            memcpy(jumble, &start_hash, sizeof(start_hash));
-            jumble_len = sizeof(start_hash);
-        }
-        part_size = Min(size, JUMBLE_SIZE - jumble_len);
-        memcpy(jumble + jumble_len, item, part_size);
-        jumble_len += part_size;
-        item += part_size;
-        size -= part_size;
+        jstate->j.bytes[nextbyte] ^= *item++;
+        nextbyte++;
+        if (nextbyte >= sizeof(uint64))
+            nextbyte = 0;
+        size--;
     }
-    jstate->jumble_len = jumble_len;
+    jstate->nextbyte = nextbyte;
 }

 /*
diff --git a/src/include/utils/queryjumble.h b/src/include/utils/queryjumble.h
index 83ba7339fa..d01712af4d 100644
--- a/src/include/utils/queryjumble.h
+++ b/src/include/utils/queryjumble.h
@@ -16,8 +16,6 @@

 #include "nodes/parsenodes.h"

-#define JUMBLE_SIZE                1024    /* query serialization buffer size */
-
 /*
  * Struct for tracking locations/lengths of constants during normalization
  */
@@ -33,11 +31,13 @@ typedef struct LocationLen
  */
 typedef struct JumbleState
 {
-    /* Jumble of current query tree */
-    unsigned char *jumble;
-
-    /* Number of bytes used in jumble[] */
-    Size        jumble_len;
+    /* Working state for accumulating a 64-bit query ID */
+    union
+    {
+        unsigned char bytes[sizeof(uint64)];
+        uint64        id;
+    }            j;
+    int            nextbyte;        /* index of next byte to update in j.bytes[] */

     /* Array of locations of constants that should be removed */
     LocationLen *clocations;

Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Fri, May 14, 2021 at 12:26:23AM -0400, Tom Lane wrote:
> I then tried a really dumb xor'ing implementation, and
> that gives me a slowdown of 2.2%.  This could undoubtedly
> be improved on further, say by unrolling the loop or
> processing multiple bytes at once.  One problem with it
> is that I suspect it will tend to concentrate the entropy
> into the third/fourth and seventh/eighth bytes of the
> accumulator, since so many of the fields being jumbled
> are 4-byte or 8-byte fields with most of the entropy in
> their low-order bits.  Probably that could be improved
> with a bit more thought -- say, an extra bump of the
> nextbyte pointer after each field.
> 
> Anyway, I think that what we have here is quite an inferior
> implementation, and we can do better with some more thought.

Considering what even a simple query has to do, I am still baffled why
such a computation takes ~2%, though it obviously does since you have
confirmed that figure.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:
> On Fri, May 14, 2021 at 09:40:15AM +0800, Julien Rouhaud wrote:
> > On Fri, May 14, 2021 at 8:13 AM Bruce Momjian <bruce@momjian.us> wrote:
> > >
> > > On Thu, May 13, 2021 at 08:04:37PM -0400, Álvaro Herrera wrote:
> > > > Here's a first attempt at what was suggested.  If you say "auto" it
> > > > remains auto in SHOW, but it gets enabled if a module asks for it.
> > > >
> > > > Not final yet, but I thought I'd throw it out for early commentary ...
> > >
> > > I certainly like this idea better than having the extension change the
> > > output of the GUC.
> > 
> > Oh, I didn't understand that it was the major blocker.  I'm fine with it too.
> 
> I think if we keep the output as 'auto', and document that you check
> pg_stat_activity for a hash to see if it is enabled, that gets us pretty
> far.

I think keeping the output as 'auto', and documenting that this query
must be run to determine if the query id is being computed:

    SELECT query_id
    FROM pg_stat_activity
    WHERE pid = pg_backend_pid();

is the right approach.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Fri, May 14, 2021 at 08:35:14AM -0400, Bruce Momjian wrote:
> On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:
> > I think if we keep the output as 'auto', and document that you check
> > pg_stat_activity for a hash to see if it is enabled, that gets us pretty
> > far.
> 
> I think keeping the output as 'auto', and documenting that this query
> must be run to determine if the query id is being computed:
> 
>     SELECT query_id
>     FROM pg_stat_activity
>     WHERE pid = pg_backend_pid();
> 
> is the right approach.

Actually, we talked about huge_pages = try needing to use OS commands to
see if huge pages are being used, so requiring an SQL query to see if
query id is being computed seems reasonable.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Fri, May 14, 2021 at 08:57:41AM -0400, Bruce Momjian wrote:
> On Fri, May 14, 2021 at 08:35:14AM -0400, Bruce Momjian wrote:
> > On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:
> > > I think if we keep the output as 'auto', and document that you check
> > > pg_stat_activity for a hash to see if it is enabled, that gets us pretty
> > > far.
> > 
> > I think keeping the output as 'auto', and documenting that this query
> > must be run to determine if the query id is being computed:
> > 
> >     SELECT query_id
> >     FROM pg_stat_activity
> >     WHERE pid = pg_backend_pid();
> > 
> > is the right approach.
> 
> Actually, we talked about huge_pages = try needing to use OS commands to
> see if huge pages are being used, so requiring an SQL query to see if
> query id is being computed seems reasonable.

I totally agree.



Re: compute_query_id and pg_stat_statements

From
Andrew Dunstan
Date:

Sent from my iPhone

> On May 14, 2021, at 8:35 AM, Bruce Momjian <bruce@momjian.us> wrote:
>
> On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:
>>> On Fri, May 14, 2021 at 09:40:15AM +0800, Julien Rouhaud wrote:
>>> On Fri, May 14, 2021 at 8:13 AM Bruce Momjian <bruce@momjian.us> wrote:
>>>>
>>>> On Thu, May 13, 2021 at 08:04:37PM -0400, Álvaro Herrera wrote:
>>>>> Here's a first attempt at what was suggested.  If you say "auto" it
>>>>> remains auto in SHOW, but it gets enabled if a module asks for it.
>>>>>
>>>>> Not final yet, but I thought I'd throw it out for early commentary ...
>>>>
>>>> I certainly like this idea better than having the extension change the
>>>> output of the GUC.
>>>
>>> Oh, I didn't understand that it was the major blocker.  I'm fine with it too.
>>
>> I think if we keep the output as 'auto', and document that you check
>> pg_stat_activity for a hash to see if it is enabled, that gets us pretty
>> far.
>
> I think keeping the output as 'auto', and documenting that this query
> must be run to determine if the query id is being computed:
>
>    SELECT query_id
>    FROM pg_stat_activity
>    WHERE pid = pg_backend_pid();
>
> is the right approach.

I’d rather we added a specific function. This is not really obvious.

Cheers

Andrew




Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Fri, May 14, 2021 at 12:04:05PM -0400, Andrew Dunstan wrote:
> 
> > On May 14, 2021, at 8:35 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > 
> > On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:
> >>> On Fri, May 14, 2021 at 09:40:15AM +0800, Julien Rouhaud wrote:
> >>> On Fri, May 14, 2021 at 8:13 AM Bruce Momjian <bruce@momjian.us> wrote:
> >>>> 
> >>>> On Thu, May 13, 2021 at 08:04:37PM -0400, Álvaro Herrera wrote:
> >>>>> Here's a first attempt at what was suggested.  If you say "auto" it
> >>>>> remains auto in SHOW, but it gets enabled if a module asks for it.
> >>>>> 
> >>>>> Not final yet, but I thought I'd throw it out for early commentary ...
> >>>> 
> >>>> I certainly like this idea better than having the extension change the
> >>>> output of the GUC.
> >>> 
> >>> Oh, I didn't understand that it was the major blocker.  I'm fine with it too.
> >> 
> >> I think if we keep the output as 'auto', and document that you check
> >> pg_stat_activity for a hash to see if it is enabled, that gets us pretty
> >> far.
> > 
> > I think keeping the output as 'auto', and documenting that this query
> > must be run to determine if the query id is being computed:
> > 
> >    SELECT query_id
> >    FROM pg_stat_activity
> >    WHERE pid = pg_backend_pid();
> > 
> > is the right approach.
> 
> I’d rather we added a specific function. This is not really obvious.

We could, but I don't know how much this will be used in practice.  The only
way someone would try to know if "auto" means that query_id are computed is if
she has an extension like pg_stat_statements, and she will probably just check
that anyway, and will get a warning if query_id are *not* computed.

That being said no objection to an SQL wrapper around a query like it.



Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Fri, May 14, 2021 at 12:04:05PM -0400, Andrew Dunstan wrote:
> > On May 14, 2021, at 8:35 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > 
> > On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:
> > I think keeping the output as 'auto', and documenting that this query
> > must be run to determine if the query id is being computed:
> > 
> >    SELECT query_id
> >    FROM pg_stat_activity
> >    WHERE pid = pg_backend_pid();
> > 
> > is the right approach.
> 
> I’d rather we added a specific function. This is not really obvious.

Well, we can document this query, add a function, or add a read-only
GUC.  I am not sure how we decide which one to use.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Pavel Stehule
Date:


pá 14. 5. 2021 v 18:21 odesílatel Bruce Momjian <bruce@momjian.us> napsal:
On Fri, May 14, 2021 at 12:04:05PM -0400, Andrew Dunstan wrote:
> > On May 14, 2021, at 8:35 AM, Bruce Momjian <bruce@momjian.us> wrote:
> >
> > On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:
> > I think keeping the output as 'auto', and documenting that this query
> > must be run to determine if the query id is being computed:
> >
> >    SELECT query_id
> >    FROM pg_stat_activity
> >    WHERE pid = pg_backend_pid();
> >
> > is the right approach.
>
> I’d rather we added a specific function. This is not really obvious.

Well, we can document this query, add a function, or add a read-only
GUC.  I am not sure how we decide which one to use.

I though and I prefer read only GUC

It is easy to write "show all"

Pavel



--
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Fri, May 14, 2021 at 12:21:23PM -0400, Bruce Momjian wrote:
> On Fri, May 14, 2021 at 12:04:05PM -0400, Andrew Dunstan wrote:
> > > On May 14, 2021, at 8:35 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > > 
> > > On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:
> > > I think keeping the output as 'auto', and documenting that this query
> > > must be run to determine if the query id is being computed:
> > > 
> > >    SELECT query_id
> > >    FROM pg_stat_activity
> > >    WHERE pid = pg_backend_pid();
> > > 
> > > is the right approach.
> > 
> > I’d rather we added a specific function. This is not really obvious.
> 
> Well, we can document this query, add a function, or add a read-only
> GUC.  I am not sure how we decide which one to use.

I wonder if we should go with an SQL query now (no new API needed) and
then add a GUC once we decide on how extensions can register that they
are generating the query id, so the GUC can report the generating
source, not just a boolean.  The core server can also register as the
source.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Pavel Stehule
Date:


pá 14. 5. 2021 v 19:28 odesílatel Bruce Momjian <bruce@momjian.us> napsal:
On Fri, May 14, 2021 at 12:21:23PM -0400, Bruce Momjian wrote:
> On Fri, May 14, 2021 at 12:04:05PM -0400, Andrew Dunstan wrote:
> > > On May 14, 2021, at 8:35 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > >
> > > On Thu, May 13, 2021 at 09:41:42PM -0400, Bruce Momjian wrote:
> > > I think keeping the output as 'auto', and documenting that this query
> > > must be run to determine if the query id is being computed:
> > >
> > >    SELECT query_id
> > >    FROM pg_stat_activity
> > >    WHERE pid = pg_backend_pid();
> > >
> > > is the right approach.
> >
> > I’d rather we added a specific function. This is not really obvious.
>
> Well, we can document this query, add a function, or add a read-only
> GUC.  I am not sure how we decide which one to use.

I wonder if we should go with an SQL query now (no new API needed) and
then add a GUC once we decide on how extensions can register that they
are generating the query id, so the GUC can report the generating
source, not just a boolean.  The core server can also register as the
source.

I have no problem with it. This is an internal feature and can be enhanced (fixed) in time without problems.

Pavel



--
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: compute_query_id and pg_stat_statements

From
Alvaro Herrera
Date:
On 2021-May-14, Julien Rouhaud wrote:

> On Fri, May 14, 2021 at 12:20:00PM +0900, Fujii Masao wrote:

> > +void
> > +EnableQueryId(void)
> > +{
> > +    if (compute_query_id == COMPUTE_QUERY_ID_AUTO)
> > +        auto_query_id_enabled = true;
> > 
> > Shouldn't EnableQueryId() enable auto_query_id_enabled whatever compute_query_id is?
> > Otherwise, for example, the following scenario can happen and it's a bit strange.
> > 
> > 1. The server starts up with shared_preload_libraries=pg_stat_statements and compute_query_id=on
> > 2. compute_query_id is set to auto and the configuration file is reloaded
> > Then, even though compute_query_id is auto and pg_stat_statements is loaded,
> > query ids are not computed and no queries are tracked by pg_stat_statements.
> 
> +1.

That makes sense.  Done in this version.

I took out the new WARNING in pg_stat_statements.  It's not clear to me
that that's terribly useful (it stops working as soon as you have one
query in the pg_stat_statements stash and later disable everything).
Maybe there is an useful warning to add, but I think it's independent of
changing the GUC behabior.

I also made IsQueryIdEnabled() a static inline in queryjumble.h, to
avoid a function call at every site where we need that.  Also did some
light doc editing.

I think I should aim at pushing this tomorrow morning.

> Note that if you switch from compute_query_id = off + custom
> query_id + pg_stat_statements to compute_query_id = auto then thing will
> immediately break (as we instruct third-party plugins authors to error out in
> that case), which is way better than breaking at the next random service
> restart.

Hmm, ok.  I tested pg_queryid and that behavior of throwing an error
seems quite unhelpful -- it basically makes every single query fail if
you set things wrong.  I think that instruction is bogus and should be
reconsidered.  Maybe pg_queryid could use a custom Boolean GUC that
tells it to overwrite the core query_id if that is enabled, or to just
silently do nothing in that case.



While thinking about this patch it occurred to that an useful gadget
might be to let pg_stat_statements be loaded always, but with
compute_query_id=false; so it's never active; except if you
  ALTER {DATABASE, USER} foo SET (compute_query_id=on);
so that it's possible to enable it selectively.  I think this doesn't
currently achieve anything because pgss_store is always called
regardless of query ID being active (so you'd always have at least one
function call as performance penalty, only that the work would be for
naught), but that seems a simple change to make.  I didn't look closely
to see what other things would need patched.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
Maybe there's lots of data loss but the records of data loss are also lost.
(Lincoln Yeoh)

Attachment

Re: compute_query_id and pg_stat_statements

From
Justin Pryzby
Date:
On Fri, May 14, 2021 at 07:50:13PM -0400, Alvaro Herrera wrote:
> +++ b/doc/src/sgml/config.sgml
> @@ -7643,7 +7643,12 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
>          identifier to be computed.  Note that an external module can
>          alternatively be used if the in-core query identifier computation
>          method is not acceptable.  In this case, in-core computation
> -        must be disabled.  The default is <literal>off</literal>.
> +        must be always disabled.
> +        Valid values are <literal>off</literal> (always disabled),
> +        <literal>on</literal> (always enabled) and <literal>auto</literal>,
> +        which let modules such as <xref linkend="pgstatstatements"/>
> +        automatically enable it.
> +        The default is <literal>auto</literal>.

which lets

> +/* True when a module requests query IDs and they're set auto */
> +bool        query_id_enabled = false;

Does "they're" mean the GUC compute_query_id ?

> +/*
> + * This should only be called if IsQueryIdEnabled()
> + * return true.
> + */
>  JumbleState *
>  JumbleQuery(Query *query, const char *querytext)

Should it Assert() that ?

Maybe you should update this too ?
doc/src/sgml/release-14.sgml



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Sat, May 15, 2021 at 7:50 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> I took out the new WARNING in pg_stat_statements.  It's not clear to me
> that that's terribly useful (it stops working as soon as you have one
> query in the pg_stat_statements stash and later disable everything).

If no query_id is calculated and you have entries in
pg_stat_statements, it means someone deliberately deactivated
compute_query_id.  In that case it's clear that they know the GUC
exists, so there's no much point in warning them that they deactivated
it I think.

> Maybe there is an useful warning to add, but I think it's independent of
> changing the GUC behabior.

I'm fine with it.

> > Note that if you switch from compute_query_id = off + custom
> > query_id + pg_stat_statements to compute_query_id = auto then thing will
> > immediately break (as we instruct third-party plugins authors to error out in
> > that case), which is way better than breaking at the next random service
> > restart.
>
> Hmm, ok.  I tested pg_queryid and that behavior of throwing an error
> seems quite unhelpful -- it basically makes every single query fail if
> you set things wrong.  I think that instruction is bogus and should be
> reconsidered.  Maybe pg_queryid could use a custom Boolean GUC that
> tells it to overwrite the core query_id if that is enabled, or to just
> silently do nothing in that case.

Unless I'm missing something, if we remove that instruction it means
that we encourage users to dynamically change the query_id source
without any safeguard, which will in the majority of case result in
unwanted behavior, going from duplicated entries, poor performance in
pg_stat_statements if that leads to more evictions, or even totally
bogus metrics if that leads to hash collision.

> While thinking about this patch it occurred to that an useful gadget
> might be to let pg_stat_statements be loaded always, but with
> compute_query_id=false; so it's never active; except if you
>   ALTER {DATABASE, USER} foo SET (compute_query_id=on);
> so that it's possible to enable it selectively.  I think this doesn't
> currently achieve anything because pgss_store is always called
> regardless of query ID being active (so you'd always have at least one
> function call as performance penalty, only that the work would be for
> naught), but that seems a simple change to make.  I didn't look closely
> to see what other things would need patched.

Couldn't it already be achieved with ALTER [ DATABASE | USER ] foo SET
pg_stat_statements.track = [ none | top | all ] ?



Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Sat, May 15, 2021 at 04:09:32PM +0800, Julien Rouhaud wrote:
> > While thinking about this patch it occurred to that an useful gadget
> > might be to let pg_stat_statements be loaded always, but with
> > compute_query_id=false; so it's never active; except if you
> >   ALTER {DATABASE, USER} foo SET (compute_query_id=on);
> > so that it's possible to enable it selectively.  I think this doesn't
> > currently achieve anything because pgss_store is always called
> > regardless of query ID being active (so you'd always have at least one
> > function call as performance penalty, only that the work would be for
> > naught), but that seems a simple change to make.  I didn't look closely
> > to see what other things would need patched.
> 
> Couldn't it already be achieved with ALTER [ DATABASE | USER ] foo SET
> pg_stat_statements.track = [ none | top | all ] ?

I am no longer the committer in charge of this feature, but I would like
to remind the group that beta1 will be wrapped on Monday, and it is hard
to change non-read-only GUCs after beta since the settings are embedded
in postgresql.conf.  There is also a release notes item that probably
will need to be adjusted.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
On Sat, May 15, 2021 at 10:00 PM Bruce Momjian <bruce@momjian.us> wrote:
>
> I am no longer the committer in charge of this feature, but I would like
> to remind the group that beta1 will be wrapped on Monday, and it is hard
> to change non-read-only GUCs after beta since the settings are embedded
> in postgresql.conf.  There is also a release notes item that probably
> will need to be adjusted.

It seems that everyone agrees on the definition of compute_query_id in
Álvaro's v4 patch (module Justin's comments) so this could be
committed before the beta1.  If the safeguards for custom query_id or
GUC misconfiguration have to be tweaked it shouldn't impact the GUC in
any way.



Re: compute_query_id and pg_stat_statements

From
Alvaro Herrera
Date:
On 2021-May-16, Julien Rouhaud wrote:

> On Sat, May 15, 2021 at 10:00 PM Bruce Momjian <bruce@momjian.us> wrote:
> >
> > I am no longer the committer in charge of this feature, but I would like
> > to remind the group that beta1 will be wrapped on Monday, and it is hard
> > to change non-read-only GUCs after beta since the settings are embedded
> > in postgresql.conf.  There is also a release notes item that probably
> > will need to be adjusted.
> 
> It seems that everyone agrees on the definition of compute_query_id in
> Álvaro's v4 patch (module Justin's comments) so this could be
> committed before the beta1.  If the safeguards for custom query_id or
> GUC misconfiguration have to be tweaked it shouldn't impact the GUC in
> any way.

Pushed after adding the fixes from Justin.  Note I didn't include the
WARNING in pg_stat_statements when this is disabled; if anybody wants to
argue for that, let's add it separately.

I commented out the release notes para that is now wrong.  What remains
is this:

  Move query hash computation from pg_stat_statements to the core server (Julien Rouhaud)

We could perhaps add something like

  Extension pg_stat_statements continues to work without requiring any
  configuration changes.

but that seems a bit pointless.  Or maybe

  Extension pg_stat_statements automatically enables query identifier
  computation if compute_query_id is set to auto.  Third-party modules
  to compute query identifiers can be installed and used if this is set
  to off.


I wonder why the initial line says "query hash" instead of "query
identifier".  Do we want to say "hash" everywhere?  Why didn't we name
the GUC "compute_query_hash" in that case?


Anyway, let me remind you that it is pretty common to require initdb
during the beta period.

-- 
Álvaro Herrera       Valdivia, Chile



Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Sat, May 15, 2021 at 02:21:59PM -0400, Álvaro Herrera wrote:
> I commented out the release notes para that is now wrong.  What remains
> is this:
> 
>   Move query hash computation from pg_stat_statements to the core server (Julien Rouhaud)
> 
> We could perhaps add something like
> 
>   Extension pg_stat_statements continues to work without requiring any
>   configuration changes.
> 
> but that seems a bit pointless.  Or maybe
> 
>   Extension pg_stat_statements automatically enables query identifier
>   computation if compute_query_id is set to auto.  Third-party modules
>   to compute query identifiers can be installed and used if this is set
>   to off.
> 

OK, new text is:

    <listitem>
    <!--
    Author: Bruce Momjian <bruce@momjian.us>
    2021-04-07 [5fd9dfa5f] Move pg_stat_statements query jumbling to core.
    -->
    
    <para>
    Move query hash computation from pg_stat_statements to the core
    server (Julien Rouhaud)
    </para>
    
    <para>
    The new server variable compute_query_id's default of 'auto' will
    automatically enable query id computation when this extension
    is loaded.
    </para>
    </listitem>

I also added Alvaro as an author of the compute_query_id item.

> I wonder why the initial line says "query hash" instead of "query
> identifier".  Do we want to say "hash" everywhere?  Why didn't we name
> the GUC "compute_query_hash" in that case?

It is queryid (no underscore) in pg_stat_statements, which was a whole
different discussion.  ;-)

> Anyway, let me remind you that it is pretty common to require initdb
> during the beta period.

True.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Alvaro Herrera
Date:
On 2021-May-15, Bruce Momjian wrote:

> On Sat, May 15, 2021 at 02:21:59PM -0400, Álvaro Herrera wrote:

> > I wonder why the initial line says "query hash" instead of "query
> > identifier".  Do we want to say "hash" everywhere?  Why didn't we name
> > the GUC "compute_query_hash" in that case?
> 
> It is queryid (no underscore) in pg_stat_statements, which was a whole
> different discussion.  ;-)

Yeah, I realize that, but I wonder if we shouldn't use the term "query
identifier" instead of "query hash" in that paragraph.

> I also added Alvaro as an author of the compute_query_id item.

I've been wondering if I should ask to stick my name in other features I
helped get committed -- specifically the PQtrace() item and autovacuum
for partitioned tables.  I'll go comment in the release notes thread.

-- 
Álvaro Herrera       Valdivia, Chile
"Digital and video cameras have this adjustment and film cameras don't for the
same reason dogs and cats lick themselves: because they can."   (Ken Rockwell)



Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote:
> OK, new text is:
> 
>     <listitem>
>     <!--
>     Author: Bruce Momjian <bruce@momjian.us>
>     2021-04-07 [5fd9dfa5f] Move pg_stat_statements query jumbling to core.
>     -->
>     
>     <para>
>     Move query hash computation from pg_stat_statements to the core
>     server (Julien Rouhaud)
>     </para>
>     
>     <para>
>     The new server variable compute_query_id's default of 'auto' will
>     automatically enable query id computation when this extension
>     is loaded.
>     </para>
>     </listitem>
> 
> I also added Alvaro as an author of the compute_query_id item.
  --------------------------------------------------------------

Based on the commit message, adding Alvaro was incorrect, so I will
revert this change.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Alvaro Herrera
Date:
On 2021-May-15, Bruce Momjian wrote:

> On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote:

> > I also added Alvaro as an author of the compute_query_id item.
>   --------------------------------------------------------------
> 
> Based on the commit message, adding Alvaro was incorrect, so I will
> revert this change.

Agreed.  My work on this one was janitorial.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"Hay que recordar que la existencia en el cosmos, y particularmente la
elaboración de civilizaciones dentro de él no son, por desgracia,
nada idílicas" (Ijon Tichy)



Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
Le dim. 16 mai 2021 à 11:23, Alvaro Herrera <alvherre@alvh.no-ip.org> a écrit :
On 2021-May-15, Bruce Momjian wrote:

> On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote:

> > I also added Alvaro as an author of the compute_query_id item.
>   --------------------------------------------------------------
>
> Based on the commit message, adding Alvaro was incorrect, so I will
> revert this change.

Agreed.  My work on this one was janitorial.

Thanks a lot Alvaro and Bruce! 

Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Sat, May 15, 2021 at 07:01:25PM -0400, Álvaro Herrera wrote:
> On 2021-May-15, Bruce Momjian wrote:
> 
> > On Sat, May 15, 2021 at 02:21:59PM -0400, Álvaro Herrera wrote:
> 
> > > I wonder why the initial line says "query hash" instead of "query
> > > identifier".  Do we want to say "hash" everywhere?  Why didn't we name
> > > the GUC "compute_query_hash" in that case?
> > 
> > It is queryid (no underscore) in pg_stat_statements, which was a whole
> > different discussion.  ;-)
> 
> Yeah, I realize that, but I wonder if we shouldn't use the term "query
> identifier" instead of "query hash" in that paragraph.

Yes, of course, you are right --- updated text:

    <listitem>
    <!--
    Author: Bruce Momjian <bruce@momjian.us>
    2021-04-07 [4f0b0966c] Make use of in-core query id added by commit 5fd9dfa5f5
    Author: Bruce Momjian <bruce@momjian.us>
    2021-04-07 [f57a2f5e0] Add csvlog output for the new query_id value
    Author: Bruce Momjian <bruce@momjian.us>
    2021-04-20 [9660834dd] adjust query id feature to use pg_stat_activity.query_id
    Author: Bruce Momjian <bruce@momjian.us>
    2021-05-03 [f7a97b6ec] Update query_id computation
    Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
    2021-05-15 [cafde58b3] Allow compute_query_id to be set to 'auto' and make it d
    -->
    
    <para>
    If server variable compute_query_id is enabled, display the query
    id in pg_stat_activity, EXPLAIN VERBOSE, csvlog, and optionally
    in log_line_prefix (Julien Rouhaud)
    </para>
    
    <para>
    A query id computed by an extension will also be displayed.
    </para>
    </listitem>

> 
> > I also added Alvaro as an author of the compute_query_id item.
> 
> I've been wondering if I should ask to stick my name in other features I
> helped get committed -- specifically the PQtrace() item and autovacuum
> for partitioned tables.  I'll go comment in the release notes thread.

Yes, done.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Sat, May 15, 2021 at 11:23:25PM -0400, Álvaro Herrera wrote:
> On 2021-May-15, Bruce Momjian wrote:
> 
> > On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote:
> 
> > > I also added Alvaro as an author of the compute_query_id item.
> >   --------------------------------------------------------------
> > 
> > Based on the commit message, adding Alvaro was incorrect, so I will
> > revert this change.
> 
> Agreed.  My work on this one was janitorial.

OK, removed, thanks.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Sun, May 16, 2021 at 08:39:33PM +0800, Julien Rouhaud wrote:
> Le dim. 16 mai 2021 à 11:23, Alvaro Herrera <alvherre@alvh.no-ip.org> a écrit :
> 
>     On 2021-May-15, Bruce Momjian wrote:
> 
>     > On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote:
> 
>     > > I also added Alvaro as an author of the compute_query_id item.
>     >   --------------------------------------------------------------
>     >
>     > Based on the commit message, adding Alvaro was incorrect, so I will
>     > revert this change.
> 
>     Agreed.  My work on this one was janitorial.
> 
> 
> Thanks a lot Alvaro and Bruce! 

We are going to get to the goal line, one way or the other!  ;-)

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Andrew Dunstan
Date:
On 5/16/21 11:13 PM, Bruce Momjian wrote:
> On Sun, May 16, 2021 at 08:39:33PM +0800, Julien Rouhaud wrote:
>> Le dim. 16 mai 2021 à 11:23, Alvaro Herrera <alvherre@alvh.no-ip.org> a écrit :
>>
>>     On 2021-May-15, Bruce Momjian wrote:
>>
>>     > On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote:
>>
>>     > > I also added Alvaro as an author of the compute_query_id item.
>>     >   --------------------------------------------------------------
>>     >
>>     > Based on the commit message, adding Alvaro was incorrect, so I will
>>     > revert this change.
>>
>>     Agreed.  My work on this one was janitorial.
>>
>>
>> Thanks a lot Alvaro and Bruce! 
> We are going to get to the goal line, one way or the other!  ;-)



I've discussed this with Alvaro. He's not planning to do anything more
regarding this and I think we can close the open item.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: compute_query_id and pg_stat_statements

From
Bruce Momjian
Date:
On Fri, May 21, 2021 at 02:19:13PM -0400, Andrew Dunstan wrote:
> 
> On 5/16/21 11:13 PM, Bruce Momjian wrote:
> > On Sun, May 16, 2021 at 08:39:33PM +0800, Julien Rouhaud wrote:
> >> Le dim. 16 mai 2021 à 11:23, Alvaro Herrera <alvherre@alvh.no-ip.org> a écrit :
> >>
> >>     On 2021-May-15, Bruce Momjian wrote:
> >>
> >>     > On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote:
> >>
> >>     > > I also added Alvaro as an author of the compute_query_id item.
> >>     >   --------------------------------------------------------------
> >>     >
> >>     > Based on the commit message, adding Alvaro was incorrect, so I will
> >>     > revert this change.
> >>
> >>     Agreed.  My work on this one was janitorial.
> >>
> >>
> >> Thanks a lot Alvaro and Bruce! 
> > We are going to get to the goal line, one way or the other!  ;-)
> 
> 
> 
> I've discussed this with Alvaro. He's not planning to do anything more
> regarding this and I think we can close the open item.

Works for me.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.




Re: compute_query_id and pg_stat_statements

From
Julien Rouhaud
Date:
Le sam. 22 mai 2021 à 02:27, Bruce Momjian <bruce@momjian.us> a écrit :
On Fri, May 21, 2021 at 02:19:13PM -0400, Andrew Dunstan wrote:
>
> On 5/16/21 11:13 PM, Bruce Momjian wrote:
> > On Sun, May 16, 2021 at 08:39:33PM +0800, Julien Rouhaud wrote:
> >> Le dim. 16 mai 2021 à 11:23, Alvaro Herrera <alvherre@alvh.no-ip.org> a écrit :
> >>
> >>     On 2021-May-15, Bruce Momjian wrote:
> >>
> >>     > On Sat, May 15, 2021 at 05:32:58PM -0400, Bruce Momjian wrote:
> >>
> >>     > > I also added Alvaro as an author of the compute_query_id item.
> >>     >   --------------------------------------------------------------
> >>     >
> >>     > Based on the commit message, adding Alvaro was incorrect, so I will
> >>     > revert this change.
> >>
> >>     Agreed.  My work on this one was janitorial.
> >>
> >>
> >> Thanks a lot Alvaro and Bruce! 
> > We are going to get to the goal line, one way or the other!  ;-)
>
>
>
> I've discussed this with Alvaro. He's not planning to do anything more
> regarding this and I think we can close the open item.

Works for me.

works for me too.