Re: compute_query_id and pg_stat_statements - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: compute_query_id and pg_stat_statements
Date
Msg-id 20210512.143335.1053382893089496881.horikyota.ntt@gmail.com
Whole thread Raw
In response to compute_query_id and pg_stat_statements  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: compute_query_id and pg_stat_statements
Re: compute_query_id and pg_stat_statements
List pgsql-hackers
At Wed, 12 May 2021 10:42:01 +0800, Julien Rouhaud <rjuju123@gmail.com> wrote in 
> Hello Horiguchi-san,
> 
> On Wed, May 12, 2021 at 11:08:36AM +0900, Kyotaro Horiguchi wrote:
> > 
> > If we look it in pg_settings, it shows the current value and the value
> > at boot-time.  So I'm fine with that behavior.
> > 
> > However, IMHO, I doubt the necessity of "on". Assuming that we require
> > any module that wants query-id to call queryIdWanted() at any time
> > after each process startup (or it could be inherited to children), I
> > think only "auto" and "off" are enough for the variable.
> 
> I don't think that this approach would cope well for people who want a queryid
> without pg_stat_statements or such.  Since the queryid can now be found in
> pg_stat_activity, EXPLAIN output or the logs I think it's entirely reasonable
> to allow users to benefit from that even if they don't install additional
> module.

Ah, I missed that case.  And we are wanting to use pg_stat_statements
with (almost) zero-config?  How about the following behavior?

Setting query_id_provider to 'none' means we don't calculate query-id
by default. However, if queryIdWante() is called, the default provider
is set up and starts calculating query id.

Setting query_id_provider to something else means the user wants
query-id calcualted using the provider. Setting 'default' is
equivalent to setting compute_query_id to 'on'.

There might be a case where a user sets query_id_provider to
non-'none' but don't want have query-id calculated, but it can be said
a kind of mis-configuration?

> > Thinking in
> > this line, the variable is a subset of a GUC variable to specify the
> > name of a query-id provider (as Andres suggested upthread), and I
> > think it would work better in future.
> > 
> > So for now I propose that we have a variable query_id_provider that
> > has only 'default' and 'none' as the domain.
> 
> I think this would be a mistake to do that, as it would mean that we don't
> officially support alternative queryid provider.

Ok, if we want to support alternative providers from the first, we
need to actually write the loader code for query-id providers.  It
would not be so hard?, but it might not be suitable to this stage so I
proposed that to get rid of needing such complexity for now.

(Anyway I prefer to load query-id provider as a dynamically loadable
 module rather than hook-function.)

> > We can later expand it
> > so that any other query-id provider modules can be loaded without
> > chaning the interface.
> 
> The GUC itself may not change, but third-party queryid provider would probably
> need changes as the new entry point will be dedicated to compute a queryid
> only, while third-party plugins may do more than that in their
> post_parse_analyze_hook.  And also users will have to change their

I don't think it is not that a problem. Even if any third-party
extension is having query-id generator by itself, in most cases it
would be a copy of JumbleQuery in case of pg_stat_statement is not
loaded and now it is moved in-core as 'default' provider. What the
exntension needs to be done is just ripping out the copied generator
code.  I guess...

> configuration to use that new interface, and additionally the module may now
> have to be removed from shared_preload_libraries.  Overall, it doesn't seem to
> me that it would make users' life easier.

Why the third-party module need to be removed from
shared_preload_libraries?  The module can stay as a preloaded shared
library but just no longer need to have its own query-id provider
since it is provided in-core.  If the extension required a specific
provider, the developer need to make it a loadable module and users
need to specify the provider module explicitly.  I don't think that is
not a problem but if we wanted to make it easier, we can let users
free from that step by allowing 'auto' for query-id-provider to load
any module by the first extension.

So, for example, how about the following interface?

GUC query_id_provider:

- 'none' : query_id is not calculated, don't allow loading external
           generator module.

- 'default' : use default provider and calculate query-id.

- '<provider-name>' : use the provider and calculate query-id using it.

- 'auto' : query_id is not calculated, but allow to load query-id
           provider if queryIdWanted() is called.

# of course 'auto' and 'default' are inhibited as the provier name.

- core function bool queryIdWanted(char *provider_name, bool use_existing)

 Allows extensions to request to load a provider if not yet, then
 start calculating query-id.  Returns true if the request is accepted.

 provider_name :

  - 'default' or '<provider-name>': requests the provider to be loaded
    and start calculating query-id. Refuse the request if 'none' is
    set to query_id_provider.

 use_existing: Set true to allow using a provider already loaded.
    Otherwise refuses the request if any other provider than
    prvoder_name is already loaded.

In most cases users set query_id_provider to 'auto' and extensions
call queryIdWanted with ('default', true).

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: parallel vacuum - few questions on docs, comments and code
Next
From: Pavel Stehule
Date:
Subject: Re: Schema variables - new implementation for Postgres 15