Thread: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Yun Li

Date:

15 March 2019, 21:54:19

Hey pg developers,

Do you think if we can add queryId into the pg_stat_get_activity function and ultimatly expose it in the view? It would be easier to track "similar" query's performance over time easier.

Thanks a lot!
Yun

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Tom Lane

Date:

16 March 2019, 01:50:08

Yun Li <liyunjuanyong@gmail.com> writes:
> Do you think if we can add queryId into the pg_stat_get_activity function
> and ultimatly expose it in the view? It would be easier to track "similar"
> query's performance over time easier.

No, we're not likely to do that, because it would mean (1) baking one
single definition of "query ID" into the core system and (2) paying
the cost to calculate that ID all the time.

pg_stat_statements has a notion of query ID, but that notion might be
quite inappropriate for other usages, which is why it's an extension
and not core.

            regards, tom lane

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Robert Haas

Date:

16 March 2019, 14:32:16

On Fri, Mar 15, 2019 at 9:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Yun Li <liyunjuanyong@gmail.com> writes:
> > Do you think if we can add queryId into the pg_stat_get_activity function
> > and ultimatly expose it in the view? It would be easier to track "similar"
> > query's performance over time easier.
>
> No, we're not likely to do that, because it would mean (1) baking one
> single definition of "query ID" into the core system and (2) paying
> the cost to calculate that ID all the time.
>
> pg_stat_statements has a notion of query ID, but that notion might be
> quite inappropriate for other usages, which is why it's an extension
> and not core.

Having written an extension that also wanted a query ID, I disagree
with this position.  There's only one query ID field available, and
you can't use two extensions that care about query ID unless they
compute it the same way, and replicating all the code that computes
the query ID into each new extension that wants one sucks.  I think we
should actually bite the bullet and move all of that code into core,
and then just let extensions say whether they care about it getting
set.

Also, I think this is now the third independent request to expose
query ID in pg_stat_statements.  I think we should give the people
what they want.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Tom Lane

Date:

16 March 2019, 16:20:01

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Mar 15, 2019 at 9:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> pg_stat_statements has a notion of query ID, but that notion might be
>> quite inappropriate for other usages, which is why it's an extension
>> and not core.

> Having written an extension that also wanted a query ID, I disagree
> with this position.

[ shrug... ]  The fact remains that pg_stat_statements's definition is
pretty lame.  There's a lot of judgment calls in which query fields
it chooses to examine or ignore, and there's been no attempt at all
to make the ID PG-version-independent, and I rather doubt that it's
platform-independent either.  Nor will the IDs survive a dump/reload
even on the same server, since object OIDs will likely change.

These things are OK, or at least mostly tolerable, for pg_stat_statements'
usage ... but I don't think it's a good idea to have the core code
dictating that definition to all extensions.  Right now, if you have
an extension that needs some other query-ID definition, you can do it,
you just can't run that extension alongside pg_stat_statements.
But you'll be out of luck if the core code starts filling that field.

I'd be happier about having the core code compute a query ID if we
had a definition that was not so obviously slapped together.

            regards, tom lane

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

16 March 2019, 18:02:52

On Sat, Mar 16, 2019 at 5:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Fri, Mar 15, 2019 at 9:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> pg_stat_statements has a notion of query ID, but that notion might be
> >> quite inappropriate for other usages, which is why it's an extension
> >> and not core.
>
> > Having written an extension that also wanted a query ID, I disagree
> > with this position.
>
> [ shrug... ]  The fact remains that pg_stat_statements's definition is
> pretty lame.  There's a lot of judgment calls in which query fields
> it chooses to examine or ignore, and there's been no attempt at all
> to make the ID PG-version-independent, and I rather doubt that it's
> platform-independent either.  Nor will the IDs survive a dump/reload
> even on the same server, since object OIDs will likely change.
>
> These things are OK, or at least mostly tolerable, for pg_stat_statements'
> usage ... but I don't think it's a good idea to have the core code
> dictating that definition to all extensions.  Right now, if you have
> an extension that needs some other query-ID definition, you can do it,
> you just can't run that extension alongside pg_stat_statements.
> But you'll be out of luck if the core code starts filling that field.
>
> I'd be happier about having the core code compute a query ID if we
> had a definition that was not so obviously slapped together.

But the queryId itself is stored in core.  Exposing it in
pg_stat_activity or log_line_prefix would still allow users to choose
the implementation of their choice, or none.  That seems like a
different complaint from asking pgss integration in core to have all
its metrics available by default (or at least without a restart).

Maybe we could add a GUC for pg_stat_statements to choose whether it
should set the queryid itself and not, if anyone wants to have its
metrics but with different queryid semantics?

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

legrand legrand

Date:

16 March 2019, 18:21:13

Hello,

This is available in https://github.com/legrandlegrand/pg_stat_sql_plans
extension with a specific function
pgssp_backend_queryid(pid) that permits to join pg_stat_activity with
pg_stat_sql_plans (that is similar to pg_stat_statements) and also permits
to collect samples of wait events per query id.

This extension computes its own queryid based on a normalized query text
(that doesn't change after table
drop/create).

Maybe that queryid calculation should stay in a dedicated extension,
permiting to users to choose their queryid definition.

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Yun Li

Date:

18 March 2019, 17:23:43

Thanks a lot for really good points!! I did not expected I will get this many points of view. :P

I have identical experience with Robert when other extension calculate the id different as PGSS, PGSS will overwritten that id when it is on. But Tom got a point that if we centralize the logic that pgss has, then other extension will have no way to change it unless we have some new config to toggle pointed out by Julien. Also Tom got the concern about the current PGSS jumble query logic is not bullet proof and may take time then impact the perf.

Let's take one step back. Since queryId is stored in core as Julien pointed out, can we just add that global to the pg_stat_get_activity and ultimately exposed in pg_stat_activity view? Then no matter whether PGSS is on or off, or however the customer extensions are updating that filed, we expose that field in that view then enable user to leverage that id to join with pgss or their extension. Will this sounds a good idea?

Thanks again,

Yun

On Sat, Mar 16, 2019 at 11:01 AM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Sat, Mar 16, 2019 at 5:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Fri, Mar 15, 2019 at 9:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> pg_stat_statements has a notion of query ID, but that notion might be
> >> quite inappropriate for other usages, which is why it's an extension
> >> and not core.
>
> > Having written an extension that also wanted a query ID, I disagree
> > with this position.
>
> [ shrug... ] The fact remains that pg_stat_statements's definition is
> pretty lame. There's a lot of judgment calls in which query fields
> it chooses to examine or ignore, and there's been no attempt at all
> to make the ID PG-version-independent, and I rather doubt that it's
> platform-independent either. Nor will the IDs survive a dump/reload
> even on the same server, since object OIDs will likely change.
>
> These things are OK, or at least mostly tolerable, for pg_stat_statements'
> usage ... but I don't think it's a good idea to have the core code
> dictating that definition to all extensions. Right now, if you have
> an extension that needs some other query-ID definition, you can do it,
> you just can't run that extension alongside pg_stat_statements.
> But you'll be out of luck if the core code starts filling that field.
>
> I'd be happier about having the core code compute a query ID if we
> had a definition that was not so obviously slapped together.

But the queryId itself is stored in core. Exposing it in
pg_stat_activity or log_line_prefix would still allow users to choose
the implementation of their choice, or none. That seems like a
different complaint from asking pgss integration in core to have all
its metrics available by default (or at least without a restart).

Maybe we could add a GUC for pg_stat_statements to choose whether it
should set the queryid itself and not, if anyone wants to have its
metrics but with different queryid semantics?

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Nikolay Samokhvalov

Date:

18 March 2019, 18:24:19

Hello

On Sat, Mar 16, 2019 at 7:32 AM Robert Haas <robertmhaas@gmail.com> wrote:

Also, I think this is now the third independent request to expose
query ID in pg_stat_statements. I think we should give the people
what they want.

Count me as the 4th.

This would be a very important feature for automated query analysis.

pg_stat_statements lacks query examples, and the only way to get them is from the logs.

Where we don't have queryid as well. So people end up either doing it manually or writing

yet another set of nasty regular expressions.

Routing query analysis s a crucial for any large project. If there are chances to implement

queryid for pg_stat_activity (or anything that will allow to automate query analysis)

in Postgres 12 or later -- this would be a great news and huge support for engineers.

Same level as recently implemented sampling for statement logging.

By the way, if queryid goes to the core someday, I'm sure it is worth to consider using

it in logs as well.

Thanks,

Nik

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

18 March 2019, 18:33:42

On Mon, Mar 18, 2019 at 6:23 PM Yun Li <liyunjuanyong@gmail.com> wrote:
>
> Let's take one step back. Since queryId is stored in core as Julien pointed out, can we just add that global to the
pg_stat_get_activityand ultimately exposed in pg_stat_activity view? Then no matter whether PGSS  is on or off, or
howeverthe customer extensions are updating that filed, we expose that field in that view then enable user to leverage
thatid to join with pgss or their extension. Will this sounds a good idea? 

I'd greatly welcome expose queryid exposure in pg_stat_activity, and
also in log_line_prefix.  I'm afraid that it's too late for pg12
inclusion, but I'll be happy to provide a patch for that for pg13.

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Maksim Milyutin

Date:

19 March 2019, 13:45:17

On 3/16/19 5:32 PM, Robert Haas wrote:

> There's only one query ID field available, and
> you can't use two extensions that care about query ID unless they
> compute it the same way, and replicating all the code that computes
> the query ID into each new extension that wants one sucks.  I think we
> should actually bite the bullet and move all of that code into core,
> and then just let extensions say whether they care about it getting
> set.

+1.

But I think that enough to integrate into core the query normalization 
routine and store generalized query strings (from which the queryId is 
produced) in shared memory (for example, hashtable that maps queryId to 
the text representation of generalized query). And activate 
normalization routine and filling the table of generalized queries by 
specified GUC.

This allows to unbind extensions that require queryId from using 
pg_stat_statements and consider such computing of queryId as canonical.

-- 
Regards,
Maksim Milyutin

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

19 March 2019, 14:43:31

On Tue, Mar 19, 2019 at 2:45 PM Maksim Milyutin <milyutinma@gmail.com> wrote:
>
> But I think that enough to integrate into core the query normalization
> routine and store generalized query strings (from which the queryId is
> produced) in shared memory (for example, hashtable that maps queryId to
> the text representation of generalized query).

That's more or less how pg_stat_statements was previously behaving,
and it had too many problems.  Current implementation, with an
external file, is a better alternative.

> And activate
> normalization routine and filling the table of generalized queries by
> specified GUC.
>
> This allows to unbind extensions that require queryId from using
> pg_stat_statements and consider such computing of queryId as canonical.

The problem I see with this approach is that if you want a different
implementation, you'll have to reimplement the in-core normalised
queries saving and retrieval, but with a different set of SQL-visible
functions.  I don't think that's it's acceptable, unless we add a
specific hook for query normalisation and queryid computing.  But it
isn't ideal either, as it would be a total mess if someone changes the
implementation without resetting the previously saved normalised
queries.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

19 March 2019, 14:51:37

On Mon, Mar 18, 2019 at 7:33 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Mon, Mar 18, 2019 at 6:23 PM Yun Li <liyunjuanyong@gmail.com> wrote:
> >
> > Let's take one step back. Since queryId is stored in core as Julien pointed out, can we just add that global to the
pg_stat_get_activityand ultimately exposed in pg_stat_activity view? Then no matter whether PGSS  is on or off, or
howeverthe customer extensions are updating that filed, we expose that field in that view then enable user to leverage
thatid to join with pgss or their extension. Will this sounds a good idea? 
>
> I'd greatly welcome expose queryid exposure in pg_stat_activity, and
> also in log_line_prefix.  I'm afraid that it's too late for pg12
> inclusion, but I'll be happy to provide a patch for that for pg13.

Here's a prototype patch for queryid exposure in pg_stat_activity and
log_line prefix.

Attachment

queryid_exposure-v1.diff

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Jim Finnerty

Date:

19 March 2019, 17:23:55

The queryId depends on oids, so it is not stable enough for some purposes. 
For example, to create a SQL identifier that survives across a server
upgrade, or that can be shipped to another database, the queryId isn't
usable. 

The apg_plan_mgmt extensions keeps both its own stable SQL identifier as
well as the queryId, so it can be used to join to pg_stat_statements if
desired.  If we were to standardize on one SQL identifier, it should be
stable enough to survive a major version upgrade or to be the same in
different databases.





-----
Jim Finnerty, AWS, Amazon Aurora PostgreSQL
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Robert Haas

Date:

19 March 2019, 19:00:32

On Tue, Mar 19, 2019 at 1:24 PM Jim Finnerty <jfinnert@amazon.com> wrote:
> The queryId depends on oids, so it is not stable enough for some purposes.
> For example, to create a SQL identifier that survives across a server
> upgrade, or that can be shipped to another database, the queryId isn't
> usable.
>
> The apg_plan_mgmt extensions keeps both its own stable SQL identifier as
> well as the queryId, so it can be used to join to pg_stat_statements if
> desired.  If we were to standardize on one SQL identifier, it should be
> stable enough to survive a major version upgrade or to be the same in
> different databases.

If Amazon would like to open-source its (AIUI) proprietary technology
for computing query IDs and propose it for inclusion in PostgreSQL,
cool, but I think that is a separate question from whether people
would like more convenient access to the query ID technology that we
have today.  I think it's 100% clear that they would like that, even
as things stand, and therefore it does not make sense to block that
behind Amazon deciding to share what it already has or somebody else
trying to reimplement it.

If we need to have a space for both a core-standard query ID and
another query ID that is available for extension use, adding one more
field to struct Query, so we can have both coreQueryId and
extensionQueryId or whatever, would be easy to do.  It appears that
there's more use case than I would have guessed for custom query IDs.
On the other hand, it also appears that a lot of people would be very,
very happy to just be able to see the query ID field that already
exists, both in pg_stat_statements in pg_stat_activity, and we
shouldn't throw up unnecessary impediments in the way of making that
happen, at least IMHO.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

legrand legrand

Date:

19 March 2019, 19:38:05

Great, 
thank you Julien !

Would it make sense to add it in auto explain ?
I don't know for explain itself, but maybe ...

Regards
PAscal




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

19 March 2019, 23:52:44

On Tue, Mar 19, 2019 at 8:38 PM legrand legrand
<legrand_legrand@hotmail.com> wrote:
>
> Would it make sense to add it in auto explain ?
> I don't know for explain itself, but maybe ...

I'd think that people interested in getting the queryid in the logs
would configure the log_line_prefix to display it consistently rather
than having it in only a subset of cases, so that's probably not
really needed.

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

legrand legrand

Date:

20 March 2019, 19:21:30

Hi Jim, Robert,

As this is a distinct subject from adding QueryId to pg_stat_activity,
would it be possible to continue the discussion "new QueryId definition" 
(for postgres open source software) here:

https://www.postgresql.org/message-id/1553029215728-0.post@n3.nabble.com

Thanks in advance.
Regards
PAscal




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

RE: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

legrand legrand

Date:

25 March 2019, 11:36:48

>> Would it make sense to add it in auto explain ?
>> I don't know for explain itself, but maybe ...

> I'd think that people interested in getting the queryid in the logs
> would configure the log_line_prefix to display it consistently rather
> than having it in only a subset of cases, so that's probably not
> really needed.

Ok.
Shoudn't you add this to commitfest ?

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

25 March 2019, 11:43:13

On Mon, Mar 25, 2019 at 12:36 PM legrand legrand
<legrand_legrand@hotmail.com> wrote:
>
> >> Would it make sense to add it in auto explain ?
> >> I don't know for explain itself, but maybe ...
>
> > I'd think that people interested in getting the queryid in the logs
> > would configure the log_line_prefix to display it consistently rather
> > than having it in only a subset of cases, so that's probably not
> > really needed.
>
> Ok.
> Shoudn't you add this to commitfest ?

I added it last week, see https://commitfest.postgresql.org/23/2069/

RE: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

legrand legrand

Date:

25 March 2019, 11:49:42

>> Shoudn't you add this to commitfest ?

> I added it last week, see https://commitfest.postgresql.org/23/2069/

Oups, sorry for the noise

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

28 June 2019, 14:39:15

On Tue, Mar 19, 2019 at 3:51 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Mon, Mar 18, 2019 at 7:33 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Mon, Mar 18, 2019 at 6:23 PM Yun Li <liyunjuanyong@gmail.com> wrote:
> > >
> > > Let's take one step back. Since queryId is stored in core as Julien pointed out, can we just add that global to
thepg_stat_get_activity and ultimately exposed in pg_stat_activity view? Then no matter whether PGSS  is on or off, or
howeverthe customer extensions are updating that filed, we expose that field in that view then enable user to leverage
thatid to join with pgss or their extension. Will this sounds a good idea? 
> >
> > I'd greatly welcome expose queryid exposure in pg_stat_activity, and
> > also in log_line_prefix.  I'm afraid that it's too late for pg12
> > inclusion, but I'll be happy to provide a patch for that for pg13.
>
> Here's a prototype patch for queryid exposure in pg_stat_activity and
> log_line prefix.

Patch doesn't apply anymore, PFA rebased v2.

Attachment

queryid_exposure-v2.diff

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

28 June 2019, 15:46:24

On Fri, Jun 28, 2019 at 4:39 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Tue, Mar 19, 2019 at 3:51 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Mon, Mar 18, 2019 at 7:33 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > On Mon, Mar 18, 2019 at 6:23 PM Yun Li <liyunjuanyong@gmail.com> wrote:
> > > >
> > > > Let's take one step back. Since queryId is stored in core as Julien pointed out, can we just add that global to
thepg_stat_get_activity and ultimately exposed in pg_stat_activity view? Then no matter whether PGSS  is on or off, or
howeverthe customer extensions are updating that filed, we expose that field in that view then enable user to leverage
thatid to join with pgss or their extension. Will this sounds a good idea? 
> > >
> > > I'd greatly welcome expose queryid exposure in pg_stat_activity, and
> > > also in log_line_prefix.  I'm afraid that it's too late for pg12
> > > inclusion, but I'll be happy to provide a patch for that for pg13.
> >
> > Here's a prototype patch for queryid exposure in pg_stat_activity and
> > log_line prefix.
>
> Patch doesn't apply anymore, PFA rebased v2.

Sorry, I missed the new pg_stat_gssapi view.

Attachment

queryid_exposure-v3.diff

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Peter Geoghegan

Date:

28 June 2019, 18:46:14

On Tue, Mar 19, 2019 at 12:00 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On the other hand, it also appears that a lot of people would be very,
> very happy to just be able to see the query ID field that already
> exists, both in pg_stat_statements in pg_stat_activity, and we
> shouldn't throw up unnecessary impediments in the way of making that
> happen, at least IMHO.

+1.

pg_stat_statements will already lose all the statistics that it
aggregated in the event of a hard crash. The trade-off that the query
jumbling logic makes is not a bad one, all things considered.

-- 
Peter Geoghegan

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Peter Geoghegan

Date:

28 June 2019, 18:49:53

On Tue, Mar 19, 2019 at 12:38 PM legrand legrand
<legrand_legrand@hotmail.com> wrote:
> Would it make sense to add it in auto explain ?
> I don't know for explain itself, but maybe ...

I think that it should appear in EXPLAIN. pg_stat_statements already
cannot have a query hash of zero, so it might be okay to display it
only when its value is non-zero.

-- 
Peter Geoghegan

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Evgeny Efimkin

Date:

31 July 2019, 08:54:18

What reason to use pg_atomic_uint64?
In docs:
occured - > occurred

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

31 July 2019, 21:51:40

Hello,

On Wed, Jul 31, 2019 at 10:55 AM Evgeny Efimkin <efimkin@yandex-team.ru> wrote:
>
> What reason to use pg_atomic_uint64?

The queryid is read and written without holding any lock on the PGPROC
entry, so the pg_atomic_uint64 will guarantee that we get a consistent
value in pg_stat_get_activity().  Other reads shouldn't be a problem
as far as I remember.

> In docs:
> occured - > occurred

Thanks!  I fixed it on my local branch.

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Andres Freund

Date:

31 July 2019, 21:59:09

Hi,

On 2019-07-31 23:51:40 +0200, Julien Rouhaud wrote:
> On Wed, Jul 31, 2019 at 10:55 AM Evgeny Efimkin <efimkin@yandex-team.ru> wrote:
> > What reason to use pg_atomic_uint64?
> 
> The queryid is read and written without holding any lock on the PGPROC
> entry, so the pg_atomic_uint64 will guarantee that we get a consistent
> value in pg_stat_get_activity().  Other reads shouldn't be a problem
> as far as I remember.

Hm, I don't think that's necessary in this case. That's what the
st_changecount protocol is trying to ensure, no?

    /*
     * To avoid locking overhead, we use the following protocol: a backend
     * increments st_changecount before modifying its entry, and again after
     * finishing a modification.  A would-be reader should note the value of
     * st_changecount, copy the entry into private memory, then check
     * st_changecount again.  If the value hasn't changed, and if it's even,
     * the copy is valid; otherwise start over.  This makes updates cheap
     * while reads are potentially expensive, but that's the tradeoff we want.
     *
     * The above protocol needs memory barriers to ensure that the apparent
     * order of execution is as it desires.  Otherwise, for example, the CPU
     * might rearrange the code so that st_changecount is incremented twice
     * before the modification on a machine with weak memory ordering.  Hence,
     * use the macros defined below for manipulating st_changecount, rather
     * than touching it directly.
     */
    int            st_changecount;


And if it were necessary, why wouldn't any of the other fields in
PgBackendStatus need it? There's plenty of other fields written to
without a lock, and several of those are also 8 bytes (so it's not a
case of assuming that 8 byte reads might not be atomic, but for byte
reads are).

Greetings,

Andres Freund

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

01 August 2019, 06:45:45

On Wed, Jul 31, 2019 at 11:59 PM Andres Freund <andres@anarazel.de> wrote:
>
> On 2019-07-31 23:51:40 +0200, Julien Rouhaud wrote:
> > On Wed, Jul 31, 2019 at 10:55 AM Evgeny Efimkin <efimkin@yandex-team.ru> wrote:
> > > What reason to use pg_atomic_uint64?
> >
> > The queryid is read and written without holding any lock on the PGPROC
> > entry, so the pg_atomic_uint64 will guarantee that we get a consistent
> > value in pg_stat_get_activity().  Other reads shouldn't be a problem
> > as far as I remember.
>
> Hm, I don't think that's necessary in this case. That's what the
> st_changecount protocol is trying to ensure, no?
>
>         /*
>          * To avoid locking overhead, we use the following protocol: a backend
>          * increments st_changecount before modifying its entry, and again after
>          * finishing a modification.  A would-be reader should note the value of
>          * st_changecount, copy the entry into private memory, then check
>          * st_changecount again.  If the value hasn't changed, and if it's even,
>          * the copy is valid; otherwise start over.  This makes updates cheap
>          * while reads are potentially expensive, but that's the tradeoff we want.
>          *
>          * The above protocol needs memory barriers to ensure that the apparent
>          * order of execution is as it desires.  Otherwise, for example, the CPU
>          * might rearrange the code so that st_changecount is incremented twice
>          * before the modification on a machine with weak memory ordering.  Hence,
>          * use the macros defined below for manipulating st_changecount, rather
>          * than touching it directly.
>          */
>         int                     st_changecount;
>
>
> And if it were necessary, why wouldn't any of the other fields in
> PgBackendStatus need it? There's plenty of other fields written to
> without a lock, and several of those are also 8 bytes (so it's not a
> case of assuming that 8 byte reads might not be atomic, but for byte
> reads are).

This patch is actually storing the queryid in PGPROC, not in
PgBackendStatus, thus the need for an atomic.  I used PGPROC because
the value needs to be available in log_line_prefix() and spi.c, so
pgstat.c / PgBackendStatus didn't seem like the best interface in that
case.  Is widening PGPROC is too expensive for this purpose?

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Robert Haas

Date:

01 August 2019, 18:20:46

On Thu, Aug 1, 2019 at 2:46 AM Julien Rouhaud <julien.rouhaud@free.fr> wrote:
> This patch is actually storing the queryid in PGPROC, not in
> PgBackendStatus, thus the need for an atomic.  I used PGPROC because
> the value needs to be available in log_line_prefix() and spi.c, so
> pgstat.c / PgBackendStatus didn't seem like the best interface in that
> case.  Is widening PGPROC is too expensive for this purpose?

I doubt it.

However, I think that the fact that this patch adds 15 new calls to
pg_atomic_write_u64(&MyProc->queryId, ...) is probably not a good
sign.  It seems like we ought to be able to centralize it better than
that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Andres Freund

Date:

01 August 2019, 18:36:20

Hi,

On 2019-08-01 08:45:45 +0200, Julien Rouhaud wrote:
> On Wed, Jul 31, 2019 at 11:59 PM Andres Freund <andres@anarazel.de> wrote:
> > And if it were necessary, why wouldn't any of the other fields in
> > PgBackendStatus need it? There's plenty of other fields written to
> > without a lock, and several of those are also 8 bytes (so it's not a
> > case of assuming that 8 byte reads might not be atomic, but for byte
> > reads are).
> 
> This patch is actually storing the queryid in PGPROC, not in
> PgBackendStatus, thus the need for an atomic.  I used PGPROC because
> the value needs to be available in log_line_prefix() and spi.c, so
> pgstat.c / PgBackendStatus didn't seem like the best interface in that
> case.

Hm. I'm not convinced that really is the case? You can just access
MyBEentry, and read and update it?  I mean, we do so at a frequency
roughtly as high as high as the new queryid updates for things like
pgstat_report_activity().  Reading the value of your own backend you'd
not need to follow the changecount algorithm, I think, because it's only
updated from the current backend. If reading were a problem, you
trivially just could have a cache in a local variable, to avoid
accessing shared memory.

> Is widening PGPROC is too expensive for this purpose?

Well, I'm mostly not a fan of putting even more in there, because it's
pretty hard to understand already. To me it architecturally status
information doesn't belong there (In fact, I'm somewhat unhappy that
wait_event_info etc in there, but that's at least commonly updated at
the same time as other fields in PGPROC).

Greetings,

Andres Freund

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Andres Freund

Date:

01 August 2019, 18:36:32

On 2019-08-01 14:20:46 -0400, Robert Haas wrote:
> However, I think that the fact that this patch adds 15 new calls to
> pg_atomic_write_u64(&MyProc->queryId, ...) is probably not a good
> sign.  It seems like we ought to be able to centralize it better than
> that.

+1

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

01 August 2019, 20:42:23

On Thu, Aug 1, 2019 at 8:36 PM Andres Freund <andres@anarazel.de> wrote:
>
> On 2019-08-01 08:45:45 +0200, Julien Rouhaud wrote:
> > On Wed, Jul 31, 2019 at 11:59 PM Andres Freund <andres@anarazel.de> wrote:
> > > And if it were necessary, why wouldn't any of the other fields in
> > > PgBackendStatus need it? There's plenty of other fields written to
> > > without a lock, and several of those are also 8 bytes (so it's not a
> > > case of assuming that 8 byte reads might not be atomic, but for byte
> > > reads are).
> >
> > This patch is actually storing the queryid in PGPROC, not in
> > PgBackendStatus, thus the need for an atomic.  I used PGPROC because
> > the value needs to be available in log_line_prefix() and spi.c, so
> > pgstat.c / PgBackendStatus didn't seem like the best interface in that
> > case.
>
> Hm. I'm not convinced that really is the case? You can just access
> MyBEentry, and read and update it?

Sure, but it requires extra wrapper functions, and the st_changecount
dance when writing the new value.

>  I mean, we do so at a frequency
> roughtly as high as high as the new queryid updates for things like
> pgstat_report_activity().

pgstat_report_activity() is only called for top-level statement.  For
the queryid we need to track it down to all nested statements, which
could be way higher.  But pgstat_progress_update_param() is called way
more than that.

>  Reading the value of your own backend you'd
> not need to follow the changecount algorithm, I think, because it's only
> updated from the current backend. If reading were a problem, you
> trivially just could have a cache in a local variable, to avoid
> accessing shared memory.

Yes definitely, except for pgstat_get_activity(), all reads are
backend local and should be totally safe to read as is.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

01 August 2019, 20:49:48

On Thu, Aug 1, 2019 at 8:36 PM Andres Freund <andres@anarazel.de> wrote:
>
> On 2019-08-01 14:20:46 -0400, Robert Haas wrote:
> > However, I think that the fact that this patch adds 15 new calls to
> > pg_atomic_write_u64(&MyProc->queryId, ...) is probably not a good
> > sign.  It seems like we ought to be able to centralize it better than
> > that.
>
> +1

Unfortunately I didn't find a better way to do that.  Since you can
have nested execution, I don't see how to avoid adding extra code in
every parts of query execution.

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Andres Freund

Date:

01 August 2019, 20:51:59

Hi,

On 2019-08-01 22:42:23 +0200, Julien Rouhaud wrote:
> On Thu, Aug 1, 2019 at 8:36 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > On 2019-08-01 08:45:45 +0200, Julien Rouhaud wrote:
> > > On Wed, Jul 31, 2019 at 11:59 PM Andres Freund <andres@anarazel.de> wrote:
> > > > And if it were necessary, why wouldn't any of the other fields in
> > > > PgBackendStatus need it? There's plenty of other fields written to
> > > > without a lock, and several of those are also 8 bytes (so it's not a
> > > > case of assuming that 8 byte reads might not be atomic, but for byte
> > > > reads are).
> > >
> > > This patch is actually storing the queryid in PGPROC, not in
> > > PgBackendStatus, thus the need for an atomic.  I used PGPROC because
> > > the value needs to be available in log_line_prefix() and spi.c, so
> > > pgstat.c / PgBackendStatus didn't seem like the best interface in that
> > > case.
> >
> > Hm. I'm not convinced that really is the case? You can just access
> > MyBEentry, and read and update it?
> 
> Sure, but it requires extra wrapper functions, and the st_changecount
> dance when writing the new value.

So? You need a wrapper function anyway, there's no way we're going to
add all those separate pg_atomic_write* calls directly.


> >  I mean, we do so at a frequency
> > roughtly as high as high as the new queryid updates for things like
> > pgstat_report_activity().
> 
> pgstat_report_activity() is only called for top-level statement.  For
> the queryid we need to track it down to all nested statements, which
> could be way higher.

Compared to the overhead of executing a separate query the cost of
single function call containing a MyBEentry update of an 8byte value
seems almost guaranteed to be immeasurable. The executor startup alone
is several orders of magnitude more expensive.

I also think this proposed column should probably respect
the track_activities GUC.

Greetings,

Andres Freund

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Andres Freund

Date:

01 August 2019, 21:05:14

Hi,

On 2019-08-01 22:49:48 +0200, Julien Rouhaud wrote:
> On Thu, Aug 1, 2019 at 8:36 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > On 2019-08-01 14:20:46 -0400, Robert Haas wrote:
> > > However, I think that the fact that this patch adds 15 new calls to
> > > pg_atomic_write_u64(&MyProc->queryId, ...) is probably not a good
> > > sign.  It seems like we ought to be able to centralize it better than
> > > that.
> >
> > +1
> 
> Unfortunately I didn't find a better way to do that.  Since you can
> have nested execution, I don't see how to avoid adding extra code in
> every parts of query execution.

At least my +1 is not primarily about the number of sites that need to
handle queryid changes, but that they all need to know about the way the
queryid is stored. Including how atomicity etc is handled. That
knowledge should be in one or two places, not more. In a file where that
knowledge makes sense.

I'm *also* concerned about the number of places, as that makes it likely
that some have been missed/new ones will be introduced without the
queryid handling. But that wasn't what I was referring to above.

I'm actually quite unconvinced that it's sensible to update the global
value for nested queries. That'll mean e.g. the log_line_prefix and
pg_stat_activity values are most of the time going to be bogus while
nested, because the querystring that's associated with those will *not*
be the value that the queryid corresponds to. elog.c uses
debug_query_string to log the statement, which is only updated for
top-level queries (outside of some exceptions like parallel workers for
parallel queries in a function or stuff like that). And pg_stat_activity
is also only updated for top level queries.

Greetings,

Andres Freund

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

01 August 2019, 21:08:37

On Thu, Aug 1, 2019 at 10:52 PM Andres Freund <andres@anarazel.de> wrote:
>
> On 2019-08-01 22:42:23 +0200, Julien Rouhaud wrote:
> > Sure, but it requires extra wrapper functions, and the st_changecount
> > dance when writing the new value.
>
> So? You need a wrapper function anyway, there's no way we're going to
> add all those separate pg_atomic_write* calls directly.

Ok

> I also think this proposed column should probably respect
> the track_activities GUC.

Oh indeed, I'll fix that when I'll be sure of the semantics to implement.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

02 August 2019, 08:54:35

On Thu, Aug 1, 2019 at 11:05 PM Andres Freund <andres@anarazel.de> wrote:
>
> I'm actually quite unconvinced that it's sensible to update the global
> value for nested queries. That'll mean e.g. the log_line_prefix and
> pg_stat_activity values are most of the time going to be bogus while
> nested, because the querystring that's associated with those will *not*
> be the value that the queryid corresponds to. elog.c uses
> debug_query_string to log the statement, which is only updated for
> top-level queries (outside of some exceptions like parallel workers for
> parallel queries in a function or stuff like that). And pg_stat_activity
> is also only updated for top level queries.

Having the nested queryid seems indeed quite broken for
log_line_prefix.  However having the nested queryid in
pg_stat_activity would be convenient to track what is a long stored
functions currently doing.  Maybe we could expose something like
top_level_queryid and current_queryid instead?

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Andres Freund

Date:

02 August 2019, 23:20:48

Hi,

On 2019-08-02 10:54:35 +0200, Julien Rouhaud wrote:
> On Thu, Aug 1, 2019 at 11:05 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > I'm actually quite unconvinced that it's sensible to update the global
> > value for nested queries. That'll mean e.g. the log_line_prefix and
> > pg_stat_activity values are most of the time going to be bogus while
> > nested, because the querystring that's associated with those will *not*
> > be the value that the queryid corresponds to. elog.c uses
> > debug_query_string to log the statement, which is only updated for
> > top-level queries (outside of some exceptions like parallel workers for
> > parallel queries in a function or stuff like that). And pg_stat_activity
> > is also only updated for top level queries.
> 
> Having the nested queryid seems indeed quite broken for
> log_line_prefix.  However having the nested queryid in
> pg_stat_activity would be convenient to track what is a long stored
> functions currently doing.  Maybe we could expose something like
> top_level_queryid and current_queryid instead?

Given that the query string is the toplevel one, I think that'd just be
confusing. And given the fact that it adds *substantial* additional
complexity, I'd just rip the subcommand bits out.

Greetings,

Andres Freund

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

03 August 2019, 21:58:13

Hi,

On Sat, Aug 3, 2019 at 1:21 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2019-08-02 10:54:35 +0200, Julien Rouhaud wrote:
> > However having the nested queryid in
> > pg_stat_activity would be convenient to track what is a long stored
> > functions currently doing.  Maybe we could expose something like
> > top_level_queryid and current_queryid instead?
>
> Given that the query string is the toplevel one, I think that'd just be
> confusing. And given the fact that it adds *substantial* additional
> complexity, I'd just rip the subcommand bits out.

Ok, so here's a version that only exposes the top-level queryid only.
There can still be discrepancies with the query field, if a
multi-command string is provided.  The queryid will be updated each
time a new top level statement is executed.

As the queryid cannot be immediately known, and may never exist at all
if a query fails to parse, here are the heuristic I used to update the
stored queryid:

- it's reset to 0 each time pgstat_report_activity(STATE_RUNNING) is
called.  This way, we're sure that we don't display last query's
queryid in the logs if the next query fails to parse
- it's also reset to 0 at the beginning of exec_simple_query() loop on
the parsetree_list (for multi-command string case)
- pg_analyze_and_rewrite() and pg_analyze_and_rewrite_params() will
report the new queryid after parse analysis.
- a non-zero queryid will only be updated if the stored one is zero

This should also work as intended for background worker using SPI,
provided that they correctly call pgstat_report_activity.  I also
modified ExecInitParallelPlan() to publish the queryId in the
serialized plannedStmt, so ParallelQueryMain() can report it to make
the queryid available in the parallel workers too.

Note that this patch makes it clear that a zero queryid means no
queryid computed (and NULL will be displayed in such case in
pg_stat_activity).  pg_stat_statements already makes sure that it
cannot compute a zero queryid.

It also assume that any extension computing a queryid will do that in
the post_parse_analysis hook, which seems like a sane requirement.  We
may want to have a dedicated hook for that instead, if more people get
interested in having the queryid only, possibly different
implementations, if it becomes available outside pgss.

Attachment

queryid_exposure_v4.diff

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

legrand legrand

Date:

04 August 2019, 07:04:01

>  However having the nested queryid in 
> pg_stat_activity would be convenient to track
> what is a long stored functions currently doing.

+1

And  this could permit to get wait event sampling per queryid when
pg_stat_statements.track = all

Regards 
PAscal



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Kyotaro Horiguchi

Date:

05 August 2019, 07:28:24

At Sun, 4 Aug 2019 00:04:01 -0700 (MST), legrand legrand <legrand_legrand@hotmail.com> wrote in
<1564902241482-0.post@n3.nabble.com>
> >  However having the nested queryid in 
> > pg_stat_activity would be convenient to track
> > what is a long stored functions currently doing.
> 
> +1
> 
> And  this could permit to get wait event sampling per queryid when
> pg_stat_statements.track = all

I'm strongly on this side emotionally, but also I'm on Tom and
Andres's side that exposing querid that way is not the right
thing.

Doing that means we don't need exact correspondence between
top-level query and queryId (in nested or multistatement queries)
in this patch.  pg_stat_statements will allow us to do the same
thing by having additional uint64[MaxBackends] array in
pgssSharedState, instead of expanding PgBackendStatus array in
core by the same size.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

05 August 2019, 08:35:11

On Mon, Aug 5, 2019 at 9:28 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Sun, 4 Aug 2019 00:04:01 -0700 (MST), legrand legrand <legrand_legrand@hotmail.com> wrote in
<1564902241482-0.post@n3.nabble.com>
> > >  However having the nested queryid in
> > > pg_stat_activity would be convenient to track
> > > what is a long stored functions currently doing.
> >
> > +1
> >
> > And  this could permit to get wait event sampling per queryid when
> > pg_stat_statements.track = all
>
> I'm strongly on this side emotionally, but also I'm on Tom and
> Andres's side that exposing querid that way is not the right
> thing.
>
> Doing that means we don't need exact correspondence between
> top-level query and queryId (in nested or multistatement queries)
> in this patch.  pg_stat_statements will allow us to do the same
> thing by having additional uint64[MaxBackends] array in
> pgssSharedState, instead of expanding PgBackendStatus array in
> core by the same size.

Sure, but the problem with this approach is that all extensions that
compute their own queryid would have to do the same.  I hope that we
can come up with an approach friendlier for those extensions.

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

legrand legrand

Date:

05 August 2019, 17:30:29

Kyotaro Horiguchi-4 wrote
> At Sun, 4 Aug 2019 00:04:01 -0700 (MST), legrand legrand <

> legrand_legrand@

> > wrote in <

> 1564902241482-0.post@.nabble

>>
>> >  However having the nested queryid in
>> > pg_stat_activity would be convenient to track
>> > what is a long stored functions currently doing.
>>
>> +1
>>
>> And  this could permit to get wait event sampling per queryid when
>> pg_stat_statements.track = all
>
> I'm strongly on this side emotionally, but also I'm on Tom and
> Andres's side that exposing querid that way is not the right
> thing.
>
> Doing that means we don't need exact correspondence between
> top-level query and queryId (in nested or multistatement queries)
> in this patch.  pg_stat_statements will allow us to do the same
> thing by having additional uint64[MaxBackends] array in
> pgssSharedState, instead of expanding PgBackendStatus array in
> core by the same size.
>
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center

Hi Kyotaro,
Thank you for this answer.
What you propose here is already available
Inside pg_stat_sql_plans extension (a derivative from
Pg_stat_statements and pg_store_plans)
And I’m used to this queryid behavior with top Level
Queries...
My emotion was high but I will accept it !
Regards
PAscal




--
Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Evgeny Efimkin

Date:

07 August 2019, 09:03:21

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            tested, passed

HI!
patch is look good for me.

The new status of this patch is: Ready for Committer

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Michael Paquier

Date:

11 September 2019, 04:45:23

On Wed, Aug 07, 2019 at 09:03:21AM +0000, Evgeny Efimkin wrote:
> The new status of this patch is: Ready for Committer

I may be wrong of course, but it looks that this is wanted and the
current shape of the patch looks sensible:
- Register the query ID using a backend entry.
- Only consider the top-level query.

An invalid query ID is assumed to be 0 in the patch, per the way it is
defined in pg_stat_statements.  However this also maps with the case
where we have a utility statement.

+    * We only report the top-level query identifiers.  The stored queryid is
+    * reset when a backend call pgstat_report_activity(STATE_RUNNING), or with
s/call/calls/

+   /*
+    * We only report the top-level query identifiers.  The stored queryid is
+    * reset when a backend call pgstat_report_activity(STATE_RUNNING), or with
+    * an explicit call to this function.  If the saved query identifier is not
+    * zero it means that it's not a top-level command, so ignore the one
+    * provided unless it's an explicit call to reset the identifier.
+    */
+   if (queryId != 0 && beentry->st_queryid != 0)
+       return;
Hmm.  I am wondering if we shouldn't have an API dedicated to the
reset of the query ID.  That logic looks rather brittle..

Wouldn't it be better (and more consistent) to update the query ID in
parse_analyze_varparams() and parse_analyze() as well after going
through the post_parse_analyze hook instead of pg_analyze_and_rewrite?

+   /*
+    * If a new query is started, we reset the query identifier as it'll only
+    * be known after parse analysis, to avoid reporting last query's
+    * identifier.
+    */
+   if (state == STATE_RUNNING)
+       beentry->st_queryid = 0
I don't quite get why you don't reset the counter in other cases as
well.  If the backend entry is idle in transaction or in an idle
state, it seems to me that we should not report the query ID of the
last query run in the transaction.  And that would make the reset in
exec_simple_query() unnecessary, no?
--
Michael

Attachment

signature.asc

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

11 September 2019, 16:30:22

Thanks for looking at it!

On Wed, Sep 11, 2019 at 6:45 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> An invalid query ID is assumed to be 0 in the patch, per the way it is
> defined in pg_stat_statements.  However this also maps with the case
> where we have a utility statement.

Oh indeed.  Which means that if a utility statements later calls
parse_analyze or friends, this patch would report an unexpected
queryid.  That's at least possible for something like

COPY (SELECT * FROM tbl) TO ...

The thing is that pg_stat_statements assigns a 0 queryid in the
post_parse_analyze_hook to recognize utility statements and avoid
tracking instrumentation twice in case of utility statements, and then
compute a queryid base on a hash of the query text.  Maybe we could
instead fully reserve queryid "2" for utility statements (so forcing
queryid "1" for standard queries if jumbling returns 0 *or* 2 instead
of only 0), and use "2" as the identifier for utility statement
instead of "0"?

> +   /*
> +    * We only report the top-level query identifiers.  The stored queryid is
> +    * reset when a backend call pgstat_report_activity(STATE_RUNNING), or with
> +    * an explicit call to this function.  If the saved query identifier is not
> +    * zero it means that it's not a top-level command, so ignore the one
> +    * provided unless it's an explicit call to reset the identifier.
> +    */
> +   if (queryId != 0 && beentry->st_queryid != 0)
> +       return;
> Hmm.  I am wondering if we shouldn't have an API dedicated to the
> reset of the query ID.  That logic looks rather brittle..

How about adding a "bool force" parameter to allow resetting the queryid to 0?

> Wouldn't it be better (and more consistent) to update the query ID in
> parse_analyze_varparams() and parse_analyze() as well after going
> through the post_parse_analyze hook instead of pg_analyze_and_rewrite?

I thought about it without knowing what would be best.  I'll change to
report the queryid right after calling post_parse_analyze_hook then.

> +   /*
> +    * If a new query is started, we reset the query identifier as it'll only
> +    * be known after parse analysis, to avoid reporting last query's
> +    * identifier.
> +    */
> +   if (state == STATE_RUNNING)
> +       beentry->st_queryid = 0
> I don't quite get why you don't reset the counter in other cases as
> well.  If the backend entry is idle in transaction or in an idle
> state, it seems to me that we should not report the query ID of the
> last query run in the transaction.  And that would make the reset in
> exec_simple_query() unnecessary, no?

I'm reproducing the same behavior as for the query text, ie. showing
the information about the last executed query text if state is idle:

+     <entry><structfield>queryid</structfield></entry>
+     <entry><type>bigint</type></entry>
+     <entry>Identifier of this backend's most recent query. If
+      <structfield>state</structfield> is <literal>active</literal> this field
+      shows the identifier of the currently executing query. In all other
+      states, it shows the identifier of last query that was executed.

I think that showing the last executed query's queryid is as useful as
the query text.  Also, while avoiding a reset in exec_simple_query()
it'd be required to do such reset in case of error during query
execution, so that wouldn't make things quite simpler..

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Michael Paquier

Date:

11 November 2019, 08:37:30

On Wed, Sep 11, 2019 at 06:30:22PM +0200, Julien Rouhaud wrote:
> The thing is that pg_stat_statements assigns a 0 queryid in the
> post_parse_analyze_hook to recognize utility statements and avoid
> tracking instrumentation twice in case of utility statements, and then
> compute a queryid base on a hash of the query text.  Maybe we could
> instead fully reserve queryid "2" for utility statements (so forcing
> queryid "1" for standard queries if jumbling returns 0 *or* 2 instead
> of only 0), and use "2" as the identifier for utility statement
> instead of "0"?

Hmm.  Not sure.  At this stage it would be nice to gather more input
on the matter, and FWIW, I don't like much the assumption that a query
ID of 0 is perhaps a utility statement, or perhaps nothing depending
on the state of a backend entry, or even perhaps something else
depending how on how modules make use and define such query IDs.
--
Michael

Attachment

signature.asc

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

13 November 2019, 03:15:23

On Mon, Nov 11, 2019 at 05:37:30PM +0900, Michael Paquier wrote:
> On Wed, Sep 11, 2019 at 06:30:22PM +0200, Julien Rouhaud wrote:
> > The thing is that pg_stat_statements assigns a 0 queryid in the
> > post_parse_analyze_hook to recognize utility statements and avoid
> > tracking instrumentation twice in case of utility statements, and then
> > compute a queryid base on a hash of the query text.  Maybe we could
> > instead fully reserve queryid "2" for utility statements (so forcing
> > queryid "1" for standard queries if jumbling returns 0 *or* 2 instead
> > of only 0), and use "2" as the identifier for utility statement
> > instead of "0"?
> 
> Hmm.  Not sure.  At this stage it would be nice to gather more input
> on the matter, and FWIW, I don't like much the assumption that a query
> ID of 0 is perhaps a utility statement, or perhaps nothing depending
> on the state of a backend entry, or even perhaps something else
> depending how on how modules make use and define such query IDs.

I thought each extension would export a function to compute the query
id, and you would all that function with the pg_stat_activity.query
string.

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

13 November 2019, 11:53:09

On Wed, Nov 13, 2019 at 4:15 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Mon, Nov 11, 2019 at 05:37:30PM +0900, Michael Paquier wrote:
> > On Wed, Sep 11, 2019 at 06:30:22PM +0200, Julien Rouhaud wrote:
> > > The thing is that pg_stat_statements assigns a 0 queryid in the
> > > post_parse_analyze_hook to recognize utility statements and avoid
> > > tracking instrumentation twice in case of utility statements, and then
> > > compute a queryid base on a hash of the query text.  Maybe we could
> > > instead fully reserve queryid "2" for utility statements (so forcing
> > > queryid "1" for standard queries if jumbling returns 0 *or* 2 instead
> > > of only 0), and use "2" as the identifier for utility statement
> > > instead of "0"?
> >
> > Hmm.  Not sure.  At this stage it would be nice to gather more input
> > on the matter, and FWIW, I don't like much the assumption that a query
> > ID of 0 is perhaps a utility statement, or perhaps nothing depending
> > on the state of a backend entry, or even perhaps something else
> > depending how on how modules make use and define such query IDs.
>
> I thought each extension would export a function to compute the query
> id, and you would all that function with the pg_stat_activity.query
> string.

I'd really like to have the queryid function available through SQL,
but I think that this specific case wouldn't work very well for
pg_stat_statements' approach as it's working with oid.  The query
string in pg_stat_activity is the user provided one rather than a
fully-qualified version, so in order to get that query's queryid, you
need to know the exact search_path in use in that backend, and that's
not something available.

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Michael Paquier

Date:

29 November 2019, 06:19:49

On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote:
> I'd really like to have the queryid function available through SQL,
> but I think that this specific case wouldn't work very well for
> pg_stat_statements' approach as it's working with oid.  The query
> string in pg_stat_activity is the user provided one rather than a
> fully-qualified version, so in order to get that query's queryid, you
> need to know the exact search_path in use in that backend, and that's
> not something available.

Yeah..  So, we have a patch marked as ready for committer here, and it
seems to me that we have a couple of issues to discuss more about
first particularly this query ID of 0.  Again, do others have more
any input to offer?
--
Michael

Attachment

signature.asc

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Michael Paquier

Date:

29 November 2019, 06:20:58

On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote:
> On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote:
>> I'd really like to have the queryid function available through SQL,
>> but I think that this specific case wouldn't work very well for
>> pg_stat_statements' approach as it's working with oid.  The query
>> string in pg_stat_activity is the user provided one rather than a
>> fully-qualified version, so in order to get that query's queryid, you
>> need to know the exact search_path in use in that backend, and that's
>> not something available.
>
> Yeah..  So, we have a patch marked as ready for committer here, and it
> seems to me that we have a couple of issues to discuss more about
> first particularly this query ID of 0.  Again, do others have more
> any input to offer?

And while on it, the latest patch does not apply, so a rebase is
needed here.
--
Michael

Attachment

signature.asc

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

29 November 2019, 08:39:09

On Fri, Nov 29, 2019 at 7:21 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote:
> > On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote:
> >> I'd really like to have the queryid function available through SQL,
> >> but I think that this specific case wouldn't work very well for
> >> pg_stat_statements' approach as it's working with oid.  The query
> >> string in pg_stat_activity is the user provided one rather than a
> >> fully-qualified version, so in order to get that query's queryid, you
> >> need to know the exact search_path in use in that backend, and that's
> >> not something available.
> >
> > Yeah..  So, we have a patch marked as ready for committer here, and it
> > seems to me that we have a couple of issues to discuss more about
> > first particularly this query ID of 0.  Again, do others have more
> > any input to offer?

I just realized that with current infrastructure it's not possible to
display a utility queryid.  We need to recognize utility to not
process the counters twice (once in processUtility, once in the
underlying executor),  so we don't provide a queryid for utility
statements in parse analysis.  Current magic value 0 has the side
effect of showing an invalid queryid for all utilty statements, and
using a magic value different from 0 will just always display that
magic value.  We could instead add another field in the Query and
PlannedStmt structs, say "int queryid_flags", that extensions could
use for their needs?

> And while on it, the latest patch does not apply, so a rebase is
> needed here.

Yep, I noticed that this morning.  I already rebased the patch
locally, I'll send a new version with new modifications when we reach
an agreement on the utility issue.

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Tomas Vondra

Date:

01 February 2020, 11:30:45

On Fri, Nov 29, 2019 at 09:39:09AM +0100, Julien Rouhaud wrote:
>On Fri, Nov 29, 2019 at 7:21 AM Michael Paquier <michael@paquier.xyz> wrote:
>>
>> On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote:
>> > On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote:
>> >> I'd really like to have the queryid function available through SQL,
>> >> but I think that this specific case wouldn't work very well for
>> >> pg_stat_statements' approach as it's working with oid.  The query
>> >> string in pg_stat_activity is the user provided one rather than a
>> >> fully-qualified version, so in order to get that query's queryid, you
>> >> need to know the exact search_path in use in that backend, and that's
>> >> not something available.
>> >
>> > Yeah..  So, we have a patch marked as ready for committer here, and it
>> > seems to me that we have a couple of issues to discuss more about
>> > first particularly this query ID of 0.  Again, do others have more
>> > any input to offer?
>
>I just realized that with current infrastructure it's not possible to
>display a utility queryid.  We need to recognize utility to not
>process the counters twice (once in processUtility, once in the
>underlying executor),  so we don't provide a queryid for utility
>statements in parse analysis.  Current magic value 0 has the side
>effect of showing an invalid queryid for all utilty statements, and
>using a magic value different from 0 will just always display that
>magic value.  We could instead add another field in the Query and
>PlannedStmt structs, say "int queryid_flags", that extensions could
>use for their needs?
>
>> And while on it, the latest patch does not apply, so a rebase is
>> needed here.
>
>Yep, I noticed that this morning.  I already rebased the patch
>locally, I'll send a new version with new modifications when we reach
>an agreement on the utility issue.
>

Well, this patch was in WoA since November, but now that I look at it
that might have been wrong - we're clearly waiting for agreement on how
to handle queryid for utility commands. I suspect the WoA status might
have been driving people away from this thread :-(

I've switched the patch to "needs review" and moved it to the next CF.
What I think needs to happen is we get a patch implementing one of the
proposed solutions, and discuss that.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

05 February 2020, 14:32:36

On Sat, Feb 1, 2020 at 12:30 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Fri, Nov 29, 2019 at 09:39:09AM +0100, Julien Rouhaud wrote:
> >On Fri, Nov 29, 2019 at 7:21 AM Michael Paquier <michael@paquier.xyz> wrote:
> >>
> >> On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote:
> >> > On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote:
> >> >> I'd really like to have the queryid function available through SQL,
> >> >> but I think that this specific case wouldn't work very well for
> >> >> pg_stat_statements' approach as it's working with oid.  The query
> >> >> string in pg_stat_activity is the user provided one rather than a
> >> >> fully-qualified version, so in order to get that query's queryid, you
> >> >> need to know the exact search_path in use in that backend, and that's
> >> >> not something available.
> >> >
> >> > Yeah..  So, we have a patch marked as ready for committer here, and it
> >> > seems to me that we have a couple of issues to discuss more about
> >> > first particularly this query ID of 0.  Again, do others have more
> >> > any input to offer?
> >
> >I just realized that with current infrastructure it's not possible to
> >display a utility queryid.  We need to recognize utility to not
> >process the counters twice (once in processUtility, once in the
> >underlying executor),  so we don't provide a queryid for utility
> >statements in parse analysis.  Current magic value 0 has the side
> >effect of showing an invalid queryid for all utilty statements, and
> >using a magic value different from 0 will just always display that
> >magic value.  We could instead add another field in the Query and
> >PlannedStmt structs, say "int queryid_flags", that extensions could
> >use for their needs?
> >
> >> And while on it, the latest patch does not apply, so a rebase is
> >> needed here.
> >
> >Yep, I noticed that this morning.  I already rebased the patch
> >locally, I'll send a new version with new modifications when we reach
> >an agreement on the utility issue.
> >
>
> Well, this patch was in WoA since November, but now that I look at it
> that might have been wrong - we're clearly waiting for agreement on how
> to handle queryid for utility commands. I suspect the WoA status might
> have been driving people away from this thread :-(

Oh, indeed.

> I've switched the patch to "needs review" and moved it to the next CF.

Thanks

> What I think needs to happen is we get a patch implementing one of the
> proposed solutions, and discuss that.

There's also the possibility to reserve 1 bit of the hash to know if
this is a utility command or not, although I don't recall right now
all the possible issues with utility commands and some special
handling of them.  I'll work on it before the next commitfest.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Robert Haas

Date:

06 February 2020, 19:59:09

On Wed, Feb 5, 2020 at 9:32 AM Julien Rouhaud <julien.rouhaud@free.fr> wrote:
> There's also the possibility to reserve 1 bit of the hash to know if
> this is a utility command or not, although I don't recall right now
> all the possible issues with utility commands and some special
> handling of them.  I'll work on it before the next commitfest.

FWIW, I don't really see why it would be bad to have 0 mean that
"there's no query ID for some reason" without caring whether that's
because the current statement is a utility statement or because
there's no statement in progress at all or whatever else. The user
probably doesn't need our help to distinguish between "no statement"
and "utility statement", right?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

07 February 2020, 10:12:50

On Thu, Feb 06, 2020 at 02:59:09PM -0500, Robert Haas wrote:
> On Wed, Feb 5, 2020 at 9:32 AM Julien Rouhaud <julien.rouhaud@free.fr> wrote:
> > There's also the possibility to reserve 1 bit of the hash to know if
> > this is a utility command or not, although I don't recall right now
> > all the possible issues with utility commands and some special
> > handling of them.  I'll work on it before the next commitfest.
>
> FWIW, I don't really see why it would be bad to have 0 mean that
> "there's no query ID for some reason" without caring whether that's
> because the current statement is a utility statement or because
> there's no statement in progress at all or whatever else. The user
> probably doesn't need our help to distinguish between "no statement"
> and "utility statement", right?

Sure, but if we don't fix that it means that we also won't expose any queryid
for utility statement, even if pg_stat_statements is configured to track those
(with a very poor queryid handling, but still).

While looking at this again, I realized that pg_stat_statements doesn't compute
a queryid during the post parse analysis hook just to make sure that no query
identifier will be set during executorStart and the rest of executor functions.

AFAICT, that can't happen anyway since pg_plan_queries() will discard any
computed queryid for utility statements.  This seems to be an oversight due to
original pg_stat_statements implementation, so I fixed this.

Then, as processUtility is called between parse analysis and executor, I think
that we can simply work around this by computing utility statements query
identifier during parse analysis, removing it in pgss_ProcessUtility and
keeping a copy of it for the pgss_store calls in that function, as done in the
attached v5.

This fixes everything except EXECUTE statements, which has to get the
underlying query's queryid.  The problem is that EXECUTE won't get through
parse analysis, so while it's correctly handled for execution and pgss_store,
it's not being exposed in pg_stat_activity and log_line_prefix.  To fix it, I
added an extra call to pgstat_report_queryid in executorStart.  As this
function is a no-op if a queryid is already exposed, this shouldn't cause any
harm and fix any other cases of query execution that don't go through parse
analysis.

Finally, DEALLOCATE is entirely ignored by pg_stat_statements, so those
statements will always be reported with a NULL/0 queryid, but this is
consistent as it's also not present in pg_stat_statements() SRF.

Attachment

queryid_exposure-v5.diff

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

03 March 2020, 15:24:59

On Fri, Feb 7, 2020 at 11:12 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Thu, Feb 06, 2020 at 02:59:09PM -0500, Robert Haas wrote:
> > On Wed, Feb 5, 2020 at 9:32 AM Julien Rouhaud <julien.rouhaud@free.fr> wrote:
> > > There's also the possibility to reserve 1 bit of the hash to know if
> > > this is a utility command or not, although I don't recall right now
> > > all the possible issues with utility commands and some special
> > > handling of them.  I'll work on it before the next commitfest.
> >
> > FWIW, I don't really see why it would be bad to have 0 mean that
> > "there's no query ID for some reason" without caring whether that's
> > because the current statement is a utility statement or because
> > there's no statement in progress at all or whatever else. The user
> > probably doesn't need our help to distinguish between "no statement"
> > and "utility statement", right?
>
> Sure, but if we don't fix that it means that we also won't expose any queryid
> for utility statement, even if pg_stat_statements is configured to track those
> (with a very poor queryid handling, but still).
>
> While looking at this again, I realized that pg_stat_statements doesn't compute
> a queryid during the post parse analysis hook just to make sure that no query
> identifier will be set during executorStart and the rest of executor functions.
>
> AFAICT, that can't happen anyway since pg_plan_queries() will discard any
> computed queryid for utility statements.  This seems to be an oversight due to
> original pg_stat_statements implementation, so I fixed this.
>
> Then, as processUtility is called between parse analysis and executor, I think
> that we can simply work around this by computing utility statements query
> identifier during parse analysis, removing it in pgss_ProcessUtility and
> keeping a copy of it for the pgss_store calls in that function, as done in the
> attached v5.
>
> This fixes everything except EXECUTE statements, which has to get the
> underlying query's queryid.  The problem is that EXECUTE won't get through
> parse analysis, so while it's correctly handled for execution and pgss_store,
> it's not being exposed in pg_stat_activity and log_line_prefix.  To fix it, I
> added an extra call to pgstat_report_queryid in executorStart.  As this
> function is a no-op if a queryid is already exposed, this shouldn't cause any
> harm and fix any other cases of query execution that don't go through parse
> analysis.
>
> Finally, DEALLOCATE is entirely ignored by pg_stat_statements, so those
> statements will always be reported with a NULL/0 queryid, but this is
> consistent as it's also not present in pg_stat_statements() SRF.

cfbot reports a failure since 2f9661311b (command completion tag
change), so here's a rebased v6, no change otherwise.

Attachment

queryid_exposure-v6.diff

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

08 March 2020, 14:26:44

On Fri, Jun 28, 2019 at 11:49:53AM -0700, Peter Geoghegan wrote:
> On Tue, Mar 19, 2019 at 12:38 PM legrand legrand
> <legrand_legrand@hotmail.com> wrote:
> > Would it make sense to add it in auto explain ?
> > I don't know for explain itself, but maybe ...
>
> I think that it should appear in EXPLAIN. pg_stat_statements already
> cannot have a query hash of zero, so it might be okay to display it
> only when its value is non-zero.

I had forgotten about this.  After looking at it, I can see a few issues.

For now post_parse_analyze_hook isn't called for the underlying statement, so
we don't have the queryid.  And we can't compute the queryid for the underlying
query in the initial post_parse_analyze_hook call as we don't want the executor
to have a queryid set in that case to avoid cumulating counters for both the
explain and the query.

We could add an extra call in ExplainQuery, but this will be ignored by
pg_stat_statements unless you set pg_stat_statements.track to all. Also,
pgss_post_parse_analyze will try to record an entry with the normalized query
text if no one exists yet and if any constant where removed.  The problem is
that, as I already mentioned in [1], the underlying query doesn't have
query_location or query_len valued, so the recorded query text will at least
contain the explain part of the input query.

[1] https://www.postgresql.org/message-id/CAOBaU_Y-y%2BVOhTZgDOuDk6-9V72-ZXdWccXo_kx0P4DDBEEh9A%40mail.gmail.com

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

14 March 2020, 17:53:51

On Tue, Mar 03, 2020 at 04:24:59PM +0100, Julien Rouhaud wrote:
>
> cfbot reports a failure since 2f9661311b (command completion tag
> change), so here's a rebased v6, no change otherwise.

Conflict with 8e8a0becb3 (Unify several ways to tracking backend type), thanks
again to cfbot, rebased v7 attached.

Attachment

v7-0001-Expose-queryid-in-pg_stat_activity-and-log_line_p.patch

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

16 March 2020, 14:43:12

On Sat, Mar 14, 2020 at 06:53:51PM +0100, Julien Rouhaud wrote:
> On Tue, Mar 03, 2020 at 04:24:59PM +0100, Julien Rouhaud wrote:
> >
> > cfbot reports a failure since 2f9661311b (command completion tag
> > change), so here's a rebased v6, no change otherwise.
>
>
> Conflict with 8e8a0becb3 (Unify several ways to tracking backend type), thanks
> again to cfbot, rebased v7 attached.


Bit repetita.

Attachment

v8-0001-Expose-queryid-in-pg_stat_activity-and-log_line_p.patch

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

02 April 2020, 13:25:06

New conflict, rebased v9 attached.

Attachment

v9-0001-Expose-queryid-in-pg_stat_activity-and-log_line_p.patch

Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?

From

Tatsuro Yamada

Date:

07 April 2020, 06:40:34

Hi Julien,

On 2020/04/02 22:25, Julien Rouhaud wrote:
> New conflict, rebased v9 attached.

I tested the patch on the head (c7654f6a3) and
the result was fine. See below:

$ make installcheck-world
=====================
  All 1 tests passed.
=====================


Regards,
Tatsuro Yamada

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?

From

Julien Rouhaud

Date:

08 April 2020, 14:37:45

On Tue, Apr 7, 2020 at 8:40 AM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:
>
> Hi Julien,
>
> On 2020/04/02 22:25, Julien Rouhaud wrote:
> > New conflict, rebased v9 attached.
>
> I tested the patch on the head (c7654f6a3) and
> the result was fine. See below:
>
> $ make installcheck-world
> =====================
>   All 1 tests passed.
> =====================

Thanks Yamada-san!  Unfortunately this patch still didn't attract any
committer, so I moved it to the next commitfest.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Atsushi Torikoshi

Date:

14 July 2020, 10:11:02

Hi,

v9 patch fails to apply to HEAD, could you check and rebase it?

And here are minor typos.

79 + * utility statements. Note that we don't compute a queryId for prepared
80 + * statemets related utility, as those will inherit from the underlying

81 + * statements's one (except DEALLOCATE which is entirely untracked).

statemets -> statements

statements's -> statements' or statement's?

Regards,

Atsushi Torikoshi

On Wed, Apr 8, 2020 at 11:38 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Tue, Apr 7, 2020 at 8:40 AM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:
>
> Hi Julien,
>
> On 2020/04/02 22:25, Julien Rouhaud wrote:
> > New conflict, rebased v9 attached.
>
> I tested the patch on the head (c7654f6a3) and
> the result was fine. See below:
>
> $ make installcheck-world
> =====================
> All 1 tests passed.
> =====================

Thanks Yamada-san! Unfortunately this patch still didn't attract any
committer, so I moved it to the next commitfest.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

14 July 2020, 11:24:53

On Tue, Jul 14, 2020 at 07:11:02PM +0900, Atsushi Torikoshi wrote:
> Hi,
> 
> v9 patch fails to apply to HEAD, could you check and rebase it?

Thanks for the notice, v10 attached!

> And here are minor typos.
> 
>  79 +        * utility statements.  Note that we don't compute a queryId
> for prepared
>  80 +        * statemets related utility, as those will inherit from the
> underlying
>  81 +        * statements's one (except DEALLOCATE which is entirely
> untracked).
> 
> statemets -> statements
> statements's -> statements' or statement's?

Thanks!  I went with "statement's".

Attachment

v10-0001-Expose-queryid-in-pg_stat_activity-and-log_line_.patch

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

torikoshia

Date:

28 July 2020, 08:07:02

On 2020-07-14 20:24, Julien Rouhaud wrote:
> On Tue, Jul 14, 2020 at 07:11:02PM +0900, Atsushi Torikoshi wrote:
>> Hi,
>> 
>> v9 patch fails to apply to HEAD, could you check and rebase it?
> 
> Thanks for the notice, v10 attached!
> 
>> And here are minor typos.
>> 
>>  79 +        * utility statements.  Note that we don't compute a 
>> queryId
>> for prepared
>>  80 +        * statemets related utility, as those will inherit from 
>> the
>> underlying
>>  81 +        * statements's one (except DEALLOCATE which is entirely
>> untracked).
>> 
>> statemets -> statements
>> statements's -> statements' or statement's?
> 
> Thanks!  I went with "statement's".

Thanks for updating!
I tested the patch setting log_statement = 'all', but %Q in 
log_line_prefix
was always 0 even when pg_stat_statements.queryid and
pg_stat_activity.queryid are not 0.

Is this an intentional behavior?


```
   $ initdb --no-locale -D data


   $ edit postgresql.conf
     shared_preload_libraries = 'pg_stat_statements'
     logging_collector = on
     log_line_prefix = '%m [%p] queryid:%Q '
     log_statement = 'all'

   $ pg_ctl start -D data

   $ psql
   =# CREATE EXTENSION pg_stat_statements;

   =# CREATE TABLE t1 (i int);
   =# INSERT INTO t1 VALUES (0),(1);
   =# SELECT queryid, query FROM pg_stat_activity;

   -- query ids are all 0 on the log
   $ view log
   2020-07-28 15:57:58.475 EDT [4480] queryid:0 LOG:  statement: CREATE 
TABLE t1 (i int);
   2020-07-28 15:58:13.730 EDT [4480] queryid:0 LOG:  statement: INSERT 
INTO t1 VALUES (0),(1);
   2020-07-28 15:59:28.389 EDT [4480] queryid:0 LOG:  statement: SELECT * 
FROM t1;

   -- on pg_stat_activity and pgss, query ids are not 0
   $ psql
   =# SELECT queryid, query FROM pg_stat_activity WHERE query LIKE 
'%t1%';
          queryid        |                                query
   
----------------------+----------------------------------------------------------------------
     1109063694563750779 | SELECT * FROM t1;
    -2582225123719476948 | SELECT queryid, query FROM pg_stat_activity 
WHERE query LIKE '%t1%';
   (2 rows)

   =# SELECT queryid, query FROM pg_stat_statements WHERE query LIKE 
'%t1%';
          queryid        |              query
   ----------------------+---------------------------------
    -5028988130796701553 | CREATE TABLE t1 (i int)
     1109063694563750779 | SELECT * FROM t1
     2726469050076420724 | INSERT INTO t1 VALUES ($1),($2)

```


And here is a minor typo.
  optionnally -> optionally


> 753 +       /* query identifier, optionnally computed using 
> post_parse_analyze_hook */


Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

28 July 2020, 08:55:04

On Tue, Jul 28, 2020 at 10:07 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
>
> Thanks for updating!
> I tested the patch setting log_statement = 'all', but %Q in
> log_line_prefix
> was always 0 even when pg_stat_statements.queryid and
> pg_stat_activity.queryid are not 0.
>
> Is this an intentional behavior?
>
>[...]

Thanks for the tests!  That's indeed an expected behavior (although I
wasn't aware of it), which isn't documented in this patch (I'll fix
it).  The reason for that is that log_statements is done right after
parsing the query:

    /*
     * Do basic parsing of the query or queries (this should be safe even if
     * we are in aborted transaction state!)
     */
    parsetree_list = pg_parse_query(query_string);

    /* Log immediately if dictated by log_statement */
    if (check_log_statement(parsetree_list))
    {
        ereport(LOG,
                (errmsg("statement: %s", query_string),
                 errhidestmt(true),
                 errdetail_execute(parsetree_list)));
        was_logged = true;
    }

As parse analysis is not yet done, no queryid can be computed at that
point, so we always print 0.  That's a limitation that can't be
removed without changing the semantics of log_statements, so we'll
probably have to live with it.

> And here is a minor typo.
>   optionnally -> optionally
>
>
> > 753 +       /* query identifier, optionnally computed using
> > post_parse_analyze_hook */

Thanks, I fixed it locally!

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

19 August 2020, 14:19:30

On Tue, Jul 28, 2020 at 10:55:04AM +0200, Julien Rouhaud wrote:
> On Tue, Jul 28, 2020 at 10:07 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
> >
> > Thanks for updating!
> > I tested the patch setting log_statement = 'all', but %Q in
> > log_line_prefix
> > was always 0 even when pg_stat_statements.queryid and
> > pg_stat_activity.queryid are not 0.
> >
> > Is this an intentional behavior?
> >
> >[...]
> 
> Thanks for the tests!  That's indeed an expected behavior (although I
> wasn't aware of it), which isn't documented in this patch (I'll fix
> it).  The reason for that is that log_statements is done right after
> parsing the query:
> 
>     /*
>      * Do basic parsing of the query or queries (this should be safe even if
>      * we are in aborted transaction state!)
>      */
>     parsetree_list = pg_parse_query(query_string);
> 
>     /* Log immediately if dictated by log_statement */
>     if (check_log_statement(parsetree_list))
>     {
>         ereport(LOG,
>                 (errmsg("statement: %s", query_string),
>                  errhidestmt(true),
>                  errdetail_execute(parsetree_list)));
>         was_logged = true;
>     }
> 
> As parse analysis is not yet done, no queryid can be computed at that
> point, so we always print 0.  That's a limitation that can't be
> removed without changing the semantics of log_statements, so we'll
> probably have to live with it.
> 
> > And here is a minor typo.
> >   optionnally -> optionally
> >
> >
> > > 753 +       /* query identifier, optionnally computed using
> > > post_parse_analyze_hook */
> 
> Thanks, I fixed it locally!


Recent conflict, rebased v11 attached.

Attachment

v11-0001-Expose-queryid-in-pg_stat_activity-and-log_line_.patch

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

05 October 2020, 21:24:06

On Wed, Aug 19, 2020 at 04:19:30PM +0200, Julien Rouhaud wrote:
> Similarly to other fields in pg_stat_activity, only the queryid from the top
> level statements are exposed, and if the backends status isn't active then the
> queryid from the last executed statements is displayed.
> 
> Also add a %Q placeholder to include the queryid in the log_line_prefix, which
> will also only expose top level statements.

I would like to apply this patch (I know it has been in the commitfest
since July 2019), but I have some questions about the user API.  Does it
make sense to have a column in pg_stat_actvity and an option in
log_line_prefix that will be empty unless pg_stat_statements is
installed?  Is there no clean way to move the query hash computation out
of pg_stat_statements and into the main code so the query id is always
visible?  (Also, did we decide _not_ to make the pg_stat_statements
queryid always a positive value?)

Also, in the doc patch:

    By default, query identifiers are not computed, so this field will always
    be null, unless an additional module that compute query identifiers, such
    as <xref linkend="pgstatstatements"/>, is configured.

why are you saying "such as"?  Isn't pg_stat_statements the only way to
see the queryid?  This command allowed the queryid to be displayed in
pg_stat_activity:

    ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements';

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Tom Lane

Date:

05 October 2020, 21:42:59

Bruce Momjian <bruce@momjian.us> writes:
> I would like to apply this patch (I know it has been in the commitfest
> since July 2019), but I have some questions about the user API.  Does it
> make sense to have a column in pg_stat_actvity and an option in
> log_line_prefix that will be empty unless pg_stat_statements is
> installed?  Is there no clean way to move the query hash computation out
> of pg_stat_statements and into the main code so the query id is always
> visible?  (Also, did we decide _not_ to make the pg_stat_statements
> queryid always a positive value?)

FWIW, I think this proposal is a mess.  I was willing to hold my nose
and have a queryId field in the internal Query struct without any solid
consensus about what its semantics are and which extensions get to use it.
Exposing it to end users seems like a bridge too far, though.  In
particular, I'm afraid that that will cause people to expect it to have
consistent values across PG versions, or even just across architectures
within one version.

The larger picture here is that there's lots of room to doubt whether
pg_stat_statements' decisions about what to ignore or include in the ID
will be satisfactory to everybody.  If that were not so, we'd just move
the computation into core.

            regards, tom lane

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Alvaro Herrera

Date:

05 October 2020, 22:58:42

On 2020-Oct-05, Tom Lane wrote:

> FWIW, I think this proposal is a mess.  I was willing to hold my nose
> and have a queryId field in the internal Query struct without any solid
> consensus about what its semantics are and which extensions get to use it.
> Exposing it to end users seems like a bridge too far, though.  In
> particular, I'm afraid that that will cause people to expect it to have
> consistent values across PG versions, or even just across architectures
> within one version.

I wonder if it would help to purposefully change the computation so that
it is not -- for instance, hash the system_identifier as initial value.
Then users would be forced to accept that it'll change as soon as it
migrates to another server or is upgraded to a new major version.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

06 October 2020, 02:18:19

On Mon, Oct  5, 2020 at 07:58:42PM -0300, Álvaro Herrera wrote:
> On 2020-Oct-05, Tom Lane wrote:
> 
> > FWIW, I think this proposal is a mess.  I was willing to hold my nose
> > and have a queryId field in the internal Query struct without any solid
> > consensus about what its semantics are and which extensions get to use it.
> > Exposing it to end users seems like a bridge too far, though.  In
> > particular, I'm afraid that that will cause people to expect it to have
> > consistent values across PG versions, or even just across architectures
> > within one version.
> 
> I wonder if it would help to purposefully change the computation so that
> it is not -- for instance, hash the system_identifier as initial value.
> Then users would be forced to accept that it'll change as soon as it
> migrates to another server or is upgraded to a new major version.

That seems like a good idea, but it would prevent cross-cluster
same-major-version comparisons, which seems like a negative.  Perhaps we
should add the major version into the hash to handle this.  Ideally,
let's just put a queryid-hash-version into to the hash, so if we change
the computation, we just update the hash version and nothing matches
anymore.

I do think the queryid has to display independent of pg_stat_statements,
because I can see people using queryid for log file and pg_stat_activity
comparisons.  I also think the ability to have queryid accessible is an
important feature outside of pg_stat_statements, so I do think we need a
way to move this idea forward.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

06 October 2020, 03:11:27

On Tue, Oct 6, 2020 at 10:18 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Mon, Oct  5, 2020 at 07:58:42PM -0300, Álvaro Herrera wrote:
> > On 2020-Oct-05, Tom Lane wrote:
> >
> > > FWIW, I think this proposal is a mess.  I was willing to hold my nose
> > > and have a queryId field in the internal Query struct without any solid
> > > consensus about what its semantics are and which extensions get to use it.
> > > Exposing it to end users seems like a bridge too far, though.  In
> > > particular, I'm afraid that that will cause people to expect it to have
> > > consistent values across PG versions, or even just across architectures
> > > within one version.
> >
> > I wonder if it would help to purposefully change the computation so that
> > it is not -- for instance, hash the system_identifier as initial value.
> > Then users would be forced to accept that it'll change as soon as it
> > migrates to another server or is upgraded to a new major version.
>
> That seems like a good idea, but it would prevent cross-cluster
> same-major-version comparisons, which seems like a negative.  Perhaps we
> should add the major version into the hash to handle this.  Ideally,
> let's just put a queryid-hash-version into to the hash, so if we change
> the computation, we just update the hash version and nothing matches
> anymore.
>
> I do think the queryid has to display independent of pg_stat_statements,
> because I can see people using queryid for log file and pg_stat_activity
> comparisons.  I also think the ability to have queryid accessible is an
> important feature outside of pg_stat_statements, so I do think we need a
> way to move this idea forward.

For the record, for now any extension can compute a queryid and there
are at least 2 other published extensions that already do that, one of
them having different semantics on how to compute the queryid.  I'm
not sure that we'll ever get a consensus on those semantics due to
performance tradeoff, so removing the ability to let people put their
own code for that doesn't seem like the best way forward.

Maybe we could add a new hook for only queryid computation, and add a
GUC to let people choose between no queryid computed, core computation
(current pg_stat_statement) and 3rd party plugin?

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

06 October 2020, 03:23:50

On Tue, Oct  6, 2020 at 11:11:27AM +0800, Julien Rouhaud wrote:
> > I do think the queryid has to display independent of pg_stat_statements,
> > because I can see people using queryid for log file and pg_stat_activity
> > comparisons.  I also think the ability to have queryid accessible is an
> > important feature outside of pg_stat_statements, so I do think we need a
> > way to move this idea forward.
> 
> For the record, for now any extension can compute a queryid and there
> are at least 2 other published extensions that already do that, one of
> them having different semantics on how to compute the queryid.  I'm
> not sure that we'll ever get a consensus on those semantics due to
> performance tradeoff, so removing the ability to let people put their
> own code for that doesn't seem like the best way forward.
> 
> Maybe we could add a new hook for only queryid computation, and add a
> GUC to let people choose between no queryid computed, core computation
> (current pg_stat_statement) and 3rd party plugin?

That all seems very complicated.  If we go in that direction, I suggest
we just give up getting any of this into core.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Michael Paquier

Date:

06 October 2020, 05:02:29

On Mon, Oct 05, 2020 at 05:24:06PM -0400, Bruce Momjian wrote:
> (Also, did we decide _not_ to make the pg_stat_statements queryid
> always a positive value?)

This specific point has been discussed a couple of years ago, please
see cff440d and its related thread:
https://www.postgresql.org/message-id/CA+TgmobG_Kp4cBKFmsznUAaM1GWW6hhRNiZC0KjRMOOeYnz5Yw@mail.gmail.com
--
Michael

Attachment

signature.asc

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Michael Paquier

Date:

06 October 2020, 05:34:58

On Mon, Oct 05, 2020 at 11:23:50PM -0400, Bruce Momjian wrote:
> On Tue, Oct  6, 2020 at 11:11:27AM +0800, Julien Rouhaud wrote:
>> Maybe we could add a new hook for only queryid computation, and add a
>> GUC to let people choose between no queryid computed, core computation
>> (current pg_stat_statement) and 3rd party plugin?
>
> That all seems very complicated.  If we go in that direction, I suggest
> we just give up getting any of this into core.

A GUC would have at least the advantage to make the computation
consistent for any system willing to consume it, with the option to
not pay any potential performance impact, though I have to admit that
just moving the query ID computation of PGSS into core may not be the
best option as a query ID of 0 means the same thing for a utility, for
an initialization, and for a backend running a query with an unknown
value, but that could be worked out.

FWIW, I think that adding the system ID in the hash is too
restrictive, as it could be interesting for users to do stat
comparisons across multiple systems running the same major version.
It would be better to not give any strong guarantee that the query ID
computed will remain consistent across major versions so as it is
possible to keep improving it.  Also, if nothing has been done that
changes the hashing computation, I see little benefit in forcing a
breakage by adding something like PG_MAJORVERSION_NUM or such in the
hash computation.
--
Michael

Attachment

signature.asc

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

06 October 2020, 13:22:29

On Tue, Oct  6, 2020 at 02:34:58PM +0900, Michael Paquier wrote:
> On Mon, Oct 05, 2020 at 11:23:50PM -0400, Bruce Momjian wrote:
> > On Tue, Oct  6, 2020 at 11:11:27AM +0800, Julien Rouhaud wrote:
> >> Maybe we could add a new hook for only queryid computation, and add a
> >> GUC to let people choose between no queryid computed, core computation
> >> (current pg_stat_statement) and 3rd party plugin?
> > 
> > That all seems very complicated.  If we go in that direction, I suggest
> > we just give up getting any of this into core.
> 
> A GUC would have at least the advantage to make the computation
> consistent for any system willing to consume it, with the option to
> not pay any potential performance impact, though I have to admit that
> just moving the query ID computation of PGSS into core may not be the
> best option as a query ID of 0 means the same thing for a utility, for
> an initialization, and for a backend running a query with an unknown
> value, but that could be worked out.
> 
> FWIW, I think that adding the system ID in the hash is too
> restrictive, as it could be interesting for users to do stat
> comparisons across multiple systems running the same major version.
> It would be better to not give any strong guarantee that the query ID
> computed will remain consistent across major versions so as it is
> possible to keep improving it.  Also, if nothing has been done that
> changes the hashing computation, I see little benefit in forcing a
> breakage by adding something like PG_MAJORVERSION_NUM or such in the
> hash computation.

I thought some more about this.  First, I think having the queryid hash
code in the server, without requiring pg_stat_statements, is a
requirement --- I think too many people will want to use this feature
independent of pg_stat_statements.  Second, I understand the desire to
have different hash computation methods, depending on what level of
detail/matching you want.

I propose moving the pg_stat_statements queryid hash code into the
server (with a version number), and also adding a postgressql.conf
variable that lets you control how detailed the queryid hash is
computed.  This addresses the problem of people wanting different hash
methods.

When computing a hash, the queryid detail level and version number will
be mixed into the hash, so only a hash that used a similar query and
identical queryid detail level would match.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Michael Paquier

Date:

07 October 2020, 01:42:49

On Tue, Oct 06, 2020 at 09:22:29AM -0400, Bruce Momjian wrote:
> I propose moving the pg_stat_statements queryid hash code into the
> server (with a version number), and also adding a postgresql.conf
> variable that lets you control how detailed the queryid hash is
> computed.  This addresses the problem of people wanting different hash
> methods.

In terms of making this part expendable in the future, there could be
a point in having an enum here, but are we sure that we will have a
need for that in the future?  What I get from this discussion is that
we want a unique source of truth that users can consume, and that the
only source of truth proposed is the PGSS hashing.  We may change the
way we compute the query ID in the future, for example if it gets
expanded to some utility statements, etc.  But that would be
controlled by the version number in the hash, not the GUC itself.

> When computing a hash, the queryid detail level and version number will
> be mixed into the hash, so only a hash that used a similar query and
> identical queryid detail level would match.

Yes, having a version number directly dependent on the hashing sounds
like a good compromise to me.
--
Michael

Attachment

signature.asc

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

07 October 2020, 01:53:46

On Wed, Oct  7, 2020 at 10:42:49AM +0900, Michael Paquier wrote:
> On Tue, Oct 06, 2020 at 09:22:29AM -0400, Bruce Momjian wrote:
> > I propose moving the pg_stat_statements queryid hash code into the
> > server (with a version number), and also adding a postgresql.conf
> > variable that lets you control how detailed the queryid hash is
> > computed.  This addresses the problem of people wanting different hash
> > methods.
> 
> In terms of making this part expendable in the future, there could be
> a point in having an enum here, but are we sure that we will have a
> need for that in the future?  What I get from this discussion is that
> we want a unique source of truth that users can consume, and that the
> only source of truth proposed is the PGSS hashing.  We may change the
> way we compute the query ID in the future, for example if it gets
> expanded to some utility statements, etc.  But that would be
> controlled by the version number in the hash, not the GUC itself.

Oh, if that is true, then I agree let's just go with the version number.

> > When computing a hash, the queryid detail level and version number will
> > be mixed into the hash, so only a hash that used a similar query and
> > identical queryid detail level would match.
> 
> Yes, having a version number directly dependent on the hashing sounds
> like a good compromise to me.

Good, much simpler.  I think there is enough demand for a queryid that I
would like to get this moving forward.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

12 October 2020, 08:20:05

On Wed, Oct 7, 2020 at 9:53 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Wed, Oct  7, 2020 at 10:42:49AM +0900, Michael Paquier wrote:
> > On Tue, Oct 06, 2020 at 09:22:29AM -0400, Bruce Momjian wrote:
> > > I propose moving the pg_stat_statements queryid hash code into the
> > > server (with a version number), and also adding a postgresql.conf
> > > variable that lets you control how detailed the queryid hash is
> > > computed.  This addresses the problem of people wanting different hash
> > > methods.
> >
> > In terms of making this part expendable in the future, there could be
> > a point in having an enum here, but are we sure that we will have a
> > need for that in the future?  What I get from this discussion is that
> > we want a unique source of truth that users can consume, and that the
> > only source of truth proposed is the PGSS hashing.  We may change the
> > way we compute the query ID in the future, for example if it gets
> > expanded to some utility statements, etc.  But that would be
> > controlled by the version number in the hash, not the GUC itself.
>
> Oh, if that is true, then I agree let's just go with the version number.

But there are many people that aren't happy with the current hashing
approach.  If we're going to move the computation in core, shouldn't
we listen to their complaints and let them pay some probably quite
high overhead to base the hash on name and/or fully qualified name
rather than OID?
For instance people using logical replication to upgrade to a newer
version may want to easily compare query performance on the new
version, or people with multi-tenant databases may want to ignore the
schema name to keep a low number of different queryid.

It would probably still be possible to have a custom queryid hashing
by disabling the core one and computing a new one in a custom
extension, but that seems a bit hackish.

Jumping back on Tom's point that there are judgment calls on what is
examined or not, after a quick look I see at least two possible
problems of ignored clauses:
- WITH TIES clause
- OVERRIDING clause

I personally think that they shouldn't be ignored, but I don't know if
they were only forgotten or ignored on purpose.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

12 October 2020, 14:14:45

On Mon, Oct 12, 2020 at 04:20:05PM +0800, Julien Rouhaud wrote:
> But there are many people that aren't happy with the current hashing
> approach.  If we're going to move the computation in core, shouldn't
> we listen to their complaints and let them pay some probably quite
> high overhead to base the hash on name and/or fully qualified name
> rather than OID?
> For instance people using logical replication to upgrade to a newer
> version may want to easily compare query performance on the new
> version, or people with multi-tenant databases may want to ignore the
> schema name to keep a low number of different queryid.

Well, we have to consider how complex the user interface has to be to
allow more flexibility.  We don't need to allow every option a user will
want.

With a version number, we have the ability to improve the algorithm or
add customization, but for the first use, we are probably better off
keeping it simple.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Robert Haas

Date:

12 October 2020, 17:30:50

On Mon, Oct 12, 2020 at 10:14 AM Bruce Momjian <bruce@momjian.us> wrote:
> On Mon, Oct 12, 2020 at 04:20:05PM +0800, Julien Rouhaud wrote:
> > But there are many people that aren't happy with the current hashing
> > approach.  If we're going to move the computation in core, shouldn't
> > we listen to their complaints and let them pay some probably quite
> > high overhead to base the hash on name and/or fully qualified name
> > rather than OID?
> > For instance people using logical replication to upgrade to a newer
> > version may want to easily compare query performance on the new
> > version, or people with multi-tenant databases may want to ignore the
> > schema name to keep a low number of different queryid.
>
> Well, we have to consider how complex the user interface has to be to
> allow more flexibility.  We don't need to allow every option a user will
> want.
>
> With a version number, we have the ability to improve the algorithm or
> add customization, but for the first use, we are probably better off
> keeping it simple.

I thought your earlier idea of allowing this to be controlled by a GUC
was good. There could be a default method built into core, matching
what pg_stat_statements does, so you could select no hashing or that
method no matter what. Then extensions could provide other methods
which could be selected via the GUC.

I don't really understand how a version number helps. It's not like
there is going to be a v2 that is in all ways better than v1. If there
are different algorithms here, they are going to be customized for
different needs.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Tom Lane

Date:

12 October 2020, 18:26:15

Robert Haas <robertmhaas@gmail.com> writes:
> I don't really understand how a version number helps. It's not like
> there is going to be a v2 that is in all ways better than v1. If there
> are different algorithms here, they are going to be customized for
> different needs.

Yeah, I agree --- a version number is the wrong way to think about this.
It's gonna be more like algorithm foo versus algorithm bar versus
algorithm baz, where each one is better for a specific set of use-cases.
Julien already noted the point about hashing object OIDs versus object
names; one can easily imagine disagreeing with pg_stat_statement's
choices about ignoring values of constants; other properties of statements
might be irrelevant for some use-cases; and so on.

I'm okay with moving pg_stat_statement's existing algorithm into core as
long as there's a way for extensions to override it.  With proper design,
that would allow extensions that do override it to coexist with
pg_stat_statements (thereby redefining the latter's idea of which
statements are "the same"), which is something that doesn't really work
nicely today.

            regards, tom lane

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

12 October 2020, 19:54:43

On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I don't really understand how a version number helps. It's not like
> > there is going to be a v2 that is in all ways better than v1. If there
> > are different algorithms here, they are going to be customized for
> > different needs.
> 
> Yeah, I agree --- a version number is the wrong way to think about this.
> It's gonna be more like algorithm foo versus algorithm bar versus
> algorithm baz, where each one is better for a specific set of use-cases.
> Julien already noted the point about hashing object OIDs versus object
> names; one can easily imagine disagreeing with pg_stat_statement's
> choices about ignoring values of constants; other properties of statements
> might be irrelevant for some use-cases; and so on.

The version number was to invalidate _all_ query hashes if the
algorithm is slightly modified, rather than invalidating just some of
them, which could lead to confusion.  The idea of selectable hash
algorithms is nice if people feel there is sufficient need for that.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Tom Lane

Date:

12 October 2020, 20:07:30

Bruce Momjian <bruce@momjian.us> writes:
> On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote:
>> Yeah, I agree --- a version number is the wrong way to think about this.

> The version number was to invalidate _all_ query hashes if the
> algorithm is slightly modified, rather than invalidating just some of
> them, which could lead to confusion.

Color me skeptical as to the use-case for that.  From users' standpoints,
the hash is mainly going to change when we change the set of parse node
fields that get hashed.  Which is going to happen at every major release
and no (or at least epsilon) minor releases.  So I do not see a point in
tracking an algorithm version number as such.  Seems like make-work.

            regards, tom lane

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

12 October 2020, 20:53:40

On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote:
> >> Yeah, I agree --- a version number is the wrong way to think about this.
> 
> > The version number was to invalidate _all_ query hashes if the
> > algorithm is slightly modified, rather than invalidating just some of
> > them, which could lead to confusion.
> 
> Color me skeptical as to the use-case for that.  From users' standpoints,
> the hash is mainly going to change when we change the set of parse node
> fields that get hashed.  Which is going to happen at every major release
> and no (or at least epsilon) minor releases.  So I do not see a point in
> tracking an algorithm version number as such.  Seems like make-work.

OK, I came up with the hash idea only to address one of your concerns
about mismatched hashes for algorithm improvements/changes.  Seems we
might as well just document that cross-version hashes are different.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

  The usefulness of a cup is in its emptiness, Bruce Lee

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

14 October 2020, 09:43:33

On Tue, Oct 13, 2020 at 4:53 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> > > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote:
> > >> Yeah, I agree --- a version number is the wrong way to think about this.
> >
> > > The version number was to invalidate _all_ query hashes if the
> > > algorithm is slightly modified, rather than invalidating just some of
> > > them, which could lead to confusion.
> >
> > Color me skeptical as to the use-case for that.  From users' standpoints,
> > the hash is mainly going to change when we change the set of parse node
> > fields that get hashed.  Which is going to happen at every major release
> > and no (or at least epsilon) minor releases.  So I do not see a point in
> > tracking an algorithm version number as such.  Seems like make-work.
>
> OK, I came up with the hash idea only to address one of your concerns
> about mismatched hashes for algorithm improvements/changes.  Seems we
> might as well just document that cross-version hashes are different.

Ok, so I tried to implement what seems to be the consensus.  First
attached patch moves the current pgss queryid computation in core,
with a new compute_queryid GUC (on/off).  One thing I don't really
like about this patch is that the JumbleState that pgss needs in order
to normalize the query string (the constants location and such) has to
be done by the core while computing the queryid and provided to pgss
in post_parse_analyse hook.  That isn't ideal as it looks very
specific to pgss needs.  On the other hand it means that you can now
use pgss with custom queryid heuristics by disabling compute_queryid
and having your module doing only that in post_parse_analyse_hook.
You'll however need to be careful to configure
shared_preload_libraries such that your custom module's
post_parse_analyse_hook is called first, so pgss' one can be called
with the needed JumbleState.  Note that if no JumbleState is provided
pgss will store non normalized queries, but will otherwise behave as
intended.

The 2nd patch is the rebased original queryid exposure patch.  No big
changes, except that it now handles utility statements queryid
generated during post_parse_analysis, same as regular queries.  This
should simplify the work needed for custom queryid third party
modules.

The 3rd patch changes explain (verbose) to display the queryid if one
has been generated, whether by core or a third-party module.  For
instance:

rjuju=# set compute_queryid = on;
SET
rjuju=# explain (verbose) select relname from pg_class;
                              QUERY PLAN
-----------------------------------------------------------------------
 Seq Scan on pg_catalog.pg_class  (cost=0.00..16.90 rows=390 width=64)
   Output: relname
 Query Identifier: -5494854185674379299
(3 rows)

Attachment

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

14 October 2020, 12:25:00

On Wed, Oct 14, 2020 at 5:43 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Tue, Oct 13, 2020 at 4:53 AM Bruce Momjian <bruce@momjian.us> wrote:
> >
> > On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote:
> > > Bruce Momjian <bruce@momjian.us> writes:
> > > > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote:
> > > >> Yeah, I agree --- a version number is the wrong way to think about this.
> > >
> > > > The version number was to invalidate _all_ query hashes if the
> > > > algorithm is slightly modified, rather than invalidating just some of
> > > > them, which could lead to confusion.
> > >
> > > Color me skeptical as to the use-case for that.  From users' standpoints,
> > > the hash is mainly going to change when we change the set of parse node
> > > fields that get hashed.  Which is going to happen at every major release
> > > and no (or at least epsilon) minor releases.  So I do not see a point in
> > > tracking an algorithm version number as such.  Seems like make-work.
> >
> > OK, I came up with the hash idea only to address one of your concerns
> > about mismatched hashes for algorithm improvements/changes.  Seems we
> > might as well just document that cross-version hashes are different.
>
> Ok, so I tried to implement what seems to be the consensus.  First
> attached patch moves the current pgss queryid computation in core,
> with a new compute_queryid GUC (on/off).  One thing I don't really
> like about this patch is that the JumbleState that pgss needs in order
> to normalize the query string (the constants location and such) has to
> be done by the core while computing the queryid and provided to pgss
> in post_parse_analyse hook.  That isn't ideal as it looks very
> specific to pgss needs.  On the other hand it means that you can now
> use pgss with custom queryid heuristics by disabling compute_queryid
> and having your module doing only that in post_parse_analyse_hook.
> You'll however need to be careful to configure
> shared_preload_libraries such that your custom module's
> post_parse_analyse_hook is called first, so pgss' one can be called
> with the needed JumbleState.  Note that if no JumbleState is provided
> pgss will store non normalized queries, but will otherwise behave as
> intended.
>
> The 2nd patch is the rebased original queryid exposure patch.  No big
> changes, except that it now handles utility statements queryid
> generated during post_parse_analysis, same as regular queries.  This
> should simplify the work needed for custom queryid third party
> modules.
>
> The 3rd patch changes explain (verbose) to display the queryid if one
> has been generated, whether by core or a third-party module.  For
> instance:
>
> rjuju=# set compute_queryid = on;
> SET
> rjuju=# explain (verbose) select relname from pg_class;
>                               QUERY PLAN
> -----------------------------------------------------------------------
>  Seq Scan on pg_catalog.pg_class  (cost=0.00..16.90 rows=390 width=64)
>    Output: relname
>  Query Identifier: -5494854185674379299
> (3 rows)

There was a possibly uninitialized var issue in the previous patches
(thanks cfbot), v13 fixes that.

On Sun, Oct 18, 2020 at 12:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Wait ... what?  I've been thinking that this GUC is just to enable or
> > disable the computation of query ID, not to change the algorithm to do
> > so.  Do we really need to allow different algorithms in different
> > sessions?
>
> We established that some time ago, no?

I thought we established the need for allowing different algorithms,
but I assumed globally not per session.  Anyway, allowing to enable or
disable compute_queryid per session would technically allow that,
assuming that you have another module loaded that computes a queryid
only if no-one was already computed.  In that case pg_stat_statements
works as you would expect, you will get a new entry, with a duplicated
query text.

With a bit more thinking, there's at least one use case where it's
interesting to disable pg_stat_statements: queries using temporary
tables.  In that case you're guaranteed to generate an infinity of
different queryid.  That doesn't really help since you're not
aggregating anything anymore, and it also makes pg_stat_statements
virtually unusable as once you have a workload that needs frequent
eviction, the overhead is so bad that you basically have to disable
pg_stat_statements.  We could alternatively add a GUC to disable
queryid computation when one of the tables is a temporary table, but
that's yet one among many considerations that are probably best
answered with a custom implementation.

I'm also attaching an updated patch with some attempt to improve the
documentation.  I mention that in-core algorithm may not suits
everyone's needs, but we don't actually document what heuristics are.
Should we give more details on them and what are the most direct
consequences?

Attachment

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

07 January 2021, 17:07:19

On Sun, Oct 18, 2020 at 4:12 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Sun, Oct 18, 2020 at 12:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > > Wait ... what?  I've been thinking that this GUC is just to enable or
> > > disable the computation of query ID, not to change the algorithm to do
> > > so.  Do we really need to allow different algorithms in different
> > > sessions?
> >
> > We established that some time ago, no?
>
> I thought we established the need for allowing different algorithms,
> but I assumed globally not per session.  Anyway, allowing to enable or
> disable compute_queryid per session would technically allow that,
> assuming that you have another module loaded that computes a queryid
> only if no-one was already computed.  In that case pg_stat_statements
> works as you would expect, you will get a new entry, with a duplicated
> query text.
>
> With a bit more thinking, there's at least one use case where it's
> interesting to disable pg_stat_statements: queries using temporary
> tables.  In that case you're guaranteed to generate an infinity of
> different queryid.  That doesn't really help since you're not
> aggregating anything anymore, and it also makes pg_stat_statements
> virtually unusable as once you have a workload that needs frequent
> eviction, the overhead is so bad that you basically have to disable
> pg_stat_statements.  We could alternatively add a GUC to disable
> queryid computation when one of the tables is a temporary table, but
> that's yet one among many considerations that are probably best
> answered with a custom implementation.
>
> I'm also attaching an updated patch with some attempt to improve the
> documentation.  I mention that in-core algorithm may not suits
> everyone's needs, but we don't actually document what heuristics are.
> Should we give more details on them and what are the most direct
> consequences?

v15 that fixes recent conflicts.

Attachment

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

19 January 2021, 16:43:25

On Fri, Jan 8, 2021 at 1:07 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> v15 that fixes recent conflicts.

Rebase only, thanks to the cfbot!  V16 attached.

Attachment

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Tatsuro Yamada

Date:

19 January 2021, 23:58:22

Hi Julien,

> Rebase only, thanks to the cfbot!  V16 attached.

I tested the v16 patch on a0efda88a by using "make installcheck-parallel", and
my result is the following. Attached file is regression.diffs.

========================
  1 of 202 tests failed.
========================

The differences that caused some tests to fail can be viewed in the
file "/home/postgres/PG140/src/test/regress/regression.diffs".  A copy of the test summary that you see
above is saved in the file "/home/postgres/PG140/src/test/regress/regression.out".


src/test/regress/regression.diffs
---------------------------------
diff -U3 /home/postgres/PG140/src/test/regress/expected/rules.out
/home/postgres/PG140/src/test/regress/results/rules.out
--- /home/postgres/PG140/src/test/regress/expected/rules.out    2021-01-20 08:41:16.383175559 +0900
+++ /home/postgres/PG140/src/test/regress/results/rules.out 2021-01-20 08:43:46.589171774 +0900
@@ -1760,10 +1760,9 @@
      s.state,
      s.backend_xid,
      s.backend_xmin,
-    s.queryid,
      s.query,
      s.backend_type
-   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type,
wait_event,xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port,
backend_xid,backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn,
ssl_client_serial,ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid)
 
+   FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type,
wait_event,xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port,
backend_xid,backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn,
ssl_client_serial,ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid)
 
...

Thanks,
Tatsuro Yamada

Attachment

regression.diffs

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Tatsuro Yamada

Date:

20 January 2021, 02:05:54

Hi Julien,


>> Rebase only, thanks to the cfbot!  V16 attached.
> 
> I tested the v16 patch on a0efda88a by using "make installcheck-parallel", and
> my result is the following. Attached file is regression.diffs.


Sorry, my environment was not suitable for the test when I sent my previous email.
I fixed my environment and tested it again, and it was a success. See below:

=======================
  All 202 tests passed.
=======================

Regards,
Tatsuro Yamada

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

20 January 2021, 02:28:54

Hello Yamada-san,

On Wed, Jan 20, 2021 at 10:06 AM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:
>
> Hi Julien,
>
>
> >> Rebase only, thanks to the cfbot!  V16 attached.
> >
> > I tested the v16 patch on a0efda88a by using "make installcheck-parallel", and
> > my result is the following. Attached file is regression.diffs.
>
>
> Sorry, my environment was not suitable for the test when I sent my previous email.
> I fixed my environment and tested it again, and it was a success. See below:
>
> =======================
>   All 202 tests passed.
> =======================

No worries, thanks a lot for testing!

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

17 March 2021, 16:49:07

On Wed, Mar 17, 2021 at 12:24:44PM -0400, Bruce Momjian wrote:
> On Wed, Mar 17, 2021 at 05:16:50PM +0100, Pavel Stehule wrote:
> > 
> > 
> > st 17. 3. 2021 v 17:03 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
> > 
> >     Bruce Momjian <bruce@momjian.us> writes:
> >     > On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote:
> >     >> I still say that it's a serious mistake to sanctify a query ID
> >     calculation
> >     >> method that was designed only for pg_stat_statement's needs as the one
> >     >> true way to do it.  But that's what exposing it in a core view would do.
> > 
> >     > OK, I am fine with creating a new method, and maybe having
> >     > pg_stat_statements use it.  Is that the direction we should be going in?
> > 
> >     The point is that we've understood Query.queryId as something that
> >     different extensions might calculate differently for their own needs.
> >     In particular it's easy to imagine extensions that want an ID that is
> >     less fuzzy than what pg_stat_statements wants.  We never had a plan for
> >     how two such extensions could co-exist, but at least it was possible
> >     to use one if you didn't use another.  If this gets moved into core
> >     then there will basically be only one way that anyone can do it.
> > 
> >     Maybe what we need is a design for allowing more than one query ID.
> > 
> > 
> > Theoretically there can be a hook for calculation of queryid, that can be by
> > used extension. Default can be assigned with a method that is used by
> > pg_stat_statements.
> 
> Yes, that is what the code patch says it does.
> 
> > I don't think it is possible to use more different query id for
> > pg_stat_statements so this solution can be simple.
> 
> Agreed.

Actually, putting the query identifer computation in the core makes it way more
tunable, even if it's conterintuitive.  What it means is that you can now chose
to use usual pgss' algorithm or a different one for log_line_prefix and
pg_stat_activity.queryid, but also that you can now use pgss with a different
query id algorithm.  That's another thing that user were asking for a long
time.

I originally suggested to make it clearer by having an enum GUC rather than a
boolean, say compute_queryid = [ none | core | external ], and if set to
external then a hook would be explicitely called.  Right now, "none" and
"external" are binded with compute_queryid = off, and depends on whether an
extension is computing a queryid during post_parse_analyse_hook.

It could later be extended to suit other needs if we ever come to some
agreement (for instance "legacy", "logical_replication_stable" or whatever
better name we can find for something that doesn't depend on Oid).

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Robert Haas

Date:

17 March 2021, 20:04:44

On Wed, Mar 17, 2021 at 12:48 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> I originally suggested to make it clearer by having an enum GUC rather than a
> boolean, say compute_queryid = [ none | core | external ], and if set to
> external then a hook would be explicitely called.  Right now, "none" and
> "external" are binded with compute_queryid = off, and depends on whether an
> extension is computing a queryid during post_parse_analyse_hook.

I would just make it a Boolean and have a hook. The Boolean controls
whether it gets computed at all, and the hook lets an external module
override the way it gets computed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

17 March 2021, 22:32:16

On Wed, Mar 17, 2021 at 04:04:44PM -0400, Robert Haas wrote:
> On Wed, Mar 17, 2021 at 12:48 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > I originally suggested to make it clearer by having an enum GUC rather than a
> > boolean, say compute_queryid = [ none | core | external ], and if set to
> > external then a hook would be explicitely called.  Right now, "none" and
> > "external" are binded with compute_queryid = off, and depends on whether an
> > extension is computing a queryid during post_parse_analyse_hook.
> 
> I would just make it a Boolean and have a hook. The Boolean controls
> whether it gets computed at all, and the hook lets an external module
> override the way it gets computed.

OK, is that what everyone wants?  I think that is what the patch already
does.

I think having multiple queryids used in a single cluster is much too
confusing to support.  You would have to label and control which queryid
is displayed by pg_stat_activity and log_line_prefix, and that seems too
confusing and not useful.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

17 March 2021, 23:29:56

On Wed, Mar 17, 2021 at 06:32:16PM -0400, Bruce Momjian wrote:
> On Wed, Mar 17, 2021 at 04:04:44PM -0400, Robert Haas wrote:
> > On Wed, Mar 17, 2021 at 12:48 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > > I originally suggested to make it clearer by having an enum GUC rather than a
> > > boolean, say compute_queryid = [ none | core | external ], and if set to
> > > external then a hook would be explicitely called.  Right now, "none" and
> > > "external" are binded with compute_queryid = off, and depends on whether an
> > > extension is computing a queryid during post_parse_analyse_hook.
> > 
> > I would just make it a Boolean and have a hook. The Boolean controls
> > whether it gets computed at all, and the hook lets an external module
> > override the way it gets computed.
> 
> OK, is that what everyone wants?  I think that is what the patch already
> does.

Note exactly.  Right now a custom queryid can be computed even if
compute_queryid is off, if some extension does that in post_parse_analyze_hook.

I'm assuming that what Robert was thinking was more like:

if (compute_queryid)
{
    if (queryid_hook)
        queryId = queryid_hook(...);
    else
        queryId = JumbeQuery(...);
}
else
    queryId = 0;

And that should be done *after* post_parse_analyse_hook so that it's clear that
this hook is no longer the place to compute queryid.

Is that what should be done?

> I think having multiple queryids used in a single cluster is much too
> confusing to support.  You would have to label and control which queryid
> is displayed by pg_stat_activity and log_line_prefix, and that seems too
> confusing and not useful.

I agree.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

18 March 2021, 13:47:29

On Thu, Mar 18, 2021 at 07:29:56AM +0800, Julien Rouhaud wrote:
> On Wed, Mar 17, 2021 at 06:32:16PM -0400, Bruce Momjian wrote:
> > OK, is that what everyone wants?  I think that is what the patch already
> > does.
> 
> Note exactly.  Right now a custom queryid can be computed even if
> compute_queryid is off, if some extension does that in post_parse_analyze_hook.
> 
> I'm assuming that what Robert was thinking was more like:
> 
> if (compute_queryid)
> {
>     if (queryid_hook)
>         queryId = queryid_hook(...);
>     else
>         queryId = JumbeQuery(...);
> }
> else
>     queryId = 0;
> 
> And that should be done *after* post_parse_analyse_hook so that it's clear that
> this hook is no longer the place to compute queryid.
> 
> Is that what should be done?

No, I don't think so.  I think having extensions change behavior
controlled by GUCs is a bad interface.

The docs are going to say that you have to enable compute_queryid to see
the query id in pg_stat_activity and log_line_prefix, but if you install
an extension, the query id will be visible even if you don't have
compute_queryid enabled.  I think you need to only honor the hook if
compute_queryid is enabled, and update the pg_stat_statements docs to
say you have to enable compute_queryid for pg_stat_statements to work.

Also, should it be compute_queryid or compute_query_id?

Also, the overhead of computing the query id was reported as 2% --- that
seems quite high for what it does.  Do we know why it is so high?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

18 March 2021, 18:06:56

On Thu, Mar 18, 2021 at 09:47:29AM -0400, Bruce Momjian wrote:
> On Thu, Mar 18, 2021 at 07:29:56AM +0800, Julien Rouhaud wrote:
> > On Wed, Mar 17, 2021 at 06:32:16PM -0400, Bruce Momjian wrote:
> > > OK, is that what everyone wants?  I think that is what the patch already
> > > does.
> > 
> > Note exactly.  Right now a custom queryid can be computed even if
> > compute_queryid is off, if some extension does that in post_parse_analyze_hook.
> > 
> > I'm assuming that what Robert was thinking was more like:
> > 
> > if (compute_queryid)
> > {
> >     if (queryid_hook)
> >         queryId = queryid_hook(...);
> >     else
> >         queryId = JumbeQuery(...);
> > }
> > else
> >     queryId = 0;
> > 
> > And that should be done *after* post_parse_analyse_hook so that it's clear that
> > this hook is no longer the place to compute queryid.
> > 
> > Is that what should be done?
> 
> No, I don't think so.  I think having extensions change behavior
> controlled by GUCs is a bad interface.
> 
> The docs are going to say that you have to enable compute_queryid to see
> the query id in pg_stat_activity and log_line_prefix, but if you install
> an extension, the query id will be visible even if you don't have
> compute_queryid enabled.  I think you need to only honor the hook if
> compute_queryid is enabled, and update the pg_stat_statements docs to
> say you have to enable compute_queryid for pg_stat_statements to work.

I'm confused, what you described really looks like what I described.

Let me try to clarify:

- if compute_queryid is off, a queryid should never be seen no matter how hard
  an extension tries

- if compute_queryid is on, the calculation will be done by the core
  (using pgss JumbeQuery) unless an extension computed one already.  The only
  way to know what algorithm is used is to check the list of extension loaded.

- if some extension calculates a queryid during post_parse_analyze_hook, we
  will always reset it.

Is that the approach you want?

Note that the only way to not honor the hook is iff the new GUC is disabled is
to have a new queryid_hook, as we can't stop calling post_parse_analyze_hook if
the new GUC is off, and we don't want to pay the queryid calculation overhead
if the admin explicitly said it wasn't needed.

> Also, should it be compute_queryid or compute_query_id?

Maybe compute_query_identifier?

> Also, the overhead of computing the query id was reported as 2% --- that
> seems quite high for what it does.  Do we know why it is so high?

The 2% was a worst case scenario, for a query with a single join over
ridiculously small pg_class and pg_attribute, in read only.  The whole workload
was in shared buffers so the planning and execution is quite fast.  Adding some
complexity in the query really limited the overhead.

Note that this was done on an old laptop with quite slow CPU.  Maybe
someone with a better hardware than a 5/6yo laptop could get some more
realistic results (I unfortunately don't have anything to try on).

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

18 March 2021, 19:23:49

On Fri, Mar 19, 2021 at 02:06:56AM +0800, Julien Rouhaud wrote:
> On Thu, Mar 18, 2021 at 09:47:29AM -0400, Bruce Momjian wrote:
> > On Thu, Mar 18, 2021 at 07:29:56AM +0800, Julien Rouhaud wrote:
> > > Note exactly.  Right now a custom queryid can be computed even if
> > > compute_queryid is off, if some extension does that in post_parse_analyze_hook.

The above text is the part that made me think an extension could display
a query id even if disabled by the GUC.

> > The docs are going to say that you have to enable compute_queryid to see
> > the query id in pg_stat_activity and log_line_prefix, but if you install
> > an extension, the query id will be visible even if you don't have
> > compute_queryid enabled.  I think you need to only honor the hook if
> > compute_queryid is enabled, and update the pg_stat_statements docs to
> > say you have to enable compute_queryid for pg_stat_statements to work.
> 
> I'm confused, what you described really looks like what I described.
> 
> Let me try to clarify:
> 
> - if compute_queryid is off, a queryid should never be seen no matter how hard
>   an extension tries

Oh, OK.  I can see an extension setting the query id on its own --- we
can't prevent that from happening.  It is probably enough to tell
extensions to honor the GUC, since they would want it enabled so it
displays in pg_stat_activity and log_line_prefix.

> - if compute_queryid is on, the calculation will be done by the core
>   (using pgss JumbeQuery) unless an extension computed one already.  The only
>   way to know what algorithm is used is to check the list of extension loaded.

OK.

> - if some extension calculates a queryid during post_parse_analyze_hook, we
>   will always reset it.

OK, good.

> Is that the approach you want?

Yes, I think so.

> Note that the only way to not honor the hook is iff the new GUC is disabled is
> to have a new queryid_hook, as we can't stop calling post_parse_analyze_hook if
> the new GUC is off, and we don't want to pay the queryid calculation overhead
> if the admin explicitly said it wasn't needed.

Right, let's just get the extensions to honor the GUC --- we don't need
to block them or anything.

> > Also, should it be compute_queryid or compute_query_id?
> 
> Maybe compute_query_identifier?

I think compute_query_id works, and is shorter.

> > Also, the overhead of computing the query id was reported as 2% --- that
> > seems quite high for what it does.  Do we know why it is so high?
> 
> The 2% was a worst case scenario, for a query with a single join over
> ridiculously small pg_class and pg_attribute, in read only.  The whole workload
> was in shared buffers so the planning and execution is quite fast.  Adding some
> complexity in the query really limited the overhead.
> 
> Note that this was done on an old laptop with quite slow CPU.  Maybe
> someone with a better hardware than a 5/6yo laptop could get some more
> realistic results (I unfortunately don't have anything to try on).

OK, good to know.  I can run some tests here if people would like me to.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

19 March 2021, 03:16:50

On Thu, Mar 18, 2021 at 03:23:49PM -0400, Bruce Momjian wrote:
> On Fri, Mar 19, 2021 at 02:06:56AM +0800, Julien Rouhaud wrote:
> 
> The above text is the part that made me think an extension could display
> a query id even if disabled by the GUC.

With the last version of the patch I sent it was the case.

> Oh, OK.  I can see an extension setting the query id on its own --- we
> can't prevent that from happening.  It is probably enough to tell
> extensions to honor the GUC, since they would want it enabled so it
> displays in pg_stat_activity and log_line_prefix.

Ok.  So no new hook, and we keep using post_parse_analyze_hook as the official
way to have custom queryid implementation, with this new behavior:

> > - if some extension calculates a queryid during post_parse_analyze_hook, we
> >   will always reset it.
> 
> OK, good.

Now that I'm back on the code I remember why I did it this way.  It's
unfortunately not really possible to make things work this way.

pg_stat_statements' post_parse_analyze_hook relies on a queryid already being
computed, as it's where we know where the constants are recorded.  It means:

- we have to call post_parse_analyze_hook *after* doing core queryid
  calculation
- if users want to use a third party module to calculate a queryid, they'll
  have to make sure that the module's post_parse_analyze_hook is called
  *before* pg_stat_statements' one.
- even if they do so, they'll still have to pay the price of core queryid
  calculation

So it would be very hard to configure and will be too expensive.  I think that
we have to choose to either we make compute_query_id only trigger core
calculation (like it was in previous patch version), or introduce a new hook.

> I think compute_query_id works, and is shorter.

WFM.

> OK, good to know.  I can run some tests here if people would like me to.

+1.  A read only pgbench will be some kind od worse case scenario that can be
used I think.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

19 March 2021, 13:29:06

On Fri, Mar 19, 2021 at 11:16:50AM +0800, Julien Rouhaud wrote:
> On Thu, Mar 18, 2021 at 03:23:49PM -0400, Bruce Momjian wrote:
> > On Fri, Mar 19, 2021 at 02:06:56AM +0800, Julien Rouhaud wrote:
> > 
> > The above text is the part that made me think an extension could display
> > a query id even if disabled by the GUC.
> 
> With the last version of the patch I sent it was the case.
> 
> > Oh, OK.  I can see an extension setting the query id on its own --- we
> > can't prevent that from happening.  It is probably enough to tell
> > extensions to honor the GUC, since they would want it enabled so it
> > displays in pg_stat_activity and log_line_prefix.
> 
> Ok.  So no new hook, and we keep using post_parse_analyze_hook as the official
> way to have custom queryid implementation, with this new behavior:
> 
> > > - if some extension calculates a queryid during post_parse_analyze_hook, we
> > >   will always reset it.
> > 
> > OK, good.
> 
> Now that I'm back on the code I remember why I did it this way.  It's
> unfortunately not really possible to make things work this way.
> 
> pg_stat_statements' post_parse_analyze_hook relies on a queryid already being
> computed, as it's where we know where the constants are recorded.  It means:
> 
> - we have to call post_parse_analyze_hook *after* doing core queryid
>   calculation
> - if users want to use a third party module to calculate a queryid, they'll
>   have to make sure that the module's post_parse_analyze_hook is called
>   *before* pg_stat_statements' one.
> - even if they do so, they'll still have to pay the price of core queryid
>   calculation

OK, that makes perfect sense.  I think the best solution is to document
that compute_query_id just controls the built-in computation of the
query id, and that extensions can also compute it if this is off, and
pg_stat_activity and log_line_prefix will display built-in or extension
computed query ids.

It might be interesting someday to check if the hook changed a
pre-computed query id and warn the user in the logs, but that could
cause more log-spam problems than help.  I am a little worried that
someone might have compute_query_id enabled and then install an
extension that overwrites it, but we will just have to document this
issue.  Hopefully extensions will be clear that they are computing their
own query id.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Hannu Krosing

Date:

19 March 2021, 13:54:16

On Fri, Mar 19, 2021 at 2:29 PM Bruce Momjian <bruce@momjian.us> wrote:
>
> OK, that makes perfect sense.  I think the best solution is to document
> that compute_query_id just controls the built-in computation of the
> query id, and that extensions can also compute it if this is off, and
> pg_stat_activity and log_line_prefix will display built-in or extension
> computed query ids.
>
> It might be interesting someday to check if the hook changed a
> pre-computed query id and warn the user in the logs, but that could
> cause more log-spam problems than help.

The log-spam could be mitigated by logging it just once per connection
the first time it is overridden

Also, we could ask the extensions to expose the "method name" in a read-only GUC

so one can do

SHOW compute_query_id_method;

and get the name of method use

compute_query_id_method
------------------------------------
builtin

And it may even dynamically change to indicate the overriding of builtin

compute_query_id_method
---------------------------------------------------
fancy_compute_query_id (overrides builtin)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

19 March 2021, 14:27:51

On Fri, Mar 19, 2021 at 09:29:06AM -0400, Bruce Momjian wrote:
> On Fri, Mar 19, 2021 at 11:16:50AM +0800, Julien Rouhaud wrote:
> > Now that I'm back on the code I remember why I did it this way.  It's
> > unfortunately not really possible to make things work this way.
> > 
> > pg_stat_statements' post_parse_analyze_hook relies on a queryid already being
> > computed, as it's where we know where the constants are recorded.  It means:
> > 
> > - we have to call post_parse_analyze_hook *after* doing core queryid
> >   calculation
> > - if users want to use a third party module to calculate a queryid, they'll
> >   have to make sure that the module's post_parse_analyze_hook is called
> >   *before* pg_stat_statements' one.
> > - even if they do so, they'll still have to pay the price of core queryid
> >   calculation
> 
> OK, that makes perfect sense.  I think the best solution is to document
> that compute_query_id just controls the built-in computation of the
> query id, and that extensions can also compute it if this is off, and
> pg_stat_activity and log_line_prefix will display built-in or extension
> computed query ids.

So the last version of the patch should implement that behavior right?  It's
just missing some explicit guidance that third-party extensions should only
calculate a queryid if compute_query_id is off

> It might be interesting someday to check if the hook changed a
> pre-computed query id and warn the user in the logs, but that could
> cause more log-spam problems than help.  I am a little worried that
> someone might have compute_query_id enabled and then install an
> extension that overwrites it, but we will just have to document this
> issue.  Hopefully extensions will be clear that they are computing their
> own query id.

I agree.  And hopefully they will split the queryid calculation from the rest
of the extension so that users can use the combination they want.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

19 March 2021, 14:35:21

On Fri, Mar 19, 2021 at 02:54:16PM +0100, Hannu Krosing wrote:
> On Fri, Mar 19, 2021 at 2:29 PM Bruce Momjian <bruce@momjian.us> wrote:
> >
> > OK, that makes perfect sense.  I think the best solution is to document
> > that compute_query_id just controls the built-in computation of the
> > query id, and that extensions can also compute it if this is off, and
> > pg_stat_activity and log_line_prefix will display built-in or extension
> > computed query ids.
> >
> > It might be interesting someday to check if the hook changed a
> > pre-computed query id and warn the user in the logs, but that could
> > cause more log-spam problems than help.
> 
> The log-spam could be mitigated by logging it just once per connection
> the first time it is overridden

Yes, but it might still generate a significant amount of additional lines.

If extensions authors follow the recommendations and only calculate a queryid
when compute_query_id is off, it shoule be easy to check that you have
everything setup properly.

> Also, we could ask the extensions to expose the "method name" in a read-only GUC
> 
> so one can do
> 
> SHOW compute_query_id_method;
> 
> and get the name of method use
> 
> compute_query_id_method
> ------------------------------------
> builtin
> 
> And it may even dynamically change to indicate the overriding of builtin
> 
> compute_query_id_method
> ---------------------------------------------------
> fancy_compute_query_id (overrides builtin)

This could be nice, but I'm not sure that it would work well if someones
install multiple extensions that calculate a queryid (which would be silly but
still), or load another one at runtime.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

19 March 2021, 16:53:18

On Fri, Mar 19, 2021 at 10:35:21PM +0800, Julien Rouhaud wrote:
> On Fri, Mar 19, 2021 at 02:54:16PM +0100, Hannu Krosing wrote:
> > On Fri, Mar 19, 2021 at 2:29 PM Bruce Momjian <bruce@momjian.us> wrote:
> > The log-spam could be mitigated by logging it just once per connection
> > the first time it is overridden
> 
> Yes, but it might still generate a significant amount of additional lines.
> 
> If extensions authors follow the recommendations and only calculate a queryid
> when compute_query_id is off, it shoule be easy to check that you have
> everything setup properly.

Seems extensions that want to generate their own query id should just
error out with a message to the log file if compute_query_id is set ---
that should fix the entire issue --- but see below.

> > Also, we could ask the extensions to expose the "method name" in a read-only GUC
> > 
> > so one can do
> > 
> > SHOW compute_query_id_method;
> > 
> > and get the name of method use
> > 
> > compute_query_id_method
> > ------------------------------------
> > builtin
> > 
> > And it may even dynamically change to indicate the overriding of builtin
> > 
> > compute_query_id_method
> > ---------------------------------------------------
> > fancy_compute_query_id (overrides builtin)
> 
> This could be nice, but I'm not sure that it would work well if someones
> install multiple extensions that calculate a queryid (which would be silly but
> still), or load another one at runtime.

Well, given we don't really want to support multiple query id types
being generated or displayed, the "error out" above should fix it. 

Let's do this --- tell extensions to error out if the query id is
already set, either by compute_query_id or another extension.  If an
extension wants to generate its own query id and store is internal to
the extension, that is fine, but the server-displayed query id should be
generated once and never overwritten by an extension.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

19 March 2021, 16:54:43

On Fri, Mar 19, 2021 at 10:27:51PM +0800, Julien Rouhaud wrote:
> On Fri, Mar 19, 2021 at 09:29:06AM -0400, Bruce Momjian wrote:
> > OK, that makes perfect sense.  I think the best solution is to document
> > that compute_query_id just controls the built-in computation of the
> > query id, and that extensions can also compute it if this is off, and
> > pg_stat_activity and log_line_prefix will display built-in or extension
> > computed query ids.
> 
> So the last version of the patch should implement that behavior right?  It's
> just missing some explicit guidance that third-party extensions should only
> calculate a queryid if compute_query_id is off

Yes, I think we are now down to just how the extensions should be told
to behave, and how we document this --- see the email I just sent.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Hannu Krosing

Date:

20 March 2021, 00:03:16

It would be really convenient if user-visible serialisations of the query id had something that identifies the computation method.

maybe prefix 'N' for internal, 'S' for pg_stat_statements etc.

This would immediately show in logs at what point the id calculator was changed

On Fri, Mar 19, 2021 at 5:54 PM Bruce Momjian <bruce@momjian.us> wrote:

On Fri, Mar 19, 2021 at 10:27:51PM +0800, Julien Rouhaud wrote:
> On Fri, Mar 19, 2021 at 09:29:06AM -0400, Bruce Momjian wrote:
> > OK, that makes perfect sense. I think the best solution is to document
> > that compute_query_id just controls the built-in computation of the
> > query id, and that extensions can also compute it if this is off, and
> > pg_stat_activity and log_line_prefix will display built-in or extension
> > computed query ids.
>
> So the last version of the patch should implement that behavior right? It's
> just missing some explicit guidance that third-party extensions should only
> calculate a queryid if compute_query_id is off

Yes, I think we are now down to just how the extensions should be told
to behave, and how we document this --- see the email I just sent.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

20 March 2021, 00:10:54

On Sat, Mar 20, 2021 at 01:03:16AM +0100, Hannu Krosing wrote:
> It would be really convenient if user-visible serialisations of the query id
> had something that identifies the computation method.
> 
> maybe prefix 'N' for internal, 'S' for pg_stat_statements etc.
> 
> This would immediately show in logs at what point the id calculator was changed

Yeah, but it an integer, and I don't think we want to change that.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

20 March 2021, 05:28:33

On Fri, Mar 19, 2021 at 08:10:54PM -0400, Bruce Momjian wrote:
> On Sat, Mar 20, 2021 at 01:03:16AM +0100, Hannu Krosing wrote:
> > It would be really convenient if user-visible serialisations of the query id
> > had something that identifies the computation method.
> > 
> > maybe prefix 'N' for internal, 'S' for pg_stat_statements etc.
> > 
> > This would immediately show in logs at what point the id calculator was changed
> 
> Yeah, but it an integer, and I don't think we want to change that.

Also, with Bruce's approach to ask extensions to error out if they would
overwrite a queryid the only way to change the calculation method is a restart.
So only one source can exist in the system.

Hopefully that's a big enough hammer that administrators will know what method
they're using.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

20 March 2021, 06:12:34

On Fri, Mar 19, 2021 at 12:53:18PM -0400, Bruce Momjian wrote:
> 
> Well, given we don't really want to support multiple query id types
> being generated or displayed, the "error out" above should fix it. 
> 
> Let's do this --- tell extensions to error out if the query id is
> already set, either by compute_query_id or another extension.  If an
> extension wants to generate its own query id and store is internal to
> the extension, that is fine, but the server-displayed query id should be
> generated once and never overwritten by an extension.

Agreed, this will ensure that you won't dynamically change the queryid source.

We should also document that changing it requires a restart and calling
pg_stat_statements_reset() afterwards.

v19 adds some changes, plus extra documentation for pg_stat_statements about
the requirement for a queryid to be calculated, and a note that all documented
details only apply for in-core source.  I'm not sure if this is still the best
place to document those details anymore though.

Attachment

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

22 March 2021, 21:55:54

On Sat, Mar 20, 2021 at 02:12:34PM +0800, Julien Rouhaud wrote:
> On Fri, Mar 19, 2021 at 12:53:18PM -0400, Bruce Momjian wrote:
> > 
> > Well, given we don't really want to support multiple query id types
> > being generated or displayed, the "error out" above should fix it. 
> > 
> > Let's do this --- tell extensions to error out if the query id is
> > already set, either by compute_query_id or another extension.  If an
> > extension wants to generate its own query id and store is internal to
> > the extension, that is fine, but the server-displayed query id should be
> > generated once and never overwritten by an extension.
> 
> Agreed, this will ensure that you won't dynamically change the queryid source.
> 
> We should also document that changing it requires a restart and calling
> pg_stat_statements_reset() afterwards.
> 
> v19 adds some changes, plus extra documentation for pg_stat_statements about
> the requirement for a queryid to be calculated, and a note that all documented
> details only apply for in-core source.  I'm not sure if this is still the best
> place to document those details anymore though.

OK, after reading the entire thread, I don't think there are any
remaining open issues with this patch and I think this is ready for
committing.  I have adjusted the doc section of the patches, attached. 
I have marked myself as committer in the commitfest app and hope to
apply it in the next few days based on feedback.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Attachment

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Zhihong Yu

Date:

23 March 2021, 00:17:15

Hi,

For queryjumble.c :

The year should be updated.

Same with queryjumble.h

Cheers

On Mon, Mar 22, 2021 at 2:56 PM Bruce Momjian <bruce@momjian.us> wrote:

On Sat, Mar 20, 2021 at 02:12:34PM +0800, Julien Rouhaud wrote:
> On Fri, Mar 19, 2021 at 12:53:18PM -0400, Bruce Momjian wrote:
> >
> > Well, given we don't really want to support multiple query id types
> > being generated or displayed, the "error out" above should fix it.
> >
> > Let's do this --- tell extensions to error out if the query id is
> > already set, either by compute_query_id or another extension. If an
> > extension wants to generate its own query id and store is internal to
> > the extension, that is fine, but the server-displayed query id should be
> > generated once and never overwritten by an extension.
>
> Agreed, this will ensure that you won't dynamically change the queryid source.
>
> We should also document that changing it requires a restart and calling
> pg_stat_statements_reset() afterwards.
>
> v19 adds some changes, plus extra documentation for pg_stat_statements about
> the requirement for a queryid to be calculated, and a note that all documented
> details only apply for in-core source. I'm not sure if this is still the best
> place to document those details anymore though.

OK, after reading the entire thread, I don't think there are any
remaining open issues with this patch and I think this is ready for
committing. I have adjusted the doc section of the patches, attached.
I have marked myself as committer in the commitfest app and hope to
apply it in the next few days based on feedback.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

23 March 2021, 00:43:40

On Mon, Mar 22, 2021 at 05:17:15PM -0700, Zhihong Yu wrote:
> Hi,
> For queryjumble.c :
> 
> + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
> 
> The year should be updated.
> Same with queryjumble.h

Thanks, fixed.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

23 March 2021, 06:31:25

On Mon, Mar 22, 2021 at 05:55:54PM -0400, Bruce Momjian wrote:
> 
> OK, after reading the entire thread, I don't think there are any
> remaining open issues with this patch and I think this is ready for
> committing.  I have adjusted the doc section of the patches, attached. 
> I have marked myself as committer in the commitfest app and hope to
> apply it in the next few days based on feedback.

Thanks a lot Bruce!

I looked at the changes in the attached patches and that's a clear
improvements, thanks a lot for that.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

23 March 2021, 06:36:27

On Mon, Mar 22, 2021 at 08:43:40PM -0400, Bruce Momjian wrote:
> On Mon, Mar 22, 2021 at 05:17:15PM -0700, Zhihong Yu wrote:
> > Hi,
> > For queryjumble.c :
> > 
> > + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
> > 
> > The year should be updated.
> > Same with queryjumble.h
> 
> Thanks, fixed.

Thanks also for taking care of that.  While at it I see that current HEAD has a
lot of files with the same problem:

$ git grep "\-2020"
config/config.guess:#   Copyright 1992-2020 Free Software Foundation, Inc.
config/config.guess:Copyright 1992-2020 Free Software Foundation, Inc.
config/config.sub:#   Copyright 1992-2020 Free Software Foundation, Inc.
config/config.sub:Copyright 1992-2020 Free Software Foundation, Inc.
contrib/pageinspect/gistfuncs.c: * Copyright (c) 2014-2020, PostgreSQL Global Development Group
src/backend/rewrite/rewriteSearchCycle.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/backend/utils/adt/jsonbsubs.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/bin/pg_archivecleanup/po/de.po:# Copyright (C) 2019-2020 PostgreSQL Global Development Group
src/bin/pg_rewind/po/de.po:# Copyright (C) 2015-2020 PostgreSQL Global Development Group
src/bin/pg_rewind/po/de.po:# Peter Eisentraut <peter@eisentraut.org>, 2015-2020.
src/common/hex.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/common/sha1.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/common/sha1_int.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/include/common/hex.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/include/common/sha1.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/include/port/pg_iovec.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/include/rewrite/rewriteSearchCycle.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
src/interfaces/ecpg/preproc/po/de.po:# Copyright (C) 2009-2020 PostgreSQL Global Development Group
src/interfaces/ecpg/preproc/po/de.po:# Peter Eisentraut <peter@eisentraut.org>, 2009-2020.

Is that an oversight in ca3b37487be333a1d241dab1bbdd17a211a88f43, at least for
non .po files?

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

23 March 2021, 14:34:38

On Tue, Mar 23, 2021 at 02:36:27PM +0800, Julien Rouhaud wrote:
> On Mon, Mar 22, 2021 at 08:43:40PM -0400, Bruce Momjian wrote:
> > On Mon, Mar 22, 2021 at 05:17:15PM -0700, Zhihong Yu wrote:
> > > Hi,
> > > For queryjumble.c :
> > > 
> > > + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
> > > 
> > > The year should be updated.
> > > Same with queryjumble.h
> > 
> > Thanks, fixed.
> 
> Thanks also for taking care of that.  While at it I see that current HEAD has a
> lot of files with the same problem:
> 
> $ git grep "\-2020"
> config/config.guess:#   Copyright 1992-2020 Free Software Foundation, Inc.
> config/config.guess:Copyright 1992-2020 Free Software Foundation, Inc.
> config/config.sub:#   Copyright 1992-2020 Free Software Foundation, Inc.
> config/config.sub:Copyright 1992-2020 Free Software Foundation, Inc.
> contrib/pageinspect/gistfuncs.c: * Copyright (c) 2014-2020, PostgreSQL Global Development Group
> src/backend/rewrite/rewriteSearchCycle.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
> src/backend/utils/adt/jsonbsubs.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
> src/bin/pg_archivecleanup/po/de.po:# Copyright (C) 2019-2020 PostgreSQL Global Development Group
> src/bin/pg_rewind/po/de.po:# Copyright (C) 2015-2020 PostgreSQL Global Development Group
> src/bin/pg_rewind/po/de.po:# Peter Eisentraut <peter@eisentraut.org>, 2015-2020.
> src/common/hex.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
> src/common/sha1.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
> src/common/sha1_int.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
> src/include/common/hex.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
> src/include/common/sha1.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
> src/include/port/pg_iovec.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
> src/include/rewrite/rewriteSearchCycle.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
> src/interfaces/ecpg/preproc/po/de.po:# Copyright (C) 2009-2020 PostgreSQL Global Development Group
> src/interfaces/ecpg/preproc/po/de.po:# Peter Eisentraut <peter@eisentraut.org>, 2009-2020.
> 
> Is that an oversight in ca3b37487be333a1d241dab1bbdd17a211a88f43, at least for
> non .po files?

No, I don't think so.  We don't change the Free Software Foundation
copyrights, and the .po files get loaded from another repository
occasionally.  The hex/sha copyrights came from patches developed in
2020 but committed in 2021.  These will mostly be corrected in 2022.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Alvaro Herrera

Date:

23 March 2021, 15:12:03

On 2021-Mar-22, Bruce Momjian wrote:

> --- a/doc/src/sgml/ref/explain.sgml
> +++ b/doc/src/sgml/ref/explain.sgml
> @@ -136,8 +136,10 @@ ROLLBACK;
>        the output column list for each node in the plan tree, schema-qualify
>        table and function names, always label variables in expressions with
>        their range table alias, and always print the name of each trigger for
> -      which statistics are displayed.  This parameter defaults to
> -      <literal>FALSE</literal>.
> +      which statistics are displayed.  The query identifier will also be
> +      displayed if one has been compute, see <xref
> +      linkend="guc-compute-query-id"/> for more details.  This parameter
> +      defaults to <literal>FALSE</literal>.

Typo here, "has been computed".

Is the intention to commit each of these patches separately?

-- 
Álvaro Herrera       Valdivia, Chile

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

23 March 2021, 16:27:10

On Tue, Mar 23, 2021 at 12:12:03PM -0300, Álvaro Herrera wrote:
> On 2021-Mar-22, Bruce Momjian wrote:
> 
> > --- a/doc/src/sgml/ref/explain.sgml
> > +++ b/doc/src/sgml/ref/explain.sgml
> > @@ -136,8 +136,10 @@ ROLLBACK;
> >        the output column list for each node in the plan tree, schema-qualify
> >        table and function names, always label variables in expressions with
> >        their range table alias, and always print the name of each trigger for
> > -      which statistics are displayed.  This parameter defaults to
> > -      <literal>FALSE</literal>.
> > +      which statistics are displayed.  The query identifier will also be
> > +      displayed if one has been compute, see <xref
> > +      linkend="guc-compute-query-id"/> for more details.  This parameter
> > +      defaults to <literal>FALSE</literal>.
> 
> Typo here, "has been computed".

Good catch, fixed.

> Is the intention to commit each of these patches separately?

No, I was thinking of just doing a single commit.  Should I do three
commits?  I posted it as three patches since that is how it was posted
by the author, and reviewing is easier.  It also will need a catversion
bump.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

24 March 2021, 03:02:39

On Tue, Mar 23, 2021 at 10:34:38AM -0400, Bruce Momjian wrote:
> On Tue, Mar 23, 2021 at 02:36:27PM +0800, Julien Rouhaud wrote:
> > 
> > Is that an oversight in ca3b37487be333a1d241dab1bbdd17a211a88f43, at least for
> > non .po files?
> 
> No, I don't think so.  We don't change the Free Software Foundation
> copyrights, and the .po files get loaded from another repository
> occasionally.  The hex/sha copyrights came from patches developed in
> 2020 but committed in 2021.  These will mostly be corrected in 2022.

Ok, thanks for the clarification!

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

24 March 2021, 03:07:13

On Tue, Mar 23, 2021 at 12:27:10PM -0400, Bruce Momjian wrote:
> 
> No, I was thinking of just doing a single commit.  Should I do three
> commits?  I posted it as three patches since that is how it was posted
> by the author, and reviewing is easier.  It also will need a catversion
> bump.

Yes, I originally split the commit because it was easier to write this way and
it seemed better to send different patches too to ease review.

I think that it would make sense to commit the first patch separately, but I'm
fine with a single commit if you prefer.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Alvaro Herrera

Date:

24 March 2021, 08:12:35

On 2021-Mar-22, Bruce Momjian wrote:

> diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
> index e259531f60..9550de0798 100644
> --- a/src/include/catalog/pg_proc.dat
> +++ b/src/include/catalog/pg_proc.dat
> @@ -5249,9 +5249,9 @@
>    proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
>    proretset => 't', provolatile => 's', proparallel => 'r',
>    prorettype => 'record', proargtypes => 'int4',
> -  proallargtypes =>
'{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4}',
> -  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
> -  proargnames =>
'{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
> +  proallargtypes =>
'{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
> +  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
> +  proargnames =>
'{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,queryid}',

BTW why do you put the queryid column at the end of the column list
here?  It seems awkward.  Can we put it perhaps between state and query?


> -const char *clean_querytext(const char *query, int *location, int *len);
> +const char *CleanQuerytext(const char *query, int *location, int *len);
>  JumbleState *JumbleQuery(Query *query, const char *querytext);

I think pushing in more than one commit is a reasonable approach if they
are well-contained, but if you do that it'd be better to avoid
introducing a function with one name and renaming it in your next
commit.

-- 
Álvaro Herrera       Valdivia, Chile
"Just treat us the way you want to be treated + some extra allowance
 for ignorance."                                    (Michael Brusser)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

24 March 2021, 08:51:40

On Wed, Mar 24, 2021 at 05:12:35AM -0300, Alvaro Herrera wrote:
> On 2021-Mar-22, Bruce Momjian wrote:
> 
> > diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
> > index e259531f60..9550de0798 100644
> > --- a/src/include/catalog/pg_proc.dat
> > +++ b/src/include/catalog/pg_proc.dat
> > @@ -5249,9 +5249,9 @@
> >    proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f',
> >    proretset => 't', provolatile => 's', proparallel => 'r',
> >    prorettype => 'record', proargtypes => 'int4',
> > -  proallargtypes =>
'{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4}',
> > -  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
> > -  proargnames =>
'{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}',
> > +  proallargtypes =>
'{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}',
> > +  proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}',
> > +  proargnames =>
'{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,queryid}',
> 
> BTW why do you put the queryid column at the end of the column list
> here?  It seems awkward.  Can we put it perhaps between state and query?

I thought that it would be better to have it at the end as it can always be
NULL (and will be by default), which I guess was also the reason to have
leader_pid there.  I'm all in favor to have queryid near the query, and
while at it leader_pid near the pid.

> > -const char *clean_querytext(const char *query, int *location, int *len);
> > +const char *CleanQuerytext(const char *query, int *location, int *len);
> >  JumbleState *JumbleQuery(Query *query, const char *querytext);
> 
> I think pushing in more than one commit is a reasonable approach if they
> are well-contained

They should, as I incrementally built on top of the first one.  I also just
double checked the patchset and each new commit compiles and passes the
regression tests.

> but if you do that it'd be better to avoid
> introducing a function with one name and renaming it in your next
> commit.

Oops, I apparently messed a fixup when working on it.  Bruce, should I take
care of that of do you want to?  I think you have some local modifications
already I'd rather not miss some changes.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

24 March 2021, 12:13:40

On Wed, Mar 24, 2021 at 04:51:40PM +0800, Julien Rouhaud wrote:
> > but if you do that it'd be better to avoid
> > introducing a function with one name and renaming it in your next
> > commit.
> 
> Oops, I apparently messed a fixup when working on it.  Bruce, should I take
> care of that of do you want to?  I think you have some local modifications
> already I'd rather not miss some changes.

I have no local modifications.  Please modify the patch I posted and
repost your version, thanks.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

24 March 2021, 15:20:49

On Wed, Mar 24, 2021 at 08:13:40AM -0400, Bruce Momjian wrote:
> On Wed, Mar 24, 2021 at 04:51:40PM +0800, Julien Rouhaud wrote:
> > > but if you do that it'd be better to avoid
> > > introducing a function with one name and renaming it in your next
> > > commit.
> > 
> > Oops, I apparently messed a fixup when working on it.  Bruce, should I take
> > care of that of do you want to?  I think you have some local modifications
> > already I'd rather not miss some changes.
> 
> I have no local modifications.  Please modify the patch I posted and
> repost your version, thanks.

Ok!  I used the last version of the patch you sent and addressed the following
comments from earlier messages in attached v20:

- copyright year to 2021
- s/has has been compute/has been compute/
- use the name CleanQuerytext in the first commit

I didn't change the position of queryid in pg_stat_get_activity(), as the
"real" order is actually define in system_views.sql when creating
pg_stat_activity view.  Adding the new fields at the end of
pg_stat_get_activity() helps to keep the C code simpler and less bug prone, so
I think it's best to continue this way.

I also used the previous commit message if that helps.

On Thu, Apr 01, 2021 at 11:05:24PM +0800, Julien Rouhaud wrote:
> On Wed, Mar 31, 2021 at 11:18:45AM -0300, Alvaro Herrera wrote:
> > On 2021-Mar-31, Bruce Momjian wrote:
> > > 
> > > I assume it is since Alvaro didn't reply.  I am planning to apply this
> > > soon.
> > 
> > I'm afraid I don't know enough about how parallel query works to make a
> > good assessment on this being a good approach or not -- and no time at
> > present to figure it all out.
> 
> I'm far from being an expert either, but at the time I wrote it and
> looking at the code around it probably seemed sensible.  We could directly call
> pgstat_get_my_queryid() in ExecSerializePlan() rather than passing it from the
> various callers though, at least there would be a single source for it.

Here's a v21 that includes the mentioned change.

On Thu, Apr 01, 2021 at 03:27:11PM -0400, Bruce Momjian wrote:
> 
> OK, I am happy with your design decisions, thanks.

Thanks!  While double checking I noticed that I failed to remove a (now)
useless include of pgstat.h in nodeGatherMerge.c in last version.  I'm
attaching v22 to fix that, no other change.

Attachment

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

04 April 2021, 14:18:50

On Fri, Apr 02, 2021 at 01:33:28PM +0800, Julien Rouhaud wrote:
> On Thu, Apr 01, 2021 at 03:27:11PM -0400, Bruce Momjian wrote:
> > 
> > OK, I am happy with your design decisions, thanks.
> 
> Thanks!  While double checking I noticed that I failed to remove a (now)
> useless include of pgstat.h in nodeGatherMerge.c in last version.  I'm
> attaching v22 to fix that, no other change.

There was a conflict since e1025044c (Split backend status and progress related
functionality out of pgstat.c).

Attached v23 is a rebase against current HEAD, and I also added a few
UINT64CONST() macro usage for consistency.

Attachment

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

05 April 2021, 17:16:27

On Sun, Apr  4, 2021 at 10:18:50PM +0800, Julien Rouhaud wrote:
> On Fri, Apr 02, 2021 at 01:33:28PM +0800, Julien Rouhaud wrote:
> > On Thu, Apr 01, 2021 at 03:27:11PM -0400, Bruce Momjian wrote:
> > > 
> > > OK, I am happy with your design decisions, thanks.
> > 
> > Thanks!  While double checking I noticed that I failed to remove a (now)
> > useless include of pgstat.h in nodeGatherMerge.c in last version.  I'm
> > attaching v22 to fix that, no other change.
> 
> There was a conflict since e1025044c (Split backend status and progress related
> functionality out of pgstat.c).
> 
> Attached v23 is a rebase against current HEAD, and I also added a few
> UINT64CONST() macro usage for consistency.

Thanks.  I struggled with merging the statistics collection changes into
my cluster file encryption branches because my patch made changes to
code that moved to another C file.

I plan to apply this tomorrow.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Nitin Jadhav

Date:

06 April 2021, 14:35:19

I have reviewed the code. Here are a few minor comments.

+void

+pgstat_report_queryid(uint64 queryId, bool force)

+ volatile PgBackendStatus *beentry = MyBEEntry;

+ if (!beentry)

+ return;

+ /*

+ * if track_activities is disabled, st_queryid should already have been

+ * reset

+ */

+ if (!pgstat_track_activities)

+ return;

The above two conditions can be clubbed together in a single condition.

+/* ----------

+ * pgstat_get_my_queryid() -

+ *

+ * Return current backend's query identifier.

+ */

+uint64

+pgstat_get_my_queryid(void)

+ if (!MyBEEntry)

+ return 0;

+ return MyBEEntry->st_queryid;

Is it safe to directly read the data from MyBEEntry without calling pgstat_begin_read_activity() and pgstat_end_read_activity(). Kindly ref pgstat_get_backend_current_activity() for more information. Kindly let me know if I am wrong.

Thanks and Regards,

Nitin Jadhav

On Mon, Apr 5, 2021 at 10:46 PM Bruce Momjian <bruce@momjian.us> wrote:

On Sun, Apr 4, 2021 at 10:18:50PM +0800, Julien Rouhaud wrote:
> On Fri, Apr 02, 2021 at 01:33:28PM +0800, Julien Rouhaud wrote:
> > On Thu, Apr 01, 2021 at 03:27:11PM -0400, Bruce Momjian wrote:
> > >
> > > OK, I am happy with your design decisions, thanks.
> >
> > Thanks! While double checking I noticed that I failed to remove a (now)
> > useless include of pgstat.h in nodeGatherMerge.c in last version. I'm
> > attaching v22 to fix that, no other change.
>
> There was a conflict since e1025044c (Split backend status and progress related
> functionality out of pgstat.c).
>
> Attached v23 is a rebase against current HEAD, and I also added a few
> UINT64CONST() macro usage for consistency.

Thanks. I struggled with merging the statistics collection changes into
my cluster file encryption branches because my patch made changes to
code that moved to another C file.

I plan to apply this tomorrow.

--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

06 April 2021, 15:11:02

On Tue, Apr 06, 2021 at 08:05:19PM +0530, Nitin Jadhav wrote:
> 
> 1.
> +void
> +pgstat_report_queryid(uint64 queryId, bool force)
> +{
> + volatile PgBackendStatus *beentry = MyBEEntry;
> +
> + if (!beentry)
> + return;
> +
> + /*
> + * if track_activities is disabled, st_queryid should already have been
> + * reset
> + */
> + if (!pgstat_track_activities)
> + return;
> 
> The above two conditions can be clubbed together in a single condition.

Right, I just kept it separate as the comment is only relevant for the 2nd
test.  I'm fine with merging both if needed.

> 2.
> +/* ----------
> + * pgstat_get_my_queryid() -
> + *
> + * Return current backend's query identifier.
> + */
> +uint64
> +pgstat_get_my_queryid(void)
> +{
> + if (!MyBEEntry)
> + return 0;
> +
> + return MyBEEntry->st_queryid;
> +}
> 
> Is it safe to directly read the data from MyBEEntry without
> calling pgstat_begin_read_activity() and pgstat_end_read_activity(). Kindly
> ref pgstat_get_backend_current_activity() for more information. Kindly let
> me know if I am wrong.

This field is only written by a backend for its own entry.
pg_stat_get_activity already has required protection, so the rest of the calls
to read that field shouldn't have any risk of reading torn values on platform
where this isn't an atomic operation due to concurrent write, as it will be
from the same backend that originally wrote it.  It avoids some overhead to
retrieve the queryid, but if people think it's worth having the loop (or a
comment explaining why there's no loop) I'm also fine with it.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Alvaro Herrera

Date:

06 April 2021, 15:41:52

On 2021-Apr-06, Nitin Jadhav wrote:

> I have reviewed the code. Here are a few minor comments.
> 
> 1.
> +void
> +pgstat_report_queryid(uint64 queryId, bool force)
> +{
> + volatile PgBackendStatus *beentry = MyBEEntry;
> +
> + if (!beentry)
> + return;
> +
> + /*
> + * if track_activities is disabled, st_queryid should already have been
> + * reset
> + */
> + if (!pgstat_track_activities)
> + return;
> 
> The above two conditions can be clubbed together in a single condition.

I wonder if it wouldn't make more sense to put the assignment *after* we
have checked the second condition.

-- 
Álvaro Herrera       Valdivia, Chile

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

06 April 2021, 15:49:16

On Tue, Apr 06, 2021 at 11:41:52AM -0400, Alvaro Herrera wrote:
> On 2021-Apr-06, Nitin Jadhav wrote:
> 
> > I have reviewed the code. Here are a few minor comments.
> > 
> > 1.
> > +void
> > +pgstat_report_queryid(uint64 queryId, bool force)
> > +{
> > + volatile PgBackendStatus *beentry = MyBEEntry;
> > +
> > + if (!beentry)
> > + return;
> > +
> > + /*
> > + * if track_activities is disabled, st_queryid should already have been
> > + * reset
> > + */
> > + if (!pgstat_track_activities)
> > + return;
> > 
> > The above two conditions can be clubbed together in a single condition.
> 
> I wonder if it wouldn't make more sense to put the assignment *after* we
> have checked the second condition.

All other pgstat_report_* functions do the assignment before doing any test on
beentry and/or pgstat_track_activities, I think we should keep this code
consistent.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Nitin Jadhav

Date:

07 April 2021, 12:45:27

>
> 1.
> +void
> +pgstat_report_queryid(uint64 queryId, bool force)
> +{
> + volatile PgBackendStatus *beentry = MyBEEntry;
> +
> + if (!beentry)
> + return;
> +
> + /*
> + * if track_activities is disabled, st_queryid should already have been
> + * reset
> + */
> + if (!pgstat_track_activities)
> + return;
>
> The above two conditions can be clubbed together in a single condition.
Right, I just kept it separate as the comment is only relevant for the 2nd
test. I'm fine with merging both if needed.

I feel we should merge both of the conditions as it is done in pgstat_report_xact_timestamp(). Probably we can write a common comment to explain both the conditions.

> 2.
> +/* ----------
> + * pgstat_get_my_queryid() -
> + *
> + * Return current backend's query identifier.
> + */
> +uint64
> +pgstat_get_my_queryid(void)
> +{
> + if (!MyBEEntry)
> + return 0;
> +
> + return MyBEEntry->st_queryid;
> +}
>
> Is it safe to directly read the data from MyBEEntry without
> calling pgstat_begin_read_activity() and pgstat_end_read_activity(). Kindly
> ref pgstat_get_backend_current_activity() for more information. Kindly let
> me know if I am wrong.
This field is only written by a backend for its own entry.
pg_stat_get_activity already has required protection, so the rest of the calls
to read that field shouldn't have any risk of reading torn values on platform
where this isn't an atomic operation due to concurrent write, as it will be
from the same backend that originally wrote it. It avoids some overhead to
retrieve the queryid, but if people think it's worth having the loop (or a
comment explaining why there's no loop) I'm also fine with it.

Thanks for the explanation. Please add a comment explaining why there is no loop.

Thanks and Regards,

Nitin Jadhav

On Tue, Apr 6, 2021 at 8:40 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Tue, Apr 06, 2021 at 08:05:19PM +0530, Nitin Jadhav wrote:
>
> 1.
> +void
> +pgstat_report_queryid(uint64 queryId, bool force)
> +{
> + volatile PgBackendStatus *beentry = MyBEEntry;
> +
> + if (!beentry)
> + return;
> +
> + /*
> + * if track_activities is disabled, st_queryid should already have been
> + * reset
> + */
> + if (!pgstat_track_activities)
> + return;
>
> The above two conditions can be clubbed together in a single condition.

Right, I just kept it separate as the comment is only relevant for the 2nd
test. I'm fine with merging both if needed.

> 2.
> +/* ----------
> + * pgstat_get_my_queryid() -
> + *
> + * Return current backend's query identifier.
> + */
> +uint64
> +pgstat_get_my_queryid(void)
> +{
> + if (!MyBEEntry)
> + return 0;
> +
> + return MyBEEntry->st_queryid;
> +}
>
> Is it safe to directly read the data from MyBEEntry without
> calling pgstat_begin_read_activity() and pgstat_end_read_activity(). Kindly
> ref pgstat_get_backend_current_activity() for more information. Kindly let
> me know if I am wrong.

This field is only written by a backend for its own entry.
pg_stat_get_activity already has required protection, so the rest of the calls
to read that field shouldn't have any risk of reading torn values on platform
where this isn't an atomic operation due to concurrent write, as it will be
from the same backend that originally wrote it. It avoids some overhead to
retrieve the queryid, but if people think it's worth having the loop (or a
comment explaining why there's no loop) I'm also fine with it.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Nitin Jadhav

Date:

07 April 2021, 12:47:11

On Tue, Apr 06, 2021 at 11:41:52AM -0400, Alvaro Herrera wrote:
> On 2021-Apr-06, Nitin Jadhav wrote:
>
> > I have reviewed the code. Here are a few minor comments.
> >
> > 1.
> > +void
> > +pgstat_report_queryid(uint64 queryId, bool force)
> > +{
> > + volatile PgBackendStatus *beentry = MyBEEntry;
> > +
> > + if (!beentry)
> > + return;
> > +
> > + /*
> > + * if track_activities is disabled, st_queryid should already have been
> > + * reset
> > + */
> > + if (!pgstat_track_activities)
> > + return;
> >
> > The above two conditions can be clubbed together in a single condition.
>
> I wonder if it wouldn't make more sense to put the assignment *after* we
> have checked the second condition.
All other pgstat_report_* functions do the assignment before doing any test on
beentry and/or pgstat_track_activities, I think we should keep this code
consistent.

I agree about this.

Thanks and Regards,

Nitin Jadhav

On Tue, Apr 6, 2021 at 9:18 PM Julien Rouhaud <rjuju123@gmail.com> wrote:

On Tue, Apr 06, 2021 at 11:41:52AM -0400, Alvaro Herrera wrote:
> On 2021-Apr-06, Nitin Jadhav wrote:
>
> > I have reviewed the code. Here are a few minor comments.
> >
> > 1.
> > +void
> > +pgstat_report_queryid(uint64 queryId, bool force)
> > +{
> > + volatile PgBackendStatus *beentry = MyBEEntry;
> > +
> > + if (!beentry)
> > + return;
> > +
> > + /*
> > + * if track_activities is disabled, st_queryid should already have been
> > + * reset
> > + */
> > + if (!pgstat_track_activities)
> > + return;
> >
> > The above two conditions can be clubbed together in a single condition.
>
> I wonder if it wouldn't make more sense to put the assignment *after* we
> have checked the second condition.

All other pgstat_report_* functions do the assignment before doing any test on
beentry and/or pgstat_track_activities, I think we should keep this code
consistent.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

07 April 2021, 12:57:26

On Wed, Apr 07, 2021 at 06:15:27PM +0530, Nitin Jadhav wrote:
> 
> I feel we should merge both of the conditions as it is done in
> pgstat_report_xact_timestamp(). Probably we can write a common comment to
> explain both the conditions.
> 
> [...]
> 
> Thanks for the explanation. Please add a comment explaining why there is no
> loop.

PFA v24.

On Wed, Apr 07, 2021 at 04:22:55PM -0400, Bruce Momjian wrote:
> aOn Wed, Apr  7, 2021 at 04:15:50PM -0400, Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> > > Patch applied.  I am ready to adjust this with any improvements people
> > > might have.  Thank you for all the good feedback we got on this, and I
> > > know many users have waited a long time for this feature.
> > 
> > For starters, you could try to make the buildfarm green again.
> 
> Wow, that's odd.  The cfbot was green, so I never even looked at the
> buildfarm.  I will look at that now, and the CVS log issue.

Sorry about that.  The issue came from animals with jit_above_cost = 0
outputting more lines than expected.  I fixed that by using the same query as
before in explain.sql, as they don't generate any JIT output.

I also added the queryid to the csvlog output and fixed the documentation that
mention how to create a table to access the data.

Attachment

v1-fix_pgss.diff

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

07 April 2021, 22:15:25

On Thu, Apr 08, 2021 at 05:56:25AM +0800, Julien Rouhaud wrote:
> 
> I also added the queryid to the csvlog output and fixed the documentation that
> mention how to create a table to access the data.

Note that I chose to output a 0 queryid if none has been computed rather that
outputting nothing.  Let me know if that's not the wanted behavior.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

07 April 2021, 22:38:40

On Thu, Apr  8, 2021 at 05:56:25AM +0800, Julien Rouhaud wrote:
> On Wed, Apr 07, 2021 at 04:22:55PM -0400, Bruce Momjian wrote:
> > aOn Wed, Apr  7, 2021 at 04:15:50PM -0400, Tom Lane wrote:
> > > Bruce Momjian <bruce@momjian.us> writes:
> > > > Patch applied.  I am ready to adjust this with any improvements people
> > > > might have.  Thank you for all the good feedback we got on this, and I
> > > > know many users have waited a long time for this feature.
> > > 
> > > For starters, you could try to make the buildfarm green again.
> > 
> > Wow, that's odd.  The cfbot was green, so I never even looked at the
> > buildfarm.  I will look at that now, and the CVS log issue.
> 
> Sorry about that.  The issue came from animals with jit_above_cost = 0
> outputting more lines than expected.  I fixed that by using the same query as
> before in explain.sql, as they don't generate any JIT output.

Yes, I just came to the same conclusion, that 'SELECT 1' didn't generate
the proper output lines to allow explain_filter() to strip out the JIT
lines.  I have applied your patch for this, which should fix the build
farm. (I see my first green report now.)

> I also added the queryid to the csvlog output and fixed the documentation that
> mention how to create a table to access the data.

Uh, I think your patch missed a few things.  First, you use "%zd"
(size_t) for the printf string, but calls to pgstat_get_my_queryid() in
src/backend/utils/error/elog.c used "%ld".  Which is correct?  I see
pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64
fits in a BIGINT SQL column.

Also, you missed the SGML paragraph doc change, but you correctly
changed the SQL table definition.

I am attaching my version of the patch.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Attachment

csv.diff

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Tom Lane

Date:

07 April 2021, 23:01:25

Bruce Momjian <bruce@momjian.us> writes:
> Uh, I think your patch missed a few things.  First, you use "%zd"
> (size_t) for the printf string, but calls to pgstat_get_my_queryid() in
> src/backend/utils/error/elog.c used "%ld".  Which is correct?  I see
> pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64
> fits in a BIGINT SQL column.

Neither is correct.  Project standard these days for printing [u]int64
is to write "%lld" or "%llu", with an explicit (long long) cast on
the printf argument.

            regards, tom lane

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

07 April 2021, 23:38:35

On Wed, Apr  7, 2021 at 07:01:25PM -0400, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > Uh, I think your patch missed a few things.  First, you use "%zd"
> > (size_t) for the printf string, but calls to pgstat_get_my_queryid() in
> > src/backend/utils/error/elog.c used "%ld".  Which is correct?  I see
> > pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64
> > fits in a BIGINT SQL column.
> 
> Neither is correct.  Project standard these days for printing [u]int64
> is to write "%lld" or "%llu", with an explicit (long long) cast on
> the printf argument.

Yep, got it.  The attached patch fixes all the calls to use %lld, and
adds casts.  In implementing cvslog, I noticed that internally we pass
the hash as uint64, but output as int64, which I think is a requirement
for how pg_stat_statements has output it, and the use of bigint.  Is
that OK?

I am also confused about the inconsistency of calling the GUC
compute_query_id (with underscore), but pg_stat_activity.queryid.  If we
make it pg_stat_activity.query_id, it doesn't match most of the other
*id columsns in the table, leader_pid, usesysid, backend_xid.  Is that
OK?I know I suggested pg_stat_activity.query_id, but maybe I was wrong.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Attachment

csv.diff

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

08 April 2021, 00:47:48

On Wed, Apr 07, 2021 at 07:38:35PM -0400, Bruce Momjian wrote:
> On Wed, Apr  7, 2021 at 07:01:25PM -0400, Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> > > Uh, I think your patch missed a few things.  First, you use "%zd"
> > > (size_t) for the printf string, but calls to pgstat_get_my_queryid() in
> > > src/backend/utils/error/elog.c used "%ld".  Which is correct?  I see
> > > pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64
> > > fits in a BIGINT SQL column.
> > 
> > Neither is correct.  Project standard these days for printing [u]int64
> > is to write "%lld" or "%llu", with an explicit (long long) cast on
> > the printf argument.
> 
> Yep, got it.  The attached patch fixes all the calls to use %lld, and
> adds casts.  In implementing cvslog, I noticed that internally we pass
> the hash as uint64, but output as int64, which I think is a requirement
> for how pg_stat_statements has output it, and the use of bigint.  Is
> that OK?

Indeed, this is due to how we expose the value in SQL.  The original discussion
is at
https://www.postgresql.org/message-id/CAH2-WzkueMfAmY3onoXLi+g67SJoKY65Cg9Z1QOhSyhCEU8w3g@mail.gmail.com.
As far as I know this is OK, as we want to show consistent values everywhere.

> I am also confused about the inconsistency of calling the GUC
> compute_query_id (with underscore), but pg_stat_activity.queryid.  If we
> make it pg_stat_activity.query_id, it doesn't match most of the other
> *id columsns in the table, leader_pid, usesysid, backend_xid.  Is that
> OK?I know I suggested pg_stat_activity.query_id, but maybe I was wrong.

Mmm, most of the columns in pg_stat_activity do have a "_", so using query_id
would make more sense.

@@ -2967,6 +2967,10 @@ write_csvlog(ErrorData *edata)

     appendStringInfoChar(&buf, '\n');

+    /* query id */
+    appendStringInfo(&buf, "%lld", (long long) pgstat_get_my_queryid());
+    appendStringInfoChar(&buf, ',');
+

     /* If in the syslogger process, try to write messages direct to file */
     if (MyBackendType == B_LOGGER)
         write_syslogger_file(buf.data, buf.len, LOG_DESTINATION_CSVLOG);


Unless I'm missing something this will output the query id in the next log
line?  The new code should be added before the newline is output, and the comma
should also be output before the queryid.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

08 April 2021, 00:54:02

On Thu, Apr  8, 2021 at 08:47:48AM +0800, Julien Rouhaud wrote:
> On Wed, Apr 07, 2021 at 07:38:35PM -0400, Bruce Momjian wrote:
> > On Wed, Apr  7, 2021 at 07:01:25PM -0400, Tom Lane wrote:
> > > Bruce Momjian <bruce@momjian.us> writes:
> > > > Uh, I think your patch missed a few things.  First, you use "%zd"
> > > > (size_t) for the printf string, but calls to pgstat_get_my_queryid() in
> > > > src/backend/utils/error/elog.c used "%ld".  Which is correct?  I see
> > > > pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64
> > > > fits in a BIGINT SQL column.
> > > 
> > > Neither is correct.  Project standard these days for printing [u]int64
> > > is to write "%lld" or "%llu", with an explicit (long long) cast on
> > > the printf argument.
> > 
> > Yep, got it.  The attached patch fixes all the calls to use %lld, and
> > adds casts.  In implementing cvslog, I noticed that internally we pass
> > the hash as uint64, but output as int64, which I think is a requirement
> > for how pg_stat_statements has output it, and the use of bigint.  Is
> > that OK?
> 
> Indeed, this is due to how we expose the value in SQL.  The original discussion
> is at
> https://www.postgresql.org/message-id/CAH2-WzkueMfAmY3onoXLi+g67SJoKY65Cg9Z1QOhSyhCEU8w3g@mail.gmail.com.
> As far as I know this is OK, as we want to show consistent values everywhere.

OK, yes, I do remember the discussion.  I was wondering if there should
be a C comment about this anywhere.

> > I am also confused about the inconsistency of calling the GUC
> > compute_query_id (with underscore), but pg_stat_activity.queryid.  If we
> > make it pg_stat_activity.query_id, it doesn't match most of the other
> > *id columsns in the table, leader_pid, usesysid, backend_xid.  Is that
> > OK?I know I suggested pg_stat_activity.query_id, but maybe I was wrong.
> 
> Mmm, most of the columns in pg_stat_activity do have a "_", so using query_id
> would make more sense.

OK, let me work on a patch to change that part.

> @@ -2967,6 +2967,10 @@ write_csvlog(ErrorData *edata)
> 
>      appendStringInfoChar(&buf, '\n');
> 
> +    /* query id */
> +    appendStringInfo(&buf, "%lld", (long long) pgstat_get_my_queryid());
> +    appendStringInfoChar(&buf, ',');
> +
> 
>      /* If in the syslogger process, try to write messages direct to file */
>      if (MyBackendType == B_LOGGER)
>          write_syslogger_file(buf.data, buf.len, LOG_DESTINATION_CSVLOG);
>
> Unless I'm missing something this will output the query id in the next log
> line?  The new code should be added before the newline is output, and the comma
> should also be output before the queryid.

Yes, correct, updated patch attached.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Attachment

csv.diff

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

08 April 2021, 01:00:40

On Wed, Apr  7, 2021 at 08:54:02PM -0400, Bruce Momjian wrote:
> > > I am also confused about the inconsistency of calling the GUC
> > > compute_query_id (with underscore), but pg_stat_activity.queryid.  If we
> > > make it pg_stat_activity.query_id, it doesn't match most of the other
> > > *id columsns in the table, leader_pid, usesysid, backend_xid.  Is that
> > > OK?I know I suggested pg_stat_activity.query_id, but maybe I was wrong.
> > 
> > Mmm, most of the columns in pg_stat_activity do have a "_", so using query_id
> > would make more sense.
> 
> OK, let me work on a patch to change that part.

Uh, it is 'queryid' in pg_stat_statements:

    https://www.postgresql.org/docs/13/pgstatstatements.html

    queryid bigint
    Internal hash code, computed from the statement's parse tree

I am not sure if we should have pg_stat_activity use underscore, or the
GUC use underscore.  The problem is that queryid can easily look like
quer-yid.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

08 April 2021, 02:31:01

On Wed, Apr  7, 2021 at 08:54:02PM -0400, Bruce Momjian wrote:
> > Unless I'm missing something this will output the query id in the next log
> > line?  The new code should be added before the newline is output, and the comma
> > should also be output before the queryid.
> 
> Yes, correct, updated patch attached.

Patch applied.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

08 April 2021, 02:38:08

On Wed, Apr 07, 2021 at 10:31:01PM -0400, Bruce Momjian wrote:
> On Wed, Apr  7, 2021 at 08:54:02PM -0400, Bruce Momjian wrote:
> > > Unless I'm missing something this will output the query id in the next log
> > > line?  The new code should be added before the newline is output, and the comma
> > > should also be output before the queryid.
> > 
> > Yes, correct, updated patch attached.
> 
> Patch applied.

Thanks!  And I agree with using query_id in the new field names while keeping
queryid for pg_stat_statements to avoid unnecessary query breakage.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

08 April 2021, 02:42:29

On Thu, Apr  8, 2021 at 10:38:08AM +0800, Julien Rouhaud wrote:
> On Wed, Apr 07, 2021 at 10:31:01PM -0400, Bruce Momjian wrote:
> > On Wed, Apr  7, 2021 at 08:54:02PM -0400, Bruce Momjian wrote:
> > > > Unless I'm missing something this will output the query id in the next log
> > > > line?  The new code should be added before the newline is output, and the comma
> > > > should also be output before the queryid.
> > > 
> > > Yes, correct, updated patch attached.
> > 
> > Patch applied.
> 
> Thanks!  And I agree with using query_id in the new field names while keeping
> queryid for pg_stat_statements to avoid unnecessary query breakage.

I think we need more feedback from the group.  Do people agree with the
idea above?  The question is what to call:

    GUC compute_queryid
    pg_stat_activity.queryid
    pg_stat_statements.queryid

using "queryid" or "query_id", and do they have to match?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Alvaro Herrera

Date:

08 April 2021, 03:27:04

On 2021-Apr-07, Bruce Momjian wrote:

> On Thu, Apr  8, 2021 at 10:38:08AM +0800, Julien Rouhaud wrote:

> > Thanks!  And I agree with using query_id in the new field names while keeping
> > queryid for pg_stat_statements to avoid unnecessary query breakage.
> 
> I think we need more feedback from the group.  Do people agree with the
> idea above?  The question is what to call:
> 
>     GUC compute_queryid
>     pg_stat_activity.queryid
>     pg_stat_statements.queryid
> 
> using "queryid" or "query_id", and do they have to match?

Seems a matter of personal preference.  Mine is to have the underscore
everywhere in backend code (where this is new), and let it without the
underscore in pg_stat_statements to avoid breaking existing code.  Seems
to match what Julien is saying.

-- 
Álvaro Herrera       Valdivia, Chile
"La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

08 April 2021, 04:17:58

On Wed, Apr 07, 2021 at 02:12:11PM -0400, Bruce Momjian wrote:
> 
> Patch applied.  I am ready to adjust this with any improvements people
> might have.  Thank you for all the good feedback we got on this, and I
> know many users have waited a long time for this feature.

Thanks a lot Bruce and everyone!  I hope that the users who waited a long time
for this will find everything they need.

Just to validate that this patchset also allows user to use pg_stat_statements,
any additional third-party module and the new added infrastructure with the
queryid algorithm of their choice, I created a POC extension ([1]) which works
as expected.

Basically:

SHOW shared_preload_libraries;
 shared_preload_libraries
--------------------------
 pg_stat_statements, pg_queryid
(1 row)

SET pg_queryid.use_object_names TO on;
SET pg_queryid.ignore_schema TO on;

CREATE SCHEMA ns1; CREATE TABLE ns1.tbl1(id integer);
CREATE SCHEMA ns2; CREATE TABLE ns2.tbl1(id integer);

SET search_path TO ns1;
SELECT COUNT(*) FROM tbl1;
SET search_path TO ns2;
SELECT COUNT(*) FROM tbl1;

SELECT queryid, query, calls
FROM public.pg_stat_statements
WHERE query LIKE '%tbl%';
       queryid       |           query           | calls
---------------------+---------------------------+-------
 4629593225724429059 | SELECT count(*) from tbl1 |     2
(1 row)

So whether that's a good idea to do that or not, users now have a choice.

[1]: https://github.com/rjuju/pg_queryid

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Thomas Munro

Date:

08 April 2021, 11:36:48

Hi Julien, Bruce,

A warning appears on 32 bit systems:

In file included from pgstatfuncs.c:15:
pgstatfuncs.c: In function 'pg_stat_get_activity':
../../../../src/include/postgres.h:593:29: warning: cast to pointer
from integer of different size [-Wint-to-pointer-cast]
  593 | #define DatumGetPointer(X) ((Pointer) (X))
      |                             ^
../../../../src/include/postgres.h:678:42: note: in expansion of macro
'DatumGetPointer'
  678 | #define DatumGetUInt64(X) (* ((uint64 *) DatumGetPointer(X)))
      |                                          ^~~~~~~~~~~~~~~
pgstatfuncs.c:920:18: note: in expansion of macro 'DatumGetUInt64'
  920 |     values[29] = DatumGetUInt64(beentry->st_queryid);
      |                  ^~~~~~~~~~~~~~

Hmm, maybe this should be UInt64GetDatum()?

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

08 April 2021, 12:12:54

On Thu, Apr 08, 2021 at 11:36:48PM +1200, Thomas Munro wrote:
> Hi Julien, Bruce,
> 
> A warning appears on 32 bit systems:
> 
> In file included from pgstatfuncs.c:15:
> pgstatfuncs.c: In function 'pg_stat_get_activity':
> ../../../../src/include/postgres.h:593:29: warning: cast to pointer
> from integer of different size [-Wint-to-pointer-cast]
>   593 | #define DatumGetPointer(X) ((Pointer) (X))
>       |                             ^
> ../../../../src/include/postgres.h:678:42: note: in expansion of macro
> 'DatumGetPointer'
>   678 | #define DatumGetUInt64(X) (* ((uint64 *) DatumGetPointer(X)))
>       |                                          ^~~~~~~~~~~~~~~
> pgstatfuncs.c:920:18: note: in expansion of macro 'DatumGetUInt64'
>   920 |     values[29] = DatumGetUInt64(beentry->st_queryid);
>       |                  ^~~~~~~~~~~~~~

Wow, that's really embarrassing :(

> Hmm, maybe this should be UInt64GetDatum()?

Yes definitely.  I'm attaching the previous patch for force_parallel_mode to
not forget it + a new one for this issue.

Attachment

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Amit Kapila

Date:

08 April 2021, 12:16:07

On Thu, Apr 8, 2021 at 9:47 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Wed, Apr 07, 2021 at 02:12:11PM -0400, Bruce Momjian wrote:
> >
> > Patch applied.  I am ready to adjust this with any improvements people
> > might have.  Thank you for all the good feedback we got on this, and I
> > know many users have waited a long time for this feature.
>
> Thanks a lot Bruce and everyone!  I hope that the users who waited a long time
> for this will find everything they need.
>

@@ -1421,8 +1421,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
    /* Setting debug_query_string for individual workers */
    debug_query_string = queryDesc->sourceText;

-   /* Report workers' query for monitoring purposes */
+   /* Report workers' query and queryId for monitoring purposes */
    pgstat_report_activity(STATE_RUNNING, debug_query_string);
+   pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);


Below lines down in ParallelQueryMain, we call ExecutorStart which
will report queryid, so do we need it here?

-- 
With Regards,
Amit Kapila.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

08 April 2021, 12:27:20

On Thu, Apr 08, 2021 at 05:46:07PM +0530, Amit Kapila wrote:
> 
> @@ -1421,8 +1421,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
>     /* Setting debug_query_string for individual workers */
>     debug_query_string = queryDesc->sourceText;
> 
> -   /* Report workers' query for monitoring purposes */
> +   /* Report workers' query and queryId for monitoring purposes */
>     pgstat_report_activity(STATE_RUNNING, debug_query_string);
> +   pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
> 
> 
> Below lines down in ParallelQueryMain, we call ExecutorStart which
> will report queryid, so do we need it here?

Correct, it's not actually needed.  The overhead should be negligible but let's
get rid of it.  Updated fix patchset attached.

Attachment

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

08 April 2021, 13:31:27

On Thu, Apr 08, 2021 at 08:27:20PM +0800, Julien Rouhaud wrote:
> On Thu, Apr 08, 2021 at 05:46:07PM +0530, Amit Kapila wrote:
> > 
> > @@ -1421,8 +1421,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc)
> >     /* Setting debug_query_string for individual workers */
> >     debug_query_string = queryDesc->sourceText;
> > 
> > -   /* Report workers' query for monitoring purposes */
> > +   /* Report workers' query and queryId for monitoring purposes */
> >     pgstat_report_activity(STATE_RUNNING, debug_query_string);
> > +   pgstat_report_queryid(queryDesc->plannedstmt->queryId, false);
> > 
> > 
> > Below lines down in ParallelQueryMain, we call ExecutorStart which
> > will report queryid, so do we need it here?
> 
> Correct, it's not actually needed.  The overhead should be negligible but let's
> get rid of it.  Updated fix patchset attached.

Sorry I messed up the last commit, v4 is ok.

On Thu, Apr  8, 2021 at 01:01:42PM -0400, Bruce Momjian wrote:
> On Thu, Apr  8, 2021 at 12:51:06PM -0400, Álvaro Herrera wrote:
> > On 2021-Apr-08, Bruce Momjian wrote:
> > 
> > > pg_stat_activity.queryid is new, but I can imagine cases where you would
> > > join pg_stat_activity to pg_stat_statements to get an estimate of how
> > > long the query will take --- having one using an underscore and another
> > > one not seems odd.
> > 
> > OK.  So far, you have one vote for queryid (your own) and two votes for
> > query_id (mine and Julien's).  And even yourself were hesitating about
> > it earlier in the thread.
> 
> OK, if people are fine with pg_stat_activity.query_id not matching
> pg_stat_statements.queryid, I am fine with that.  I just don't want
> someone to say it was a big mistake later.  ;-)

OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I
have not changed any of the APIs which existed before this feature was
added, and are called "queryid" or "queryId" --- it is kind of a mess. 
I assume I should leave those unchanged.  It will also need a catversion
bump.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Attachment

query_id.diff.gz

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

13 April 2021, 08:06:25

On Mon, Apr 12, 2021 at 10:12:46PM -0400, Bruce Momjian wrote:
> On Thu, Apr  8, 2021 at 01:01:42PM -0400, Bruce Momjian wrote:
> > On Thu, Apr  8, 2021 at 12:51:06PM -0400, Álvaro Herrera wrote:
> > > On 2021-Apr-08, Bruce Momjian wrote:
> > > 
> > > > pg_stat_activity.queryid is new, but I can imagine cases where you would
> > > > join pg_stat_activity to pg_stat_statements to get an estimate of how
> > > > long the query will take --- having one using an underscore and another
> > > > one not seems odd.
> > > 
> > > OK.  So far, you have one vote for queryid (your own) and two votes for
> > > query_id (mine and Julien's).  And even yourself were hesitating about
> > > it earlier in the thread.
> > 
> > OK, if people are fine with pg_stat_activity.query_id not matching
> > pg_stat_statements.queryid, I am fine with that.  I just don't want
> > someone to say it was a big mistake later.  ;-)
> 
> OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I
> have not changed any of the APIs which existed before this feature was
> added, and are called "queryid" or "queryId" --- it is kind of a mess. 
> I assume I should leave those unchanged.  It will also need a catversion
> bump.

-    uint64        st_queryid;
+    uint64        st_query_id;

I thought we would internally keep queryid/queryId, at least for the variable
names as this is the name of the saved field in PlannedStmt.

-extern void pgstat_report_queryid(uint64 queryId, bool force);
+extern void pgstat_report_query_id(uint64 queryId, bool force);

But if we don't then it should be "uint64 query_id".

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Alvaro Herrera

Date:

13 April 2021, 17:30:16

On 2021-Apr-12, Bruce Momjian wrote:

> OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I
> have not changed any of the APIs which existed before this feature was
> added, and are called "queryid" or "queryId" --- it is kind of a mess. 
> I assume I should leave those unchanged.  It will also need a catversion
> bump.

I think it is fine actually.  These names appear in structs Query and
PlannedStmt, and every single member of those already uses camelCase
naming.  Changing those to use "query_id" would look out of place.
You did change the one in PgBackendStatus to st_query_id, which also
matches the naming style in that struct, so that looks fine also.

So I'm -1 on Julien's first proposed change, and +1 on his second
proposed change (the name of the first argument of
pgstat_report_query_id should be query_id).

-- 
Álvaro Herrera       Valdivia, Chile

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

14 April 2021, 18:33:26

On Tue, Apr 13, 2021 at 01:30:16PM -0400, Álvaro Herrera wrote:
> On 2021-Apr-12, Bruce Momjian wrote:
> 
> > OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I
> > have not changed any of the APIs which existed before this feature was
> > added, and are called "queryid" or "queryId" --- it is kind of a mess. 
> > I assume I should leave those unchanged.  It will also need a catversion
> > bump.
> 
> I think it is fine actually.  These names appear in structs Query and
> PlannedStmt, and every single member of those already uses camelCase
> naming.  Changing those to use "query_id" would look out of place.
> You did change the one in PgBackendStatus to st_query_id, which also
> matches the naming style in that struct, so that looks fine also.
> 
> So I'm -1 on Julien's first proposed change, and +1 on his second
> proposed change (the name of the first argument of
> pgstat_report_query_id should be query_id).

Thanks for your analysis.  Updated patch attached with the change
suggested above.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Attachment

query_id.diff.gz

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

15 April 2021, 08:11:42

On Wed, Apr 14, 2021 at 02:33:26PM -0400, Bruce Momjian wrote:
> On Tue, Apr 13, 2021 at 01:30:16PM -0400, Álvaro Herrera wrote:
> > 
> > So I'm -1 on Julien's first proposed change, and +1 on his second
> > proposed change (the name of the first argument of
> > pgstat_report_query_id should be query_id).
> 
> Thanks for your analysis.  Updated patch attached with the change
> suggested above.

Thanks Bruce.  It looks good to me.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Bruce Momjian

Date:

20 April 2021, 16:22:44

On Wed, Apr 14, 2021 at 02:33:26PM -0400, Bruce Momjian wrote:
> On Tue, Apr 13, 2021 at 01:30:16PM -0400, Álvaro Herrera wrote:
> > On 2021-Apr-12, Bruce Momjian wrote:
> > 
> > > OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I
> > > have not changed any of the APIs which existed before this feature was
> > > added, and are called "queryid" or "queryId" --- it is kind of a mess. 
> > > I assume I should leave those unchanged.  It will also need a catversion
> > > bump.
> > 
> > I think it is fine actually.  These names appear in structs Query and
> > PlannedStmt, and every single member of those already uses camelCase
> > naming.  Changing those to use "query_id" would look out of place.
> > You did change the one in PgBackendStatus to st_query_id, which also
> > matches the naming style in that struct, so that looks fine also.
> > 
> > So I'm -1 on Julien's first proposed change, and +1 on his second
> > proposed change (the name of the first argument of
> > pgstat_report_query_id should be query_id).
> 
> Thanks for your analysis.  Updated patch attached with the change
> suggested above.

Patch applied.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Fujii Masao

Date:

21 April 2021, 15:28:11


On 2021/04/21 1:22, Bruce Momjian wrote:
> On Wed, Apr 14, 2021 at 02:33:26PM -0400, Bruce Momjian wrote:
>> On Tue, Apr 13, 2021 at 01:30:16PM -0400, Álvaro Herrera wrote:
>>> On 2021-Apr-12, Bruce Momjian wrote:
>>>
>>>> OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I
>>>> have not changed any of the APIs which existed before this feature was
>>>> added, and are called "queryid" or "queryId" --- it is kind of a mess.
>>>> I assume I should leave those unchanged.  It will also need a catversion
>>>> bump.
>>>
>>> I think it is fine actually.  These names appear in structs Query and
>>> PlannedStmt, and every single member of those already uses camelCase
>>> naming.  Changing those to use "query_id" would look out of place.
>>> You did change the one in PgBackendStatus to st_query_id, which also
>>> matches the naming style in that struct, so that looks fine also.
>>>
>>> So I'm -1 on Julien's first proposed change, and +1 on his second
>>> proposed change (the name of the first argument of
>>> pgstat_report_query_id should be query_id).
>>
>> Thanks for your analysis.  Updated patch attached with the change
>> suggested above.
> 
> Patch applied.

I found another small issue in pg_stat_statements docs. The following
description in the docs should be updated so that toplevel is included?

> This view contains one row for each distinct database ID, user ID and query ID

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

22 April 2021, 09:23:18

On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:
> 
> I found another small issue in pg_stat_statements docs. The following
> description in the docs should be updated so that toplevel is included?
> 
> > This view contains one row for each distinct database ID, user ID and query ID

Indeed!  I'm adding Magnus in Cc.

PFA a patch to fix at, and also mention that toplevel will only
contain True values if pg_stat_statements.track is set to top.

Attachment

v1-0001-doc-Mention-that-toplevel-is-part-of-pg_stat_stat.patch

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Fujii Masao

Date:

23 April 2021, 07:10:37


On 2021/04/22 18:23, Julien Rouhaud wrote:
> On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:
>>
>> I found another small issue in pg_stat_statements docs. The following
>> description in the docs should be updated so that toplevel is included?
>>
>>> This view contains one row for each distinct database ID, user ID and query ID
> 
> Indeed!  I'm adding Magnus in Cc.
> 
> PFA a patch to fix at, and also mention that toplevel will only
> contain True values if pg_stat_statements.track is set to top.

Thanks for the patch! LGTM.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Magnus Hagander

Date:

23 April 2021, 09:46:34

On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2021/04/22 18:23, Julien Rouhaud wrote:
> > On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:
> >>
> >> I found another small issue in pg_stat_statements docs. The following
> >> description in the docs should be updated so that toplevel is included?
> >>
> >>> This view contains one row for each distinct database ID, user ID and query ID
> >
> > Indeed!  I'm adding Magnus in Cc.
> >
> > PFA a patch to fix at, and also mention that toplevel will only
> > contain True values if pg_stat_statements.track is set to top.
>
> Thanks for the patch! LGTM.

Agreed, in general. But going by the example a few lines down, I
changed the second part to:
        True if the query was executed as a top level statement
+       (if <varname>pg_stat_statements.track</varname> is set to
+       <literal>all</literal>, otherwise always false)

(changes the wording, but also the name of the parameter is
pg_stat_statements.track, not pg_stat_statements.toplevel (that's the
column, not the parameter). Same error in the commit msg except there
you called it pg_stat_statements.top - but that one needed some more
fix as well)

With those changes, applied. Thanks!

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Fujii Masao

Date:

23 April 2021, 10:04:48


On 2021/04/23 18:46, Magnus Hagander wrote:
> On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>>
>>
>> On 2021/04/22 18:23, Julien Rouhaud wrote:
>>> On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:
>>>>
>>>> I found another small issue in pg_stat_statements docs. The following
>>>> description in the docs should be updated so that toplevel is included?
>>>>
>>>>> This view contains one row for each distinct database ID, user ID and query ID
>>>
>>> Indeed!  I'm adding Magnus in Cc.
>>>
>>> PFA a patch to fix at, and also mention that toplevel will only
>>> contain True values if pg_stat_statements.track is set to top.
>>
>> Thanks for the patch! LGTM.
> 
> Agreed, in general. But going by the example a few lines down, I
> changed the second part to:
>          True if the query was executed as a top level statement
> +       (if <varname>pg_stat_statements.track</varname> is set to
> +       <literal>all</literal>, otherwise always false)

Isn't this confusing? Users may mistakenly read this as that the toplevel
column always indicates false if pg_stat_statements.track is not "all".


> (changes the wording, but also the name of the parameter is
> pg_stat_statements.track, not pg_stat_statements.toplevel (that's the
> column, not the parameter). Same error in the commit msg except there
> you called it pg_stat_statements.top - but that one needed some more
> fix as well)
> 
> With those changes, applied. Thanks!

Thanks!

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Magnus Hagander

Date:

23 April 2021, 10:11:31

On Fri, Apr 23, 2021 at 12:04 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2021/04/23 18:46, Magnus Hagander wrote:
> > On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >>
> >>
> >>
> >> On 2021/04/22 18:23, Julien Rouhaud wrote:
> >>> On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:
> >>>>
> >>>> I found another small issue in pg_stat_statements docs. The following
> >>>> description in the docs should be updated so that toplevel is included?
> >>>>
> >>>>> This view contains one row for each distinct database ID, user ID and query ID
> >>>
> >>> Indeed!  I'm adding Magnus in Cc.
> >>>
> >>> PFA a patch to fix at, and also mention that toplevel will only
> >>> contain True values if pg_stat_statements.track is set to top.
> >>
> >> Thanks for the patch! LGTM.
> >
> > Agreed, in general. But going by the example a few lines down, I
> > changed the second part to:
> >          True if the query was executed as a top level statement
> > +       (if <varname>pg_stat_statements.track</varname> is set to
> > +       <literal>all</literal>, otherwise always false)
>
> Isn't this confusing? Users may mistakenly read this as that the toplevel
> column always indicates false if pg_stat_statements.track is not "all".

Hmm. I think you're right. It should say "always true", shouldn't it?
So not just confusing, but completely wrong? :)


-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Fujii Masao

Date:

23 April 2021, 10:40:33


On 2021/04/23 19:11, Magnus Hagander wrote:
> On Fri, Apr 23, 2021 at 12:04 PM Fujii Masao
> <masao.fujii@oss.nttdata.com> wrote:
>>
>>
>>
>> On 2021/04/23 18:46, Magnus Hagander wrote:
>>> On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2021/04/22 18:23, Julien Rouhaud wrote:
>>>>> On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:
>>>>>>
>>>>>> I found another small issue in pg_stat_statements docs. The following
>>>>>> description in the docs should be updated so that toplevel is included?
>>>>>>
>>>>>>> This view contains one row for each distinct database ID, user ID and query ID
>>>>>
>>>>> Indeed!  I'm adding Magnus in Cc.
>>>>>
>>>>> PFA a patch to fix at, and also mention that toplevel will only
>>>>> contain True values if pg_stat_statements.track is set to top.
>>>>
>>>> Thanks for the patch! LGTM.
>>>
>>> Agreed, in general. But going by the example a few lines down, I
>>> changed the second part to:
>>>           True if the query was executed as a top level statement
>>> +       (if <varname>pg_stat_statements.track</varname> is set to
>>> +       <literal>all</literal>, otherwise always false)
>>
>> Isn't this confusing? Users may mistakenly read this as that the toplevel
>> column always indicates false if pg_stat_statements.track is not "all".
> 
> Hmm. I think you're right. It should say "always true", shouldn't it?

You're thinking something like the following?

     True if the query was executed as a top level statement
     (if <varname>pg_stat_statements.track</varname> is set to
     <literal>top</literal>, always true)

> So not just confusing, but completely wrong? :)

Yeah :)

I'm fine with the original wording by Julien.
Of course, the parameter name should be corrected as you did, though.

Or what about the following?

     True if the query was executed as a top level statement
     (this can be <literal>false</literal> only if
     <varname>pg_stat_statements.track</varname> is set to
     <literal>all</literal> and nested statements are also tracked)

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Magnus Hagander

Date:

17 May 2021, 09:02:29

On Fri, Apr 23, 2021 at 12:40 PM Fujii Masao
<masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2021/04/23 19:11, Magnus Hagander wrote:
> > On Fri, Apr 23, 2021 at 12:04 PM Fujii Masao
> > <masao.fujii@oss.nttdata.com> wrote:
> >>
> >>
> >>
> >> On 2021/04/23 18:46, Magnus Hagander wrote:
> >>> On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2021/04/22 18:23, Julien Rouhaud wrote:
> >>>>> On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote:
> >>>>>>
> >>>>>> I found another small issue in pg_stat_statements docs. The following
> >>>>>> description in the docs should be updated so that toplevel is included?
> >>>>>>
> >>>>>>> This view contains one row for each distinct database ID, user ID and query ID
> >>>>>
> >>>>> Indeed!  I'm adding Magnus in Cc.
> >>>>>
> >>>>> PFA a patch to fix at, and also mention that toplevel will only
> >>>>> contain True values if pg_stat_statements.track is set to top.
> >>>>
> >>>> Thanks for the patch! LGTM.
> >>>
> >>> Agreed, in general. But going by the example a few lines down, I
> >>> changed the second part to:
> >>>           True if the query was executed as a top level statement
> >>> +       (if <varname>pg_stat_statements.track</varname> is set to
> >>> +       <literal>all</literal>, otherwise always false)
> >>
> >> Isn't this confusing? Users may mistakenly read this as that the toplevel
> >> column always indicates false if pg_stat_statements.track is not "all".
> >
> > Hmm. I think you're right. It should say "always true", shouldn't it?
>
> You're thinking something like the following?
>
>      True if the query was executed as a top level statement
>      (if <varname>pg_stat_statements.track</varname> is set to
>      <literal>top</literal>, always true)
>
> > So not just confusing, but completely wrong? :)
>
> Yeah :)

Ugh. I completely lost track of this email.

I've applied the change suggested above with another slight reordering
of the words:

+       (always true if <varname>pg_stat_statements.track</varname> is set to
+       <literal>top</literal>)


> I'm fine with the original wording by Julien.
> Of course, the parameter name should be corrected as you did, though.
>
> Or what about the following?
>
>      True if the query was executed as a top level statement
>      (this can be <literal>false</literal> only if
>      <varname>pg_stat_statements.track</varname> is set to
>      <literal>all</literal> and nested statements are also tracked)

I found my suggestion, once the final reordering of words was done,
easier to parse.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Peter Eisentraut

Date:

12 July 2021, 20:02:59

On 22.04.21 11:23, Julien Rouhaud wrote:
>      The statistics gathered by the module are made available via a
>      view named <structname>pg_stat_statements</structname>.  This view
> -   contains one row for each distinct database ID, user ID and query
> -   ID (up to the maximum number of distinct statements that the module
> +   contains one row for each distinct database ID, user ID, query ID and
> +   toplevel (up to the maximum number of distinct statements that the module
>      can track).  The columns of the view are shown in
>      <xref linkend="pgstatstatements-columns"/>.

I'm having trouble parsing this new sentence.  It now says essentially

"This view contains one row for each distinct database ID, each distinct 
user ID, each distinct query ID, and each distinct toplevel."

That last part doesn't make sense.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

13 July 2021, 06:10:23

On Mon, Jul 12, 2021 at 10:02:59PM +0200, Peter Eisentraut wrote:
> On 22.04.21 11:23, Julien Rouhaud wrote:
> >      The statistics gathered by the module are made available via a
> >      view named <structname>pg_stat_statements</structname>.  This view
> > -   contains one row for each distinct database ID, user ID and query
> > -   ID (up to the maximum number of distinct statements that the module
> > +   contains one row for each distinct database ID, user ID, query ID and
> > +   toplevel (up to the maximum number of distinct statements that the module
> >      can track).  The columns of the view are shown in
> >      <xref linkend="pgstatstatements-columns"/>.
> 
> I'm having trouble parsing this new sentence.  It now says essentially
> 
> "This view contains one row for each distinct database ID, each distinct
> user ID, each distinct query ID, and each distinct toplevel."

Isn't it each distinct permutation of all those fields?

> That last part doesn't make sense.

I'm not sure what you mean by that.  Maybe it's not really self explanatory
without referring to what toplevel is, which is a bool flag stating whether the
statement was exected as a top level statement or not.

So every distinct permutation of (dbid, userid, queryid) can indeed be stored
twice, if pg_stat_statements.track is set to all.  However in practice most
statements are not executed both as top level and nested statements.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Magnus Hagander

Date:

13 July 2021, 08:58:12

On Tue, Jul 13, 2021 at 8:10 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Mon, Jul 12, 2021 at 10:02:59PM +0200, Peter Eisentraut wrote:
> > On 22.04.21 11:23, Julien Rouhaud wrote:
> > >      The statistics gathered by the module are made available via a
> > >      view named <structname>pg_stat_statements</structname>.  This view
> > > -   contains one row for each distinct database ID, user ID and query
> > > -   ID (up to the maximum number of distinct statements that the module
> > > +   contains one row for each distinct database ID, user ID, query ID and
> > > +   toplevel (up to the maximum number of distinct statements that the module
> > >      can track).  The columns of the view are shown in
> > >      <xref linkend="pgstatstatements-columns"/>.
> >
> > I'm having trouble parsing this new sentence.  It now says essentially
> >
> > "This view contains one row for each distinct database ID, each distinct
> > user ID, each distinct query ID, and each distinct toplevel."
>
> Isn't it each distinct permutation of all those fields?

Maybe a problem for the readability of it is that the three other
fields are listed by their description and not by their fieldname, and
toplevel is fieldname?

Maybe "each distinct database id, each distinct user id, each distinct
query id, and whether it is a top level statement or not"?

Or maybe "each distinct combination of database id, user id, query id
and whether it's a top level statement or not"?

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Julien Rouhaud

Date:

13 July 2021, 09:38:52

On Tue, Jul 13, 2021 at 10:58:12AM +0200, Magnus Hagander wrote:
> 
> Maybe a problem for the readability of it is that the three other
> fields are listed by their description and not by their fieldname, and
> toplevel is fieldname?

I think so too.

> Maybe "each distinct database id, each distinct user id, each distinct
> query id, and whether it is a top level statement or not"?
> 
> Or maybe "each distinct combination of database id, user id, query id
> and whether it's a top level statement or not"?

I like the 2nd one better.  What about "and its top level status"?  It would
keep the sentence short and the full description is right after if needed.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Peter Eisentraut

Date:

14 July 2021, 04:36:48

On 13.07.21 10:58, Magnus Hagander wrote:
> Maybe "each distinct database id, each distinct user id, each distinct
> query id, and whether it is a top level statement or not"?
> 
> Or maybe "each distinct combination of database id, user id, query id
> and whether it's a top level statement or not"?

Okay, now I understand what is meant here.  The second one sounds good 
to me.

Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

From

Magnus Hagander

Date:

14 July 2021, 09:09:54

On Wed, Jul 14, 2021 at 6:36 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> On 13.07.21 10:58, Magnus Hagander wrote:
> > Maybe "each distinct database id, each distinct user id, each distinct
> > query id, and whether it is a top level statement or not"?
> >
> > Or maybe "each distinct combination of database id, user id, query id
> > and whether it's a top level statement or not"?
>
> Okay, now I understand what is meant here.  The second one sounds good
> to me.

Thanks, will push a fix like that.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/