Thread: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
Hey pg developers,
Do you think if we can add queryId into the pg_stat_get_activity function and ultimatly expose it in the view? It would be easier to track "similar" query's performance over time easier.
Thanks a lot!
Yun
Do you think if we can add queryId into the pg_stat_get_activity function and ultimatly expose it in the view? It would be easier to track "similar" query's performance over time easier.
Thanks a lot!
Yun
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Tom Lane
Date:
Yun Li <liyunjuanyong@gmail.com> writes: > Do you think if we can add queryId into the pg_stat_get_activity function > and ultimatly expose it in the view? It would be easier to track "similar" > query's performance over time easier. No, we're not likely to do that, because it would mean (1) baking one single definition of "query ID" into the core system and (2) paying the cost to calculate that ID all the time. pg_stat_statements has a notion of query ID, but that notion might be quite inappropriate for other usages, which is why it's an extension and not core. regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Robert Haas
Date:
On Fri, Mar 15, 2019 at 9:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yun Li <liyunjuanyong@gmail.com> writes: > > Do you think if we can add queryId into the pg_stat_get_activity function > > and ultimatly expose it in the view? It would be easier to track "similar" > > query's performance over time easier. > > No, we're not likely to do that, because it would mean (1) baking one > single definition of "query ID" into the core system and (2) paying > the cost to calculate that ID all the time. > > pg_stat_statements has a notion of query ID, but that notion might be > quite inappropriate for other usages, which is why it's an extension > and not core. Having written an extension that also wanted a query ID, I disagree with this position. There's only one query ID field available, and you can't use two extensions that care about query ID unless they compute it the same way, and replicating all the code that computes the query ID into each new extension that wants one sucks. I think we should actually bite the bullet and move all of that code into core, and then just let extensions say whether they care about it getting set. Also, I think this is now the third independent request to expose query ID in pg_stat_statements. I think we should give the people what they want. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Mar 15, 2019 at 9:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> pg_stat_statements has a notion of query ID, but that notion might be >> quite inappropriate for other usages, which is why it's an extension >> and not core. > Having written an extension that also wanted a query ID, I disagree > with this position. [ shrug... ] The fact remains that pg_stat_statements's definition is pretty lame. There's a lot of judgment calls in which query fields it chooses to examine or ignore, and there's been no attempt at all to make the ID PG-version-independent, and I rather doubt that it's platform-independent either. Nor will the IDs survive a dump/reload even on the same server, since object OIDs will likely change. These things are OK, or at least mostly tolerable, for pg_stat_statements' usage ... but I don't think it's a good idea to have the core code dictating that definition to all extensions. Right now, if you have an extension that needs some other query-ID definition, you can do it, you just can't run that extension alongside pg_stat_statements. But you'll be out of luck if the core code starts filling that field. I'd be happier about having the core code compute a query ID if we had a definition that was not so obviously slapped together. regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
On Sat, Mar 16, 2019 at 5:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > > On Fri, Mar 15, 2019 at 9:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> pg_stat_statements has a notion of query ID, but that notion might be > >> quite inappropriate for other usages, which is why it's an extension > >> and not core. > > > Having written an extension that also wanted a query ID, I disagree > > with this position. > > [ shrug... ] The fact remains that pg_stat_statements's definition is > pretty lame. There's a lot of judgment calls in which query fields > it chooses to examine or ignore, and there's been no attempt at all > to make the ID PG-version-independent, and I rather doubt that it's > platform-independent either. Nor will the IDs survive a dump/reload > even on the same server, since object OIDs will likely change. > > These things are OK, or at least mostly tolerable, for pg_stat_statements' > usage ... but I don't think it's a good idea to have the core code > dictating that definition to all extensions. Right now, if you have > an extension that needs some other query-ID definition, you can do it, > you just can't run that extension alongside pg_stat_statements. > But you'll be out of luck if the core code starts filling that field. > > I'd be happier about having the core code compute a query ID if we > had a definition that was not so obviously slapped together. But the queryId itself is stored in core. Exposing it in pg_stat_activity or log_line_prefix would still allow users to choose the implementation of their choice, or none. That seems like a different complaint from asking pgss integration in core to have all its metrics available by default (or at least without a restart). Maybe we could add a GUC for pg_stat_statements to choose whether it should set the queryid itself and not, if anyone wants to have its metrics but with different queryid semantics?
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
legrand legrand
Date:
Hello, This is available in https://github.com/legrandlegrand/pg_stat_sql_plans extension with a specific function pgssp_backend_queryid(pid) that permits to join pg_stat_activity with pg_stat_sql_plans (that is similar to pg_stat_statements) and also permits to collect samples of wait events per query id. This extension computes its own queryid based on a normalized query text (that doesn't change after table drop/create). Maybe that queryid calculation should stay in a dedicated extension, permiting to users to choose their queryid definition. Regards PAscal -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Thanks a lot for really good points!! I did not expected I will get this many points of view. :P
I have identical experience with Robert when other extension calculate the id different as PGSS, PGSS will overwritten that id when it is on. But Tom got a point that if we centralize the logic that pgss has, then other extension will have no way to change it unless we have some new config to toggle pointed out by Julien. Also Tom got the concern about the current PGSS jumble query logic is not bullet proof and may take time then impact the perf.
Let's take one step back. Since queryId is stored in core as Julien pointed out, can we just add that global to the pg_stat_get_activity and ultimately exposed in pg_stat_activity view? Then no matter whether PGSS is on or off, or however the customer extensions are updating that filed, we expose that field in that view then enable user to leverage that id to join with pgss or their extension. Will this sounds a good idea?
Thanks again,
Yun
On Sat, Mar 16, 2019 at 11:01 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Sat, Mar 16, 2019 at 5:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Fri, Mar 15, 2019 at 9:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> pg_stat_statements has a notion of query ID, but that notion might be
> >> quite inappropriate for other usages, which is why it's an extension
> >> and not core.
>
> > Having written an extension that also wanted a query ID, I disagree
> > with this position.
>
> [ shrug... ] The fact remains that pg_stat_statements's definition is
> pretty lame. There's a lot of judgment calls in which query fields
> it chooses to examine or ignore, and there's been no attempt at all
> to make the ID PG-version-independent, and I rather doubt that it's
> platform-independent either. Nor will the IDs survive a dump/reload
> even on the same server, since object OIDs will likely change.
>
> These things are OK, or at least mostly tolerable, for pg_stat_statements'
> usage ... but I don't think it's a good idea to have the core code
> dictating that definition to all extensions. Right now, if you have
> an extension that needs some other query-ID definition, you can do it,
> you just can't run that extension alongside pg_stat_statements.
> But you'll be out of luck if the core code starts filling that field.
>
> I'd be happier about having the core code compute a query ID if we
> had a definition that was not so obviously slapped together.
But the queryId itself is stored in core. Exposing it in
pg_stat_activity or log_line_prefix would still allow users to choose
the implementation of their choice, or none. That seems like a
different complaint from asking pgss integration in core to have all
its metrics available by default (or at least without a restart).
Maybe we could add a GUC for pg_stat_statements to choose whether it
should set the queryid itself and not, if anyone wants to have its
metrics but with different queryid semantics?
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Nikolay Samokhvalov
Date:
Hello
On Sat, Mar 16, 2019 at 7:32 AM Robert Haas <robertmhaas@gmail.com> wrote:
Also, I think this is now the third independent request to expose
query ID in pg_stat_statements. I think we should give the people
what they want.
Count me as the 4th.
This would be a very important feature for automated query analysis.
pg_stat_statements lacks query examples, and the only way to get them is from the logs.
Where we don't have queryid as well. So people end up either doing it manually or writing
yet another set of nasty regular expressions.
Routing query analysis s a crucial for any large project. If there are chances to implement
queryid for pg_stat_activity (or anything that will allow to automate query analysis)
in Postgres 12 or later -- this would be a great news and huge support for engineers.
Same level as recently implemented sampling for statement logging.
By the way, if queryid goes to the core someday, I'm sure it is worth to consider using
it in logs as well.
Thanks,
Nik
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
On Mon, Mar 18, 2019 at 6:23 PM Yun Li <liyunjuanyong@gmail.com> wrote: > > Let's take one step back. Since queryId is stored in core as Julien pointed out, can we just add that global to the pg_stat_get_activityand ultimately exposed in pg_stat_activity view? Then no matter whether PGSS is on or off, or howeverthe customer extensions are updating that filed, we expose that field in that view then enable user to leverage thatid to join with pgss or their extension. Will this sounds a good idea? I'd greatly welcome expose queryid exposure in pg_stat_activity, and also in log_line_prefix. I'm afraid that it's too late for pg12 inclusion, but I'll be happy to provide a patch for that for pg13.
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Maksim Milyutin
Date:
On 3/16/19 5:32 PM, Robert Haas wrote: > There's only one query ID field available, and > you can't use two extensions that care about query ID unless they > compute it the same way, and replicating all the code that computes > the query ID into each new extension that wants one sucks. I think we > should actually bite the bullet and move all of that code into core, > and then just let extensions say whether they care about it getting > set. +1. But I think that enough to integrate into core the query normalization routine and store generalized query strings (from which the queryId is produced) in shared memory (for example, hashtable that maps queryId to the text representation of generalized query). And activate normalization routine and filling the table of generalized queries by specified GUC. This allows to unbind extensions that require queryId from using pg_stat_statements and consider such computing of queryId as canonical. -- Regards, Maksim Milyutin
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
On Tue, Mar 19, 2019 at 2:45 PM Maksim Milyutin <milyutinma@gmail.com> wrote: > > But I think that enough to integrate into core the query normalization > routine and store generalized query strings (from which the queryId is > produced) in shared memory (for example, hashtable that maps queryId to > the text representation of generalized query). That's more or less how pg_stat_statements was previously behaving, and it had too many problems. Current implementation, with an external file, is a better alternative. > And activate > normalization routine and filling the table of generalized queries by > specified GUC. > > This allows to unbind extensions that require queryId from using > pg_stat_statements and consider such computing of queryId as canonical. The problem I see with this approach is that if you want a different implementation, you'll have to reimplement the in-core normalised queries saving and retrieval, but with a different set of SQL-visible functions. I don't think that's it's acceptable, unless we add a specific hook for query normalisation and queryid computing. But it isn't ideal either, as it would be a total mess if someone changes the implementation without resetting the previously saved normalised queries.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
On Mon, Mar 18, 2019 at 7:33 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Mon, Mar 18, 2019 at 6:23 PM Yun Li <liyunjuanyong@gmail.com> wrote: > > > > Let's take one step back. Since queryId is stored in core as Julien pointed out, can we just add that global to the pg_stat_get_activityand ultimately exposed in pg_stat_activity view? Then no matter whether PGSS is on or off, or howeverthe customer extensions are updating that filed, we expose that field in that view then enable user to leverage thatid to join with pgss or their extension. Will this sounds a good idea? > > I'd greatly welcome expose queryid exposure in pg_stat_activity, and > also in log_line_prefix. I'm afraid that it's too late for pg12 > inclusion, but I'll be happy to provide a patch for that for pg13. Here's a prototype patch for queryid exposure in pg_stat_activity and log_line prefix.
Attachment
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Jim Finnerty
Date:
The queryId depends on oids, so it is not stable enough for some purposes. For example, to create a SQL identifier that survives across a server upgrade, or that can be shipped to another database, the queryId isn't usable. The apg_plan_mgmt extensions keeps both its own stable SQL identifier as well as the queryId, so it can be used to join to pg_stat_statements if desired. If we were to standardize on one SQL identifier, it should be stable enough to survive a major version upgrade or to be the same in different databases. ----- Jim Finnerty, AWS, Amazon Aurora PostgreSQL -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Robert Haas
Date:
On Tue, Mar 19, 2019 at 1:24 PM Jim Finnerty <jfinnert@amazon.com> wrote: > The queryId depends on oids, so it is not stable enough for some purposes. > For example, to create a SQL identifier that survives across a server > upgrade, or that can be shipped to another database, the queryId isn't > usable. > > The apg_plan_mgmt extensions keeps both its own stable SQL identifier as > well as the queryId, so it can be used to join to pg_stat_statements if > desired. If we were to standardize on one SQL identifier, it should be > stable enough to survive a major version upgrade or to be the same in > different databases. If Amazon would like to open-source its (AIUI) proprietary technology for computing query IDs and propose it for inclusion in PostgreSQL, cool, but I think that is a separate question from whether people would like more convenient access to the query ID technology that we have today. I think it's 100% clear that they would like that, even as things stand, and therefore it does not make sense to block that behind Amazon deciding to share what it already has or somebody else trying to reimplement it. If we need to have a space for both a core-standard query ID and another query ID that is available for extension use, adding one more field to struct Query, so we can have both coreQueryId and extensionQueryId or whatever, would be easy to do. It appears that there's more use case than I would have guessed for custom query IDs. On the other hand, it also appears that a lot of people would be very, very happy to just be able to see the query ID field that already exists, both in pg_stat_statements in pg_stat_activity, and we shouldn't throw up unnecessary impediments in the way of making that happen, at least IMHO. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
legrand legrand
Date:
Great, thank you Julien ! Would it make sense to add it in auto explain ? I don't know for explain itself, but maybe ... Regards PAscal -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
On Tue, Mar 19, 2019 at 8:38 PM legrand legrand <legrand_legrand@hotmail.com> wrote: > > Would it make sense to add it in auto explain ? > I don't know for explain itself, but maybe ... I'd think that people interested in getting the queryid in the logs would configure the log_line_prefix to display it consistently rather than having it in only a subset of cases, so that's probably not really needed.
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
legrand legrand
Date:
Hi Jim, Robert, As this is a distinct subject from adding QueryId to pg_stat_activity, would it be possible to continue the discussion "new QueryId definition" (for postgres open source software) here: https://www.postgresql.org/message-id/1553029215728-0.post@n3.nabble.com Thanks in advance. Regards PAscal -- Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
RE: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
legrand legrand
Date:
>> Would it make sense to add it in auto explain ?
>> I don't know for explain itself, but maybe ...
> I'd think that people interested in getting the queryid in the logs
> would configure the log_line_prefix to display it consistently rather
> than having it in only a subset of cases, so that's probably not
> really needed.
Ok.
Shoudn't you add this to commitfest ?
>> I don't know for explain itself, but maybe ...
> I'd think that people interested in getting the queryid in the logs
> would configure the log_line_prefix to display it consistently rather
> than having it in only a subset of cases, so that's probably not
> really needed.
Ok.
Shoudn't you add this to commitfest ?
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
On Mon, Mar 25, 2019 at 12:36 PM legrand legrand <legrand_legrand@hotmail.com> wrote: > > >> Would it make sense to add it in auto explain ? > >> I don't know for explain itself, but maybe ... > > > I'd think that people interested in getting the queryid in the logs > > would configure the log_line_prefix to display it consistently rather > > than having it in only a subset of cases, so that's probably not > > really needed. > > Ok. > Shoudn't you add this to commitfest ? I added it last week, see https://commitfest.postgresql.org/23/2069/
RE: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
legrand legrand
Date:
>> Shoudn't you add this to commitfest ?
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
On Tue, Mar 19, 2019 at 3:51 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Mon, Mar 18, 2019 at 7:33 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Mon, Mar 18, 2019 at 6:23 PM Yun Li <liyunjuanyong@gmail.com> wrote: > > > > > > Let's take one step back. Since queryId is stored in core as Julien pointed out, can we just add that global to thepg_stat_get_activity and ultimately exposed in pg_stat_activity view? Then no matter whether PGSS is on or off, or howeverthe customer extensions are updating that filed, we expose that field in that view then enable user to leverage thatid to join with pgss or their extension. Will this sounds a good idea? > > > > I'd greatly welcome expose queryid exposure in pg_stat_activity, and > > also in log_line_prefix. I'm afraid that it's too late for pg12 > > inclusion, but I'll be happy to provide a patch for that for pg13. > > Here's a prototype patch for queryid exposure in pg_stat_activity and > log_line prefix. Patch doesn't apply anymore, PFA rebased v2.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
On Fri, Jun 28, 2019 at 4:39 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Tue, Mar 19, 2019 at 3:51 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Mon, Mar 18, 2019 at 7:33 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > On Mon, Mar 18, 2019 at 6:23 PM Yun Li <liyunjuanyong@gmail.com> wrote: > > > > > > > > Let's take one step back. Since queryId is stored in core as Julien pointed out, can we just add that global to thepg_stat_get_activity and ultimately exposed in pg_stat_activity view? Then no matter whether PGSS is on or off, or howeverthe customer extensions are updating that filed, we expose that field in that view then enable user to leverage thatid to join with pgss or their extension. Will this sounds a good idea? > > > > > > I'd greatly welcome expose queryid exposure in pg_stat_activity, and > > > also in log_line_prefix. I'm afraid that it's too late for pg12 > > > inclusion, but I'll be happy to provide a patch for that for pg13. > > > > Here's a prototype patch for queryid exposure in pg_stat_activity and > > log_line prefix. > > Patch doesn't apply anymore, PFA rebased v2. Sorry, I missed the new pg_stat_gssapi view.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Peter Geoghegan
Date:
On Tue, Mar 19, 2019 at 12:00 PM Robert Haas <robertmhaas@gmail.com> wrote: > On the other hand, it also appears that a lot of people would be very, > very happy to just be able to see the query ID field that already > exists, both in pg_stat_statements in pg_stat_activity, and we > shouldn't throw up unnecessary impediments in the way of making that > happen, at least IMHO. +1. pg_stat_statements will already lose all the statistics that it aggregated in the event of a hard crash. The trade-off that the query jumbling logic makes is not a bad one, all things considered. -- Peter Geoghegan
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Peter Geoghegan
Date:
On Tue, Mar 19, 2019 at 12:38 PM legrand legrand <legrand_legrand@hotmail.com> wrote: > Would it make sense to add it in auto explain ? > I don't know for explain itself, but maybe ... I think that it should appear in EXPLAIN. pg_stat_statements already cannot have a query hash of zero, so it might be okay to display it only when its value is non-zero. -- Peter Geoghegan
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Evgeny Efimkin
Date:
What reason to use pg_atomic_uint64? In docs: occured - > occurred
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
Hello, On Wed, Jul 31, 2019 at 10:55 AM Evgeny Efimkin <efimkin@yandex-team.ru> wrote: > > What reason to use pg_atomic_uint64? The queryid is read and written without holding any lock on the PGPROC entry, so the pg_atomic_uint64 will guarantee that we get a consistent value in pg_stat_get_activity(). Other reads shouldn't be a problem as far as I remember. > In docs: > occured - > occurred Thanks! I fixed it on my local branch.
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Andres Freund
Date:
Hi, On 2019-07-31 23:51:40 +0200, Julien Rouhaud wrote: > On Wed, Jul 31, 2019 at 10:55 AM Evgeny Efimkin <efimkin@yandex-team.ru> wrote: > > What reason to use pg_atomic_uint64? > > The queryid is read and written without holding any lock on the PGPROC > entry, so the pg_atomic_uint64 will guarantee that we get a consistent > value in pg_stat_get_activity(). Other reads shouldn't be a problem > as far as I remember. Hm, I don't think that's necessary in this case. That's what the st_changecount protocol is trying to ensure, no? /* * To avoid locking overhead, we use the following protocol: a backend * increments st_changecount before modifying its entry, and again after * finishing a modification. A would-be reader should note the value of * st_changecount, copy the entry into private memory, then check * st_changecount again. If the value hasn't changed, and if it's even, * the copy is valid; otherwise start over. This makes updates cheap * while reads are potentially expensive, but that's the tradeoff we want. * * The above protocol needs memory barriers to ensure that the apparent * order of execution is as it desires. Otherwise, for example, the CPU * might rearrange the code so that st_changecount is incremented twice * before the modification on a machine with weak memory ordering. Hence, * use the macros defined below for manipulating st_changecount, rather * than touching it directly. */ int st_changecount; And if it were necessary, why wouldn't any of the other fields in PgBackendStatus need it? There's plenty of other fields written to without a lock, and several of those are also 8 bytes (so it's not a case of assuming that 8 byte reads might not be atomic, but for byte reads are). Greetings, Andres Freund
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
On Wed, Jul 31, 2019 at 11:59 PM Andres Freund <andres@anarazel.de> wrote: > > On 2019-07-31 23:51:40 +0200, Julien Rouhaud wrote: > > On Wed, Jul 31, 2019 at 10:55 AM Evgeny Efimkin <efimkin@yandex-team.ru> wrote: > > > What reason to use pg_atomic_uint64? > > > > The queryid is read and written without holding any lock on the PGPROC > > entry, so the pg_atomic_uint64 will guarantee that we get a consistent > > value in pg_stat_get_activity(). Other reads shouldn't be a problem > > as far as I remember. > > Hm, I don't think that's necessary in this case. That's what the > st_changecount protocol is trying to ensure, no? > > /* > * To avoid locking overhead, we use the following protocol: a backend > * increments st_changecount before modifying its entry, and again after > * finishing a modification. A would-be reader should note the value of > * st_changecount, copy the entry into private memory, then check > * st_changecount again. If the value hasn't changed, and if it's even, > * the copy is valid; otherwise start over. This makes updates cheap > * while reads are potentially expensive, but that's the tradeoff we want. > * > * The above protocol needs memory barriers to ensure that the apparent > * order of execution is as it desires. Otherwise, for example, the CPU > * might rearrange the code so that st_changecount is incremented twice > * before the modification on a machine with weak memory ordering. Hence, > * use the macros defined below for manipulating st_changecount, rather > * than touching it directly. > */ > int st_changecount; > > > And if it were necessary, why wouldn't any of the other fields in > PgBackendStatus need it? There's plenty of other fields written to > without a lock, and several of those are also 8 bytes (so it's not a > case of assuming that 8 byte reads might not be atomic, but for byte > reads are). This patch is actually storing the queryid in PGPROC, not in PgBackendStatus, thus the need for an atomic. I used PGPROC because the value needs to be available in log_line_prefix() and spi.c, so pgstat.c / PgBackendStatus didn't seem like the best interface in that case. Is widening PGPROC is too expensive for this purpose?
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Robert Haas
Date:
On Thu, Aug 1, 2019 at 2:46 AM Julien Rouhaud <julien.rouhaud@free.fr> wrote: > This patch is actually storing the queryid in PGPROC, not in > PgBackendStatus, thus the need for an atomic. I used PGPROC because > the value needs to be available in log_line_prefix() and spi.c, so > pgstat.c / PgBackendStatus didn't seem like the best interface in that > case. Is widening PGPROC is too expensive for this purpose? I doubt it. However, I think that the fact that this patch adds 15 new calls to pg_atomic_write_u64(&MyProc->queryId, ...) is probably not a good sign. It seems like we ought to be able to centralize it better than that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Andres Freund
Date:
Hi, On 2019-08-01 08:45:45 +0200, Julien Rouhaud wrote: > On Wed, Jul 31, 2019 at 11:59 PM Andres Freund <andres@anarazel.de> wrote: > > And if it were necessary, why wouldn't any of the other fields in > > PgBackendStatus need it? There's plenty of other fields written to > > without a lock, and several of those are also 8 bytes (so it's not a > > case of assuming that 8 byte reads might not be atomic, but for byte > > reads are). > > This patch is actually storing the queryid in PGPROC, not in > PgBackendStatus, thus the need for an atomic. I used PGPROC because > the value needs to be available in log_line_prefix() and spi.c, so > pgstat.c / PgBackendStatus didn't seem like the best interface in that > case. Hm. I'm not convinced that really is the case? You can just access MyBEentry, and read and update it? I mean, we do so at a frequency roughtly as high as high as the new queryid updates for things like pgstat_report_activity(). Reading the value of your own backend you'd not need to follow the changecount algorithm, I think, because it's only updated from the current backend. If reading were a problem, you trivially just could have a cache in a local variable, to avoid accessing shared memory. > Is widening PGPROC is too expensive for this purpose? Well, I'm mostly not a fan of putting even more in there, because it's pretty hard to understand already. To me it architecturally status information doesn't belong there (In fact, I'm somewhat unhappy that wait_event_info etc in there, but that's at least commonly updated at the same time as other fields in PGPROC). Greetings, Andres Freund
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Andres Freund
Date:
On 2019-08-01 14:20:46 -0400, Robert Haas wrote: > However, I think that the fact that this patch adds 15 new calls to > pg_atomic_write_u64(&MyProc->queryId, ...) is probably not a good > sign. It seems like we ought to be able to centralize it better than > that. +1
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
On Thu, Aug 1, 2019 at 8:36 PM Andres Freund <andres@anarazel.de> wrote: > > On 2019-08-01 08:45:45 +0200, Julien Rouhaud wrote: > > On Wed, Jul 31, 2019 at 11:59 PM Andres Freund <andres@anarazel.de> wrote: > > > And if it were necessary, why wouldn't any of the other fields in > > > PgBackendStatus need it? There's plenty of other fields written to > > > without a lock, and several of those are also 8 bytes (so it's not a > > > case of assuming that 8 byte reads might not be atomic, but for byte > > > reads are). > > > > This patch is actually storing the queryid in PGPROC, not in > > PgBackendStatus, thus the need for an atomic. I used PGPROC because > > the value needs to be available in log_line_prefix() and spi.c, so > > pgstat.c / PgBackendStatus didn't seem like the best interface in that > > case. > > Hm. I'm not convinced that really is the case? You can just access > MyBEentry, and read and update it? Sure, but it requires extra wrapper functions, and the st_changecount dance when writing the new value. > I mean, we do so at a frequency > roughtly as high as high as the new queryid updates for things like > pgstat_report_activity(). pgstat_report_activity() is only called for top-level statement. For the queryid we need to track it down to all nested statements, which could be way higher. But pgstat_progress_update_param() is called way more than that. > Reading the value of your own backend you'd > not need to follow the changecount algorithm, I think, because it's only > updated from the current backend. If reading were a problem, you > trivially just could have a cache in a local variable, to avoid > accessing shared memory. Yes definitely, except for pgstat_get_activity(), all reads are backend local and should be totally safe to read as is.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
On Thu, Aug 1, 2019 at 8:36 PM Andres Freund <andres@anarazel.de> wrote: > > On 2019-08-01 14:20:46 -0400, Robert Haas wrote: > > However, I think that the fact that this patch adds 15 new calls to > > pg_atomic_write_u64(&MyProc->queryId, ...) is probably not a good > > sign. It seems like we ought to be able to centralize it better than > > that. > > +1 Unfortunately I didn't find a better way to do that. Since you can have nested execution, I don't see how to avoid adding extra code in every parts of query execution.
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Andres Freund
Date:
Hi, On 2019-08-01 22:42:23 +0200, Julien Rouhaud wrote: > On Thu, Aug 1, 2019 at 8:36 PM Andres Freund <andres@anarazel.de> wrote: > > > > On 2019-08-01 08:45:45 +0200, Julien Rouhaud wrote: > > > On Wed, Jul 31, 2019 at 11:59 PM Andres Freund <andres@anarazel.de> wrote: > > > > And if it were necessary, why wouldn't any of the other fields in > > > > PgBackendStatus need it? There's plenty of other fields written to > > > > without a lock, and several of those are also 8 bytes (so it's not a > > > > case of assuming that 8 byte reads might not be atomic, but for byte > > > > reads are). > > > > > > This patch is actually storing the queryid in PGPROC, not in > > > PgBackendStatus, thus the need for an atomic. I used PGPROC because > > > the value needs to be available in log_line_prefix() and spi.c, so > > > pgstat.c / PgBackendStatus didn't seem like the best interface in that > > > case. > > > > Hm. I'm not convinced that really is the case? You can just access > > MyBEentry, and read and update it? > > Sure, but it requires extra wrapper functions, and the st_changecount > dance when writing the new value. So? You need a wrapper function anyway, there's no way we're going to add all those separate pg_atomic_write* calls directly. > > I mean, we do so at a frequency > > roughtly as high as high as the new queryid updates for things like > > pgstat_report_activity(). > > pgstat_report_activity() is only called for top-level statement. For > the queryid we need to track it down to all nested statements, which > could be way higher. Compared to the overhead of executing a separate query the cost of single function call containing a MyBEentry update of an 8byte value seems almost guaranteed to be immeasurable. The executor startup alone is several orders of magnitude more expensive. I also think this proposed column should probably respect the track_activities GUC. Greetings, Andres Freund
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Andres Freund
Date:
Hi, On 2019-08-01 22:49:48 +0200, Julien Rouhaud wrote: > On Thu, Aug 1, 2019 at 8:36 PM Andres Freund <andres@anarazel.de> wrote: > > > > On 2019-08-01 14:20:46 -0400, Robert Haas wrote: > > > However, I think that the fact that this patch adds 15 new calls to > > > pg_atomic_write_u64(&MyProc->queryId, ...) is probably not a good > > > sign. It seems like we ought to be able to centralize it better than > > > that. > > > > +1 > > Unfortunately I didn't find a better way to do that. Since you can > have nested execution, I don't see how to avoid adding extra code in > every parts of query execution. At least my +1 is not primarily about the number of sites that need to handle queryid changes, but that they all need to know about the way the queryid is stored. Including how atomicity etc is handled. That knowledge should be in one or two places, not more. In a file where that knowledge makes sense. I'm *also* concerned about the number of places, as that makes it likely that some have been missed/new ones will be introduced without the queryid handling. But that wasn't what I was referring to above. I'm actually quite unconvinced that it's sensible to update the global value for nested queries. That'll mean e.g. the log_line_prefix and pg_stat_activity values are most of the time going to be bogus while nested, because the querystring that's associated with those will *not* be the value that the queryid corresponds to. elog.c uses debug_query_string to log the statement, which is only updated for top-level queries (outside of some exceptions like parallel workers for parallel queries in a function or stuff like that). And pg_stat_activity is also only updated for top level queries. Greetings, Andres Freund
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
On Thu, Aug 1, 2019 at 10:52 PM Andres Freund <andres@anarazel.de> wrote: > > On 2019-08-01 22:42:23 +0200, Julien Rouhaud wrote: > > Sure, but it requires extra wrapper functions, and the st_changecount > > dance when writing the new value. > > So? You need a wrapper function anyway, there's no way we're going to > add all those separate pg_atomic_write* calls directly. Ok > I also think this proposed column should probably respect > the track_activities GUC. Oh indeed, I'll fix that when I'll be sure of the semantics to implement.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
On Thu, Aug 1, 2019 at 11:05 PM Andres Freund <andres@anarazel.de> wrote: > > I'm actually quite unconvinced that it's sensible to update the global > value for nested queries. That'll mean e.g. the log_line_prefix and > pg_stat_activity values are most of the time going to be bogus while > nested, because the querystring that's associated with those will *not* > be the value that the queryid corresponds to. elog.c uses > debug_query_string to log the statement, which is only updated for > top-level queries (outside of some exceptions like parallel workers for > parallel queries in a function or stuff like that). And pg_stat_activity > is also only updated for top level queries. Having the nested queryid seems indeed quite broken for log_line_prefix. However having the nested queryid in pg_stat_activity would be convenient to track what is a long stored functions currently doing. Maybe we could expose something like top_level_queryid and current_queryid instead?
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Andres Freund
Date:
Hi, On 2019-08-02 10:54:35 +0200, Julien Rouhaud wrote: > On Thu, Aug 1, 2019 at 11:05 PM Andres Freund <andres@anarazel.de> wrote: > > > > I'm actually quite unconvinced that it's sensible to update the global > > value for nested queries. That'll mean e.g. the log_line_prefix and > > pg_stat_activity values are most of the time going to be bogus while > > nested, because the querystring that's associated with those will *not* > > be the value that the queryid corresponds to. elog.c uses > > debug_query_string to log the statement, which is only updated for > > top-level queries (outside of some exceptions like parallel workers for > > parallel queries in a function or stuff like that). And pg_stat_activity > > is also only updated for top level queries. > > Having the nested queryid seems indeed quite broken for > log_line_prefix. However having the nested queryid in > pg_stat_activity would be convenient to track what is a long stored > functions currently doing. Maybe we could expose something like > top_level_queryid and current_queryid instead? Given that the query string is the toplevel one, I think that'd just be confusing. And given the fact that it adds *substantial* additional complexity, I'd just rip the subcommand bits out. Greetings, Andres Freund
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
Hi, On Sat, Aug 3, 2019 at 1:21 AM Andres Freund <andres@anarazel.de> wrote: > > On 2019-08-02 10:54:35 +0200, Julien Rouhaud wrote: > > However having the nested queryid in > > pg_stat_activity would be convenient to track what is a long stored > > functions currently doing. Maybe we could expose something like > > top_level_queryid and current_queryid instead? > > Given that the query string is the toplevel one, I think that'd just be > confusing. And given the fact that it adds *substantial* additional > complexity, I'd just rip the subcommand bits out. Ok, so here's a version that only exposes the top-level queryid only. There can still be discrepancies with the query field, if a multi-command string is provided. The queryid will be updated each time a new top level statement is executed. As the queryid cannot be immediately known, and may never exist at all if a query fails to parse, here are the heuristic I used to update the stored queryid: - it's reset to 0 each time pgstat_report_activity(STATE_RUNNING) is called. This way, we're sure that we don't display last query's queryid in the logs if the next query fails to parse - it's also reset to 0 at the beginning of exec_simple_query() loop on the parsetree_list (for multi-command string case) - pg_analyze_and_rewrite() and pg_analyze_and_rewrite_params() will report the new queryid after parse analysis. - a non-zero queryid will only be updated if the stored one is zero This should also work as intended for background worker using SPI, provided that they correctly call pgstat_report_activity. I also modified ExecInitParallelPlan() to publish the queryId in the serialized plannedStmt, so ParallelQueryMain() can report it to make the queryid available in the parallel workers too. Note that this patch makes it clear that a zero queryid means no queryid computed (and NULL will be displayed in such case in pg_stat_activity). pg_stat_statements already makes sure that it cannot compute a zero queryid. It also assume that any extension computing a queryid will do that in the post_parse_analysis hook, which seems like a sane requirement. We may want to have a dedicated hook for that instead, if more people get interested in having the queryid only, possibly different implementations, if it becomes available outside pgss.
Attachment
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
legrand legrand
Date:
> However having the nested queryid in > pg_stat_activity would be convenient to track > what is a long stored functions currently doing. +1 And this could permit to get wait event sampling per queryid when pg_stat_statements.track = all Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Kyotaro Horiguchi
Date:
At Sun, 4 Aug 2019 00:04:01 -0700 (MST), legrand legrand <legrand_legrand@hotmail.com> wrote in <1564902241482-0.post@n3.nabble.com> > > However having the nested queryid in > > pg_stat_activity would be convenient to track > > what is a long stored functions currently doing. > > +1 > > And this could permit to get wait event sampling per queryid when > pg_stat_statements.track = all I'm strongly on this side emotionally, but also I'm on Tom and Andres's side that exposing querid that way is not the right thing. Doing that means we don't need exact correspondence between top-level query and queryId (in nested or multistatement queries) in this patch. pg_stat_statements will allow us to do the same thing by having additional uint64[MaxBackends] array in pgssSharedState, instead of expanding PgBackendStatus array in core by the same size. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
On Mon, Aug 5, 2019 at 9:28 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Sun, 4 Aug 2019 00:04:01 -0700 (MST), legrand legrand <legrand_legrand@hotmail.com> wrote in <1564902241482-0.post@n3.nabble.com> > > > However having the nested queryid in > > > pg_stat_activity would be convenient to track > > > what is a long stored functions currently doing. > > > > +1 > > > > And this could permit to get wait event sampling per queryid when > > pg_stat_statements.track = all > > I'm strongly on this side emotionally, but also I'm on Tom and > Andres's side that exposing querid that way is not the right > thing. > > Doing that means we don't need exact correspondence between > top-level query and queryId (in nested or multistatement queries) > in this patch. pg_stat_statements will allow us to do the same > thing by having additional uint64[MaxBackends] array in > pgssSharedState, instead of expanding PgBackendStatus array in > core by the same size. Sure, but the problem with this approach is that all extensions that compute their own queryid would have to do the same. I hope that we can come up with an approach friendlier for those extensions.
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
legrand legrand
Date:
Kyotaro Horiguchi-4 wrote > At Sun, 4 Aug 2019 00:04:01 -0700 (MST), legrand legrand < > legrand_legrand@ > > wrote in < > 1564902241482-0.post@.nabble >> >> > However having the nested queryid in >> > pg_stat_activity would be convenient to track >> > what is a long stored functions currently doing. >> >> +1 >> >> And this could permit to get wait event sampling per queryid when >> pg_stat_statements.track = all > > I'm strongly on this side emotionally, but also I'm on Tom and > Andres's side that exposing querid that way is not the right > thing. > > Doing that means we don't need exact correspondence between > top-level query and queryId (in nested or multistatement queries) > in this patch. pg_stat_statements will allow us to do the same > thing by having additional uint64[MaxBackends] array in > pgssSharedState, instead of expanding PgBackendStatus array in > core by the same size. > > regards. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center Hi Kyotaro, Thank you for this answer. What you propose here is already available Inside pg_stat_sql_plans extension (a derivative from Pg_stat_statements and pg_store_plans) And I’m used to this queryid behavior with top Level Queries... My emotion was high but I will accept it ! Regards PAscal -- Sent from: https://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Evgeny Efimkin
Date:
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: tested, passed Documentation: tested, passed HI! patch is look good for me. The new status of this patch is: Ready for Committer
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Michael Paquier
Date:
On Wed, Aug 07, 2019 at 09:03:21AM +0000, Evgeny Efimkin wrote: > The new status of this patch is: Ready for Committer I may be wrong of course, but it looks that this is wanted and the current shape of the patch looks sensible: - Register the query ID using a backend entry. - Only consider the top-level query. An invalid query ID is assumed to be 0 in the patch, per the way it is defined in pg_stat_statements. However this also maps with the case where we have a utility statement. + * We only report the top-level query identifiers. The stored queryid is + * reset when a backend call pgstat_report_activity(STATE_RUNNING), or with s/call/calls/ + /* + * We only report the top-level query identifiers. The stored queryid is + * reset when a backend call pgstat_report_activity(STATE_RUNNING), or with + * an explicit call to this function. If the saved query identifier is not + * zero it means that it's not a top-level command, so ignore the one + * provided unless it's an explicit call to reset the identifier. + */ + if (queryId != 0 && beentry->st_queryid != 0) + return; Hmm. I am wondering if we shouldn't have an API dedicated to the reset of the query ID. That logic looks rather brittle.. Wouldn't it be better (and more consistent) to update the query ID in parse_analyze_varparams() and parse_analyze() as well after going through the post_parse_analyze hook instead of pg_analyze_and_rewrite? + /* + * If a new query is started, we reset the query identifier as it'll only + * be known after parse analysis, to avoid reporting last query's + * identifier. + */ + if (state == STATE_RUNNING) + beentry->st_queryid = 0 I don't quite get why you don't reset the counter in other cases as well. If the backend entry is idle in transaction or in an idle state, it seems to me that we should not report the query ID of the last query run in the transaction. And that would make the reset in exec_simple_query() unnecessary, no? -- Michael
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
Thanks for looking at it! On Wed, Sep 11, 2019 at 6:45 AM Michael Paquier <michael@paquier.xyz> wrote: > > An invalid query ID is assumed to be 0 in the patch, per the way it is > defined in pg_stat_statements. However this also maps with the case > where we have a utility statement. Oh indeed. Which means that if a utility statements later calls parse_analyze or friends, this patch would report an unexpected queryid. That's at least possible for something like COPY (SELECT * FROM tbl) TO ... The thing is that pg_stat_statements assigns a 0 queryid in the post_parse_analyze_hook to recognize utility statements and avoid tracking instrumentation twice in case of utility statements, and then compute a queryid base on a hash of the query text. Maybe we could instead fully reserve queryid "2" for utility statements (so forcing queryid "1" for standard queries if jumbling returns 0 *or* 2 instead of only 0), and use "2" as the identifier for utility statement instead of "0"? > + /* > + * We only report the top-level query identifiers. The stored queryid is > + * reset when a backend call pgstat_report_activity(STATE_RUNNING), or with > + * an explicit call to this function. If the saved query identifier is not > + * zero it means that it's not a top-level command, so ignore the one > + * provided unless it's an explicit call to reset the identifier. > + */ > + if (queryId != 0 && beentry->st_queryid != 0) > + return; > Hmm. I am wondering if we shouldn't have an API dedicated to the > reset of the query ID. That logic looks rather brittle.. How about adding a "bool force" parameter to allow resetting the queryid to 0? > Wouldn't it be better (and more consistent) to update the query ID in > parse_analyze_varparams() and parse_analyze() as well after going > through the post_parse_analyze hook instead of pg_analyze_and_rewrite? I thought about it without knowing what would be best. I'll change to report the queryid right after calling post_parse_analyze_hook then. > + /* > + * If a new query is started, we reset the query identifier as it'll only > + * be known after parse analysis, to avoid reporting last query's > + * identifier. > + */ > + if (state == STATE_RUNNING) > + beentry->st_queryid = 0 > I don't quite get why you don't reset the counter in other cases as > well. If the backend entry is idle in transaction or in an idle > state, it seems to me that we should not report the query ID of the > last query run in the transaction. And that would make the reset in > exec_simple_query() unnecessary, no? I'm reproducing the same behavior as for the query text, ie. showing the information about the last executed query text if state is idle: + <entry><structfield>queryid</structfield></entry> + <entry><type>bigint</type></entry> + <entry>Identifier of this backend's most recent query. If + <structfield>state</structfield> is <literal>active</literal> this field + shows the identifier of the currently executing query. In all other + states, it shows the identifier of last query that was executed. I think that showing the last executed query's queryid is as useful as the query text. Also, while avoiding a reset in exec_simple_query() it'd be required to do such reset in case of error during query execution, so that wouldn't make things quite simpler..
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Michael Paquier
Date:
On Wed, Sep 11, 2019 at 06:30:22PM +0200, Julien Rouhaud wrote: > The thing is that pg_stat_statements assigns a 0 queryid in the > post_parse_analyze_hook to recognize utility statements and avoid > tracking instrumentation twice in case of utility statements, and then > compute a queryid base on a hash of the query text. Maybe we could > instead fully reserve queryid "2" for utility statements (so forcing > queryid "1" for standard queries if jumbling returns 0 *or* 2 instead > of only 0), and use "2" as the identifier for utility statement > instead of "0"? Hmm. Not sure. At this stage it would be nice to gather more input on the matter, and FWIW, I don't like much the assumption that a query ID of 0 is perhaps a utility statement, or perhaps nothing depending on the state of a backend entry, or even perhaps something else depending how on how modules make use and define such query IDs. -- Michael
Attachment
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Mon, Nov 11, 2019 at 05:37:30PM +0900, Michael Paquier wrote: > On Wed, Sep 11, 2019 at 06:30:22PM +0200, Julien Rouhaud wrote: > > The thing is that pg_stat_statements assigns a 0 queryid in the > > post_parse_analyze_hook to recognize utility statements and avoid > > tracking instrumentation twice in case of utility statements, and then > > compute a queryid base on a hash of the query text. Maybe we could > > instead fully reserve queryid "2" for utility statements (so forcing > > queryid "1" for standard queries if jumbling returns 0 *or* 2 instead > > of only 0), and use "2" as the identifier for utility statement > > instead of "0"? > > Hmm. Not sure. At this stage it would be nice to gather more input > on the matter, and FWIW, I don't like much the assumption that a query > ID of 0 is perhaps a utility statement, or perhaps nothing depending > on the state of a backend entry, or even perhaps something else > depending how on how modules make use and define such query IDs. I thought each extension would export a function to compute the query id, and you would all that function with the pg_stat_activity.query string. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
On Wed, Nov 13, 2019 at 4:15 AM Bruce Momjian <bruce@momjian.us> wrote: > > On Mon, Nov 11, 2019 at 05:37:30PM +0900, Michael Paquier wrote: > > On Wed, Sep 11, 2019 at 06:30:22PM +0200, Julien Rouhaud wrote: > > > The thing is that pg_stat_statements assigns a 0 queryid in the > > > post_parse_analyze_hook to recognize utility statements and avoid > > > tracking instrumentation twice in case of utility statements, and then > > > compute a queryid base on a hash of the query text. Maybe we could > > > instead fully reserve queryid "2" for utility statements (so forcing > > > queryid "1" for standard queries if jumbling returns 0 *or* 2 instead > > > of only 0), and use "2" as the identifier for utility statement > > > instead of "0"? > > > > Hmm. Not sure. At this stage it would be nice to gather more input > > on the matter, and FWIW, I don't like much the assumption that a query > > ID of 0 is perhaps a utility statement, or perhaps nothing depending > > on the state of a backend entry, or even perhaps something else > > depending how on how modules make use and define such query IDs. > > I thought each extension would export a function to compute the query > id, and you would all that function with the pg_stat_activity.query > string. I'd really like to have the queryid function available through SQL, but I think that this specific case wouldn't work very well for pg_stat_statements' approach as it's working with oid. The query string in pg_stat_activity is the user provided one rather than a fully-qualified version, so in order to get that query's queryid, you need to know the exact search_path in use in that backend, and that's not something available.
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Michael Paquier
Date:
On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote: > I'd really like to have the queryid function available through SQL, > but I think that this specific case wouldn't work very well for > pg_stat_statements' approach as it's working with oid. The query > string in pg_stat_activity is the user provided one rather than a > fully-qualified version, so in order to get that query's queryid, you > need to know the exact search_path in use in that backend, and that's > not something available. Yeah.. So, we have a patch marked as ready for committer here, and it seems to me that we have a couple of issues to discuss more about first particularly this query ID of 0. Again, do others have more any input to offer? -- Michael
Attachment
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Michael Paquier
Date:
On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote: > On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote: >> I'd really like to have the queryid function available through SQL, >> but I think that this specific case wouldn't work very well for >> pg_stat_statements' approach as it's working with oid. The query >> string in pg_stat_activity is the user provided one rather than a >> fully-qualified version, so in order to get that query's queryid, you >> need to know the exact search_path in use in that backend, and that's >> not something available. > > Yeah.. So, we have a patch marked as ready for committer here, and it > seems to me that we have a couple of issues to discuss more about > first particularly this query ID of 0. Again, do others have more > any input to offer? And while on it, the latest patch does not apply, so a rebase is needed here. -- Michael
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
On Fri, Nov 29, 2019 at 7:21 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote: > > On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote: > >> I'd really like to have the queryid function available through SQL, > >> but I think that this specific case wouldn't work very well for > >> pg_stat_statements' approach as it's working with oid. The query > >> string in pg_stat_activity is the user provided one rather than a > >> fully-qualified version, so in order to get that query's queryid, you > >> need to know the exact search_path in use in that backend, and that's > >> not something available. > > > > Yeah.. So, we have a patch marked as ready for committer here, and it > > seems to me that we have a couple of issues to discuss more about > > first particularly this query ID of 0. Again, do others have more > > any input to offer? I just realized that with current infrastructure it's not possible to display a utility queryid. We need to recognize utility to not process the counters twice (once in processUtility, once in the underlying executor), so we don't provide a queryid for utility statements in parse analysis. Current magic value 0 has the side effect of showing an invalid queryid for all utilty statements, and using a magic value different from 0 will just always display that magic value. We could instead add another field in the Query and PlannedStmt structs, say "int queryid_flags", that extensions could use for their needs? > And while on it, the latest patch does not apply, so a rebase is > needed here. Yep, I noticed that this morning. I already rebased the patch locally, I'll send a new version with new modifications when we reach an agreement on the utility issue.
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Tomas Vondra
Date:
On Fri, Nov 29, 2019 at 09:39:09AM +0100, Julien Rouhaud wrote: >On Fri, Nov 29, 2019 at 7:21 AM Michael Paquier <michael@paquier.xyz> wrote: >> >> On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote: >> > On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote: >> >> I'd really like to have the queryid function available through SQL, >> >> but I think that this specific case wouldn't work very well for >> >> pg_stat_statements' approach as it's working with oid. The query >> >> string in pg_stat_activity is the user provided one rather than a >> >> fully-qualified version, so in order to get that query's queryid, you >> >> need to know the exact search_path in use in that backend, and that's >> >> not something available. >> > >> > Yeah.. So, we have a patch marked as ready for committer here, and it >> > seems to me that we have a couple of issues to discuss more about >> > first particularly this query ID of 0. Again, do others have more >> > any input to offer? > >I just realized that with current infrastructure it's not possible to >display a utility queryid. We need to recognize utility to not >process the counters twice (once in processUtility, once in the >underlying executor), so we don't provide a queryid for utility >statements in parse analysis. Current magic value 0 has the side >effect of showing an invalid queryid for all utilty statements, and >using a magic value different from 0 will just always display that >magic value. We could instead add another field in the Query and >PlannedStmt structs, say "int queryid_flags", that extensions could >use for their needs? > >> And while on it, the latest patch does not apply, so a rebase is >> needed here. > >Yep, I noticed that this morning. I already rebased the patch >locally, I'll send a new version with new modifications when we reach >an agreement on the utility issue. > Well, this patch was in WoA since November, but now that I look at it that might have been wrong - we're clearly waiting for agreement on how to handle queryid for utility commands. I suspect the WoA status might have been driving people away from this thread :-( I've switched the patch to "needs review" and moved it to the next CF. What I think needs to happen is we get a patch implementing one of the proposed solutions, and discuss that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
On Sat, Feb 1, 2020 at 12:30 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Fri, Nov 29, 2019 at 09:39:09AM +0100, Julien Rouhaud wrote: > >On Fri, Nov 29, 2019 at 7:21 AM Michael Paquier <michael@paquier.xyz> wrote: > >> > >> On Fri, Nov 29, 2019 at 03:19:49PM +0900, Michael Paquier wrote: > >> > On Wed, Nov 13, 2019 at 12:53:09PM +0100, Julien Rouhaud wrote: > >> >> I'd really like to have the queryid function available through SQL, > >> >> but I think that this specific case wouldn't work very well for > >> >> pg_stat_statements' approach as it's working with oid. The query > >> >> string in pg_stat_activity is the user provided one rather than a > >> >> fully-qualified version, so in order to get that query's queryid, you > >> >> need to know the exact search_path in use in that backend, and that's > >> >> not something available. > >> > > >> > Yeah.. So, we have a patch marked as ready for committer here, and it > >> > seems to me that we have a couple of issues to discuss more about > >> > first particularly this query ID of 0. Again, do others have more > >> > any input to offer? > > > >I just realized that with current infrastructure it's not possible to > >display a utility queryid. We need to recognize utility to not > >process the counters twice (once in processUtility, once in the > >underlying executor), so we don't provide a queryid for utility > >statements in parse analysis. Current magic value 0 has the side > >effect of showing an invalid queryid for all utilty statements, and > >using a magic value different from 0 will just always display that > >magic value. We could instead add another field in the Query and > >PlannedStmt structs, say "int queryid_flags", that extensions could > >use for their needs? > > > >> And while on it, the latest patch does not apply, so a rebase is > >> needed here. > > > >Yep, I noticed that this morning. I already rebased the patch > >locally, I'll send a new version with new modifications when we reach > >an agreement on the utility issue. > > > > Well, this patch was in WoA since November, but now that I look at it > that might have been wrong - we're clearly waiting for agreement on how > to handle queryid for utility commands. I suspect the WoA status might > have been driving people away from this thread :-( Oh, indeed. > I've switched the patch to "needs review" and moved it to the next CF. Thanks > What I think needs to happen is we get a patch implementing one of the > proposed solutions, and discuss that. There's also the possibility to reserve 1 bit of the hash to know if this is a utility command or not, although I don't recall right now all the possible issues with utility commands and some special handling of them. I'll work on it before the next commitfest.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Robert Haas
Date:
On Wed, Feb 5, 2020 at 9:32 AM Julien Rouhaud <julien.rouhaud@free.fr> wrote: > There's also the possibility to reserve 1 bit of the hash to know if > this is a utility command or not, although I don't recall right now > all the possible issues with utility commands and some special > handling of them. I'll work on it before the next commitfest. FWIW, I don't really see why it would be bad to have 0 mean that "there's no query ID for some reason" without caring whether that's because the current statement is a utility statement or because there's no statement in progress at all or whatever else. The user probably doesn't need our help to distinguish between "no statement" and "utility statement", right? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Thu, Feb 06, 2020 at 02:59:09PM -0500, Robert Haas wrote: > On Wed, Feb 5, 2020 at 9:32 AM Julien Rouhaud <julien.rouhaud@free.fr> wrote: > > There's also the possibility to reserve 1 bit of the hash to know if > > this is a utility command or not, although I don't recall right now > > all the possible issues with utility commands and some special > > handling of them. I'll work on it before the next commitfest. > > FWIW, I don't really see why it would be bad to have 0 mean that > "there's no query ID for some reason" without caring whether that's > because the current statement is a utility statement or because > there's no statement in progress at all or whatever else. The user > probably doesn't need our help to distinguish between "no statement" > and "utility statement", right? Sure, but if we don't fix that it means that we also won't expose any queryid for utility statement, even if pg_stat_statements is configured to track those (with a very poor queryid handling, but still). While looking at this again, I realized that pg_stat_statements doesn't compute a queryid during the post parse analysis hook just to make sure that no query identifier will be set during executorStart and the rest of executor functions. AFAICT, that can't happen anyway since pg_plan_queries() will discard any computed queryid for utility statements. This seems to be an oversight due to original pg_stat_statements implementation, so I fixed this. Then, as processUtility is called between parse analysis and executor, I think that we can simply work around this by computing utility statements query identifier during parse analysis, removing it in pgss_ProcessUtility and keeping a copy of it for the pgss_store calls in that function, as done in the attached v5. This fixes everything except EXECUTE statements, which has to get the underlying query's queryid. The problem is that EXECUTE won't get through parse analysis, so while it's correctly handled for execution and pgss_store, it's not being exposed in pg_stat_activity and log_line_prefix. To fix it, I added an extra call to pgstat_report_queryid in executorStart. As this function is a no-op if a queryid is already exposed, this shouldn't cause any harm and fix any other cases of query execution that don't go through parse analysis. Finally, DEALLOCATE is entirely ignored by pg_stat_statements, so those statements will always be reported with a NULL/0 queryid, but this is consistent as it's also not present in pg_stat_statements() SRF.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
On Fri, Feb 7, 2020 at 11:12 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Thu, Feb 06, 2020 at 02:59:09PM -0500, Robert Haas wrote: > > On Wed, Feb 5, 2020 at 9:32 AM Julien Rouhaud <julien.rouhaud@free.fr> wrote: > > > There's also the possibility to reserve 1 bit of the hash to know if > > > this is a utility command or not, although I don't recall right now > > > all the possible issues with utility commands and some special > > > handling of them. I'll work on it before the next commitfest. > > > > FWIW, I don't really see why it would be bad to have 0 mean that > > "there's no query ID for some reason" without caring whether that's > > because the current statement is a utility statement or because > > there's no statement in progress at all or whatever else. The user > > probably doesn't need our help to distinguish between "no statement" > > and "utility statement", right? > > Sure, but if we don't fix that it means that we also won't expose any queryid > for utility statement, even if pg_stat_statements is configured to track those > (with a very poor queryid handling, but still). > > While looking at this again, I realized that pg_stat_statements doesn't compute > a queryid during the post parse analysis hook just to make sure that no query > identifier will be set during executorStart and the rest of executor functions. > > AFAICT, that can't happen anyway since pg_plan_queries() will discard any > computed queryid for utility statements. This seems to be an oversight due to > original pg_stat_statements implementation, so I fixed this. > > Then, as processUtility is called between parse analysis and executor, I think > that we can simply work around this by computing utility statements query > identifier during parse analysis, removing it in pgss_ProcessUtility and > keeping a copy of it for the pgss_store calls in that function, as done in the > attached v5. > > This fixes everything except EXECUTE statements, which has to get the > underlying query's queryid. The problem is that EXECUTE won't get through > parse analysis, so while it's correctly handled for execution and pgss_store, > it's not being exposed in pg_stat_activity and log_line_prefix. To fix it, I > added an extra call to pgstat_report_queryid in executorStart. As this > function is a no-op if a queryid is already exposed, this shouldn't cause any > harm and fix any other cases of query execution that don't go through parse > analysis. > > Finally, DEALLOCATE is entirely ignored by pg_stat_statements, so those > statements will always be reported with a NULL/0 queryid, but this is > consistent as it's also not present in pg_stat_statements() SRF. cfbot reports a failure since 2f9661311b (command completion tag change), so here's a rebased v6, no change otherwise.
Attachment
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Fri, Jun 28, 2019 at 11:49:53AM -0700, Peter Geoghegan wrote: > On Tue, Mar 19, 2019 at 12:38 PM legrand legrand > <legrand_legrand@hotmail.com> wrote: > > Would it make sense to add it in auto explain ? > > I don't know for explain itself, but maybe ... > > I think that it should appear in EXPLAIN. pg_stat_statements already > cannot have a query hash of zero, so it might be okay to display it > only when its value is non-zero. I had forgotten about this. After looking at it, I can see a few issues. For now post_parse_analyze_hook isn't called for the underlying statement, so we don't have the queryid. And we can't compute the queryid for the underlying query in the initial post_parse_analyze_hook call as we don't want the executor to have a queryid set in that case to avoid cumulating counters for both the explain and the query. We could add an extra call in ExplainQuery, but this will be ignored by pg_stat_statements unless you set pg_stat_statements.track to all. Also, pgss_post_parse_analyze will try to record an entry with the normalized query text if no one exists yet and if any constant where removed. The problem is that, as I already mentioned in [1], the underlying query doesn't have query_location or query_len valued, so the recorded query text will at least contain the explain part of the input query. [1] https://www.postgresql.org/message-id/CAOBaU_Y-y%2BVOhTZgDOuDk6-9V72-ZXdWccXo_kx0P4DDBEEh9A%40mail.gmail.com
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Tue, Mar 03, 2020 at 04:24:59PM +0100, Julien Rouhaud wrote: > > cfbot reports a failure since 2f9661311b (command completion tag > change), so here's a rebased v6, no change otherwise. Conflict with 8e8a0becb3 (Unify several ways to tracking backend type), thanks again to cfbot, rebased v7 attached.
Attachment
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Sat, Mar 14, 2020 at 06:53:51PM +0100, Julien Rouhaud wrote: > On Tue, Mar 03, 2020 at 04:24:59PM +0100, Julien Rouhaud wrote: > > > > cfbot reports a failure since 2f9661311b (command completion tag > > change), so here's a rebased v6, no change otherwise. > > > Conflict with 8e8a0becb3 (Unify several ways to tracking backend type), thanks > again to cfbot, rebased v7 attached. Bit repetita.
Attachment
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
New conflict, rebased v9 attached.
Attachment
Re: Feature improvement: can we add queryId forpg_catalog.pg_stat_activity view?
From
Tatsuro Yamada
Date:
Hi Julien, On 2020/04/02 22:25, Julien Rouhaud wrote: > New conflict, rebased v9 attached. I tested the patch on the head (c7654f6a3) and the result was fine. See below: $ make installcheck-world ===================== All 1 tests passed. ===================== Regards, Tatsuro Yamada
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activityview?
From
Julien Rouhaud
Date:
On Tue, Apr 7, 2020 at 8:40 AM Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote: > > Hi Julien, > > On 2020/04/02 22:25, Julien Rouhaud wrote: > > New conflict, rebased v9 attached. > > I tested the patch on the head (c7654f6a3) and > the result was fine. See below: > > $ make installcheck-world > ===================== > All 1 tests passed. > ===================== Thanks Yamada-san! Unfortunately this patch still didn't attract any committer, so I moved it to the next commitfest.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Atsushi Torikoshi
Date:
Hi,
v9 patch fails to apply to HEAD, could you check and rebase it?
And here are minor typos.
79 + * utility statements. Note that we don't compute a queryId for prepared
80 + * statemets related utility, as those will inherit from the underlying
80 + * statemets related utility, as those will inherit from the underlying
81 + * statements's one (except DEALLOCATE which is entirely untracked).
statemets -> statements
statements's -> statements' or statement's?
Regards,
--
Atsushi Torikoshi
On Wed, Apr 8, 2020 at 11:38 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Tue, Apr 7, 2020 at 8:40 AM Tatsuro Yamada
<tatsuro.yamada.tf@nttcom.co.jp> wrote:
>
> Hi Julien,
>
> On 2020/04/02 22:25, Julien Rouhaud wrote:
> > New conflict, rebased v9 attached.
>
> I tested the patch on the head (c7654f6a3) and
> the result was fine. See below:
>
> $ make installcheck-world
> =====================
> All 1 tests passed.
> =====================
Thanks Yamada-san! Unfortunately this patch still didn't attract any
committer, so I moved it to the next commitfest.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Tue, Jul 14, 2020 at 07:11:02PM +0900, Atsushi Torikoshi wrote: > Hi, > > v9 patch fails to apply to HEAD, could you check and rebase it? Thanks for the notice, v10 attached! > And here are minor typos. > > 79 + * utility statements. Note that we don't compute a queryId > for prepared > 80 + * statemets related utility, as those will inherit from the > underlying > 81 + * statements's one (except DEALLOCATE which is entirely > untracked). > > statemets -> statements > statements's -> statements' or statement's? Thanks! I went with "statement's".
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
torikoshia
Date:
On 2020-07-14 20:24, Julien Rouhaud wrote: > On Tue, Jul 14, 2020 at 07:11:02PM +0900, Atsushi Torikoshi wrote: >> Hi, >> >> v9 patch fails to apply to HEAD, could you check and rebase it? > > Thanks for the notice, v10 attached! > >> And here are minor typos. >> >> 79 + * utility statements. Note that we don't compute a >> queryId >> for prepared >> 80 + * statemets related utility, as those will inherit from >> the >> underlying >> 81 + * statements's one (except DEALLOCATE which is entirely >> untracked). >> >> statemets -> statements >> statements's -> statements' or statement's? > > Thanks! I went with "statement's". Thanks for updating! I tested the patch setting log_statement = 'all', but %Q in log_line_prefix was always 0 even when pg_stat_statements.queryid and pg_stat_activity.queryid are not 0. Is this an intentional behavior? ``` $ initdb --no-locale -D data $ edit postgresql.conf shared_preload_libraries = 'pg_stat_statements' logging_collector = on log_line_prefix = '%m [%p] queryid:%Q ' log_statement = 'all' $ pg_ctl start -D data $ psql =# CREATE EXTENSION pg_stat_statements; =# CREATE TABLE t1 (i int); =# INSERT INTO t1 VALUES (0),(1); =# SELECT queryid, query FROM pg_stat_activity; -- query ids are all 0 on the log $ view log 2020-07-28 15:57:58.475 EDT [4480] queryid:0 LOG: statement: CREATE TABLE t1 (i int); 2020-07-28 15:58:13.730 EDT [4480] queryid:0 LOG: statement: INSERT INTO t1 VALUES (0),(1); 2020-07-28 15:59:28.389 EDT [4480] queryid:0 LOG: statement: SELECT * FROM t1; -- on pg_stat_activity and pgss, query ids are not 0 $ psql =# SELECT queryid, query FROM pg_stat_activity WHERE query LIKE '%t1%'; queryid | query ----------------------+---------------------------------------------------------------------- 1109063694563750779 | SELECT * FROM t1; -2582225123719476948 | SELECT queryid, query FROM pg_stat_activity WHERE query LIKE '%t1%'; (2 rows) =# SELECT queryid, query FROM pg_stat_statements WHERE query LIKE '%t1%'; queryid | query ----------------------+--------------------------------- -5028988130796701553 | CREATE TABLE t1 (i int) 1109063694563750779 | SELECT * FROM t1 2726469050076420724 | INSERT INTO t1 VALUES ($1),($2) ``` And here is a minor typo. optionnally -> optionally > 753 + /* query identifier, optionnally computed using > post_parse_analyze_hook */ Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Tue, Jul 28, 2020 at 10:07 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > Thanks for updating! > I tested the patch setting log_statement = 'all', but %Q in > log_line_prefix > was always 0 even when pg_stat_statements.queryid and > pg_stat_activity.queryid are not 0. > > Is this an intentional behavior? > >[...] Thanks for the tests! That's indeed an expected behavior (although I wasn't aware of it), which isn't documented in this patch (I'll fix it). The reason for that is that log_statements is done right after parsing the query: /* * Do basic parsing of the query or queries (this should be safe even if * we are in aborted transaction state!) */ parsetree_list = pg_parse_query(query_string); /* Log immediately if dictated by log_statement */ if (check_log_statement(parsetree_list)) { ereport(LOG, (errmsg("statement: %s", query_string), errhidestmt(true), errdetail_execute(parsetree_list))); was_logged = true; } As parse analysis is not yet done, no queryid can be computed at that point, so we always print 0. That's a limitation that can't be removed without changing the semantics of log_statements, so we'll probably have to live with it. > And here is a minor typo. > optionnally -> optionally > > > > 753 + /* query identifier, optionnally computed using > > post_parse_analyze_hook */ Thanks, I fixed it locally!
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Tue, Jul 28, 2020 at 10:55:04AM +0200, Julien Rouhaud wrote: > On Tue, Jul 28, 2020 at 10:07 AM torikoshia <torikoshia@oss.nttdata.com> wrote: > > > > Thanks for updating! > > I tested the patch setting log_statement = 'all', but %Q in > > log_line_prefix > > was always 0 even when pg_stat_statements.queryid and > > pg_stat_activity.queryid are not 0. > > > > Is this an intentional behavior? > > > >[...] > > Thanks for the tests! That's indeed an expected behavior (although I > wasn't aware of it), which isn't documented in this patch (I'll fix > it). The reason for that is that log_statements is done right after > parsing the query: > > /* > * Do basic parsing of the query or queries (this should be safe even if > * we are in aborted transaction state!) > */ > parsetree_list = pg_parse_query(query_string); > > /* Log immediately if dictated by log_statement */ > if (check_log_statement(parsetree_list)) > { > ereport(LOG, > (errmsg("statement: %s", query_string), > errhidestmt(true), > errdetail_execute(parsetree_list))); > was_logged = true; > } > > As parse analysis is not yet done, no queryid can be computed at that > point, so we always print 0. That's a limitation that can't be > removed without changing the semantics of log_statements, so we'll > probably have to live with it. > > > And here is a minor typo. > > optionnally -> optionally > > > > > > > 753 + /* query identifier, optionnally computed using > > > post_parse_analyze_hook */ > > Thanks, I fixed it locally! Recent conflict, rebased v11 attached.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Wed, Aug 19, 2020 at 04:19:30PM +0200, Julien Rouhaud wrote: > Similarly to other fields in pg_stat_activity, only the queryid from the top > level statements are exposed, and if the backends status isn't active then the > queryid from the last executed statements is displayed. > > Also add a %Q placeholder to include the queryid in the log_line_prefix, which > will also only expose top level statements. I would like to apply this patch (I know it has been in the commitfest since July 2019), but I have some questions about the user API. Does it make sense to have a column in pg_stat_actvity and an option in log_line_prefix that will be empty unless pg_stat_statements is installed? Is there no clean way to move the query hash computation out of pg_stat_statements and into the main code so the query id is always visible? (Also, did we decide _not_ to make the pg_stat_statements queryid always a positive value?) Also, in the doc patch: By default, query identifiers are not computed, so this field will always be null, unless an additional module that compute query identifiers, such as <xref linkend="pgstatstatements"/>, is configured. why are you saying "such as"? Isn't pg_stat_statements the only way to see the queryid? This command allowed the queryid to be displayed in pg_stat_activity: ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements'; -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes: > I would like to apply this patch (I know it has been in the commitfest > since July 2019), but I have some questions about the user API. Does it > make sense to have a column in pg_stat_actvity and an option in > log_line_prefix that will be empty unless pg_stat_statements is > installed? Is there no clean way to move the query hash computation out > of pg_stat_statements and into the main code so the query id is always > visible? (Also, did we decide _not_ to make the pg_stat_statements > queryid always a positive value?) FWIW, I think this proposal is a mess. I was willing to hold my nose and have a queryId field in the internal Query struct without any solid consensus about what its semantics are and which extensions get to use it. Exposing it to end users seems like a bridge too far, though. In particular, I'm afraid that that will cause people to expect it to have consistent values across PG versions, or even just across architectures within one version. The larger picture here is that there's lots of room to doubt whether pg_stat_statements' decisions about what to ignore or include in the ID will be satisfactory to everybody. If that were not so, we'd just move the computation into core. regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Alvaro Herrera
Date:
On 2020-Oct-05, Tom Lane wrote: > FWIW, I think this proposal is a mess. I was willing to hold my nose > and have a queryId field in the internal Query struct without any solid > consensus about what its semantics are and which extensions get to use it. > Exposing it to end users seems like a bridge too far, though. In > particular, I'm afraid that that will cause people to expect it to have > consistent values across PG versions, or even just across architectures > within one version. I wonder if it would help to purposefully change the computation so that it is not -- for instance, hash the system_identifier as initial value. Then users would be forced to accept that it'll change as soon as it migrates to another server or is upgraded to a new major version.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Mon, Oct 5, 2020 at 07:58:42PM -0300, Álvaro Herrera wrote: > On 2020-Oct-05, Tom Lane wrote: > > > FWIW, I think this proposal is a mess. I was willing to hold my nose > > and have a queryId field in the internal Query struct without any solid > > consensus about what its semantics are and which extensions get to use it. > > Exposing it to end users seems like a bridge too far, though. In > > particular, I'm afraid that that will cause people to expect it to have > > consistent values across PG versions, or even just across architectures > > within one version. > > I wonder if it would help to purposefully change the computation so that > it is not -- for instance, hash the system_identifier as initial value. > Then users would be forced to accept that it'll change as soon as it > migrates to another server or is upgraded to a new major version. That seems like a good idea, but it would prevent cross-cluster same-major-version comparisons, which seems like a negative. Perhaps we should add the major version into the hash to handle this. Ideally, let's just put a queryid-hash-version into to the hash, so if we change the computation, we just update the hash version and nothing matches anymore. I do think the queryid has to display independent of pg_stat_statements, because I can see people using queryid for log file and pg_stat_activity comparisons. I also think the ability to have queryid accessible is an important feature outside of pg_stat_statements, so I do think we need a way to move this idea forward. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Tue, Oct 6, 2020 at 10:18 AM Bruce Momjian <bruce@momjian.us> wrote: > > On Mon, Oct 5, 2020 at 07:58:42PM -0300, Álvaro Herrera wrote: > > On 2020-Oct-05, Tom Lane wrote: > > > > > FWIW, I think this proposal is a mess. I was willing to hold my nose > > > and have a queryId field in the internal Query struct without any solid > > > consensus about what its semantics are and which extensions get to use it. > > > Exposing it to end users seems like a bridge too far, though. In > > > particular, I'm afraid that that will cause people to expect it to have > > > consistent values across PG versions, or even just across architectures > > > within one version. > > > > I wonder if it would help to purposefully change the computation so that > > it is not -- for instance, hash the system_identifier as initial value. > > Then users would be forced to accept that it'll change as soon as it > > migrates to another server or is upgraded to a new major version. > > That seems like a good idea, but it would prevent cross-cluster > same-major-version comparisons, which seems like a negative. Perhaps we > should add the major version into the hash to handle this. Ideally, > let's just put a queryid-hash-version into to the hash, so if we change > the computation, we just update the hash version and nothing matches > anymore. > > I do think the queryid has to display independent of pg_stat_statements, > because I can see people using queryid for log file and pg_stat_activity > comparisons. I also think the ability to have queryid accessible is an > important feature outside of pg_stat_statements, so I do think we need a > way to move this idea forward. For the record, for now any extension can compute a queryid and there are at least 2 other published extensions that already do that, one of them having different semantics on how to compute the queryid. I'm not sure that we'll ever get a consensus on those semantics due to performance tradeoff, so removing the ability to let people put their own code for that doesn't seem like the best way forward. Maybe we could add a new hook for only queryid computation, and add a GUC to let people choose between no queryid computed, core computation (current pg_stat_statement) and 3rd party plugin?
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Tue, Oct 6, 2020 at 11:11:27AM +0800, Julien Rouhaud wrote: > > I do think the queryid has to display independent of pg_stat_statements, > > because I can see people using queryid for log file and pg_stat_activity > > comparisons. I also think the ability to have queryid accessible is an > > important feature outside of pg_stat_statements, so I do think we need a > > way to move this idea forward. > > For the record, for now any extension can compute a queryid and there > are at least 2 other published extensions that already do that, one of > them having different semantics on how to compute the queryid. I'm > not sure that we'll ever get a consensus on those semantics due to > performance tradeoff, so removing the ability to let people put their > own code for that doesn't seem like the best way forward. > > Maybe we could add a new hook for only queryid computation, and add a > GUC to let people choose between no queryid computed, core computation > (current pg_stat_statement) and 3rd party plugin? That all seems very complicated. If we go in that direction, I suggest we just give up getting any of this into core. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Michael Paquier
Date:
On Mon, Oct 05, 2020 at 05:24:06PM -0400, Bruce Momjian wrote: > (Also, did we decide _not_ to make the pg_stat_statements queryid > always a positive value?) This specific point has been discussed a couple of years ago, please see cff440d and its related thread: https://www.postgresql.org/message-id/CA+TgmobG_Kp4cBKFmsznUAaM1GWW6hhRNiZC0KjRMOOeYnz5Yw@mail.gmail.com -- Michael
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Michael Paquier
Date:
On Mon, Oct 05, 2020 at 11:23:50PM -0400, Bruce Momjian wrote: > On Tue, Oct 6, 2020 at 11:11:27AM +0800, Julien Rouhaud wrote: >> Maybe we could add a new hook for only queryid computation, and add a >> GUC to let people choose between no queryid computed, core computation >> (current pg_stat_statement) and 3rd party plugin? > > That all seems very complicated. If we go in that direction, I suggest > we just give up getting any of this into core. A GUC would have at least the advantage to make the computation consistent for any system willing to consume it, with the option to not pay any potential performance impact, though I have to admit that just moving the query ID computation of PGSS into core may not be the best option as a query ID of 0 means the same thing for a utility, for an initialization, and for a backend running a query with an unknown value, but that could be worked out. FWIW, I think that adding the system ID in the hash is too restrictive, as it could be interesting for users to do stat comparisons across multiple systems running the same major version. It would be better to not give any strong guarantee that the query ID computed will remain consistent across major versions so as it is possible to keep improving it. Also, if nothing has been done that changes the hashing computation, I see little benefit in forcing a breakage by adding something like PG_MAJORVERSION_NUM or such in the hash computation. -- Michael
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Tue, Oct 6, 2020 at 02:34:58PM +0900, Michael Paquier wrote: > On Mon, Oct 05, 2020 at 11:23:50PM -0400, Bruce Momjian wrote: > > On Tue, Oct 6, 2020 at 11:11:27AM +0800, Julien Rouhaud wrote: > >> Maybe we could add a new hook for only queryid computation, and add a > >> GUC to let people choose between no queryid computed, core computation > >> (current pg_stat_statement) and 3rd party plugin? > > > > That all seems very complicated. If we go in that direction, I suggest > > we just give up getting any of this into core. > > A GUC would have at least the advantage to make the computation > consistent for any system willing to consume it, with the option to > not pay any potential performance impact, though I have to admit that > just moving the query ID computation of PGSS into core may not be the > best option as a query ID of 0 means the same thing for a utility, for > an initialization, and for a backend running a query with an unknown > value, but that could be worked out. > > FWIW, I think that adding the system ID in the hash is too > restrictive, as it could be interesting for users to do stat > comparisons across multiple systems running the same major version. > It would be better to not give any strong guarantee that the query ID > computed will remain consistent across major versions so as it is > possible to keep improving it. Also, if nothing has been done that > changes the hashing computation, I see little benefit in forcing a > breakage by adding something like PG_MAJORVERSION_NUM or such in the > hash computation. I thought some more about this. First, I think having the queryid hash code in the server, without requiring pg_stat_statements, is a requirement --- I think too many people will want to use this feature independent of pg_stat_statements. Second, I understand the desire to have different hash computation methods, depending on what level of detail/matching you want. I propose moving the pg_stat_statements queryid hash code into the server (with a version number), and also adding a postgressql.conf variable that lets you control how detailed the queryid hash is computed. This addresses the problem of people wanting different hash methods. When computing a hash, the queryid detail level and version number will be mixed into the hash, so only a hash that used a similar query and identical queryid detail level would match. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Michael Paquier
Date:
On Tue, Oct 06, 2020 at 09:22:29AM -0400, Bruce Momjian wrote: > I propose moving the pg_stat_statements queryid hash code into the > server (with a version number), and also adding a postgresql.conf > variable that lets you control how detailed the queryid hash is > computed. This addresses the problem of people wanting different hash > methods. In terms of making this part expendable in the future, there could be a point in having an enum here, but are we sure that we will have a need for that in the future? What I get from this discussion is that we want a unique source of truth that users can consume, and that the only source of truth proposed is the PGSS hashing. We may change the way we compute the query ID in the future, for example if it gets expanded to some utility statements, etc. But that would be controlled by the version number in the hash, not the GUC itself. > When computing a hash, the queryid detail level and version number will > be mixed into the hash, so only a hash that used a similar query and > identical queryid detail level would match. Yes, having a version number directly dependent on the hashing sounds like a good compromise to me. -- Michael
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Wed, Oct 7, 2020 at 10:42:49AM +0900, Michael Paquier wrote: > On Tue, Oct 06, 2020 at 09:22:29AM -0400, Bruce Momjian wrote: > > I propose moving the pg_stat_statements queryid hash code into the > > server (with a version number), and also adding a postgresql.conf > > variable that lets you control how detailed the queryid hash is > > computed. This addresses the problem of people wanting different hash > > methods. > > In terms of making this part expendable in the future, there could be > a point in having an enum here, but are we sure that we will have a > need for that in the future? What I get from this discussion is that > we want a unique source of truth that users can consume, and that the > only source of truth proposed is the PGSS hashing. We may change the > way we compute the query ID in the future, for example if it gets > expanded to some utility statements, etc. But that would be > controlled by the version number in the hash, not the GUC itself. Oh, if that is true, then I agree let's just go with the version number. > > When computing a hash, the queryid detail level and version number will > > be mixed into the hash, so only a hash that used a similar query and > > identical queryid detail level would match. > > Yes, having a version number directly dependent on the hashing sounds > like a good compromise to me. Good, much simpler. I think there is enough demand for a queryid that I would like to get this moving forward. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Wed, Oct 7, 2020 at 9:53 AM Bruce Momjian <bruce@momjian.us> wrote: > > On Wed, Oct 7, 2020 at 10:42:49AM +0900, Michael Paquier wrote: > > On Tue, Oct 06, 2020 at 09:22:29AM -0400, Bruce Momjian wrote: > > > I propose moving the pg_stat_statements queryid hash code into the > > > server (with a version number), and also adding a postgresql.conf > > > variable that lets you control how detailed the queryid hash is > > > computed. This addresses the problem of people wanting different hash > > > methods. > > > > In terms of making this part expendable in the future, there could be > > a point in having an enum here, but are we sure that we will have a > > need for that in the future? What I get from this discussion is that > > we want a unique source of truth that users can consume, and that the > > only source of truth proposed is the PGSS hashing. We may change the > > way we compute the query ID in the future, for example if it gets > > expanded to some utility statements, etc. But that would be > > controlled by the version number in the hash, not the GUC itself. > > Oh, if that is true, then I agree let's just go with the version number. But there are many people that aren't happy with the current hashing approach. If we're going to move the computation in core, shouldn't we listen to their complaints and let them pay some probably quite high overhead to base the hash on name and/or fully qualified name rather than OID? For instance people using logical replication to upgrade to a newer version may want to easily compare query performance on the new version, or people with multi-tenant databases may want to ignore the schema name to keep a low number of different queryid. It would probably still be possible to have a custom queryid hashing by disabling the core one and computing a new one in a custom extension, but that seems a bit hackish. Jumping back on Tom's point that there are judgment calls on what is examined or not, after a quick look I see at least two possible problems of ignored clauses: - WITH TIES clause - OVERRIDING clause I personally think that they shouldn't be ignored, but I don't know if they were only forgotten or ignored on purpose.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Mon, Oct 12, 2020 at 04:20:05PM +0800, Julien Rouhaud wrote: > But there are many people that aren't happy with the current hashing > approach. If we're going to move the computation in core, shouldn't > we listen to their complaints and let them pay some probably quite > high overhead to base the hash on name and/or fully qualified name > rather than OID? > For instance people using logical replication to upgrade to a newer > version may want to easily compare query performance on the new > version, or people with multi-tenant databases may want to ignore the > schema name to keep a low number of different queryid. Well, we have to consider how complex the user interface has to be to allow more flexibility. We don't need to allow every option a user will want. With a version number, we have the ability to improve the algorithm or add customization, but for the first use, we are probably better off keeping it simple. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Robert Haas
Date:
On Mon, Oct 12, 2020 at 10:14 AM Bruce Momjian <bruce@momjian.us> wrote: > On Mon, Oct 12, 2020 at 04:20:05PM +0800, Julien Rouhaud wrote: > > But there are many people that aren't happy with the current hashing > > approach. If we're going to move the computation in core, shouldn't > > we listen to their complaints and let them pay some probably quite > > high overhead to base the hash on name and/or fully qualified name > > rather than OID? > > For instance people using logical replication to upgrade to a newer > > version may want to easily compare query performance on the new > > version, or people with multi-tenant databases may want to ignore the > > schema name to keep a low number of different queryid. > > Well, we have to consider how complex the user interface has to be to > allow more flexibility. We don't need to allow every option a user will > want. > > With a version number, we have the ability to improve the algorithm or > add customization, but for the first use, we are probably better off > keeping it simple. I thought your earlier idea of allowing this to be controlled by a GUC was good. There could be a default method built into core, matching what pg_stat_statements does, so you could select no hashing or that method no matter what. Then extensions could provide other methods which could be selected via the GUC. I don't really understand how a version number helps. It's not like there is going to be a v2 that is in all ways better than v1. If there are different algorithms here, they are going to be customized for different needs. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > I don't really understand how a version number helps. It's not like > there is going to be a v2 that is in all ways better than v1. If there > are different algorithms here, they are going to be customized for > different needs. Yeah, I agree --- a version number is the wrong way to think about this. It's gonna be more like algorithm foo versus algorithm bar versus algorithm baz, where each one is better for a specific set of use-cases. Julien already noted the point about hashing object OIDs versus object names; one can easily imagine disagreeing with pg_stat_statement's choices about ignoring values of constants; other properties of statements might be irrelevant for some use-cases; and so on. I'm okay with moving pg_stat_statement's existing algorithm into core as long as there's a way for extensions to override it. With proper design, that would allow extensions that do override it to coexist with pg_stat_statements (thereby redefining the latter's idea of which statements are "the same"), which is something that doesn't really work nicely today. regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > I don't really understand how a version number helps. It's not like > > there is going to be a v2 that is in all ways better than v1. If there > > are different algorithms here, they are going to be customized for > > different needs. > > Yeah, I agree --- a version number is the wrong way to think about this. > It's gonna be more like algorithm foo versus algorithm bar versus > algorithm baz, where each one is better for a specific set of use-cases. > Julien already noted the point about hashing object OIDs versus object > names; one can easily imagine disagreeing with pg_stat_statement's > choices about ignoring values of constants; other properties of statements > might be irrelevant for some use-cases; and so on. The version number was to invalidate _all_ query hashes if the algorithm is slightly modified, rather than invalidating just some of them, which could lead to confusion. The idea of selectable hash algorithms is nice if people feel there is sufficient need for that. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes: > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote: >> Yeah, I agree --- a version number is the wrong way to think about this. > The version number was to invalidate _all_ query hashes if the > algorithm is slightly modified, rather than invalidating just some of > them, which could lead to confusion. Color me skeptical as to the use-case for that. From users' standpoints, the hash is mainly going to change when we change the set of parse node fields that get hashed. Which is going to happen at every major release and no (or at least epsilon) minor releases. So I do not see a point in tracking an algorithm version number as such. Seems like make-work. regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote: > >> Yeah, I agree --- a version number is the wrong way to think about this. > > > The version number was to invalidate _all_ query hashes if the > > algorithm is slightly modified, rather than invalidating just some of > > them, which could lead to confusion. > > Color me skeptical as to the use-case for that. From users' standpoints, > the hash is mainly going to change when we change the set of parse node > fields that get hashed. Which is going to happen at every major release > and no (or at least epsilon) minor releases. So I do not see a point in > tracking an algorithm version number as such. Seems like make-work. OK, I came up with the hash idea only to address one of your concerns about mismatched hashes for algorithm improvements/changes. Seems we might as well just document that cross-version hashes are different. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Tue, Oct 13, 2020 at 4:53 AM Bruce Momjian <bruce@momjian.us> wrote: > > On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote: > > Bruce Momjian <bruce@momjian.us> writes: > > > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote: > > >> Yeah, I agree --- a version number is the wrong way to think about this. > > > > > The version number was to invalidate _all_ query hashes if the > > > algorithm is slightly modified, rather than invalidating just some of > > > them, which could lead to confusion. > > > > Color me skeptical as to the use-case for that. From users' standpoints, > > the hash is mainly going to change when we change the set of parse node > > fields that get hashed. Which is going to happen at every major release > > and no (or at least epsilon) minor releases. So I do not see a point in > > tracking an algorithm version number as such. Seems like make-work. > > OK, I came up with the hash idea only to address one of your concerns > about mismatched hashes for algorithm improvements/changes. Seems we > might as well just document that cross-version hashes are different. Ok, so I tried to implement what seems to be the consensus. First attached patch moves the current pgss queryid computation in core, with a new compute_queryid GUC (on/off). One thing I don't really like about this patch is that the JumbleState that pgss needs in order to normalize the query string (the constants location and such) has to be done by the core while computing the queryid and provided to pgss in post_parse_analyse hook. That isn't ideal as it looks very specific to pgss needs. On the other hand it means that you can now use pgss with custom queryid heuristics by disabling compute_queryid and having your module doing only that in post_parse_analyse_hook. You'll however need to be careful to configure shared_preload_libraries such that your custom module's post_parse_analyse_hook is called first, so pgss' one can be called with the needed JumbleState. Note that if no JumbleState is provided pgss will store non normalized queries, but will otherwise behave as intended. The 2nd patch is the rebased original queryid exposure patch. No big changes, except that it now handles utility statements queryid generated during post_parse_analysis, same as regular queries. This should simplify the work needed for custom queryid third party modules. The 3rd patch changes explain (verbose) to display the queryid if one has been generated, whether by core or a third-party module. For instance: rjuju=# set compute_queryid = on; SET rjuju=# explain (verbose) select relname from pg_class; QUERY PLAN ----------------------------------------------------------------------- Seq Scan on pg_catalog.pg_class (cost=0.00..16.90 rows=390 width=64) Output: relname Query Identifier: -5494854185674379299 (3 rows)
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Wed, Oct 14, 2020 at 5:43 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Tue, Oct 13, 2020 at 4:53 AM Bruce Momjian <bruce@momjian.us> wrote: > > > > On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote: > > > Bruce Momjian <bruce@momjian.us> writes: > > > > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote: > > > >> Yeah, I agree --- a version number is the wrong way to think about this. > > > > > > > The version number was to invalidate _all_ query hashes if the > > > > algorithm is slightly modified, rather than invalidating just some of > > > > them, which could lead to confusion. > > > > > > Color me skeptical as to the use-case for that. From users' standpoints, > > > the hash is mainly going to change when we change the set of parse node > > > fields that get hashed. Which is going to happen at every major release > > > and no (or at least epsilon) minor releases. So I do not see a point in > > > tracking an algorithm version number as such. Seems like make-work. > > > > OK, I came up with the hash idea only to address one of your concerns > > about mismatched hashes for algorithm improvements/changes. Seems we > > might as well just document that cross-version hashes are different. > > Ok, so I tried to implement what seems to be the consensus. First > attached patch moves the current pgss queryid computation in core, > with a new compute_queryid GUC (on/off). One thing I don't really > like about this patch is that the JumbleState that pgss needs in order > to normalize the query string (the constants location and such) has to > be done by the core while computing the queryid and provided to pgss > in post_parse_analyse hook. That isn't ideal as it looks very > specific to pgss needs. On the other hand it means that you can now > use pgss with custom queryid heuristics by disabling compute_queryid > and having your module doing only that in post_parse_analyse_hook. > You'll however need to be careful to configure > shared_preload_libraries such that your custom module's > post_parse_analyse_hook is called first, so pgss' one can be called > with the needed JumbleState. Note that if no JumbleState is provided > pgss will store non normalized queries, but will otherwise behave as > intended. > > The 2nd patch is the rebased original queryid exposure patch. No big > changes, except that it now handles utility statements queryid > generated during post_parse_analysis, same as regular queries. This > should simplify the work needed for custom queryid third party > modules. > > The 3rd patch changes explain (verbose) to display the queryid if one > has been generated, whether by core or a third-party module. For > instance: > > rjuju=# set compute_queryid = on; > SET > rjuju=# explain (verbose) select relname from pg_class; > QUERY PLAN > ----------------------------------------------------------------------- > Seq Scan on pg_catalog.pg_class (cost=0.00..16.90 rows=390 width=64) > Output: relname > Query Identifier: -5494854185674379299 > (3 rows) There was a possibly uninitialized var issue in the previous patches (thanks cfbot), v13 fixes that.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Wed, Oct 14, 2020 at 05:43:33PM +0800, Julien Rouhaud wrote: > On Tue, Oct 13, 2020 at 4:53 AM Bruce Momjian <bruce@momjian.us> wrote: > > > > On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote: > > > Bruce Momjian <bruce@momjian.us> writes: > > > > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote: > > > >> Yeah, I agree --- a version number is the wrong way to think about this. > > > > > > > The version number was to invalidate _all_ query hashes if the > > > > algorithm is slightly modified, rather than invalidating just some of > > > > them, which could lead to confusion. > > > > > > Color me skeptical as to the use-case for that. From users' standpoints, > > > the hash is mainly going to change when we change the set of parse node > > > fields that get hashed. Which is going to happen at every major release > > > and no (or at least epsilon) minor releases. So I do not see a point in > > > tracking an algorithm version number as such. Seems like make-work. > > > > OK, I came up with the hash idea only to address one of your concerns > > about mismatched hashes for algorithm improvements/changes. Seems we > > might as well just document that cross-version hashes are different. > > Ok, so I tried to implement what seems to be the consensus. First > attached patch moves the current pgss queryid computation in core, > with a new compute_queryid GUC (on/off). One thing I don't really Why would someone turn compute_queryid off? Overhead? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Wed, Oct 14, 2020 at 10:09 PM Bruce Momjian <bruce@momjian.us> wrote: > > On Wed, Oct 14, 2020 at 05:43:33PM +0800, Julien Rouhaud wrote: > > On Tue, Oct 13, 2020 at 4:53 AM Bruce Momjian <bruce@momjian.us> wrote: > > > > > > On Mon, Oct 12, 2020 at 04:07:30PM -0400, Tom Lane wrote: > > > > Bruce Momjian <bruce@momjian.us> writes: > > > > > On Mon, Oct 12, 2020 at 02:26:15PM -0400, Tom Lane wrote: > > > > >> Yeah, I agree --- a version number is the wrong way to think about this. > > > > > > > > > The version number was to invalidate _all_ query hashes if the > > > > > algorithm is slightly modified, rather than invalidating just some of > > > > > them, which could lead to confusion. > > > > > > > > Color me skeptical as to the use-case for that. From users' standpoints, > > > > the hash is mainly going to change when we change the set of parse node > > > > fields that get hashed. Which is going to happen at every major release > > > > and no (or at least epsilon) minor releases. So I do not see a point in > > > > tracking an algorithm version number as such. Seems like make-work. > > > > > > OK, I came up with the hash idea only to address one of your concerns > > > about mismatched hashes for algorithm improvements/changes. Seems we > > > might as well just document that cross-version hashes are different. > > > > Ok, so I tried to implement what seems to be the consensus. First > > attached patch moves the current pgss queryid computation in core, > > with a new compute_queryid GUC (on/off). One thing I don't really > > Why would someone turn compute_queryid off? Overhead? Yes, or possibly to use a different algorithm.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Wed, Oct 14, 2020 at 10:21:24PM +0800, Julien Rouhaud wrote: > On Wed, Oct 14, 2020 at 10:09 PM Bruce Momjian <bruce@momjian.us> wrote: > > > > OK, I came up with the hash idea only to address one of your concerns > > > > about mismatched hashes for algorithm improvements/changes. Seems we > > > > might as well just document that cross-version hashes are different. > > > > > > Ok, so I tried to implement what seems to be the consensus. First > > > attached patch moves the current pgss queryid computation in core, > > > with a new compute_queryid GUC (on/off). One thing I don't really > > > > Why would someone turn compute_queryid off? Overhead? > > Yes, or possibly to use a different algorithm. Is there a measureable overhead when this is turned on, since it is off by default and maybe should default to on. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes: > Is there a measureable overhead when this is turned on, since it is off > by default and maybe should default to on. I don't believe that "default to on" can even be in the discussion. There is no in-core feature that would use this by default. regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Wed, Oct 14, 2020 at 10:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Bruce Momjian <bruce@momjian.us> writes: > > Is there a measureable overhead when this is turned on, since it is off > > by default and maybe should default to on. > > I don't believe that "default to on" can even be in the discussion. > There is no in-core feature that would use this by default. If the 2nd patch is applied there would be pg_stat_activity.queryid column, but I doubt that's a strong enough argument.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Wed, Oct 14, 2020 at 10:34:31PM +0800, Julien Rouhaud wrote: > On Wed, Oct 14, 2020 at 10:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Bruce Momjian <bruce@momjian.us> writes: > > > Is there a measureable overhead when this is turned on, since it is off > > > by default and maybe should default to on. > > > > I don't believe that "default to on" can even be in the discussion. > > There is no in-core feature that would use this by default. > > If the 2nd patch is applied there would be pg_stat_activity.queryid > column, but I doubt that's a strong enough argument. There is that, and log_line_prefix, which I can imaging being useful. My point is that if the queryid is visible, there should be a reason it defaults to show empty. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Wed, Oct 14, 2020 at 10:40 PM Bruce Momjian <bruce@momjian.us> wrote: > > On Wed, Oct 14, 2020 at 10:34:31PM +0800, Julien Rouhaud wrote: > > On Wed, Oct 14, 2020 at 10:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > > Bruce Momjian <bruce@momjian.us> writes: > > > > Is there a measureable overhead when this is turned on, since it is off > > > > by default and maybe should default to on. > > > > > > I don't believe that "default to on" can even be in the discussion. > > > There is no in-core feature that would use this by default. > > > > If the 2nd patch is applied there would be pg_stat_activity.queryid > > column, but I doubt that's a strong enough argument. > > There is that, and log_line_prefix, which I can imaging being useful. > My point is that if the queryid is visible, there should be a reason it > defaults to show empty. I did some naive benchmarking. Using a custom pgbench script with this query: SELECT * FROM pg_class c JOIN pg_attribute a ON a.attrelid = c.oid ORDER BY 1 DESC LIMIT 1; I can see around 2% overhead (this query is reported with ~ 3ms latency average). Adding a few joins, overhead goes down to 1%. Adding on top of the join some WHERE and GROUP BY conditions, overhead goes down to 0.2% (at that point average latency is around 9ms on my laptop). So having this enabled by default is probably only going to hit people with OLTP-style workload with a majority of queries running in a couple of milliseconds or less, which isn't that uncommon.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Thu, Oct 15, 2020 at 11:41:23AM +0800, Julien Rouhaud wrote: > On Wed, Oct 14, 2020 at 10:40 PM Bruce Momjian <bruce@momjian.us> wrote: > > There is that, and log_line_prefix, which I can imaging being useful. > > My point is that if the queryid is visible, there should be a reason it > > defaults to show empty. > > I did some naive benchmarking. Using a custom pgbench script with this query: > > SELECT * > FROM pg_class c > JOIN pg_attribute a ON a.attrelid = c.oid > ORDER BY 1 DESC > LIMIT 1; > > I can see around 2% overhead (this query is reported with ~ 3ms > latency average). Adding a few joins, overhead goes down to 1%. That number is too high to enable this by default. I suggest we either improve the performance of this, or clearly document that you have to enable the hash computation to see the pg_stat_activity and log_line_prefix fields. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Alvaro Herrera
Date:
On 2020-Oct-16, Bruce Momjian wrote: > On Thu, Oct 15, 2020 at 11:41:23AM +0800, Julien Rouhaud wrote: > > I did some naive benchmarking. Using a custom pgbench script with this query: > > I can see around 2% overhead (this query is reported with ~ 3ms > > latency average). Adding a few joins, overhead goes down to 1%. > > That number is too high to enable this by default. I suggest we either > improve the performance of this, or clearly document that you have to > enable the hash computation to see the pg_stat_activity and > log_line_prefix fields. Agreed. This is similar to how we used to deal with query strings: an optional feature, disabled by default (cf. commit b13c9686d084). In this case, I suppose using pg_stat_statement would require to have it enabled, and it'd just not collect anything if disabled. Similarly, the field would show NULL in pg_stat_activity or an empty string in log_line_prefix/CSV logs. So users that want it can easily have it, and users that don't are not paying the price. For maximum user-friendliness, pg_stat_statement could be loaded and shmem-initialized even when query ID computation is turned off, and you'd be able to enable query ID computation with just SIGHUP; so you don't have to restart the server in order to enable statement tracking. (I suppose we would forbid users from disabling query ID with SET, though.)
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > In this case, I suppose using pg_stat_statement would require to have it > enabled, and it'd just not collect anything if disabled. Alternatively, pg_stat_statement might be able to force it on (applying a non-overridable PGC_INTERNAL-level setting) on load? Not sure if that'd be desirable or not. If the behavior of pg_stat_statement is to do nothing when it sees a query without the ID calculated (which I guess it'd have to) then there's a potential security issue if the GUC is USERSET level: a user could hide her queries from pg_stat_statement by turning the GUC off. So this line of thought suggests the GUC needs to be at least SUSET, and maybe higher ... doesn't pg_stat_statement need it to have the same value cluster-wide? regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Fri, Oct 16, 2020 at 01:03:55PM -0300, Álvaro Herrera wrote: > On 2020-Oct-16, Bruce Momjian wrote: > > > On Thu, Oct 15, 2020 at 11:41:23AM +0800, Julien Rouhaud wrote: > > > > I did some naive benchmarking. Using a custom pgbench script with this query: > > > > I can see around 2% overhead (this query is reported with ~ 3ms > > > latency average). Adding a few joins, overhead goes down to 1%. > > > > That number is too high to enable this by default. I suggest we either > > improve the performance of this, or clearly document that you have to > > enable the hash computation to see the pg_stat_activity and > > log_line_prefix fields. > > Agreed. This is similar to how we used to deal with query strings: an > optional feature, disabled by default (cf. commit b13c9686d084). > > In this case, I suppose using pg_stat_statement would require to have it > enabled, and it'd just not collect anything if disabled. Similarly, the > field would show NULL in pg_stat_activity or an empty string in > log_line_prefix/CSV logs. Yes, and at each use point, e.g., pg_stat_activity, log_line_prefix, we have to remind people how to turn hash compuation on. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com The usefulness of a cup is in its emptiness, Bruce Lee
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Sat, Oct 17, 2020 at 12:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > In this case, I suppose using pg_stat_statement would require to have it > > enabled, and it'd just not collect anything if disabled. Yes, my idea was to be able to have pg_stat_statements enabled even if no queryid is computed without that being a problem, and the patch I sent should handle that properly, as pgss_store (and a few other places) check for a non-zero queryid before doing any work. Also, we can't have pg_stat_statements have any specific behavior based on the new GUC, as there could alternatively be another module that handles the queryid generation. > Alternatively, pg_stat_statement might be able to force it on > (applying a non-overridable PGC_INTERNAL-level setting) on load? > Not sure if that'd be desirable or not. > > If the behavior of pg_stat_statement is to do nothing when it > sees a query without the ID calculated (which I guess it'd have to) Yes that's what it does. > then there's a potential security issue if the GUC is USERSET level: > a user could hide her queries from pg_stat_statement by turning the > GUC off. So this line of thought suggests the GUC needs to be at > least SUSET, and maybe higher ... doesn't pg_stat_statement need it > to have the same value cluster-wide? Well, I don't think that there's any guarantee that pg_stat_statemens will display all activity that has been run, since there's a limited amount of (userid, dbid, queryid) that can be stored, but I agree that allowing random user to hide their activity isn't nice. Note that I defined the GUC as SUSET, but maybe it should be SIGHUP?
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Fri, Oct 16, 2020 at 11:04 PM Bruce Momjian <bruce@momjian.us> wrote: > > On Thu, Oct 15, 2020 at 11:41:23AM +0800, Julien Rouhaud wrote: > > On Wed, Oct 14, 2020 at 10:40 PM Bruce Momjian <bruce@momjian.us> wrote: > > > There is that, and log_line_prefix, which I can imaging being useful. > > > My point is that if the queryid is visible, there should be a reason it > > > defaults to show empty. > > > > I did some naive benchmarking. Using a custom pgbench script with this query: > > > > SELECT * > > FROM pg_class c > > JOIN pg_attribute a ON a.attrelid = c.oid > > ORDER BY 1 DESC > > LIMIT 1; > > > > I can see around 2% overhead (this query is reported with ~ 3ms > > latency average). Adding a few joins, overhead goes down to 1%. > > That number is too high to enable this by default. I suggest we either > improve the performance of this, or clearly document that you have to > enable the hash computation to see the pg_stat_activity and > log_line_prefix fields. I realize that I didn't update the documentation part to reflect the new GUC. I'll fix that and add more warnings about the requirements to have values displayed in pg_stat_acitivity and log_line_prefix.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Alvaro Herrera
Date:
On 2020-Oct-17, Julien Rouhaud wrote: > On Sat, Oct 17, 2020 at 12:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > then there's a potential security issue if the GUC is USERSET level: > > a user could hide her queries from pg_stat_statement by turning the > > GUC off. So this line of thought suggests the GUC needs to be at > > least SUSET, and maybe higher ... doesn't pg_stat_statement need it > > to have the same value cluster-wide? > > Well, I don't think that there's any guarantee that pg_stat_statemens > will display all activity that has been run, since there's a limited > amount of (userid, dbid, queryid) that can be stored, but I agree that > allowing random user to hide their activity isn't nice. Note that I > defined the GUC as SUSET, but maybe it should be SIGHUP? I don't think we should consider pg_stat_statement a bulletproof defense for security problems. It is already lossy by design. I do think it'd be preferrable if we allowed it to be disabled at the config file level only, not with SET (prevent users from hiding stuff); but I think it is useful to allow users to enable it for specific queries or for specific sessions only, while globally disabled. This might mean we need to mark it PGC_SIGHUP and then have the check hook disallow it from being changed under such-and-such conditions.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > On 2020-Oct-17, Julien Rouhaud wrote: >> On Sat, Oct 17, 2020 at 12:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> then there's a potential security issue if the GUC is USERSET level: >>> a user could hide her queries from pg_stat_statement by turning the >>> GUC off. So this line of thought suggests the GUC needs to be at >>> least SUSET, and maybe higher ... doesn't pg_stat_statement need it >>> to have the same value cluster-wide? > I don't think we should consider pg_stat_statement a bulletproof defense > for security problems. It is already lossy by design. Fair point, but if we allow several different values to be set in different sessions, what ends up happening in pg_stat_statements? On the other hand, maybe that's just a matter for documentation. "If the 'same' query is processed with two different queryID settings, that will generally result in two separate table entries, because the same ID hash is unlikely to be produced in both cases". There is certainly a use-case for wanting to be able to do this, if for example you'd like different query aggregation behavior for different applications. > I do think it'd be preferrable if we allowed it to be disabled at the > config file level only, not with SET (prevent users from hiding stuff); > but I think it is useful to allow users to enable it for specific > queries or for specific sessions only, while globally disabled. Indeed. I'm kind of talking myself into the idea that USERSET, or at most SUSET, is fine, so long as we document what happens when it has different values in different sessions. regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Alvaro Herrera
Date:
On 2020-Oct-17, Tom Lane wrote: > Fair point, but if we allow several different values to be set in > different sessions, what ends up happening in pg_stat_statements? > > On the other hand, maybe that's just a matter for documentation. > "If the 'same' query is processed with two different queryID settings, > that will generally result in two separate table entries, because > the same ID hash is unlikely to be produced in both cases". Wait ... what? I've been thinking that this GUC is just to enable or disable the computation of query ID, not to change the algorithm to do so. Do we really need to allow different algorithms in different sessions?
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Wait ... what? I've been thinking that this GUC is just to enable or > disable the computation of query ID, not to change the algorithm to do > so. Do we really need to allow different algorithms in different > sessions? We established that some time ago, no? regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Sun, Oct 18, 2020 at 12:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > Wait ... what? I've been thinking that this GUC is just to enable or > > disable the computation of query ID, not to change the algorithm to do > > so. Do we really need to allow different algorithms in different > > sessions? > > We established that some time ago, no? I thought we established the need for allowing different algorithms, but I assumed globally not per session. Anyway, allowing to enable or disable compute_queryid per session would technically allow that, assuming that you have another module loaded that computes a queryid only if no-one was already computed. In that case pg_stat_statements works as you would expect, you will get a new entry, with a duplicated query text. With a bit more thinking, there's at least one use case where it's interesting to disable pg_stat_statements: queries using temporary tables. In that case you're guaranteed to generate an infinity of different queryid. That doesn't really help since you're not aggregating anything anymore, and it also makes pg_stat_statements virtually unusable as once you have a workload that needs frequent eviction, the overhead is so bad that you basically have to disable pg_stat_statements. We could alternatively add a GUC to disable queryid computation when one of the tables is a temporary table, but that's yet one among many considerations that are probably best answered with a custom implementation. I'm also attaching an updated patch with some attempt to improve the documentation. I mention that in-core algorithm may not suits everyone's needs, but we don't actually document what heuristics are. Should we give more details on them and what are the most direct consequences?
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Sun, Oct 18, 2020 at 4:12 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Sun, Oct 18, 2020 at 12:20 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > > Wait ... what? I've been thinking that this GUC is just to enable or > > > disable the computation of query ID, not to change the algorithm to do > > > so. Do we really need to allow different algorithms in different > > > sessions? > > > > We established that some time ago, no? > > I thought we established the need for allowing different algorithms, > but I assumed globally not per session. Anyway, allowing to enable or > disable compute_queryid per session would technically allow that, > assuming that you have another module loaded that computes a queryid > only if no-one was already computed. In that case pg_stat_statements > works as you would expect, you will get a new entry, with a duplicated > query text. > > With a bit more thinking, there's at least one use case where it's > interesting to disable pg_stat_statements: queries using temporary > tables. In that case you're guaranteed to generate an infinity of > different queryid. That doesn't really help since you're not > aggregating anything anymore, and it also makes pg_stat_statements > virtually unusable as once you have a workload that needs frequent > eviction, the overhead is so bad that you basically have to disable > pg_stat_statements. We could alternatively add a GUC to disable > queryid computation when one of the tables is a temporary table, but > that's yet one among many considerations that are probably best > answered with a custom implementation. > > I'm also attaching an updated patch with some attempt to improve the > documentation. I mention that in-core algorithm may not suits > everyone's needs, but we don't actually document what heuristics are. > Should we give more details on them and what are the most direct > consequences? v15 that fixes recent conflicts.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Fri, Jan 8, 2021 at 1:07 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > v15 that fixes recent conflicts. Rebase only, thanks to the cfbot! V16 attached.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Tatsuro Yamada
Date:
Hi Julien, > Rebase only, thanks to the cfbot! V16 attached. I tested the v16 patch on a0efda88a by using "make installcheck-parallel", and my result is the following. Attached file is regression.diffs. ======================== 1 of 202 tests failed. ======================== The differences that caused some tests to fail can be viewed in the file "/home/postgres/PG140/src/test/regress/regression.diffs". A copy of the test summary that you see above is saved in the file "/home/postgres/PG140/src/test/regress/regression.out". src/test/regress/regression.diffs --------------------------------- diff -U3 /home/postgres/PG140/src/test/regress/expected/rules.out /home/postgres/PG140/src/test/regress/results/rules.out --- /home/postgres/PG140/src/test/regress/expected/rules.out 2021-01-20 08:41:16.383175559 +0900 +++ /home/postgres/PG140/src/test/regress/results/rules.out 2021-01-20 08:43:46.589171774 +0900 @@ -1760,10 +1760,9 @@ s.state, s.backend_xid, s.backend_xmin, - s.queryid, s.query, s.backend_type - FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event,xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid,backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial,ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid, queryid) + FROM ((pg_stat_get_activity(NULL::integer) s(datid, pid, usesysid, application_name, state, query, wait_event_type, wait_event,xact_start, query_start, backend_start, state_change, client_addr, client_hostname, client_port, backend_xid,backend_xmin, backend_type, ssl, sslversion, sslcipher, sslbits, sslcompression, ssl_client_dn, ssl_client_serial,ssl_issuer_dn, gss_auth, gss_princ, gss_enc, leader_pid) ... Thanks, Tatsuro Yamada
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Tatsuro Yamada
Date:
Hi Julien, >> Rebase only, thanks to the cfbot! V16 attached. > > I tested the v16 patch on a0efda88a by using "make installcheck-parallel", and > my result is the following. Attached file is regression.diffs. Sorry, my environment was not suitable for the test when I sent my previous email. I fixed my environment and tested it again, and it was a success. See below: ======================= All 202 tests passed. ======================= Regards, Tatsuro Yamada
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
Hello Yamada-san, On Wed, Jan 20, 2021 at 10:06 AM Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp> wrote: > > Hi Julien, > > > >> Rebase only, thanks to the cfbot! V16 attached. > > > > I tested the v16 patch on a0efda88a by using "make installcheck-parallel", and > > my result is the following. Attached file is regression.diffs. > > > Sorry, my environment was not suitable for the test when I sent my previous email. > I fixed my environment and tested it again, and it was a success. See below: > > ======================= > All 202 tests passed. > ======================= No worries, thanks a lot for testing!
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Wed, Mar 17, 2021 at 12:24:44PM -0400, Bruce Momjian wrote: > On Wed, Mar 17, 2021 at 05:16:50PM +0100, Pavel Stehule wrote: > > > > > > st 17. 3. 2021 v 17:03 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal: > > > > Bruce Momjian <bruce@momjian.us> writes: > > > On Wed, Mar 17, 2021 at 11:28:38AM -0400, Tom Lane wrote: > > >> I still say that it's a serious mistake to sanctify a query ID > > calculation > > >> method that was designed only for pg_stat_statement's needs as the one > > >> true way to do it. But that's what exposing it in a core view would do. > > > > > OK, I am fine with creating a new method, and maybe having > > > pg_stat_statements use it. Is that the direction we should be going in? > > > > The point is that we've understood Query.queryId as something that > > different extensions might calculate differently for their own needs. > > In particular it's easy to imagine extensions that want an ID that is > > less fuzzy than what pg_stat_statements wants. We never had a plan for > > how two such extensions could co-exist, but at least it was possible > > to use one if you didn't use another. If this gets moved into core > > then there will basically be only one way that anyone can do it. > > > > Maybe what we need is a design for allowing more than one query ID. > > > > > > Theoretically there can be a hook for calculation of queryid, that can be by > > used extension. Default can be assigned with a method that is used by > > pg_stat_statements. > > Yes, that is what the code patch says it does. > > > I don't think it is possible to use more different query id for > > pg_stat_statements so this solution can be simple. > > Agreed. Actually, putting the query identifer computation in the core makes it way more tunable, even if it's conterintuitive. What it means is that you can now chose to use usual pgss' algorithm or a different one for log_line_prefix and pg_stat_activity.queryid, but also that you can now use pgss with a different query id algorithm. That's another thing that user were asking for a long time. I originally suggested to make it clearer by having an enum GUC rather than a boolean, say compute_queryid = [ none | core | external ], and if set to external then a hook would be explicitely called. Right now, "none" and "external" are binded with compute_queryid = off, and depends on whether an extension is computing a queryid during post_parse_analyse_hook. It could later be extended to suit other needs if we ever come to some agreement (for instance "legacy", "logical_replication_stable" or whatever better name we can find for something that doesn't depend on Oid).
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Robert Haas
Date:
On Wed, Mar 17, 2021 at 12:48 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > I originally suggested to make it clearer by having an enum GUC rather than a > boolean, say compute_queryid = [ none | core | external ], and if set to > external then a hook would be explicitely called. Right now, "none" and > "external" are binded with compute_queryid = off, and depends on whether an > extension is computing a queryid during post_parse_analyse_hook. I would just make it a Boolean and have a hook. The Boolean controls whether it gets computed at all, and the hook lets an external module override the way it gets computed. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Wed, Mar 17, 2021 at 04:04:44PM -0400, Robert Haas wrote: > On Wed, Mar 17, 2021 at 12:48 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > I originally suggested to make it clearer by having an enum GUC rather than a > > boolean, say compute_queryid = [ none | core | external ], and if set to > > external then a hook would be explicitely called. Right now, "none" and > > "external" are binded with compute_queryid = off, and depends on whether an > > extension is computing a queryid during post_parse_analyse_hook. > > I would just make it a Boolean and have a hook. The Boolean controls > whether it gets computed at all, and the hook lets an external module > override the way it gets computed. OK, is that what everyone wants? I think that is what the patch already does. I think having multiple queryids used in a single cluster is much too confusing to support. You would have to label and control which queryid is displayed by pg_stat_activity and log_line_prefix, and that seems too confusing and not useful. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Wed, Mar 17, 2021 at 06:32:16PM -0400, Bruce Momjian wrote: > On Wed, Mar 17, 2021 at 04:04:44PM -0400, Robert Haas wrote: > > On Wed, Mar 17, 2021 at 12:48 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > I originally suggested to make it clearer by having an enum GUC rather than a > > > boolean, say compute_queryid = [ none | core | external ], and if set to > > > external then a hook would be explicitely called. Right now, "none" and > > > "external" are binded with compute_queryid = off, and depends on whether an > > > extension is computing a queryid during post_parse_analyse_hook. > > > > I would just make it a Boolean and have a hook. The Boolean controls > > whether it gets computed at all, and the hook lets an external module > > override the way it gets computed. > > OK, is that what everyone wants? I think that is what the patch already > does. Note exactly. Right now a custom queryid can be computed even if compute_queryid is off, if some extension does that in post_parse_analyze_hook. I'm assuming that what Robert was thinking was more like: if (compute_queryid) { if (queryid_hook) queryId = queryid_hook(...); else queryId = JumbeQuery(...); } else queryId = 0; And that should be done *after* post_parse_analyse_hook so that it's clear that this hook is no longer the place to compute queryid. Is that what should be done? > I think having multiple queryids used in a single cluster is much too > confusing to support. You would have to label and control which queryid > is displayed by pg_stat_activity and log_line_prefix, and that seems too > confusing and not useful. I agree.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Thu, Mar 18, 2021 at 07:29:56AM +0800, Julien Rouhaud wrote: > On Wed, Mar 17, 2021 at 06:32:16PM -0400, Bruce Momjian wrote: > > OK, is that what everyone wants? I think that is what the patch already > > does. > > Note exactly. Right now a custom queryid can be computed even if > compute_queryid is off, if some extension does that in post_parse_analyze_hook. > > I'm assuming that what Robert was thinking was more like: > > if (compute_queryid) > { > if (queryid_hook) > queryId = queryid_hook(...); > else > queryId = JumbeQuery(...); > } > else > queryId = 0; > > And that should be done *after* post_parse_analyse_hook so that it's clear that > this hook is no longer the place to compute queryid. > > Is that what should be done? No, I don't think so. I think having extensions change behavior controlled by GUCs is a bad interface. The docs are going to say that you have to enable compute_queryid to see the query id in pg_stat_activity and log_line_prefix, but if you install an extension, the query id will be visible even if you don't have compute_queryid enabled. I think you need to only honor the hook if compute_queryid is enabled, and update the pg_stat_statements docs to say you have to enable compute_queryid for pg_stat_statements to work. Also, should it be compute_queryid or compute_query_id? Also, the overhead of computing the query id was reported as 2% --- that seems quite high for what it does. Do we know why it is so high? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Thu, Mar 18, 2021 at 09:47:29AM -0400, Bruce Momjian wrote: > On Thu, Mar 18, 2021 at 07:29:56AM +0800, Julien Rouhaud wrote: > > On Wed, Mar 17, 2021 at 06:32:16PM -0400, Bruce Momjian wrote: > > > OK, is that what everyone wants? I think that is what the patch already > > > does. > > > > Note exactly. Right now a custom queryid can be computed even if > > compute_queryid is off, if some extension does that in post_parse_analyze_hook. > > > > I'm assuming that what Robert was thinking was more like: > > > > if (compute_queryid) > > { > > if (queryid_hook) > > queryId = queryid_hook(...); > > else > > queryId = JumbeQuery(...); > > } > > else > > queryId = 0; > > > > And that should be done *after* post_parse_analyse_hook so that it's clear that > > this hook is no longer the place to compute queryid. > > > > Is that what should be done? > > No, I don't think so. I think having extensions change behavior > controlled by GUCs is a bad interface. > > The docs are going to say that you have to enable compute_queryid to see > the query id in pg_stat_activity and log_line_prefix, but if you install > an extension, the query id will be visible even if you don't have > compute_queryid enabled. I think you need to only honor the hook if > compute_queryid is enabled, and update the pg_stat_statements docs to > say you have to enable compute_queryid for pg_stat_statements to work. I'm confused, what you described really looks like what I described. Let me try to clarify: - if compute_queryid is off, a queryid should never be seen no matter how hard an extension tries - if compute_queryid is on, the calculation will be done by the core (using pgss JumbeQuery) unless an extension computed one already. The only way to know what algorithm is used is to check the list of extension loaded. - if some extension calculates a queryid during post_parse_analyze_hook, we will always reset it. Is that the approach you want? Note that the only way to not honor the hook is iff the new GUC is disabled is to have a new queryid_hook, as we can't stop calling post_parse_analyze_hook if the new GUC is off, and we don't want to pay the queryid calculation overhead if the admin explicitly said it wasn't needed. > Also, should it be compute_queryid or compute_query_id? Maybe compute_query_identifier? > Also, the overhead of computing the query id was reported as 2% --- that > seems quite high for what it does. Do we know why it is so high? The 2% was a worst case scenario, for a query with a single join over ridiculously small pg_class and pg_attribute, in read only. The whole workload was in shared buffers so the planning and execution is quite fast. Adding some complexity in the query really limited the overhead. Note that this was done on an old laptop with quite slow CPU. Maybe someone with a better hardware than a 5/6yo laptop could get some more realistic results (I unfortunately don't have anything to try on).
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Fri, Mar 19, 2021 at 02:06:56AM +0800, Julien Rouhaud wrote: > On Thu, Mar 18, 2021 at 09:47:29AM -0400, Bruce Momjian wrote: > > On Thu, Mar 18, 2021 at 07:29:56AM +0800, Julien Rouhaud wrote: > > > Note exactly. Right now a custom queryid can be computed even if > > > compute_queryid is off, if some extension does that in post_parse_analyze_hook. The above text is the part that made me think an extension could display a query id even if disabled by the GUC. > > The docs are going to say that you have to enable compute_queryid to see > > the query id in pg_stat_activity and log_line_prefix, but if you install > > an extension, the query id will be visible even if you don't have > > compute_queryid enabled. I think you need to only honor the hook if > > compute_queryid is enabled, and update the pg_stat_statements docs to > > say you have to enable compute_queryid for pg_stat_statements to work. > > I'm confused, what you described really looks like what I described. > > Let me try to clarify: > > - if compute_queryid is off, a queryid should never be seen no matter how hard > an extension tries Oh, OK. I can see an extension setting the query id on its own --- we can't prevent that from happening. It is probably enough to tell extensions to honor the GUC, since they would want it enabled so it displays in pg_stat_activity and log_line_prefix. > - if compute_queryid is on, the calculation will be done by the core > (using pgss JumbeQuery) unless an extension computed one already. The only > way to know what algorithm is used is to check the list of extension loaded. OK. > - if some extension calculates a queryid during post_parse_analyze_hook, we > will always reset it. OK, good. > Is that the approach you want? Yes, I think so. > Note that the only way to not honor the hook is iff the new GUC is disabled is > to have a new queryid_hook, as we can't stop calling post_parse_analyze_hook if > the new GUC is off, and we don't want to pay the queryid calculation overhead > if the admin explicitly said it wasn't needed. Right, let's just get the extensions to honor the GUC --- we don't need to block them or anything. > > Also, should it be compute_queryid or compute_query_id? > > Maybe compute_query_identifier? I think compute_query_id works, and is shorter. > > Also, the overhead of computing the query id was reported as 2% --- that > > seems quite high for what it does. Do we know why it is so high? > > The 2% was a worst case scenario, for a query with a single join over > ridiculously small pg_class and pg_attribute, in read only. The whole workload > was in shared buffers so the planning and execution is quite fast. Adding some > complexity in the query really limited the overhead. > > Note that this was done on an old laptop with quite slow CPU. Maybe > someone with a better hardware than a 5/6yo laptop could get some more > realistic results (I unfortunately don't have anything to try on). OK, good to know. I can run some tests here if people would like me to. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Thu, Mar 18, 2021 at 03:23:49PM -0400, Bruce Momjian wrote: > On Fri, Mar 19, 2021 at 02:06:56AM +0800, Julien Rouhaud wrote: > > The above text is the part that made me think an extension could display > a query id even if disabled by the GUC. With the last version of the patch I sent it was the case. > Oh, OK. I can see an extension setting the query id on its own --- we > can't prevent that from happening. It is probably enough to tell > extensions to honor the GUC, since they would want it enabled so it > displays in pg_stat_activity and log_line_prefix. Ok. So no new hook, and we keep using post_parse_analyze_hook as the official way to have custom queryid implementation, with this new behavior: > > - if some extension calculates a queryid during post_parse_analyze_hook, we > > will always reset it. > > OK, good. Now that I'm back on the code I remember why I did it this way. It's unfortunately not really possible to make things work this way. pg_stat_statements' post_parse_analyze_hook relies on a queryid already being computed, as it's where we know where the constants are recorded. It means: - we have to call post_parse_analyze_hook *after* doing core queryid calculation - if users want to use a third party module to calculate a queryid, they'll have to make sure that the module's post_parse_analyze_hook is called *before* pg_stat_statements' one. - even if they do so, they'll still have to pay the price of core queryid calculation So it would be very hard to configure and will be too expensive. I think that we have to choose to either we make compute_query_id only trigger core calculation (like it was in previous patch version), or introduce a new hook. > I think compute_query_id works, and is shorter. WFM. > OK, good to know. I can run some tests here if people would like me to. +1. A read only pgbench will be some kind od worse case scenario that can be used I think.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Fri, Mar 19, 2021 at 11:16:50AM +0800, Julien Rouhaud wrote: > On Thu, Mar 18, 2021 at 03:23:49PM -0400, Bruce Momjian wrote: > > On Fri, Mar 19, 2021 at 02:06:56AM +0800, Julien Rouhaud wrote: > > > > The above text is the part that made me think an extension could display > > a query id even if disabled by the GUC. > > With the last version of the patch I sent it was the case. > > > Oh, OK. I can see an extension setting the query id on its own --- we > > can't prevent that from happening. It is probably enough to tell > > extensions to honor the GUC, since they would want it enabled so it > > displays in pg_stat_activity and log_line_prefix. > > Ok. So no new hook, and we keep using post_parse_analyze_hook as the official > way to have custom queryid implementation, with this new behavior: > > > > - if some extension calculates a queryid during post_parse_analyze_hook, we > > > will always reset it. > > > > OK, good. > > Now that I'm back on the code I remember why I did it this way. It's > unfortunately not really possible to make things work this way. > > pg_stat_statements' post_parse_analyze_hook relies on a queryid already being > computed, as it's where we know where the constants are recorded. It means: > > - we have to call post_parse_analyze_hook *after* doing core queryid > calculation > - if users want to use a third party module to calculate a queryid, they'll > have to make sure that the module's post_parse_analyze_hook is called > *before* pg_stat_statements' one. > - even if they do so, they'll still have to pay the price of core queryid > calculation OK, that makes perfect sense. I think the best solution is to document that compute_query_id just controls the built-in computation of the query id, and that extensions can also compute it if this is off, and pg_stat_activity and log_line_prefix will display built-in or extension computed query ids. It might be interesting someday to check if the hook changed a pre-computed query id and warn the user in the logs, but that could cause more log-spam problems than help. I am a little worried that someone might have compute_query_id enabled and then install an extension that overwrites it, but we will just have to document this issue. Hopefully extensions will be clear that they are computing their own query id. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Hannu Krosing
Date:
On Fri, Mar 19, 2021 at 2:29 PM Bruce Momjian <bruce@momjian.us> wrote: > > OK, that makes perfect sense. I think the best solution is to document > that compute_query_id just controls the built-in computation of the > query id, and that extensions can also compute it if this is off, and > pg_stat_activity and log_line_prefix will display built-in or extension > computed query ids. > > It might be interesting someday to check if the hook changed a > pre-computed query id and warn the user in the logs, but that could > cause more log-spam problems than help. The log-spam could be mitigated by logging it just once per connection the first time it is overridden Also, we could ask the extensions to expose the "method name" in a read-only GUC so one can do SHOW compute_query_id_method; and get the name of method use compute_query_id_method ------------------------------------ builtin And it may even dynamically change to indicate the overriding of builtin compute_query_id_method --------------------------------------------------- fancy_compute_query_id (overrides builtin)
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Fri, Mar 19, 2021 at 09:29:06AM -0400, Bruce Momjian wrote: > On Fri, Mar 19, 2021 at 11:16:50AM +0800, Julien Rouhaud wrote: > > Now that I'm back on the code I remember why I did it this way. It's > > unfortunately not really possible to make things work this way. > > > > pg_stat_statements' post_parse_analyze_hook relies on a queryid already being > > computed, as it's where we know where the constants are recorded. It means: > > > > - we have to call post_parse_analyze_hook *after* doing core queryid > > calculation > > - if users want to use a third party module to calculate a queryid, they'll > > have to make sure that the module's post_parse_analyze_hook is called > > *before* pg_stat_statements' one. > > - even if they do so, they'll still have to pay the price of core queryid > > calculation > > OK, that makes perfect sense. I think the best solution is to document > that compute_query_id just controls the built-in computation of the > query id, and that extensions can also compute it if this is off, and > pg_stat_activity and log_line_prefix will display built-in or extension > computed query ids. So the last version of the patch should implement that behavior right? It's just missing some explicit guidance that third-party extensions should only calculate a queryid if compute_query_id is off > It might be interesting someday to check if the hook changed a > pre-computed query id and warn the user in the logs, but that could > cause more log-spam problems than help. I am a little worried that > someone might have compute_query_id enabled and then install an > extension that overwrites it, but we will just have to document this > issue. Hopefully extensions will be clear that they are computing their > own query id. I agree. And hopefully they will split the queryid calculation from the rest of the extension so that users can use the combination they want.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Fri, Mar 19, 2021 at 02:54:16PM +0100, Hannu Krosing wrote: > On Fri, Mar 19, 2021 at 2:29 PM Bruce Momjian <bruce@momjian.us> wrote: > > > > OK, that makes perfect sense. I think the best solution is to document > > that compute_query_id just controls the built-in computation of the > > query id, and that extensions can also compute it if this is off, and > > pg_stat_activity and log_line_prefix will display built-in or extension > > computed query ids. > > > > It might be interesting someday to check if the hook changed a > > pre-computed query id and warn the user in the logs, but that could > > cause more log-spam problems than help. > > The log-spam could be mitigated by logging it just once per connection > the first time it is overridden Yes, but it might still generate a significant amount of additional lines. If extensions authors follow the recommendations and only calculate a queryid when compute_query_id is off, it shoule be easy to check that you have everything setup properly. > Also, we could ask the extensions to expose the "method name" in a read-only GUC > > so one can do > > SHOW compute_query_id_method; > > and get the name of method use > > compute_query_id_method > ------------------------------------ > builtin > > And it may even dynamically change to indicate the overriding of builtin > > compute_query_id_method > --------------------------------------------------- > fancy_compute_query_id (overrides builtin) This could be nice, but I'm not sure that it would work well if someones install multiple extensions that calculate a queryid (which would be silly but still), or load another one at runtime.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Fri, Mar 19, 2021 at 10:35:21PM +0800, Julien Rouhaud wrote: > On Fri, Mar 19, 2021 at 02:54:16PM +0100, Hannu Krosing wrote: > > On Fri, Mar 19, 2021 at 2:29 PM Bruce Momjian <bruce@momjian.us> wrote: > > The log-spam could be mitigated by logging it just once per connection > > the first time it is overridden > > Yes, but it might still generate a significant amount of additional lines. > > If extensions authors follow the recommendations and only calculate a queryid > when compute_query_id is off, it shoule be easy to check that you have > everything setup properly. Seems extensions that want to generate their own query id should just error out with a message to the log file if compute_query_id is set --- that should fix the entire issue --- but see below. > > Also, we could ask the extensions to expose the "method name" in a read-only GUC > > > > so one can do > > > > SHOW compute_query_id_method; > > > > and get the name of method use > > > > compute_query_id_method > > ------------------------------------ > > builtin > > > > And it may even dynamically change to indicate the overriding of builtin > > > > compute_query_id_method > > --------------------------------------------------- > > fancy_compute_query_id (overrides builtin) > > This could be nice, but I'm not sure that it would work well if someones > install multiple extensions that calculate a queryid (which would be silly but > still), or load another one at runtime. Well, given we don't really want to support multiple query id types being generated or displayed, the "error out" above should fix it. Let's do this --- tell extensions to error out if the query id is already set, either by compute_query_id or another extension. If an extension wants to generate its own query id and store is internal to the extension, that is fine, but the server-displayed query id should be generated once and never overwritten by an extension. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Fri, Mar 19, 2021 at 10:27:51PM +0800, Julien Rouhaud wrote: > On Fri, Mar 19, 2021 at 09:29:06AM -0400, Bruce Momjian wrote: > > OK, that makes perfect sense. I think the best solution is to document > > that compute_query_id just controls the built-in computation of the > > query id, and that extensions can also compute it if this is off, and > > pg_stat_activity and log_line_prefix will display built-in or extension > > computed query ids. > > So the last version of the patch should implement that behavior right? It's > just missing some explicit guidance that third-party extensions should only > calculate a queryid if compute_query_id is off Yes, I think we are now down to just how the extensions should be told to behave, and how we document this --- see the email I just sent. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Hannu Krosing
Date:
It would be really convenient if user-visible serialisations of the query id had something that identifies the computation method.
maybe prefix 'N' for internal, 'S' for pg_stat_statements etc.
This would immediately show in logs at what point the id calculator was changed
maybe prefix 'N' for internal, 'S' for pg_stat_statements etc.
This would immediately show in logs at what point the id calculator was changed
On Fri, Mar 19, 2021 at 5:54 PM Bruce Momjian <bruce@momjian.us> wrote:
On Fri, Mar 19, 2021 at 10:27:51PM +0800, Julien Rouhaud wrote:
> On Fri, Mar 19, 2021 at 09:29:06AM -0400, Bruce Momjian wrote:
> > OK, that makes perfect sense. I think the best solution is to document
> > that compute_query_id just controls the built-in computation of the
> > query id, and that extensions can also compute it if this is off, and
> > pg_stat_activity and log_line_prefix will display built-in or extension
> > computed query ids.
>
> So the last version of the patch should implement that behavior right? It's
> just missing some explicit guidance that third-party extensions should only
> calculate a queryid if compute_query_id is off
Yes, I think we are now down to just how the extensions should be told
to behave, and how we document this --- see the email I just sent.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Sat, Mar 20, 2021 at 01:03:16AM +0100, Hannu Krosing wrote: > It would be really convenient if user-visible serialisations of the query id > had something that identifies the computation method. > > maybe prefix 'N' for internal, 'S' for pg_stat_statements etc. > > This would immediately show in logs at what point the id calculator was changed Yeah, but it an integer, and I don't think we want to change that. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Fri, Mar 19, 2021 at 08:10:54PM -0400, Bruce Momjian wrote: > On Sat, Mar 20, 2021 at 01:03:16AM +0100, Hannu Krosing wrote: > > It would be really convenient if user-visible serialisations of the query id > > had something that identifies the computation method. > > > > maybe prefix 'N' for internal, 'S' for pg_stat_statements etc. > > > > This would immediately show in logs at what point the id calculator was changed > > Yeah, but it an integer, and I don't think we want to change that. Also, with Bruce's approach to ask extensions to error out if they would overwrite a queryid the only way to change the calculation method is a restart. So only one source can exist in the system. Hopefully that's a big enough hammer that administrators will know what method they're using.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Fri, Mar 19, 2021 at 12:53:18PM -0400, Bruce Momjian wrote: > > Well, given we don't really want to support multiple query id types > being generated or displayed, the "error out" above should fix it. > > Let's do this --- tell extensions to error out if the query id is > already set, either by compute_query_id or another extension. If an > extension wants to generate its own query id and store is internal to > the extension, that is fine, but the server-displayed query id should be > generated once and never overwritten by an extension. Agreed, this will ensure that you won't dynamically change the queryid source. We should also document that changing it requires a restart and calling pg_stat_statements_reset() afterwards. v19 adds some changes, plus extra documentation for pg_stat_statements about the requirement for a queryid to be calculated, and a note that all documented details only apply for in-core source. I'm not sure if this is still the best place to document those details anymore though.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Sat, Mar 20, 2021 at 02:12:34PM +0800, Julien Rouhaud wrote: > On Fri, Mar 19, 2021 at 12:53:18PM -0400, Bruce Momjian wrote: > > > > Well, given we don't really want to support multiple query id types > > being generated or displayed, the "error out" above should fix it. > > > > Let's do this --- tell extensions to error out if the query id is > > already set, either by compute_query_id or another extension. If an > > extension wants to generate its own query id and store is internal to > > the extension, that is fine, but the server-displayed query id should be > > generated once and never overwritten by an extension. > > Agreed, this will ensure that you won't dynamically change the queryid source. > > We should also document that changing it requires a restart and calling > pg_stat_statements_reset() afterwards. > > v19 adds some changes, plus extra documentation for pg_stat_statements about > the requirement for a queryid to be calculated, and a note that all documented > details only apply for in-core source. I'm not sure if this is still the best > place to document those details anymore though. OK, after reading the entire thread, I don't think there are any remaining open issues with this patch and I think this is ready for committing. I have adjusted the doc section of the patches, attached. I have marked myself as committer in the commitfest app and hope to apply it in the next few days based on feedback. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Zhihong Yu
Date:
Hi,
For queryjumble.c :
+ * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
The year should be updated.
Same with queryjumble.h
Cheers
On Mon, Mar 22, 2021 at 2:56 PM Bruce Momjian <bruce@momjian.us> wrote:
On Sat, Mar 20, 2021 at 02:12:34PM +0800, Julien Rouhaud wrote:
> On Fri, Mar 19, 2021 at 12:53:18PM -0400, Bruce Momjian wrote:
> >
> > Well, given we don't really want to support multiple query id types
> > being generated or displayed, the "error out" above should fix it.
> >
> > Let's do this --- tell extensions to error out if the query id is
> > already set, either by compute_query_id or another extension. If an
> > extension wants to generate its own query id and store is internal to
> > the extension, that is fine, but the server-displayed query id should be
> > generated once and never overwritten by an extension.
>
> Agreed, this will ensure that you won't dynamically change the queryid source.
>
> We should also document that changing it requires a restart and calling
> pg_stat_statements_reset() afterwards.
>
> v19 adds some changes, plus extra documentation for pg_stat_statements about
> the requirement for a queryid to be calculated, and a note that all documented
> details only apply for in-core source. I'm not sure if this is still the best
> place to document those details anymore though.
OK, after reading the entire thread, I don't think there are any
remaining open issues with this patch and I think this is ready for
committing. I have adjusted the doc section of the patches, attached.
I have marked myself as committer in the commitfest app and hope to
apply it in the next few days based on feedback.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Mon, Mar 22, 2021 at 05:17:15PM -0700, Zhihong Yu wrote: > Hi, > For queryjumble.c : > > + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group > > The year should be updated. > Same with queryjumble.h Thanks, fixed. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Mon, Mar 22, 2021 at 05:55:54PM -0400, Bruce Momjian wrote: > > OK, after reading the entire thread, I don't think there are any > remaining open issues with this patch and I think this is ready for > committing. I have adjusted the doc section of the patches, attached. > I have marked myself as committer in the commitfest app and hope to > apply it in the next few days based on feedback. Thanks a lot Bruce! I looked at the changes in the attached patches and that's a clear improvements, thanks a lot for that.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Mon, Mar 22, 2021 at 08:43:40PM -0400, Bruce Momjian wrote: > On Mon, Mar 22, 2021 at 05:17:15PM -0700, Zhihong Yu wrote: > > Hi, > > For queryjumble.c : > > > > + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group > > > > The year should be updated. > > Same with queryjumble.h > > Thanks, fixed. Thanks also for taking care of that. While at it I see that current HEAD has a lot of files with the same problem: $ git grep "\-2020" config/config.guess:# Copyright 1992-2020 Free Software Foundation, Inc. config/config.guess:Copyright 1992-2020 Free Software Foundation, Inc. config/config.sub:# Copyright 1992-2020 Free Software Foundation, Inc. config/config.sub:Copyright 1992-2020 Free Software Foundation, Inc. contrib/pageinspect/gistfuncs.c: * Copyright (c) 2014-2020, PostgreSQL Global Development Group src/backend/rewrite/rewriteSearchCycle.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group src/backend/utils/adt/jsonbsubs.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group src/bin/pg_archivecleanup/po/de.po:# Copyright (C) 2019-2020 PostgreSQL Global Development Group src/bin/pg_rewind/po/de.po:# Copyright (C) 2015-2020 PostgreSQL Global Development Group src/bin/pg_rewind/po/de.po:# Peter Eisentraut <peter@eisentraut.org>, 2015-2020. src/common/hex.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group src/common/sha1.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group src/common/sha1_int.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group src/include/common/hex.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group src/include/common/sha1.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group src/include/port/pg_iovec.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group src/include/rewrite/rewriteSearchCycle.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group src/interfaces/ecpg/preproc/po/de.po:# Copyright (C) 2009-2020 PostgreSQL Global Development Group src/interfaces/ecpg/preproc/po/de.po:# Peter Eisentraut <peter@eisentraut.org>, 2009-2020. Is that an oversight in ca3b37487be333a1d241dab1bbdd17a211a88f43, at least for non .po files?
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Tue, Mar 23, 2021 at 02:36:27PM +0800, Julien Rouhaud wrote: > On Mon, Mar 22, 2021 at 08:43:40PM -0400, Bruce Momjian wrote: > > On Mon, Mar 22, 2021 at 05:17:15PM -0700, Zhihong Yu wrote: > > > Hi, > > > For queryjumble.c : > > > > > > + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group > > > > > > The year should be updated. > > > Same with queryjumble.h > > > > Thanks, fixed. > > Thanks also for taking care of that. While at it I see that current HEAD has a > lot of files with the same problem: > > $ git grep "\-2020" > config/config.guess:# Copyright 1992-2020 Free Software Foundation, Inc. > config/config.guess:Copyright 1992-2020 Free Software Foundation, Inc. > config/config.sub:# Copyright 1992-2020 Free Software Foundation, Inc. > config/config.sub:Copyright 1992-2020 Free Software Foundation, Inc. > contrib/pageinspect/gistfuncs.c: * Copyright (c) 2014-2020, PostgreSQL Global Development Group > src/backend/rewrite/rewriteSearchCycle.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group > src/backend/utils/adt/jsonbsubs.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group > src/bin/pg_archivecleanup/po/de.po:# Copyright (C) 2019-2020 PostgreSQL Global Development Group > src/bin/pg_rewind/po/de.po:# Copyright (C) 2015-2020 PostgreSQL Global Development Group > src/bin/pg_rewind/po/de.po:# Peter Eisentraut <peter@eisentraut.org>, 2015-2020. > src/common/hex.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group > src/common/sha1.c: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group > src/common/sha1_int.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group > src/include/common/hex.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group > src/include/common/sha1.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group > src/include/port/pg_iovec.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group > src/include/rewrite/rewriteSearchCycle.h: * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group > src/interfaces/ecpg/preproc/po/de.po:# Copyright (C) 2009-2020 PostgreSQL Global Development Group > src/interfaces/ecpg/preproc/po/de.po:# Peter Eisentraut <peter@eisentraut.org>, 2009-2020. > > Is that an oversight in ca3b37487be333a1d241dab1bbdd17a211a88f43, at least for > non .po files? No, I don't think so. We don't change the Free Software Foundation copyrights, and the .po files get loaded from another repository occasionally. The hex/sha copyrights came from patches developed in 2020 but committed in 2021. These will mostly be corrected in 2022. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Alvaro Herrera
Date:
On 2021-Mar-22, Bruce Momjian wrote: > --- a/doc/src/sgml/ref/explain.sgml > +++ b/doc/src/sgml/ref/explain.sgml > @@ -136,8 +136,10 @@ ROLLBACK; > the output column list for each node in the plan tree, schema-qualify > table and function names, always label variables in expressions with > their range table alias, and always print the name of each trigger for > - which statistics are displayed. This parameter defaults to > - <literal>FALSE</literal>. > + which statistics are displayed. The query identifier will also be > + displayed if one has been compute, see <xref > + linkend="guc-compute-query-id"/> for more details. This parameter > + defaults to <literal>FALSE</literal>. Typo here, "has been computed". Is the intention to commit each of these patches separately? -- Álvaro Herrera Valdivia, Chile
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Tue, Mar 23, 2021 at 12:12:03PM -0300, Álvaro Herrera wrote: > On 2021-Mar-22, Bruce Momjian wrote: > > > --- a/doc/src/sgml/ref/explain.sgml > > +++ b/doc/src/sgml/ref/explain.sgml > > @@ -136,8 +136,10 @@ ROLLBACK; > > the output column list for each node in the plan tree, schema-qualify > > table and function names, always label variables in expressions with > > their range table alias, and always print the name of each trigger for > > - which statistics are displayed. This parameter defaults to > > - <literal>FALSE</literal>. > > + which statistics are displayed. The query identifier will also be > > + displayed if one has been compute, see <xref > > + linkend="guc-compute-query-id"/> for more details. This parameter > > + defaults to <literal>FALSE</literal>. > > Typo here, "has been computed". Good catch, fixed. > Is the intention to commit each of these patches separately? No, I was thinking of just doing a single commit. Should I do three commits? I posted it as three patches since that is how it was posted by the author, and reviewing is easier. It also will need a catversion bump. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Tue, Mar 23, 2021 at 10:34:38AM -0400, Bruce Momjian wrote: > On Tue, Mar 23, 2021 at 02:36:27PM +0800, Julien Rouhaud wrote: > > > > Is that an oversight in ca3b37487be333a1d241dab1bbdd17a211a88f43, at least for > > non .po files? > > No, I don't think so. We don't change the Free Software Foundation > copyrights, and the .po files get loaded from another repository > occasionally. The hex/sha copyrights came from patches developed in > 2020 but committed in 2021. These will mostly be corrected in 2022. Ok, thanks for the clarification!
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Tue, Mar 23, 2021 at 12:27:10PM -0400, Bruce Momjian wrote: > > No, I was thinking of just doing a single commit. Should I do three > commits? I posted it as three patches since that is how it was posted > by the author, and reviewing is easier. It also will need a catversion > bump. Yes, I originally split the commit because it was easier to write this way and it seemed better to send different patches too to ease review. I think that it would make sense to commit the first patch separately, but I'm fine with a single commit if you prefer.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Alvaro Herrera
Date:
On 2021-Mar-22, Bruce Momjian wrote: > diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat > index e259531f60..9550de0798 100644 > --- a/src/include/catalog/pg_proc.dat > +++ b/src/include/catalog/pg_proc.dat > @@ -5249,9 +5249,9 @@ > proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f', > proretset => 't', provolatile => 's', proparallel => 'r', > prorettype => 'record', proargtypes => 'int4', > - proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4}', > - proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}', > - proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}', > + proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}', > + proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}', > + proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,queryid}', BTW why do you put the queryid column at the end of the column list here? It seems awkward. Can we put it perhaps between state and query? > -const char *clean_querytext(const char *query, int *location, int *len); > +const char *CleanQuerytext(const char *query, int *location, int *len); > JumbleState *JumbleQuery(Query *query, const char *querytext); I think pushing in more than one commit is a reasonable approach if they are well-contained, but if you do that it'd be better to avoid introducing a function with one name and renaming it in your next commit. -- Álvaro Herrera Valdivia, Chile "Just treat us the way you want to be treated + some extra allowance for ignorance." (Michael Brusser)
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Wed, Mar 24, 2021 at 05:12:35AM -0300, Alvaro Herrera wrote: > On 2021-Mar-22, Bruce Momjian wrote: > > > diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat > > index e259531f60..9550de0798 100644 > > --- a/src/include/catalog/pg_proc.dat > > +++ b/src/include/catalog/pg_proc.dat > > @@ -5249,9 +5249,9 @@ > > proname => 'pg_stat_get_activity', prorows => '100', proisstrict => 'f', > > proretset => 't', provolatile => 's', proparallel => 'r', > > prorettype => 'record', proargtypes => 'int4', > > - proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4}', > > - proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}', > > - proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid}', > > + proallargtypes => '{int4,oid,int4,oid,text,text,text,text,text,timestamptz,timestamptz,timestamptz,timestamptz,inet,text,int4,xid,xid,text,bool,text,text,int4,text,numeric,text,bool,text,bool,int4,int8}', > > + proargmodes => '{i,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o,o}', > > + proargnames => '{pid,datid,pid,usesysid,application_name,state,query,wait_event_type,wait_event,xact_start,query_start,backend_start,state_change,client_addr,client_hostname,client_port,backend_xid,backend_xmin,backend_type,ssl,sslversion,sslcipher,sslbits,ssl_client_dn,ssl_client_serial,ssl_issuer_dn,gss_auth,gss_princ,gss_enc,leader_pid,queryid}', > > BTW why do you put the queryid column at the end of the column list > here? It seems awkward. Can we put it perhaps between state and query? I thought that it would be better to have it at the end as it can always be NULL (and will be by default), which I guess was also the reason to have leader_pid there. I'm all in favor to have queryid near the query, and while at it leader_pid near the pid. > > -const char *clean_querytext(const char *query, int *location, int *len); > > +const char *CleanQuerytext(const char *query, int *location, int *len); > > JumbleState *JumbleQuery(Query *query, const char *querytext); > > I think pushing in more than one commit is a reasonable approach if they > are well-contained They should, as I incrementally built on top of the first one. I also just double checked the patchset and each new commit compiles and passes the regression tests. > but if you do that it'd be better to avoid > introducing a function with one name and renaming it in your next > commit. Oops, I apparently messed a fixup when working on it. Bruce, should I take care of that of do you want to? I think you have some local modifications already I'd rather not miss some changes.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Wed, Mar 24, 2021 at 04:51:40PM +0800, Julien Rouhaud wrote: > > but if you do that it'd be better to avoid > > introducing a function with one name and renaming it in your next > > commit. > > Oops, I apparently messed a fixup when working on it. Bruce, should I take > care of that of do you want to? I think you have some local modifications > already I'd rather not miss some changes. I have no local modifications. Please modify the patch I posted and repost your version, thanks. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Wed, Mar 24, 2021 at 08:13:40AM -0400, Bruce Momjian wrote: > On Wed, Mar 24, 2021 at 04:51:40PM +0800, Julien Rouhaud wrote: > > > but if you do that it'd be better to avoid > > > introducing a function with one name and renaming it in your next > > > commit. > > > > Oops, I apparently messed a fixup when working on it. Bruce, should I take > > care of that of do you want to? I think you have some local modifications > > already I'd rather not miss some changes. > > I have no local modifications. Please modify the patch I posted and > repost your version, thanks. Ok! I used the last version of the patch you sent and addressed the following comments from earlier messages in attached v20: - copyright year to 2021 - s/has has been compute/has been compute/ - use the name CleanQuerytext in the first commit I didn't change the position of queryid in pg_stat_get_activity(), as the "real" order is actually define in system_views.sql when creating pg_stat_activity view. Adding the new fields at the end of pg_stat_get_activity() helps to keep the C code simpler and less bug prone, so I think it's best to continue this way. I also used the previous commit message if that helps.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Alvaro Herrera
Date:
On 2021-Mar-24, Julien Rouhaud wrote: > From e08c9d5fc86ba722844d97000798de868890aba3 Mon Sep 17 00:00:00 2001 > From: Bruce Momjian <bruce@momjian.us> > Date: Mon, 22 Mar 2021 17:43:23 -0400 > Subject: [PATCH v20 2/3] Expose queryid in pg_stat_activity and > src/backend/executor/execMain.c | 9 ++ > src/backend/executor/execParallel.c | 14 ++- > src/backend/executor/nodeGather.c | 3 +- > src/backend/executor/nodeGatherMerge.c | 4 +- Hmm... I find it odd that there's executor code that acquires the current query ID from pgstat, after having been put there by planner or ExecutorStart itself. Seems like a modularity violation. I wonder if it would make more sense to have the value maybe in struct EState (or perhaps there's a better place -- but I don't think they have a way to reach the QueryDesc anyhow), put there by ExecutorStart, so that places such as execParallel, nodeGather etc don't have to fetch it from pgstat but from EState. -- Álvaro Herrera Valdivia, Chile "We're here to devour each other alive" (Hobbes)
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Wed, Mar 24, 2021 at 01:02:00PM -0300, Alvaro Herrera wrote: > On 2021-Mar-24, Julien Rouhaud wrote: > > > From e08c9d5fc86ba722844d97000798de868890aba3 Mon Sep 17 00:00:00 2001 > > From: Bruce Momjian <bruce@momjian.us> > > Date: Mon, 22 Mar 2021 17:43:23 -0400 > > Subject: [PATCH v20 2/3] Expose queryid in pg_stat_activity and > > > src/backend/executor/execMain.c | 9 ++ > > src/backend/executor/execParallel.c | 14 ++- > > src/backend/executor/nodeGather.c | 3 +- > > src/backend/executor/nodeGatherMerge.c | 4 +- > > Hmm... > > I find it odd that there's executor code that acquires the current query > ID from pgstat, after having been put there by planner or ExecutorStart > itself. Seems like a modularity violation. I wonder if it would make > more sense to have the value maybe in struct EState (or perhaps there's > a better place -- but I don't think they have a way to reach the > QueryDesc anyhow), put there by ExecutorStart, so that places such as > execParallel, nodeGather etc don't have to fetch it from pgstat but from > EState. The current queryid is already available in the Estate, as the underlying PlannedStmt contains it. The problem is that we want to display the top level queryid, not the current query one, and the top level queryid is held in pgstat.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Wed, Mar 24, 2021 at 11:20:49PM +0800, Julien Rouhaud wrote: > On Wed, Mar 24, 2021 at 08:13:40AM -0400, Bruce Momjian wrote: > > I have no local modifications. Please modify the patch I posted and > > repost your version, thanks. > > Ok! I used the last version of the patch you sent and addressed the following > comments from earlier messages in attached v20: > > - copyright year to 2021 > - s/has has been compute/has been compute/ > - use the name CleanQuerytext in the first commit My apologies --- yes, I made those two changes after I posted my version of the patch. I should have reposted my version with those changes. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Thu, Mar 25, 2021 at 10:36:38AM +0800, Julien Rouhaud wrote: > On Wed, Mar 24, 2021 at 01:02:00PM -0300, Alvaro Herrera wrote: > > On 2021-Mar-24, Julien Rouhaud wrote: > > > > > From e08c9d5fc86ba722844d97000798de868890aba3 Mon Sep 17 00:00:00 2001 > > > From: Bruce Momjian <bruce@momjian.us> > > > Date: Mon, 22 Mar 2021 17:43:23 -0400 > > > Subject: [PATCH v20 2/3] Expose queryid in pg_stat_activity and > > > > > src/backend/executor/execMain.c | 9 ++ > > > src/backend/executor/execParallel.c | 14 ++- > > > src/backend/executor/nodeGather.c | 3 +- > > > src/backend/executor/nodeGatherMerge.c | 4 +- > > > > Hmm... > > > > I find it odd that there's executor code that acquires the current query > > ID from pgstat, after having been put there by planner or ExecutorStart > > itself. Seems like a modularity violation. I wonder if it would make > > more sense to have the value maybe in struct EState (or perhaps there's > > a better place -- but I don't think they have a way to reach the > > QueryDesc anyhow), put there by ExecutorStart, so that places such as > > execParallel, nodeGather etc don't have to fetch it from pgstat but from > > EState. > > The current queryid is already available in the Estate, as the underlying > PlannedStmt contains it. The problem is that we want to display the top level > queryid, not the current query one, and the top level queryid is held in > pgstat. So is the current approach ok? If not I'm afraid that detecting and caching the top level queryid in the executor parts would lead to some code duplication.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Wed, Mar 31, 2021 at 11:25:32AM +0800, Julien Rouhaud wrote: > On Thu, Mar 25, 2021 at 10:36:38AM +0800, Julien Rouhaud wrote: > > On Wed, Mar 24, 2021 at 01:02:00PM -0300, Alvaro Herrera wrote: > > > On 2021-Mar-24, Julien Rouhaud wrote: > > > > > > > From e08c9d5fc86ba722844d97000798de868890aba3 Mon Sep 17 00:00:00 2001 > > > > From: Bruce Momjian <bruce@momjian.us> > > > > Date: Mon, 22 Mar 2021 17:43:23 -0400 > > > > Subject: [PATCH v20 2/3] Expose queryid in pg_stat_activity and > > > > > > > src/backend/executor/execMain.c | 9 ++ > > > > src/backend/executor/execParallel.c | 14 ++- > > > > src/backend/executor/nodeGather.c | 3 +- > > > > src/backend/executor/nodeGatherMerge.c | 4 +- > > > > > > Hmm... > > > > > > I find it odd that there's executor code that acquires the current query > > > ID from pgstat, after having been put there by planner or ExecutorStart > > > itself. Seems like a modularity violation. I wonder if it would make > > > more sense to have the value maybe in struct EState (or perhaps there's > > > a better place -- but I don't think they have a way to reach the > > > QueryDesc anyhow), put there by ExecutorStart, so that places such as > > > execParallel, nodeGather etc don't have to fetch it from pgstat but from > > > EState. > > > > The current queryid is already available in the Estate, as the underlying > > PlannedStmt contains it. The problem is that we want to display the top level > > queryid, not the current query one, and the top level queryid is held in > > pgstat. > > So is the current approach ok? If not I'm afraid that detecting and caching > the top level queryid in the executor parts would lead to some code > duplication. I assume it is since Alvaro didn't reply. I am planning to apply this soon. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Alvaro Herrera
Date:
On 2021-Mar-31, Bruce Momjian wrote: > On Wed, Mar 31, 2021 at 11:25:32AM +0800, Julien Rouhaud wrote: > > On Thu, Mar 25, 2021 at 10:36:38AM +0800, Julien Rouhaud wrote: > > > On Wed, Mar 24, 2021 at 01:02:00PM -0300, Alvaro Herrera wrote: > > > > I find it odd that there's executor code that acquires the current query > > > > ID from pgstat, after having been put there by planner or ExecutorStart > > > > itself. Seems like a modularity violation. I wonder if it would make > > > > more sense to have the value maybe in struct EState (or perhaps there's > > > > a better place -- but I don't think they have a way to reach the > > > > QueryDesc anyhow), put there by ExecutorStart, so that places such as > > > > execParallel, nodeGather etc don't have to fetch it from pgstat but from > > > > EState. > > > > > > The current queryid is already available in the Estate, as the underlying > > > PlannedStmt contains it. The problem is that we want to display the top level > > > queryid, not the current query one, and the top level queryid is held in > > > pgstat. > > > > So is the current approach ok? If not I'm afraid that detecting and caching > > the top level queryid in the executor parts would lead to some code > > duplication. > > I assume it is since Alvaro didn't reply. I am planning to apply this > soon. I'm afraid I don't know enough about how parallel query works to make a good assessment on this being a good approach or not -- and no time at present to figure it all out. -- Álvaro Herrera 39°49'30"S 73°17'W "I think my standards have lowered enough that now I think 'good design' is when the page doesn't irritate the living f*ck out of me." (JWZ)
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Wed, Mar 31, 2021 at 11:18:45AM -0300, Alvaro Herrera wrote: > On 2021-Mar-31, Bruce Momjian wrote: > > > > I assume it is since Alvaro didn't reply. I am planning to apply this > > soon. > > I'm afraid I don't know enough about how parallel query works to make a > good assessment on this being a good approach or not -- and no time at > present to figure it all out. I'm far from being an expert either, but at the time I wrote it and looking at the code around it probably seemed sensible. We could directly call pgstat_get_my_queryid() in ExecSerializePlan() rather than passing it from the various callers though, at least there would be a single source for it.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Thu, Apr 01, 2021 at 11:05:24PM +0800, Julien Rouhaud wrote: > On Wed, Mar 31, 2021 at 11:18:45AM -0300, Alvaro Herrera wrote: > > On 2021-Mar-31, Bruce Momjian wrote: > > > > > > I assume it is since Alvaro didn't reply. I am planning to apply this > > > soon. > > > > I'm afraid I don't know enough about how parallel query works to make a > > good assessment on this being a good approach or not -- and no time at > > present to figure it all out. > > I'm far from being an expert either, but at the time I wrote it and > looking at the code around it probably seemed sensible. We could directly call > pgstat_get_my_queryid() in ExecSerializePlan() rather than passing it from the > various callers though, at least there would be a single source for it. Here's a v21 that includes the mentioned change.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Thu, Apr 1, 2021 at 11:30:15PM +0800, Julien Rouhaud wrote: > On Thu, Apr 01, 2021 at 11:05:24PM +0800, Julien Rouhaud wrote: > > On Wed, Mar 31, 2021 at 11:18:45AM -0300, Alvaro Herrera wrote: > > > On 2021-Mar-31, Bruce Momjian wrote: > > > > > > > > I assume it is since Alvaro didn't reply. I am planning to apply this > > > > soon. > > > > > > I'm afraid I don't know enough about how parallel query works to make a > > > good assessment on this being a good approach or not -- and no time at > > > present to figure it all out. > > > > I'm far from being an expert either, but at the time I wrote it and > > looking at the code around it probably seemed sensible. We could directly call > > pgstat_get_my_queryid() in ExecSerializePlan() rather than passing it from the > > various callers though, at least there would be a single source for it. > > Here's a v21 that includes the mentioned change. You are using: /* ---------- * pgstat_get_my_queryid() - * * Return current backend's query identifier. */ uint64 pgstat_get_my_queryid(void) { if (!MyBEEntry) return 0; return MyBEEntry->st_queryid; } Looking at log_statement: /* Log immediately if dictated by log_statement */ if (check_log_statement(parsetree_list)) { ereport(LOG, (errmsg("statement: %s", query_string), errhidestmt(true), errdetail_execute(parsetree_list))); was_logged = true; } it uses the global variable query_string. I wonder if the query hash should be a global variable too --- this would more clearly match how we handle top-level info like query_string. Digging into the stats system to get top-level info does seem odd. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Thu, Apr 1, 2021 at 01:56:42PM -0400, Bruce Momjian wrote: > You are using: > > /* ---------- > * pgstat_get_my_queryid() - > * > * Return current backend's query identifier. > */ > uint64 > pgstat_get_my_queryid(void) > { > if (!MyBEEntry) > return 0; > > return MyBEEntry->st_queryid; > } > > Looking at log_statement: > > /* Log immediately if dictated by log_statement */ > if (check_log_statement(parsetree_list)) > { > ereport(LOG, > (errmsg("statement: %s", query_string), > errhidestmt(true), > errdetail_execute(parsetree_list))); > was_logged = true; > } > > it uses the global variable query_string. I wonder if the query hash > should be a global variable too --- this would more clearly match how we > handle top-level info like query_string. Digging into the stats system > to get top-level info does seem odd. Also, if you go in that direction, make sure the hash it set in the same places the query string is set, though I am unclear how extensions would handle that. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Thu, Apr 01, 2021 at 01:59:15PM -0400, Bruce Momjian wrote: > On Thu, Apr 1, 2021 at 01:56:42PM -0400, Bruce Momjian wrote: > > You are using: > > > > /* ---------- > > * pgstat_get_my_queryid() - > > * > > * Return current backend's query identifier. > > */ > > uint64 > > pgstat_get_my_queryid(void) > > { > > if (!MyBEEntry) > > return 0; > > > > return MyBEEntry->st_queryid; > > } > > > > Looking at log_statement: > > > > /* Log immediately if dictated by log_statement */ > > if (check_log_statement(parsetree_list)) > > { > > ereport(LOG, > > (errmsg("statement: %s", query_string), > > errhidestmt(true), > > errdetail_execute(parsetree_list))); > > was_logged = true; > > } > > > > it uses the global variable query_string. Unless I'm missing something query_string isn't a global variable, it's a parameter passed to exec_simple_query() from postgresMain(). It's then passed to the stats collector to be able to be displayed in pg_stat_activity through pgstat_report_activity() a bit like what I do for the queryid. There's a global variable debug_query_string, but it's only for debugging purpose. > > I wonder if the query hash > > should be a global variable too --- this would more clearly match how we > > handle top-level info like query_string. Digging into the stats system > > to get top-level info does seem odd. The main difference is that there's a single top level query_string, even if it contains multiple statements. But there would be multiple queryid calculated in that case and we don't want to change it during a top level multi-statements execution, so we can't use the same approach. Also, the query_string is directly logged from this code path, while the queryid is logged as a log_line_prefix, and almost all the code there also retrieve information from some shared structure. And since it also has to be available in pg_stat_activity, having a single source of truth looked like a better approach. > Also, if you go in that direction, make sure the hash it set in the same > places the query string is set, though I am unclear how extensions would > handle that. It should be transparent for application, it's extracting the first queryid seen for each top level statement and export it. The rest of the code still continue to see the queryid that corresponds to the really executed single statement.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Fri, Apr 2, 2021 at 02:28:02AM +0800, Julien Rouhaud wrote: > Unless I'm missing something query_string isn't a global variable, it's a > parameter passed to exec_simple_query() from postgresMain(). > > It's then passed to the stats collector to be able to be displayed in > pg_stat_activity through pgstat_report_activity() a bit like what I do for the > queryid. > > There's a global variable debug_query_string, but it's only for debugging > purpose. > > > > I wonder if the query hash > > > should be a global variable too --- this would more clearly match how we > > > handle top-level info like query_string. Digging into the stats system > > > to get top-level info does seem odd. > > The main difference is that there's a single top level query_string, > even if it contains multiple statements. But there would be multiple queryid > calculated in that case and we don't want to change it during a top level > multi-statements execution, so we can't use the same approach. > > Also, the query_string is directly logged from this code path, while the > queryid is logged as a log_line_prefix, and almost all the code there also > retrieve information from some shared structure. > > And since it also has to be available in pg_stat_activity, having a single > source of truth looked like a better approach. > > > Also, if you go in that direction, make sure the hash it set in the same > > places the query string is set, though I am unclear how extensions would > > handle that. > > It should be transparent for application, it's extracting the first queryid > seen for each top level statement and export it. The rest of the code still > continue to see the queryid that corresponds to the really executed single > statement. OK, I am happy with your design decisions, thanks. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Thu, Apr 01, 2021 at 03:27:11PM -0400, Bruce Momjian wrote: > > OK, I am happy with your design decisions, thanks. Thanks! While double checking I noticed that I failed to remove a (now) useless include of pgstat.h in nodeGatherMerge.c in last version. I'm attaching v22 to fix that, no other change.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Fri, Apr 02, 2021 at 01:33:28PM +0800, Julien Rouhaud wrote: > On Thu, Apr 01, 2021 at 03:27:11PM -0400, Bruce Momjian wrote: > > > > OK, I am happy with your design decisions, thanks. > > Thanks! While double checking I noticed that I failed to remove a (now) > useless include of pgstat.h in nodeGatherMerge.c in last version. I'm > attaching v22 to fix that, no other change. There was a conflict since e1025044c (Split backend status and progress related functionality out of pgstat.c). Attached v23 is a rebase against current HEAD, and I also added a few UINT64CONST() macro usage for consistency.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Sun, Apr 4, 2021 at 10:18:50PM +0800, Julien Rouhaud wrote: > On Fri, Apr 02, 2021 at 01:33:28PM +0800, Julien Rouhaud wrote: > > On Thu, Apr 01, 2021 at 03:27:11PM -0400, Bruce Momjian wrote: > > > > > > OK, I am happy with your design decisions, thanks. > > > > Thanks! While double checking I noticed that I failed to remove a (now) > > useless include of pgstat.h in nodeGatherMerge.c in last version. I'm > > attaching v22 to fix that, no other change. > > There was a conflict since e1025044c (Split backend status and progress related > functionality out of pgstat.c). > > Attached v23 is a rebase against current HEAD, and I also added a few > UINT64CONST() macro usage for consistency. Thanks. I struggled with merging the statistics collection changes into my cluster file encryption branches because my patch made changes to code that moved to another C file. I plan to apply this tomorrow. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Nitin Jadhav
Date:
I have reviewed the code. Here are a few minor comments.
1.
+void
+pgstat_report_queryid(uint64 queryId, bool force)
+{
+ volatile PgBackendStatus *beentry = MyBEEntry;
+
+ if (!beentry)
+ return;
+
+ /*
+ * if track_activities is disabled, st_queryid should already have been
+ * reset
+ */
+ if (!pgstat_track_activities)
+ return;
The above two conditions can be clubbed together in a single condition.
2.
+/* ----------
+ * pgstat_get_my_queryid() -
+ *
+ * Return current backend's query identifier.
+ */
+uint64
+pgstat_get_my_queryid(void)
+{
+ if (!MyBEEntry)
+ return 0;
+
+ return MyBEEntry->st_queryid;
+}
Is it safe to directly read the data from MyBEEntry without calling pgstat_begin_read_activity() and pgstat_end_read_activity(). Kindly ref pgstat_get_backend_current_activity() for more information. Kindly let me know if I am wrong.
Thanks and Regards,
Nitin Jadhav
On Mon, Apr 5, 2021 at 10:46 PM Bruce Momjian <bruce@momjian.us> wrote:
On Sun, Apr 4, 2021 at 10:18:50PM +0800, Julien Rouhaud wrote:
> On Fri, Apr 02, 2021 at 01:33:28PM +0800, Julien Rouhaud wrote:
> > On Thu, Apr 01, 2021 at 03:27:11PM -0400, Bruce Momjian wrote:
> > >
> > > OK, I am happy with your design decisions, thanks.
> >
> > Thanks! While double checking I noticed that I failed to remove a (now)
> > useless include of pgstat.h in nodeGatherMerge.c in last version. I'm
> > attaching v22 to fix that, no other change.
>
> There was a conflict since e1025044c (Split backend status and progress related
> functionality out of pgstat.c).
>
> Attached v23 is a rebase against current HEAD, and I also added a few
> UINT64CONST() macro usage for consistency.
Thanks. I struggled with merging the statistics collection changes into
my cluster file encryption branches because my patch made changes to
code that moved to another C file.
I plan to apply this tomorrow.
--
Bruce Momjian <bruce@momjian.us> https://momjian.us
EDB https://enterprisedb.com
If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Tue, Apr 06, 2021 at 08:05:19PM +0530, Nitin Jadhav wrote: > > 1. > +void > +pgstat_report_queryid(uint64 queryId, bool force) > +{ > + volatile PgBackendStatus *beentry = MyBEEntry; > + > + if (!beentry) > + return; > + > + /* > + * if track_activities is disabled, st_queryid should already have been > + * reset > + */ > + if (!pgstat_track_activities) > + return; > > The above two conditions can be clubbed together in a single condition. Right, I just kept it separate as the comment is only relevant for the 2nd test. I'm fine with merging both if needed. > 2. > +/* ---------- > + * pgstat_get_my_queryid() - > + * > + * Return current backend's query identifier. > + */ > +uint64 > +pgstat_get_my_queryid(void) > +{ > + if (!MyBEEntry) > + return 0; > + > + return MyBEEntry->st_queryid; > +} > > Is it safe to directly read the data from MyBEEntry without > calling pgstat_begin_read_activity() and pgstat_end_read_activity(). Kindly > ref pgstat_get_backend_current_activity() for more information. Kindly let > me know if I am wrong. This field is only written by a backend for its own entry. pg_stat_get_activity already has required protection, so the rest of the calls to read that field shouldn't have any risk of reading torn values on platform where this isn't an atomic operation due to concurrent write, as it will be from the same backend that originally wrote it. It avoids some overhead to retrieve the queryid, but if people think it's worth having the loop (or a comment explaining why there's no loop) I'm also fine with it.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Alvaro Herrera
Date:
On 2021-Apr-06, Nitin Jadhav wrote: > I have reviewed the code. Here are a few minor comments. > > 1. > +void > +pgstat_report_queryid(uint64 queryId, bool force) > +{ > + volatile PgBackendStatus *beentry = MyBEEntry; > + > + if (!beentry) > + return; > + > + /* > + * if track_activities is disabled, st_queryid should already have been > + * reset > + */ > + if (!pgstat_track_activities) > + return; > > The above two conditions can be clubbed together in a single condition. I wonder if it wouldn't make more sense to put the assignment *after* we have checked the second condition. -- Álvaro Herrera Valdivia, Chile
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Tue, Apr 06, 2021 at 11:41:52AM -0400, Alvaro Herrera wrote: > On 2021-Apr-06, Nitin Jadhav wrote: > > > I have reviewed the code. Here are a few minor comments. > > > > 1. > > +void > > +pgstat_report_queryid(uint64 queryId, bool force) > > +{ > > + volatile PgBackendStatus *beentry = MyBEEntry; > > + > > + if (!beentry) > > + return; > > + > > + /* > > + * if track_activities is disabled, st_queryid should already have been > > + * reset > > + */ > > + if (!pgstat_track_activities) > > + return; > > > > The above two conditions can be clubbed together in a single condition. > > I wonder if it wouldn't make more sense to put the assignment *after* we > have checked the second condition. All other pgstat_report_* functions do the assignment before doing any test on beentry and/or pgstat_track_activities, I think we should keep this code consistent.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Nitin Jadhav
Date:
>
> 1.
> +void
> +pgstat_report_queryid(uint64 queryId, bool force)
> +{
> + volatile PgBackendStatus *beentry = MyBEEntry;
> +
> + if (!beentry)
> + return;
> +
> + /*
> + * if track_activities is disabled, st_queryid should already have been
> + * reset
> + */
> + if (!pgstat_track_activities)
> + return;
>
> The above two conditions can be clubbed together in a single condition.
Right, I just kept it separate as the comment is only relevant for the 2nd
test. I'm fine with merging both if needed.
I feel we should merge both of the conditions as it is done in pgstat_report_xact_timestamp(). Probably we can write a common comment to explain both the conditions.
> 2.
> +/* ----------
> + * pgstat_get_my_queryid() -
> + *
> + * Return current backend's query identifier.
> + */
> +uint64
> +pgstat_get_my_queryid(void)
> +{
> + if (!MyBEEntry)
> + return 0;
> +
> + return MyBEEntry->st_queryid;
> +}
>
> Is it safe to directly read the data from MyBEEntry without
> calling pgstat_begin_read_activity() and pgstat_end_read_activity(). Kindly
> ref pgstat_get_backend_current_activity() for more information. Kindly let
> me know if I am wrong.
This field is only written by a backend for its own entry.
pg_stat_get_activity already has required protection, so the rest of the calls
to read that field shouldn't have any risk of reading torn values on platform
where this isn't an atomic operation due to concurrent write, as it will be
from the same backend that originally wrote it. It avoids some overhead to
retrieve the queryid, but if people think it's worth having the loop (or a
comment explaining why there's no loop) I'm also fine with it.
Thanks for the explanation. Please add a comment explaining why there is no loop.
Thanks and Regards,
Nitin Jadhav
On Tue, Apr 6, 2021 at 8:40 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Tue, Apr 06, 2021 at 08:05:19PM +0530, Nitin Jadhav wrote:
>
> 1.
> +void
> +pgstat_report_queryid(uint64 queryId, bool force)
> +{
> + volatile PgBackendStatus *beentry = MyBEEntry;
> +
> + if (!beentry)
> + return;
> +
> + /*
> + * if track_activities is disabled, st_queryid should already have been
> + * reset
> + */
> + if (!pgstat_track_activities)
> + return;
>
> The above two conditions can be clubbed together in a single condition.
Right, I just kept it separate as the comment is only relevant for the 2nd
test. I'm fine with merging both if needed.
> 2.
> +/* ----------
> + * pgstat_get_my_queryid() -
> + *
> + * Return current backend's query identifier.
> + */
> +uint64
> +pgstat_get_my_queryid(void)
> +{
> + if (!MyBEEntry)
> + return 0;
> +
> + return MyBEEntry->st_queryid;
> +}
>
> Is it safe to directly read the data from MyBEEntry without
> calling pgstat_begin_read_activity() and pgstat_end_read_activity(). Kindly
> ref pgstat_get_backend_current_activity() for more information. Kindly let
> me know if I am wrong.
This field is only written by a backend for its own entry.
pg_stat_get_activity already has required protection, so the rest of the calls
to read that field shouldn't have any risk of reading torn values on platform
where this isn't an atomic operation due to concurrent write, as it will be
from the same backend that originally wrote it. It avoids some overhead to
retrieve the queryid, but if people think it's worth having the loop (or a
comment explaining why there's no loop) I'm also fine with it.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Nitin Jadhav
Date:
On Tue, Apr 06, 2021 at 11:41:52AM -0400, Alvaro Herrera wrote:
> On 2021-Apr-06, Nitin Jadhav wrote:
>
> > I have reviewed the code. Here are a few minor comments.
> >
> > 1.
> > +void
> > +pgstat_report_queryid(uint64 queryId, bool force)
> > +{
> > + volatile PgBackendStatus *beentry = MyBEEntry;
> > +
> > + if (!beentry)
> > + return;
> > +
> > + /*
> > + * if track_activities is disabled, st_queryid should already have been
> > + * reset
> > + */
> > + if (!pgstat_track_activities)
> > + return;
> >
> > The above two conditions can be clubbed together in a single condition.
>
> I wonder if it wouldn't make more sense to put the assignment *after* we
> have checked the second condition.
All other pgstat_report_* functions do the assignment before doing any test on
beentry and/or pgstat_track_activities, I think we should keep this code
consistent.
I agree about this.
Thanks and Regards,
Nitin Jadhav
On Tue, Apr 6, 2021 at 9:18 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Tue, Apr 06, 2021 at 11:41:52AM -0400, Alvaro Herrera wrote:
> On 2021-Apr-06, Nitin Jadhav wrote:
>
> > I have reviewed the code. Here are a few minor comments.
> >
> > 1.
> > +void
> > +pgstat_report_queryid(uint64 queryId, bool force)
> > +{
> > + volatile PgBackendStatus *beentry = MyBEEntry;
> > +
> > + if (!beentry)
> > + return;
> > +
> > + /*
> > + * if track_activities is disabled, st_queryid should already have been
> > + * reset
> > + */
> > + if (!pgstat_track_activities)
> > + return;
> >
> > The above two conditions can be clubbed together in a single condition.
>
> I wonder if it wouldn't make more sense to put the assignment *after* we
> have checked the second condition.
All other pgstat_report_* functions do the assignment before doing any test on
beentry and/or pgstat_track_activities, I think we should keep this code
consistent.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Wed, Apr 07, 2021 at 06:15:27PM +0530, Nitin Jadhav wrote: > > I feel we should merge both of the conditions as it is done in > pgstat_report_xact_timestamp(). Probably we can write a common comment to > explain both the conditions. > > [...] > > Thanks for the explanation. Please add a comment explaining why there is no > loop. PFA v24.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Wed, Apr 7, 2021 at 08:57:26PM +0800, Julien Rouhaud wrote: > On Wed, Apr 07, 2021 at 06:15:27PM +0530, Nitin Jadhav wrote: > > > > I feel we should merge both of the conditions as it is done in > > pgstat_report_xact_timestamp(). Probably we can write a common comment to > > explain both the conditions. > > > > [...] > > > > Thanks for the explanation. Please add a comment explaining why there is no > > loop. > > PFA v24. Patch applied. I am ready to adjust this with any improvements people might have. Thank you for all the good feedback we got on this, and I know many users have waited a long time for this feature. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes: > Patch applied. I am ready to adjust this with any improvements people > might have. Thank you for all the good feedback we got on this, and I > know many users have waited a long time for this feature. For starters, you could try to make the buildfarm green again. regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
aOn Wed, Apr 7, 2021 at 04:15:50PM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Patch applied. I am ready to adjust this with any improvements people > > might have. Thank you for all the good feedback we got on this, and I > > know many users have waited a long time for this feature. > > For starters, you could try to make the buildfarm green again. Wow, that's odd. The cfbot was green, so I never even looked at the buildfarm. I will look at that now, and the CVS log issue. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Wed, Apr 07, 2021 at 04:22:55PM -0400, Bruce Momjian wrote: > aOn Wed, Apr 7, 2021 at 04:15:50PM -0400, Tom Lane wrote: > > Bruce Momjian <bruce@momjian.us> writes: > > > Patch applied. I am ready to adjust this with any improvements people > > > might have. Thank you for all the good feedback we got on this, and I > > > know many users have waited a long time for this feature. > > > > For starters, you could try to make the buildfarm green again. > > Wow, that's odd. The cfbot was green, so I never even looked at the > buildfarm. I will look at that now, and the CVS log issue. Sorry about that. The issue came from animals with jit_above_cost = 0 outputting more lines than expected. I fixed that by using the same query as before in explain.sql, as they don't generate any JIT output. I also added the queryid to the csvlog output and fixed the documentation that mention how to create a table to access the data.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Thu, Apr 08, 2021 at 05:56:25AM +0800, Julien Rouhaud wrote: > > I also added the queryid to the csvlog output and fixed the documentation that > mention how to create a table to access the data. Note that I chose to output a 0 queryid if none has been computed rather that outputting nothing. Let me know if that's not the wanted behavior.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Thu, Apr 8, 2021 at 05:56:25AM +0800, Julien Rouhaud wrote: > On Wed, Apr 07, 2021 at 04:22:55PM -0400, Bruce Momjian wrote: > > aOn Wed, Apr 7, 2021 at 04:15:50PM -0400, Tom Lane wrote: > > > Bruce Momjian <bruce@momjian.us> writes: > > > > Patch applied. I am ready to adjust this with any improvements people > > > > might have. Thank you for all the good feedback we got on this, and I > > > > know many users have waited a long time for this feature. > > > > > > For starters, you could try to make the buildfarm green again. > > > > Wow, that's odd. The cfbot was green, so I never even looked at the > > buildfarm. I will look at that now, and the CVS log issue. > > Sorry about that. The issue came from animals with jit_above_cost = 0 > outputting more lines than expected. I fixed that by using the same query as > before in explain.sql, as they don't generate any JIT output. Yes, I just came to the same conclusion, that 'SELECT 1' didn't generate the proper output lines to allow explain_filter() to strip out the JIT lines. I have applied your patch for this, which should fix the build farm. (I see my first green report now.) > I also added the queryid to the csvlog output and fixed the documentation that > mention how to create a table to access the data. Uh, I think your patch missed a few things. First, you use "%zd" (size_t) for the printf string, but calls to pgstat_get_my_queryid() in src/backend/utils/error/elog.c used "%ld". Which is correct? I see pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64 fits in a BIGINT SQL column. Also, you missed the SGML paragraph doc change, but you correctly changed the SQL table definition. I am attaching my version of the patch. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes: > Uh, I think your patch missed a few things. First, you use "%zd" > (size_t) for the printf string, but calls to pgstat_get_my_queryid() in > src/backend/utils/error/elog.c used "%ld". Which is correct? I see > pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64 > fits in a BIGINT SQL column. Neither is correct. Project standard these days for printing [u]int64 is to write "%lld" or "%llu", with an explicit (long long) cast on the printf argument. regards, tom lane
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Wed, Apr 7, 2021 at 07:01:25PM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Uh, I think your patch missed a few things. First, you use "%zd" > > (size_t) for the printf string, but calls to pgstat_get_my_queryid() in > > src/backend/utils/error/elog.c used "%ld". Which is correct? I see > > pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64 > > fits in a BIGINT SQL column. > > Neither is correct. Project standard these days for printing [u]int64 > is to write "%lld" or "%llu", with an explicit (long long) cast on > the printf argument. Yep, got it. The attached patch fixes all the calls to use %lld, and adds casts. In implementing cvslog, I noticed that internally we pass the hash as uint64, but output as int64, which I think is a requirement for how pg_stat_statements has output it, and the use of bigint. Is that OK? I am also confused about the inconsistency of calling the GUC compute_query_id (with underscore), but pg_stat_activity.queryid. If we make it pg_stat_activity.query_id, it doesn't match most of the other *id columsns in the table, leader_pid, usesysid, backend_xid. Is that OK?I know I suggested pg_stat_activity.query_id, but maybe I was wrong. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Wed, Apr 07, 2021 at 07:38:35PM -0400, Bruce Momjian wrote: > On Wed, Apr 7, 2021 at 07:01:25PM -0400, Tom Lane wrote: > > Bruce Momjian <bruce@momjian.us> writes: > > > Uh, I think your patch missed a few things. First, you use "%zd" > > > (size_t) for the printf string, but calls to pgstat_get_my_queryid() in > > > src/backend/utils/error/elog.c used "%ld". Which is correct? I see > > > pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64 > > > fits in a BIGINT SQL column. > > > > Neither is correct. Project standard these days for printing [u]int64 > > is to write "%lld" or "%llu", with an explicit (long long) cast on > > the printf argument. > > Yep, got it. The attached patch fixes all the calls to use %lld, and > adds casts. In implementing cvslog, I noticed that internally we pass > the hash as uint64, but output as int64, which I think is a requirement > for how pg_stat_statements has output it, and the use of bigint. Is > that OK? Indeed, this is due to how we expose the value in SQL. The original discussion is at https://www.postgresql.org/message-id/CAH2-WzkueMfAmY3onoXLi+g67SJoKY65Cg9Z1QOhSyhCEU8w3g@mail.gmail.com. As far as I know this is OK, as we want to show consistent values everywhere. > I am also confused about the inconsistency of calling the GUC > compute_query_id (with underscore), but pg_stat_activity.queryid. If we > make it pg_stat_activity.query_id, it doesn't match most of the other > *id columsns in the table, leader_pid, usesysid, backend_xid. Is that > OK?I know I suggested pg_stat_activity.query_id, but maybe I was wrong. Mmm, most of the columns in pg_stat_activity do have a "_", so using query_id would make more sense. @@ -2967,6 +2967,10 @@ write_csvlog(ErrorData *edata) appendStringInfoChar(&buf, '\n'); + /* query id */ + appendStringInfo(&buf, "%lld", (long long) pgstat_get_my_queryid()); + appendStringInfoChar(&buf, ','); + /* If in the syslogger process, try to write messages direct to file */ if (MyBackendType == B_LOGGER) write_syslogger_file(buf.data, buf.len, LOG_DESTINATION_CSVLOG); Unless I'm missing something this will output the query id in the next log line? The new code should be added before the newline is output, and the comma should also be output before the queryid.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Thu, Apr 8, 2021 at 08:47:48AM +0800, Julien Rouhaud wrote: > On Wed, Apr 07, 2021 at 07:38:35PM -0400, Bruce Momjian wrote: > > On Wed, Apr 7, 2021 at 07:01:25PM -0400, Tom Lane wrote: > > > Bruce Momjian <bruce@momjian.us> writes: > > > > Uh, I think your patch missed a few things. First, you use "%zd" > > > > (size_t) for the printf string, but calls to pgstat_get_my_queryid() in > > > > src/backend/utils/error/elog.c used "%ld". Which is correct? I see > > > > pgstat_get_my_queryid() as returning uint64, but I didn't think a uint64 > > > > fits in a BIGINT SQL column. > > > > > > Neither is correct. Project standard these days for printing [u]int64 > > > is to write "%lld" or "%llu", with an explicit (long long) cast on > > > the printf argument. > > > > Yep, got it. The attached patch fixes all the calls to use %lld, and > > adds casts. In implementing cvslog, I noticed that internally we pass > > the hash as uint64, but output as int64, which I think is a requirement > > for how pg_stat_statements has output it, and the use of bigint. Is > > that OK? > > Indeed, this is due to how we expose the value in SQL. The original discussion > is at > https://www.postgresql.org/message-id/CAH2-WzkueMfAmY3onoXLi+g67SJoKY65Cg9Z1QOhSyhCEU8w3g@mail.gmail.com. > As far as I know this is OK, as we want to show consistent values everywhere. OK, yes, I do remember the discussion. I was wondering if there should be a C comment about this anywhere. > > I am also confused about the inconsistency of calling the GUC > > compute_query_id (with underscore), but pg_stat_activity.queryid. If we > > make it pg_stat_activity.query_id, it doesn't match most of the other > > *id columsns in the table, leader_pid, usesysid, backend_xid. Is that > > OK?I know I suggested pg_stat_activity.query_id, but maybe I was wrong. > > Mmm, most of the columns in pg_stat_activity do have a "_", so using query_id > would make more sense. OK, let me work on a patch to change that part. > @@ -2967,6 +2967,10 @@ write_csvlog(ErrorData *edata) > > appendStringInfoChar(&buf, '\n'); > > + /* query id */ > + appendStringInfo(&buf, "%lld", (long long) pgstat_get_my_queryid()); > + appendStringInfoChar(&buf, ','); > + > > /* If in the syslogger process, try to write messages direct to file */ > if (MyBackendType == B_LOGGER) > write_syslogger_file(buf.data, buf.len, LOG_DESTINATION_CSVLOG); > > Unless I'm missing something this will output the query id in the next log > line? The new code should be added before the newline is output, and the comma > should also be output before the queryid. Yes, correct, updated patch attached. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Wed, Apr 7, 2021 at 08:54:02PM -0400, Bruce Momjian wrote: > > > I am also confused about the inconsistency of calling the GUC > > > compute_query_id (with underscore), but pg_stat_activity.queryid. If we > > > make it pg_stat_activity.query_id, it doesn't match most of the other > > > *id columsns in the table, leader_pid, usesysid, backend_xid. Is that > > > OK?I know I suggested pg_stat_activity.query_id, but maybe I was wrong. > > > > Mmm, most of the columns in pg_stat_activity do have a "_", so using query_id > > would make more sense. > > OK, let me work on a patch to change that part. Uh, it is 'queryid' in pg_stat_statements: https://www.postgresql.org/docs/13/pgstatstatements.html queryid bigint Internal hash code, computed from the statement's parse tree I am not sure if we should have pg_stat_activity use underscore, or the GUC use underscore. The problem is that queryid can easily look like quer-yid. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Wed, Apr 7, 2021 at 08:54:02PM -0400, Bruce Momjian wrote: > > Unless I'm missing something this will output the query id in the next log > > line? The new code should be added before the newline is output, and the comma > > should also be output before the queryid. > > Yes, correct, updated patch attached. Patch applied. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Wed, Apr 07, 2021 at 10:31:01PM -0400, Bruce Momjian wrote: > On Wed, Apr 7, 2021 at 08:54:02PM -0400, Bruce Momjian wrote: > > > Unless I'm missing something this will output the query id in the next log > > > line? The new code should be added before the newline is output, and the comma > > > should also be output before the queryid. > > > > Yes, correct, updated patch attached. > > Patch applied. Thanks! And I agree with using query_id in the new field names while keeping queryid for pg_stat_statements to avoid unnecessary query breakage.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Thu, Apr 8, 2021 at 10:38:08AM +0800, Julien Rouhaud wrote: > On Wed, Apr 07, 2021 at 10:31:01PM -0400, Bruce Momjian wrote: > > On Wed, Apr 7, 2021 at 08:54:02PM -0400, Bruce Momjian wrote: > > > > Unless I'm missing something this will output the query id in the next log > > > > line? The new code should be added before the newline is output, and the comma > > > > should also be output before the queryid. > > > > > > Yes, correct, updated patch attached. > > > > Patch applied. > > Thanks! And I agree with using query_id in the new field names while keeping > queryid for pg_stat_statements to avoid unnecessary query breakage. I think we need more feedback from the group. Do people agree with the idea above? The question is what to call: GUC compute_queryid pg_stat_activity.queryid pg_stat_statements.queryid using "queryid" or "query_id", and do they have to match? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Alvaro Herrera
Date:
On 2021-Apr-07, Bruce Momjian wrote: > On Thu, Apr 8, 2021 at 10:38:08AM +0800, Julien Rouhaud wrote: > > Thanks! And I agree with using query_id in the new field names while keeping > > queryid for pg_stat_statements to avoid unnecessary query breakage. > > I think we need more feedback from the group. Do people agree with the > idea above? The question is what to call: > > GUC compute_queryid > pg_stat_activity.queryid > pg_stat_statements.queryid > > using "queryid" or "query_id", and do they have to match? Seems a matter of personal preference. Mine is to have the underscore everywhere in backend code (where this is new), and let it without the underscore in pg_stat_statements to avoid breaking existing code. Seems to match what Julien is saying. -- Álvaro Herrera Valdivia, Chile "La libertad es como el dinero; el que no la sabe emplear la pierde" (Alvarez)
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Wed, Apr 07, 2021 at 02:12:11PM -0400, Bruce Momjian wrote: > > Patch applied. I am ready to adjust this with any improvements people > might have. Thank you for all the good feedback we got on this, and I > know many users have waited a long time for this feature. Thanks a lot Bruce and everyone! I hope that the users who waited a long time for this will find everything they need. Just to validate that this patchset also allows user to use pg_stat_statements, any additional third-party module and the new added infrastructure with the queryid algorithm of their choice, I created a POC extension ([1]) which works as expected. Basically: SHOW shared_preload_libraries; shared_preload_libraries -------------------------- pg_stat_statements, pg_queryid (1 row) SET pg_queryid.use_object_names TO on; SET pg_queryid.ignore_schema TO on; CREATE SCHEMA ns1; CREATE TABLE ns1.tbl1(id integer); CREATE SCHEMA ns2; CREATE TABLE ns2.tbl1(id integer); SET search_path TO ns1; SELECT COUNT(*) FROM tbl1; SET search_path TO ns2; SELECT COUNT(*) FROM tbl1; SELECT queryid, query, calls FROM public.pg_stat_statements WHERE query LIKE '%tbl%'; queryid | query | calls ---------------------+---------------------------+------- 4629593225724429059 | SELECT count(*) from tbl1 | 2 (1 row) So whether that's a good idea to do that or not, users now have a choice. [1]: https://github.com/rjuju/pg_queryid
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Thomas Munro
Date:
Hi Julien, Bruce, A warning appears on 32 bit systems: In file included from pgstatfuncs.c:15: pgstatfuncs.c: In function 'pg_stat_get_activity': ../../../../src/include/postgres.h:593:29: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] 593 | #define DatumGetPointer(X) ((Pointer) (X)) | ^ ../../../../src/include/postgres.h:678:42: note: in expansion of macro 'DatumGetPointer' 678 | #define DatumGetUInt64(X) (* ((uint64 *) DatumGetPointer(X))) | ^~~~~~~~~~~~~~~ pgstatfuncs.c:920:18: note: in expansion of macro 'DatumGetUInt64' 920 | values[29] = DatumGetUInt64(beentry->st_queryid); | ^~~~~~~~~~~~~~ Hmm, maybe this should be UInt64GetDatum()?
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Thu, Apr 08, 2021 at 11:36:48PM +1200, Thomas Munro wrote: > Hi Julien, Bruce, > > A warning appears on 32 bit systems: > > In file included from pgstatfuncs.c:15: > pgstatfuncs.c: In function 'pg_stat_get_activity': > ../../../../src/include/postgres.h:593:29: warning: cast to pointer > from integer of different size [-Wint-to-pointer-cast] > 593 | #define DatumGetPointer(X) ((Pointer) (X)) > | ^ > ../../../../src/include/postgres.h:678:42: note: in expansion of macro > 'DatumGetPointer' > 678 | #define DatumGetUInt64(X) (* ((uint64 *) DatumGetPointer(X))) > | ^~~~~~~~~~~~~~~ > pgstatfuncs.c:920:18: note: in expansion of macro 'DatumGetUInt64' > 920 | values[29] = DatumGetUInt64(beentry->st_queryid); > | ^~~~~~~~~~~~~~ Wow, that's really embarrassing :( > Hmm, maybe this should be UInt64GetDatum()? Yes definitely. I'm attaching the previous patch for force_parallel_mode to not forget it + a new one for this issue.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Amit Kapila
Date:
On Thu, Apr 8, 2021 at 9:47 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Wed, Apr 07, 2021 at 02:12:11PM -0400, Bruce Momjian wrote: > > > > Patch applied. I am ready to adjust this with any improvements people > > might have. Thank you for all the good feedback we got on this, and I > > know many users have waited a long time for this feature. > > Thanks a lot Bruce and everyone! I hope that the users who waited a long time > for this will find everything they need. > @@ -1421,8 +1421,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc) /* Setting debug_query_string for individual workers */ debug_query_string = queryDesc->sourceText; - /* Report workers' query for monitoring purposes */ + /* Report workers' query and queryId for monitoring purposes */ pgstat_report_activity(STATE_RUNNING, debug_query_string); + pgstat_report_queryid(queryDesc->plannedstmt->queryId, false); Below lines down in ParallelQueryMain, we call ExecutorStart which will report queryid, so do we need it here? -- With Regards, Amit Kapila.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Thu, Apr 08, 2021 at 05:46:07PM +0530, Amit Kapila wrote: > > @@ -1421,8 +1421,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc) > /* Setting debug_query_string for individual workers */ > debug_query_string = queryDesc->sourceText; > > - /* Report workers' query for monitoring purposes */ > + /* Report workers' query and queryId for monitoring purposes */ > pgstat_report_activity(STATE_RUNNING, debug_query_string); > + pgstat_report_queryid(queryDesc->plannedstmt->queryId, false); > > > Below lines down in ParallelQueryMain, we call ExecutorStart which > will report queryid, so do we need it here? Correct, it's not actually needed. The overhead should be negligible but let's get rid of it. Updated fix patchset attached.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Thu, Apr 08, 2021 at 08:27:20PM +0800, Julien Rouhaud wrote: > On Thu, Apr 08, 2021 at 05:46:07PM +0530, Amit Kapila wrote: > > > > @@ -1421,8 +1421,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc) > > /* Setting debug_query_string for individual workers */ > > debug_query_string = queryDesc->sourceText; > > > > - /* Report workers' query for monitoring purposes */ > > + /* Report workers' query and queryId for monitoring purposes */ > > pgstat_report_activity(STATE_RUNNING, debug_query_string); > > + pgstat_report_queryid(queryDesc->plannedstmt->queryId, false); > > > > > > Below lines down in ParallelQueryMain, we call ExecutorStart which > > will report queryid, so do we need it here? > > Correct, it's not actually needed. The overhead should be negligible but let's > get rid of it. Updated fix patchset attached. Sorry I messed up the last commit, v4 is ok.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Thu, Apr 8, 2021 at 09:31:27PM +0800, Julien Rouhaud wrote: > On Thu, Apr 08, 2021 at 08:27:20PM +0800, Julien Rouhaud wrote: > > On Thu, Apr 08, 2021 at 05:46:07PM +0530, Amit Kapila wrote: > > > > > > @@ -1421,8 +1421,9 @@ ParallelQueryMain(dsm_segment *seg, shm_toc *toc) > > > /* Setting debug_query_string for individual workers */ > > > debug_query_string = queryDesc->sourceText; > > > > > > - /* Report workers' query for monitoring purposes */ > > > + /* Report workers' query and queryId for monitoring purposes */ > > > pgstat_report_activity(STATE_RUNNING, debug_query_string); > > > + pgstat_report_queryid(queryDesc->plannedstmt->queryId, false); > > > > > > > > > Below lines down in ParallelQueryMain, we call ExecutorStart which > > > will report queryid, so do we need it here? > > > > Correct, it's not actually needed. The overhead should be negligible but let's > > get rid of it. Updated fix patchset attached. > > Sorry I messed up the last commit, v4 is ok. Patch applied, thanks. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Wed, Apr 7, 2021 at 11:27:04PM -0400, Álvaro Herrera wrote: > On 2021-Apr-07, Bruce Momjian wrote: > > > On Thu, Apr 8, 2021 at 10:38:08AM +0800, Julien Rouhaud wrote: > > > > Thanks! And I agree with using query_id in the new field names while keeping > > > queryid for pg_stat_statements to avoid unnecessary query breakage. > > > > I think we need more feedback from the group. Do people agree with the > > idea above? The question is what to call: > > > > GUC compute_queryid > > pg_stat_activity.queryid > > pg_stat_statements.queryid > > > > using "queryid" or "query_id", and do they have to match? > > Seems a matter of personal preference. Mine is to have the underscore > everywhere in backend code (where this is new), and let it without the > underscore in pg_stat_statements to avoid breaking existing code. Seems > to match what Julien is saying. OK, let's get some details. First, pg_stat_statements.queryid already exists (no underscore), and I don't think anyone wants to change that. pg_stat_activity.queryid is new, but I can imagine cases where you would join pg_stat_activity to pg_stat_statements to get an estimate of how long the query will take --- having one using an underscore and another one not seems odd. Also, looking at the existing pg_stat_activity columns, those don't use underscores before the "id" unless there is a modifier before the "id", e.g. "pid", "xid": SELECT attname FROM pg_namespace JOIN pg_class ON (pg_namespace.oid = relnamespace) JOIN pg_attribute ON (pg_class.oid = pg_attribute.attrelid) WHERE nspname = 'pg_catalog' AND relname = 'pg_stat_activity' AND attname ~ 'id$'; attname ------------- backend_xid datid leader_pid pid queryid usesysid We don't have a modifier before queryid. If people like query_id, and I do too, I am thinking we just keep query_id as the GUC (compute_query_id), and just accept that the GUC and SQL levels will not match. This is exactly what we have now. I brought it up to be sure this is what we want, -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Thu, Apr 08, 2021 at 11:34:25AM -0400, Bruce Momjian wrote: > > OK, let's get some details. First, pg_stat_statements.queryid already > exists (no underscore), and I don't think anyone wants to change that. > > pg_stat_activity.queryid is new, but I can imagine cases where you would > join pg_stat_activity to pg_stat_statements to get an estimate of how > long the query will take --- having one using an underscore and another > one not seems odd. Indeed, and also being able to join with a USING clause rather than an ON could also save some keystrokes. But unfortunately, we already have (userid, dbid) on pg_stat_statements side vs (usesysid, datid) on pg_stat_activity side, so this unfortunately won't fix all the oddities.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Fri, Apr 9, 2021 at 12:38:29AM +0800, Julien Rouhaud wrote: > On Thu, Apr 08, 2021 at 11:34:25AM -0400, Bruce Momjian wrote: > > > > OK, let's get some details. First, pg_stat_statements.queryid already > > exists (no underscore), and I don't think anyone wants to change that. > > > > pg_stat_activity.queryid is new, but I can imagine cases where you would > > join pg_stat_activity to pg_stat_statements to get an estimate of how > > long the query will take --- having one using an underscore and another > > one not seems odd. > > Indeed, and also being able to join with a USING clause rather than an ON could > also save some keystrokes. But unfortunately, we already have (userid, dbid) > on pg_stat_statements side vs (usesysid, datid) on pg_stat_activity side, so > this unfortunately won't fix all the oddities. Wow, good point. Shame they don't match. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Alvaro Herrera
Date:
On 2021-Apr-08, Bruce Momjian wrote: > pg_stat_activity.queryid is new, but I can imagine cases where you would > join pg_stat_activity to pg_stat_statements to get an estimate of how > long the query will take --- having one using an underscore and another > one not seems odd. OK. So far, you have one vote for queryid (your own) and two votes for query_id (mine and Julien's). And even yourself were hesitating about it earlier in the thread. -- Álvaro Herrera Valdivia, Chile
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Thu, Apr 8, 2021 at 12:51:06PM -0400, Álvaro Herrera wrote: > On 2021-Apr-08, Bruce Momjian wrote: > > > pg_stat_activity.queryid is new, but I can imagine cases where you would > > join pg_stat_activity to pg_stat_statements to get an estimate of how > > long the query will take --- having one using an underscore and another > > one not seems odd. > > OK. So far, you have one vote for queryid (your own) and two votes for > query_id (mine and Julien's). And even yourself were hesitating about > it earlier in the thread. OK, if people are fine with pg_stat_activity.query_id not matching pg_stat_statements.queryid, I am fine with that. I just don't want someone to say it was a big mistake later. ;-) -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Thu, Apr 8, 2021 at 01:01:42PM -0400, Bruce Momjian wrote: > On Thu, Apr 8, 2021 at 12:51:06PM -0400, Álvaro Herrera wrote: > > On 2021-Apr-08, Bruce Momjian wrote: > > > > > pg_stat_activity.queryid is new, but I can imagine cases where you would > > > join pg_stat_activity to pg_stat_statements to get an estimate of how > > > long the query will take --- having one using an underscore and another > > > one not seems odd. > > > > OK. So far, you have one vote for queryid (your own) and two votes for > > query_id (mine and Julien's). And even yourself were hesitating about > > it earlier in the thread. > > OK, if people are fine with pg_stat_activity.query_id not matching > pg_stat_statements.queryid, I am fine with that. I just don't want > someone to say it was a big mistake later. ;-) OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I have not changed any of the APIs which existed before this feature was added, and are called "queryid" or "queryId" --- it is kind of a mess. I assume I should leave those unchanged. It will also need a catversion bump. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Mon, Apr 12, 2021 at 10:12:46PM -0400, Bruce Momjian wrote: > On Thu, Apr 8, 2021 at 01:01:42PM -0400, Bruce Momjian wrote: > > On Thu, Apr 8, 2021 at 12:51:06PM -0400, Álvaro Herrera wrote: > > > On 2021-Apr-08, Bruce Momjian wrote: > > > > > > > pg_stat_activity.queryid is new, but I can imagine cases where you would > > > > join pg_stat_activity to pg_stat_statements to get an estimate of how > > > > long the query will take --- having one using an underscore and another > > > > one not seems odd. > > > > > > OK. So far, you have one vote for queryid (your own) and two votes for > > > query_id (mine and Julien's). And even yourself were hesitating about > > > it earlier in the thread. > > > > OK, if people are fine with pg_stat_activity.query_id not matching > > pg_stat_statements.queryid, I am fine with that. I just don't want > > someone to say it was a big mistake later. ;-) > > OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I > have not changed any of the APIs which existed before this feature was > added, and are called "queryid" or "queryId" --- it is kind of a mess. > I assume I should leave those unchanged. It will also need a catversion > bump. - uint64 st_queryid; + uint64 st_query_id; I thought we would internally keep queryid/queryId, at least for the variable names as this is the name of the saved field in PlannedStmt. -extern void pgstat_report_queryid(uint64 queryId, bool force); +extern void pgstat_report_query_id(uint64 queryId, bool force); But if we don't then it should be "uint64 query_id".
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Alvaro Herrera
Date:
On 2021-Apr-12, Bruce Momjian wrote: > OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I > have not changed any of the APIs which existed before this feature was > added, and are called "queryid" or "queryId" --- it is kind of a mess. > I assume I should leave those unchanged. It will also need a catversion > bump. I think it is fine actually. These names appear in structs Query and PlannedStmt, and every single member of those already uses camelCase naming. Changing those to use "query_id" would look out of place. You did change the one in PgBackendStatus to st_query_id, which also matches the naming style in that struct, so that looks fine also. So I'm -1 on Julien's first proposed change, and +1 on his second proposed change (the name of the first argument of pgstat_report_query_id should be query_id). -- Álvaro Herrera Valdivia, Chile
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Tue, Apr 13, 2021 at 01:30:16PM -0400, Álvaro Herrera wrote: > On 2021-Apr-12, Bruce Momjian wrote: > > > OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I > > have not changed any of the APIs which existed before this feature was > > added, and are called "queryid" or "queryId" --- it is kind of a mess. > > I assume I should leave those unchanged. It will also need a catversion > > bump. > > I think it is fine actually. These names appear in structs Query and > PlannedStmt, and every single member of those already uses camelCase > naming. Changing those to use "query_id" would look out of place. > You did change the one in PgBackendStatus to st_query_id, which also > matches the naming style in that struct, so that looks fine also. > > So I'm -1 on Julien's first proposed change, and +1 on his second > proposed change (the name of the first argument of > pgstat_report_query_id should be query_id). Thanks for your analysis. Updated patch attached with the change suggested above. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Wed, Apr 14, 2021 at 02:33:26PM -0400, Bruce Momjian wrote: > On Tue, Apr 13, 2021 at 01:30:16PM -0400, Álvaro Herrera wrote: > > > > So I'm -1 on Julien's first proposed change, and +1 on his second > > proposed change (the name of the first argument of > > pgstat_report_query_id should be query_id). > > Thanks for your analysis. Updated patch attached with the change > suggested above. Thanks Bruce. It looks good to me.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Bruce Momjian
Date:
On Wed, Apr 14, 2021 at 02:33:26PM -0400, Bruce Momjian wrote: > On Tue, Apr 13, 2021 at 01:30:16PM -0400, Álvaro Herrera wrote: > > On 2021-Apr-12, Bruce Momjian wrote: > > > > > OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I > > > have not changed any of the APIs which existed before this feature was > > > added, and are called "queryid" or "queryId" --- it is kind of a mess. > > > I assume I should leave those unchanged. It will also need a catversion > > > bump. > > > > I think it is fine actually. These names appear in structs Query and > > PlannedStmt, and every single member of those already uses camelCase > > naming. Changing those to use "query_id" would look out of place. > > You did change the one in PgBackendStatus to st_query_id, which also > > matches the naming style in that struct, so that looks fine also. > > > > So I'm -1 on Julien's first proposed change, and +1 on his second > > proposed change (the name of the first argument of > > pgstat_report_query_id should be query_id). > > Thanks for your analysis. Updated patch attached with the change > suggested above. Patch applied. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Fujii Masao
Date:
On 2021/04/21 1:22, Bruce Momjian wrote: > On Wed, Apr 14, 2021 at 02:33:26PM -0400, Bruce Momjian wrote: >> On Tue, Apr 13, 2021 at 01:30:16PM -0400, Álvaro Herrera wrote: >>> On 2021-Apr-12, Bruce Momjian wrote: >>> >>>> OK, the attached patch renames pg_stat_activity.queryid to 'query_id'. I >>>> have not changed any of the APIs which existed before this feature was >>>> added, and are called "queryid" or "queryId" --- it is kind of a mess. >>>> I assume I should leave those unchanged. It will also need a catversion >>>> bump. >>> >>> I think it is fine actually. These names appear in structs Query and >>> PlannedStmt, and every single member of those already uses camelCase >>> naming. Changing those to use "query_id" would look out of place. >>> You did change the one in PgBackendStatus to st_query_id, which also >>> matches the naming style in that struct, so that looks fine also. >>> >>> So I'm -1 on Julien's first proposed change, and +1 on his second >>> proposed change (the name of the first argument of >>> pgstat_report_query_id should be query_id). >> >> Thanks for your analysis. Updated patch attached with the change >> suggested above. > > Patch applied. I found another small issue in pg_stat_statements docs. The following description in the docs should be updated so that toplevel is included? > This view contains one row for each distinct database ID, user ID and query ID Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote: > > I found another small issue in pg_stat_statements docs. The following > description in the docs should be updated so that toplevel is included? > > > This view contains one row for each distinct database ID, user ID and query ID Indeed! I'm adding Magnus in Cc. PFA a patch to fix at, and also mention that toplevel will only contain True values if pg_stat_statements.track is set to top.
Attachment
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Fujii Masao
Date:
On 2021/04/22 18:23, Julien Rouhaud wrote: > On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote: >> >> I found another small issue in pg_stat_statements docs. The following >> description in the docs should be updated so that toplevel is included? >> >>> This view contains one row for each distinct database ID, user ID and query ID > > Indeed! I'm adding Magnus in Cc. > > PFA a patch to fix at, and also mention that toplevel will only > contain True values if pg_stat_statements.track is set to top. Thanks for the patch! LGTM. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Magnus Hagander
Date:
On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2021/04/22 18:23, Julien Rouhaud wrote: > > On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote: > >> > >> I found another small issue in pg_stat_statements docs. The following > >> description in the docs should be updated so that toplevel is included? > >> > >>> This view contains one row for each distinct database ID, user ID and query ID > > > > Indeed! I'm adding Magnus in Cc. > > > > PFA a patch to fix at, and also mention that toplevel will only > > contain True values if pg_stat_statements.track is set to top. > > Thanks for the patch! LGTM. Agreed, in general. But going by the example a few lines down, I changed the second part to: True if the query was executed as a top level statement + (if <varname>pg_stat_statements.track</varname> is set to + <literal>all</literal>, otherwise always false) (changes the wording, but also the name of the parameter is pg_stat_statements.track, not pg_stat_statements.toplevel (that's the column, not the parameter). Same error in the commit msg except there you called it pg_stat_statements.top - but that one needed some more fix as well) With those changes, applied. Thanks! -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Fujii Masao
Date:
On 2021/04/23 18:46, Magnus Hagander wrote: > On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> >> >> >> On 2021/04/22 18:23, Julien Rouhaud wrote: >>> On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote: >>>> >>>> I found another small issue in pg_stat_statements docs. The following >>>> description in the docs should be updated so that toplevel is included? >>>> >>>>> This view contains one row for each distinct database ID, user ID and query ID >>> >>> Indeed! I'm adding Magnus in Cc. >>> >>> PFA a patch to fix at, and also mention that toplevel will only >>> contain True values if pg_stat_statements.track is set to top. >> >> Thanks for the patch! LGTM. > > Agreed, in general. But going by the example a few lines down, I > changed the second part to: > True if the query was executed as a top level statement > + (if <varname>pg_stat_statements.track</varname> is set to > + <literal>all</literal>, otherwise always false) Isn't this confusing? Users may mistakenly read this as that the toplevel column always indicates false if pg_stat_statements.track is not "all". > (changes the wording, but also the name of the parameter is > pg_stat_statements.track, not pg_stat_statements.toplevel (that's the > column, not the parameter). Same error in the commit msg except there > you called it pg_stat_statements.top - but that one needed some more > fix as well) > > With those changes, applied. Thanks! Thanks! Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Magnus Hagander
Date:
On Fri, Apr 23, 2021 at 12:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2021/04/23 18:46, Magnus Hagander wrote: > > On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >> > >> > >> > >> On 2021/04/22 18:23, Julien Rouhaud wrote: > >>> On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote: > >>>> > >>>> I found another small issue in pg_stat_statements docs. The following > >>>> description in the docs should be updated so that toplevel is included? > >>>> > >>>>> This view contains one row for each distinct database ID, user ID and query ID > >>> > >>> Indeed! I'm adding Magnus in Cc. > >>> > >>> PFA a patch to fix at, and also mention that toplevel will only > >>> contain True values if pg_stat_statements.track is set to top. > >> > >> Thanks for the patch! LGTM. > > > > Agreed, in general. But going by the example a few lines down, I > > changed the second part to: > > True if the query was executed as a top level statement > > + (if <varname>pg_stat_statements.track</varname> is set to > > + <literal>all</literal>, otherwise always false) > > Isn't this confusing? Users may mistakenly read this as that the toplevel > column always indicates false if pg_stat_statements.track is not "all". Hmm. I think you're right. It should say "always true", shouldn't it? So not just confusing, but completely wrong? :) -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Fujii Masao
Date:
On 2021/04/23 19:11, Magnus Hagander wrote: > On Fri, Apr 23, 2021 at 12:04 PM Fujii Masao > <masao.fujii@oss.nttdata.com> wrote: >> >> >> >> On 2021/04/23 18:46, Magnus Hagander wrote: >>> On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>>> >>>> >>>> >>>> On 2021/04/22 18:23, Julien Rouhaud wrote: >>>>> On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote: >>>>>> >>>>>> I found another small issue in pg_stat_statements docs. The following >>>>>> description in the docs should be updated so that toplevel is included? >>>>>> >>>>>>> This view contains one row for each distinct database ID, user ID and query ID >>>>> >>>>> Indeed! I'm adding Magnus in Cc. >>>>> >>>>> PFA a patch to fix at, and also mention that toplevel will only >>>>> contain True values if pg_stat_statements.track is set to top. >>>> >>>> Thanks for the patch! LGTM. >>> >>> Agreed, in general. But going by the example a few lines down, I >>> changed the second part to: >>> True if the query was executed as a top level statement >>> + (if <varname>pg_stat_statements.track</varname> is set to >>> + <literal>all</literal>, otherwise always false) >> >> Isn't this confusing? Users may mistakenly read this as that the toplevel >> column always indicates false if pg_stat_statements.track is not "all". > > Hmm. I think you're right. It should say "always true", shouldn't it? You're thinking something like the following? True if the query was executed as a top level statement (if <varname>pg_stat_statements.track</varname> is set to <literal>top</literal>, always true) > So not just confusing, but completely wrong? :) Yeah :) I'm fine with the original wording by Julien. Of course, the parameter name should be corrected as you did, though. Or what about the following? True if the query was executed as a top level statement (this can be <literal>false</literal> only if <varname>pg_stat_statements.track</varname> is set to <literal>all</literal> and nested statements are also tracked) Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Magnus Hagander
Date:
On Fri, Apr 23, 2021 at 12:40 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2021/04/23 19:11, Magnus Hagander wrote: > > On Fri, Apr 23, 2021 at 12:04 PM Fujii Masao > > <masao.fujii@oss.nttdata.com> wrote: > >> > >> > >> > >> On 2021/04/23 18:46, Magnus Hagander wrote: > >>> On Fri, Apr 23, 2021 at 9:10 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > >>>> > >>>> > >>>> > >>>> On 2021/04/22 18:23, Julien Rouhaud wrote: > >>>>> On Thu, Apr 22, 2021 at 12:28:11AM +0900, Fujii Masao wrote: > >>>>>> > >>>>>> I found another small issue in pg_stat_statements docs. The following > >>>>>> description in the docs should be updated so that toplevel is included? > >>>>>> > >>>>>>> This view contains one row for each distinct database ID, user ID and query ID > >>>>> > >>>>> Indeed! I'm adding Magnus in Cc. > >>>>> > >>>>> PFA a patch to fix at, and also mention that toplevel will only > >>>>> contain True values if pg_stat_statements.track is set to top. > >>>> > >>>> Thanks for the patch! LGTM. > >>> > >>> Agreed, in general. But going by the example a few lines down, I > >>> changed the second part to: > >>> True if the query was executed as a top level statement > >>> + (if <varname>pg_stat_statements.track</varname> is set to > >>> + <literal>all</literal>, otherwise always false) > >> > >> Isn't this confusing? Users may mistakenly read this as that the toplevel > >> column always indicates false if pg_stat_statements.track is not "all". > > > > Hmm. I think you're right. It should say "always true", shouldn't it? > > You're thinking something like the following? > > True if the query was executed as a top level statement > (if <varname>pg_stat_statements.track</varname> is set to > <literal>top</literal>, always true) > > > So not just confusing, but completely wrong? :) > > Yeah :) Ugh. I completely lost track of this email. I've applied the change suggested above with another slight reordering of the words: + (always true if <varname>pg_stat_statements.track</varname> is set to + <literal>top</literal>) > I'm fine with the original wording by Julien. > Of course, the parameter name should be corrected as you did, though. > > Or what about the following? > > True if the query was executed as a top level statement > (this can be <literal>false</literal> only if > <varname>pg_stat_statements.track</varname> is set to > <literal>all</literal> and nested statements are also tracked) I found my suggestion, once the final reordering of words was done, easier to parse. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Peter Eisentraut
Date:
On 22.04.21 11:23, Julien Rouhaud wrote: > The statistics gathered by the module are made available via a > view named <structname>pg_stat_statements</structname>. This view > - contains one row for each distinct database ID, user ID and query > - ID (up to the maximum number of distinct statements that the module > + contains one row for each distinct database ID, user ID, query ID and > + toplevel (up to the maximum number of distinct statements that the module > can track). The columns of the view are shown in > <xref linkend="pgstatstatements-columns"/>. I'm having trouble parsing this new sentence. It now says essentially "This view contains one row for each distinct database ID, each distinct user ID, each distinct query ID, and each distinct toplevel." That last part doesn't make sense.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Mon, Jul 12, 2021 at 10:02:59PM +0200, Peter Eisentraut wrote: > On 22.04.21 11:23, Julien Rouhaud wrote: > > The statistics gathered by the module are made available via a > > view named <structname>pg_stat_statements</structname>. This view > > - contains one row for each distinct database ID, user ID and query > > - ID (up to the maximum number of distinct statements that the module > > + contains one row for each distinct database ID, user ID, query ID and > > + toplevel (up to the maximum number of distinct statements that the module > > can track). The columns of the view are shown in > > <xref linkend="pgstatstatements-columns"/>. > > I'm having trouble parsing this new sentence. It now says essentially > > "This view contains one row for each distinct database ID, each distinct > user ID, each distinct query ID, and each distinct toplevel." Isn't it each distinct permutation of all those fields? > That last part doesn't make sense. I'm not sure what you mean by that. Maybe it's not really self explanatory without referring to what toplevel is, which is a bool flag stating whether the statement was exected as a top level statement or not. So every distinct permutation of (dbid, userid, queryid) can indeed be stored twice, if pg_stat_statements.track is set to all. However in practice most statements are not executed both as top level and nested statements.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Magnus Hagander
Date:
On Tue, Jul 13, 2021 at 8:10 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Mon, Jul 12, 2021 at 10:02:59PM +0200, Peter Eisentraut wrote: > > On 22.04.21 11:23, Julien Rouhaud wrote: > > > The statistics gathered by the module are made available via a > > > view named <structname>pg_stat_statements</structname>. This view > > > - contains one row for each distinct database ID, user ID and query > > > - ID (up to the maximum number of distinct statements that the module > > > + contains one row for each distinct database ID, user ID, query ID and > > > + toplevel (up to the maximum number of distinct statements that the module > > > can track). The columns of the view are shown in > > > <xref linkend="pgstatstatements-columns"/>. > > > > I'm having trouble parsing this new sentence. It now says essentially > > > > "This view contains one row for each distinct database ID, each distinct > > user ID, each distinct query ID, and each distinct toplevel." > > Isn't it each distinct permutation of all those fields? Maybe a problem for the readability of it is that the three other fields are listed by their description and not by their fieldname, and toplevel is fieldname? Maybe "each distinct database id, each distinct user id, each distinct query id, and whether it is a top level statement or not"? Or maybe "each distinct combination of database id, user id, query id and whether it's a top level statement or not"? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Julien Rouhaud
Date:
On Tue, Jul 13, 2021 at 10:58:12AM +0200, Magnus Hagander wrote: > > Maybe a problem for the readability of it is that the three other > fields are listed by their description and not by their fieldname, and > toplevel is fieldname? I think so too. > Maybe "each distinct database id, each distinct user id, each distinct > query id, and whether it is a top level statement or not"? > > Or maybe "each distinct combination of database id, user id, query id > and whether it's a top level statement or not"? I like the 2nd one better. What about "and its top level status"? It would keep the sentence short and the full description is right after if needed.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Peter Eisentraut
Date:
On 13.07.21 10:58, Magnus Hagander wrote: > Maybe "each distinct database id, each distinct user id, each distinct > query id, and whether it is a top level statement or not"? > > Or maybe "each distinct combination of database id, user id, query id > and whether it's a top level statement or not"? Okay, now I understand what is meant here. The second one sounds good to me.
Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
From
Magnus Hagander
Date:
On Wed, Jul 14, 2021 at 6:36 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > > On 13.07.21 10:58, Magnus Hagander wrote: > > Maybe "each distinct database id, each distinct user id, each distinct > > query id, and whether it is a top level statement or not"? > > > > Or maybe "each distinct combination of database id, user id, query id > > and whether it's a top level statement or not"? > > Okay, now I understand what is meant here. The second one sounds good > to me. Thanks, will push a fix like that. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/