Thread: Problems around compute_query_id

Problems around compute_query_id

From
Michael Paquier
Date:
Hi all,

Starting a new thread as the one that has introduced compute_query_id
is already long enough.

Fujii-san has reported on Twitter that enabling the computation of
query IDs does not work properly with log_statement as the query ID is
calculated at parse analyze time and the query is logged before that.
As far as I can see, that's really a problem as any queries logged are
combined with a query ID of 0, and log parsers would not really be
able to use that, even if the information provided by for example
log_duration gives the computed query ID prevent in pg_stat_activity.

While making the feature run on some test server, I have noticed that
%Q would log some garbage query ID for autovacuum workers that's kept
around.  That looks wrong.

Thoughts?
--
Michael

Attachment

Re: Problems around compute_query_id

From
Julien Rouhaud
Date:
On Mon, Apr 12, 2021 at 03:12:40PM +0900, Michael Paquier wrote:
> Hi all,
> 
> Starting a new thread as the one that has introduced compute_query_id
> is already long enough.
> 
> Fujii-san has reported on Twitter that enabling the computation of
> query IDs does not work properly with log_statement as the query ID is
> calculated at parse analyze time and the query is logged before that.
> As far as I can see, that's really a problem as any queries logged are
> combined with a query ID of 0, and log parsers would not really be
> able to use that, even if the information provided by for example
> log_duration gives the computed query ID prevent in pg_stat_activity.

I don't see any way around that.  The goal of log_statements is to log all
syntactically valid queries, including otherwise invalid queries.  For
instance, it would log

SELECT notacolumn;

and there's no way to compute a queryid in that case.  I think that
log_statements already causes some issues with log parsers.  At least pgbadger
recommends to avoid using that:

"Do not enable log_statement as its log format will not be parsed by pgBadger."

I think we should simply document that %Q is not compatible with
log_statements.

> While making the feature run on some test server, I have noticed that
> %Q would log some garbage query ID for autovacuum workers that's kept
> around.  That looks wrong.

I've not been able to reproduce it, do you have some hint on how to do it?

Maybe setting a zero queryid at the beginning of AutoVacWorkerMain() could fix
the problem?



Re: Problems around compute_query_id

From
Michael Banck
Date:
Hi,

On Mon, Apr 12, 2021 at 02:56:59PM +0800, Julien Rouhaud wrote:
> On Mon, Apr 12, 2021 at 03:12:40PM +0900, Michael Paquier wrote:
> > Fujii-san has reported on Twitter that enabling the computation of
> > query IDs does not work properly with log_statement as the query ID is
> > calculated at parse analyze time and the query is logged before that.
> > As far as I can see, that's really a problem as any queries logged are
> > combined with a query ID of 0, and log parsers would not really be
> > able to use that, even if the information provided by for example
> > log_duration gives the computed query ID prevent in pg_stat_activity.
> 
> I don't see any way around that.  The goal of log_statements is to log all
> syntactically valid queries, including otherwise invalid queries.  For
> instance, it would log
> 
> SELECT notacolumn;
> 
> and there's no way to compute a queryid in that case.  I think that
> log_statements already causes some issues with log parsers.  At least pgbadger
> recommends to avoid using that:
> 
> "Do not enable log_statement as its log format will not be parsed by pgBadger."
> 
> I think we should simply document that %Q is not compatible with
> log_statements.

What about log_statement_sample_rate ? Does compute_query_id have the
same problem with that?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Problems around compute_query_id

From
Julien Rouhaud
Date:
On Mon, Apr 12, 2021 at 09:20:07AM +0200, Michael Banck wrote:
> 
> What about log_statement_sample_rate ? Does compute_query_id have the
> same problem with that?

No, log_statement_sample_rate samples log_min_duration_statements, not
log_statements so it works as expected.



Re: Problems around compute_query_id

From
Julien Rouhaud
Date:
On Mon, Apr 12, 2021 at 03:26:33PM +0800, Julien Rouhaud wrote:
> On Mon, Apr 12, 2021 at 09:20:07AM +0200, Michael Banck wrote:
> > 
> > What about log_statement_sample_rate ? Does compute_query_id have the
> > same problem with that?
> 
> No, log_statement_sample_rate samples log_min_duration_statements, not
> log_statements so it works as expected.

While on that topic, it's probably worth mentioning that log_duration is now
way more useful if you have the queryid in you log_line_prefix.  It avoids to
log the full query text while still being able to know what was the underlying
normalized query by dumping the content of pg_stat_statements.



Re: Problems around compute_query_id

From
Julien Rouhaud
Date:
On Mon, Apr 12, 2021 at 02:56:59PM +0800, Julien Rouhaud wrote:
> I think we should simply document that %Q is not compatible with
> log_statements.

Hearing no objection I documented that limitation.

> 
> > While making the feature run on some test server, I have noticed that
> > %Q would log some garbage query ID for autovacuum workers that's kept
> > around.  That looks wrong.
> 
> I've not been able to reproduce it, do you have some hint on how to do it?
> 
> Maybe setting a zero queryid at the beginning of AutoVacWorkerMain() could fix
> the problem?

It turns out that the problem was simply that some process can inherit a
PgBackendStatus for which a previous backend reported a queryid.  For processes
like autovacuum process, they will never report a new identifier so they
reported the previous one.  Resetting the field like the other ones in
pgstat_bestart() fixes the problem for autovacuum and any similar process.

Attachment

Re: Problems around compute_query_id

From
Bruce Momjian
Date:
On Thu, Apr 15, 2021 at 03:43:59PM +0800, Julien Rouhaud wrote:
> On Mon, Apr 12, 2021 at 02:56:59PM +0800, Julien Rouhaud wrote:
> > I think we should simply document that %Q is not compatible with
> > log_statements.
> 
> Hearing no objection I documented that limitation.
> 
> > 
> > > While making the feature run on some test server, I have noticed that
> > > %Q would log some garbage query ID for autovacuum workers that's kept
> > > around.  That looks wrong.
> > 
> > I've not been able to reproduce it, do you have some hint on how to do it?
> > 
> > Maybe setting a zero queryid at the beginning of AutoVacWorkerMain() could fix
> > the problem?
> 
> It turns out that the problem was simply that some process can inherit a
> PgBackendStatus for which a previous backend reported a queryid.  For processes
> like autovacuum process, they will never report a new identifier so they
> reported the previous one.  Resetting the field like the other ones in
> pgstat_bestart() fixes the problem for autovacuum and any similar process.

I slightly adjusted the patch and applied it.  Thanks.  I think this
closes all the open issues around query_id.  :-)

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  If only the physical world exists, free will is an illusion.


Attachment

Re: Problems around compute_query_id

From
Julien Rouhaud
Date:
On Tue, Apr 20, 2021 at 12:59:10PM -0400, Bruce Momjian wrote:
> On Thu, Apr 15, 2021 at 03:43:59PM +0800, Julien Rouhaud wrote:
> > On Mon, Apr 12, 2021 at 02:56:59PM +0800, Julien Rouhaud wrote:
> > > I think we should simply document that %Q is not compatible with
> > > log_statements.
> > 
> > Hearing no objection I documented that limitation.
> > 
> > > 
> > > > While making the feature run on some test server, I have noticed that
> > > > %Q would log some garbage query ID for autovacuum workers that's kept
> > > > around.  That looks wrong.
> > > 
> > > I've not been able to reproduce it, do you have some hint on how to do it?
> > > 
> > > Maybe setting a zero queryid at the beginning of AutoVacWorkerMain() could fix
> > > the problem?
> > 
> > It turns out that the problem was simply that some process can inherit a
> > PgBackendStatus for which a previous backend reported a queryid.  For processes
> > like autovacuum process, they will never report a new identifier so they
> > reported the previous one.  Resetting the field like the other ones in
> > pgstat_bestart() fixes the problem for autovacuum and any similar process.
> 
> I slightly adjusted the patch and applied it.  Thanks.  I think this
> closes all the open issues around query_id.  :-)

Thanks a lot Bruce!  There was also [1], but Michael already committed the
proposed fix, so I also think that all open issues for query_id are not taken
care of!

[1]: https://postgr.es/m/CAJcOf-fXyb2QiDbwftD813UF70w-+BsK-03bFp1GrijXU9GQYQ@mail.gmail.com