Thread: Problems around compute_query_id
Hi all, Starting a new thread as the one that has introduced compute_query_id is already long enough. Fujii-san has reported on Twitter that enabling the computation of query IDs does not work properly with log_statement as the query ID is calculated at parse analyze time and the query is logged before that. As far as I can see, that's really a problem as any queries logged are combined with a query ID of 0, and log parsers would not really be able to use that, even if the information provided by for example log_duration gives the computed query ID prevent in pg_stat_activity. While making the feature run on some test server, I have noticed that %Q would log some garbage query ID for autovacuum workers that's kept around. That looks wrong. Thoughts? -- Michael
Attachment
On Mon, Apr 12, 2021 at 03:12:40PM +0900, Michael Paquier wrote: > Hi all, > > Starting a new thread as the one that has introduced compute_query_id > is already long enough. > > Fujii-san has reported on Twitter that enabling the computation of > query IDs does not work properly with log_statement as the query ID is > calculated at parse analyze time and the query is logged before that. > As far as I can see, that's really a problem as any queries logged are > combined with a query ID of 0, and log parsers would not really be > able to use that, even if the information provided by for example > log_duration gives the computed query ID prevent in pg_stat_activity. I don't see any way around that. The goal of log_statements is to log all syntactically valid queries, including otherwise invalid queries. For instance, it would log SELECT notacolumn; and there's no way to compute a queryid in that case. I think that log_statements already causes some issues with log parsers. At least pgbadger recommends to avoid using that: "Do not enable log_statement as its log format will not be parsed by pgBadger." I think we should simply document that %Q is not compatible with log_statements. > While making the feature run on some test server, I have noticed that > %Q would log some garbage query ID for autovacuum workers that's kept > around. That looks wrong. I've not been able to reproduce it, do you have some hint on how to do it? Maybe setting a zero queryid at the beginning of AutoVacWorkerMain() could fix the problem?
Hi, On Mon, Apr 12, 2021 at 02:56:59PM +0800, Julien Rouhaud wrote: > On Mon, Apr 12, 2021 at 03:12:40PM +0900, Michael Paquier wrote: > > Fujii-san has reported on Twitter that enabling the computation of > > query IDs does not work properly with log_statement as the query ID is > > calculated at parse analyze time and the query is logged before that. > > As far as I can see, that's really a problem as any queries logged are > > combined with a query ID of 0, and log parsers would not really be > > able to use that, even if the information provided by for example > > log_duration gives the computed query ID prevent in pg_stat_activity. > > I don't see any way around that. The goal of log_statements is to log all > syntactically valid queries, including otherwise invalid queries. For > instance, it would log > > SELECT notacolumn; > > and there's no way to compute a queryid in that case. I think that > log_statements already causes some issues with log parsers. At least pgbadger > recommends to avoid using that: > > "Do not enable log_statement as its log format will not be parsed by pgBadger." > > I think we should simply document that %Q is not compatible with > log_statements. What about log_statement_sample_rate ? Does compute_query_id have the same problem with that? Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
On Mon, Apr 12, 2021 at 09:20:07AM +0200, Michael Banck wrote: > > What about log_statement_sample_rate ? Does compute_query_id have the > same problem with that? No, log_statement_sample_rate samples log_min_duration_statements, not log_statements so it works as expected.
On Mon, Apr 12, 2021 at 03:26:33PM +0800, Julien Rouhaud wrote: > On Mon, Apr 12, 2021 at 09:20:07AM +0200, Michael Banck wrote: > > > > What about log_statement_sample_rate ? Does compute_query_id have the > > same problem with that? > > No, log_statement_sample_rate samples log_min_duration_statements, not > log_statements so it works as expected. While on that topic, it's probably worth mentioning that log_duration is now way more useful if you have the queryid in you log_line_prefix. It avoids to log the full query text while still being able to know what was the underlying normalized query by dumping the content of pg_stat_statements.
On Mon, Apr 12, 2021 at 02:56:59PM +0800, Julien Rouhaud wrote: > I think we should simply document that %Q is not compatible with > log_statements. Hearing no objection I documented that limitation. > > > While making the feature run on some test server, I have noticed that > > %Q would log some garbage query ID for autovacuum workers that's kept > > around. That looks wrong. > > I've not been able to reproduce it, do you have some hint on how to do it? > > Maybe setting a zero queryid at the beginning of AutoVacWorkerMain() could fix > the problem? It turns out that the problem was simply that some process can inherit a PgBackendStatus for which a previous backend reported a queryid. For processes like autovacuum process, they will never report a new identifier so they reported the previous one. Resetting the field like the other ones in pgstat_bestart() fixes the problem for autovacuum and any similar process.
Attachment
On Thu, Apr 15, 2021 at 03:43:59PM +0800, Julien Rouhaud wrote: > On Mon, Apr 12, 2021 at 02:56:59PM +0800, Julien Rouhaud wrote: > > I think we should simply document that %Q is not compatible with > > log_statements. > > Hearing no objection I documented that limitation. > > > > > > While making the feature run on some test server, I have noticed that > > > %Q would log some garbage query ID for autovacuum workers that's kept > > > around. That looks wrong. > > > > I've not been able to reproduce it, do you have some hint on how to do it? > > > > Maybe setting a zero queryid at the beginning of AutoVacWorkerMain() could fix > > the problem? > > It turns out that the problem was simply that some process can inherit a > PgBackendStatus for which a previous backend reported a queryid. For processes > like autovacuum process, they will never report a new identifier so they > reported the previous one. Resetting the field like the other ones in > pgstat_bestart() fixes the problem for autovacuum and any similar process. I slightly adjusted the patch and applied it. Thanks. I think this closes all the open issues around query_id. :-) -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Attachment
On Tue, Apr 20, 2021 at 12:59:10PM -0400, Bruce Momjian wrote: > On Thu, Apr 15, 2021 at 03:43:59PM +0800, Julien Rouhaud wrote: > > On Mon, Apr 12, 2021 at 02:56:59PM +0800, Julien Rouhaud wrote: > > > I think we should simply document that %Q is not compatible with > > > log_statements. > > > > Hearing no objection I documented that limitation. > > > > > > > > > While making the feature run on some test server, I have noticed that > > > > %Q would log some garbage query ID for autovacuum workers that's kept > > > > around. That looks wrong. > > > > > > I've not been able to reproduce it, do you have some hint on how to do it? > > > > > > Maybe setting a zero queryid at the beginning of AutoVacWorkerMain() could fix > > > the problem? > > > > It turns out that the problem was simply that some process can inherit a > > PgBackendStatus for which a previous backend reported a queryid. For processes > > like autovacuum process, they will never report a new identifier so they > > reported the previous one. Resetting the field like the other ones in > > pgstat_bestart() fixes the problem for autovacuum and any similar process. > > I slightly adjusted the patch and applied it. Thanks. I think this > closes all the open issues around query_id. :-) Thanks a lot Bruce! There was also [1], but Michael already committed the proposed fix, so I also think that all open issues for query_id are not taken care of! [1]: https://postgr.es/m/CAJcOf-fXyb2QiDbwftD813UF70w-+BsK-03bFp1GrijXU9GQYQ@mail.gmail.com