Thread: compute_query_id
hi,
I noticed this new param compute_query_id in pg14beta.
it is interesting as I was long wanting to identify a query with a unique id like we have for http requests etc so that we can trace the query all the way to shards via FDW etc.
I noticed this new param compute_query_id in pg14beta.
it is interesting as I was long wanting to identify a query with a unique id like we have for http requests etc so that we can trace the query all the way to shards via FDW etc.
but i cannot see them in the logs even after setting compute_query_id on.
i also read
compute_query_id_query for the same.
how is the compute_query_id actually calculated?
why does it show 0 in logs for random sql queries.
log_line_prefix = '%Q :'
0 :LOG: statement: select * from pg_stat_activity;
i mean pid already was doing the job to identify the query and its children even it logs,
but i know pid will get recycled.
tldr;
how is compute_query_id different from pid to identify some query running ?
can it be passed on to FDW queries ? for tracing etc ?
am i totally getting its use case totally wrong :)
--
Thanks,
Vijay
Mumbai, India
On Thu, Jun 17, 2021 at 08:09:54PM +0530, Vijaykumar Jain wrote: > how is the compute_query_id actually calculated? It's the exact same implementation that was extracted from pg_stat_statements. You have some implementation details at https://www.postgresql.org/docs/current/pgstatstatements.html. > why does it show 0 in logs for random sql queries. > log_line_prefix = '%Q :' > 0 :LOG: statement: select * from pg_stat_activity; It means that you haven't enabled it: 2021-06-17 22:46:16.231 CST [11246] queryid=0 LOG: duration: 4.971 ms statement: select * from pg_stat_activity ; 2021-06-17 22:46:25.383 CST [11246] queryid=0 LOG: duration: 0.284 ms statement: set compute_query_id = on; 2021-06-17 22:46:28.744 CST [11246] queryid=941978042436931562 LOG: duration: 1.725 ms statement: select * from pg_stat_activity; > i mean pid already was doing the job to identify the query and its children > even it logs, > but i know pid will get recycled. I'm not sure that I understand that question. The pid will identify a backend, and that backend can execute 0, 1 or a lot of different queries. The query_id will uniquely identify statements after some normalization and removing the constant parts (so for instance "select 1;" and "Select 2 ;" will have the same identifier). Having only that information in the log can be useful on its own, but you usually get way more benefit using additional modules like pg_stat_statements.
On Thu, 17 Jun 2021 at 20:20, Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Thu, Jun 17, 2021 at 08:09:54PM +0530, Vijaykumar Jain wrote: > > how is the compute_query_id actually calculated? > > > why does it show 0 in logs for random sql queries. > > log_line_prefix = '%Q :' > > 0 :LOG: statement: select * from pg_stat_activity; > > It means that you haven't enabled it: > > 2021-06-17 22:46:16.231 CST [11246] queryid=0 LOG: duration: 4.971 ms statement: select * from pg_stat_activity ; > 2021-06-17 22:46:25.383 CST [11246] queryid=0 LOG: duration: 0.284 ms statement: set compute_query_id = on; > 2021-06-17 22:46:28.744 CST [11246] queryid=941978042436931562 LOG: duration: 1.725 ms statement: select * from pg_stat_activity; > psql test psql (14beta1) Type "help" for help. test=# show log_line_prefix; log_line_prefix -------------------------------- [timestamp=%t] [query_id=%Q] : (1 row) test=# show compute_query_id; compute_query_id ------------------ on (1 row) test=# show log_statement; log_statement --------------- all (1 row) test=# select query_id, query, pid from pg_stat_activity where pid = pg_backend_pid(); -[ RECORD 1 ]----------------------------------------------------------------------------- query_id | -4293879703199833131 query | select query_id, query, pid from pg_stat_activity where pid = pg_backend_pid(); pid | 2640 from the logs, i get the id for some queries, but not all [timestamp=2021-06-17 20:37:59 IST] [query_id=488416992746849793] :ERROR: relation "t" already exists [timestamp=2021-06-17 20:37:59 IST] [query_id=488416992746849793] :STATEMENT: create table t(id int); [timestamp=2021-06-17 20:38:22 IST] [query_id=0] :LOG: statement: prepare qq(int) as select count(1) from t where id = $1; [timestamp=2021-06-17 20:38:29 IST] [query_id=0] :LOG: statement: execute qq(1); [timestamp=2021-06-17 20:38:29 IST] [query_id=0] :DETAIL: prepare: prepare qq(int) as select count(1) from t where id = $1; [timestamp=2021-06-17 20:38:32 IST] [query_id=0] :LOG: statement: execute qq(2); [timestamp=2021-06-17 20:38:32 IST] [query_id=0] :DETAIL: prepare: prepare qq(int) as select count(1) from t where id = $1; [timestamp=2021-06-17 20:39:25 IST] [query_id=0] :LOG: statement: select query_id, query, pid from pg_stat_activity where pid = 0; [timestamp=2021-06-17 20:40:36 IST] [query_id=0] :LOG: statement: select count(1) from t; [timestamp=2021-06-17 20:40:47 IST] [query_id=0] :LOG: statement: select count(1) from t where id < 100; test=# explain (analyze,verbose) select * from t where id < floor((random() * 100)::int); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Append (cost=0.00..3.10 rows=3 width=4) (actual time=0.009..0.014 rows=3 loops=1) .... Query Identifier: 1051405225525186795 Planning Time: 0.090 ms Execution Time: 0.030 ms (13 rows) test=# select query_id, query, pid from pg_stat_activity where pid = pg_backend_pid(); query_id | query | pid ----------------------+---------------------------------------------------------------------------------+------ -4293879703199833131 | select query_id, query, pid from pg_stat_activity where pid = pg_backend_pid(); | 2671 (1 row) but in logs [timestamp=2021-06-17 20:46:47 IST] [query_id=0] :LOG: statement: explain select query_id, query, pid from pg_stat_activity where pid = pg_backend_pid(); [timestamp=2021-06-17 20:46:54 IST] [query_id=0] :LOG: statement: explain select query_id, query, pid from pg_stat_activity where pid > 100; [timestamp=2021-06-17 20:46:58 IST] [query_id=0] :LOG: statement: explain analyze select query_id, query, pid from pg_stat_activity where pid > 100; [timestamp=2021-06-17 20:47:25 IST] [query_id=0] :LOG: statement: explain analyze select * from t where id < floor((random() * 100)::int); [timestamp=2021-06-17 20:48:16 IST] [query_id=0] :LOG: statement: explain (analyze,verbose) select * from t where id < floor((random() * 100)::int); [timestamp=2021-06-17 20:48:38 IST] [query_id=0] :LOG: statement: select query_id, query, pid from pg_stat_activity where pid = pg_backend_pid(); not sure, if i am missing something obvious? > I'm not sure that I understand that question. The pid will identify a backend, > and that backend can execute 0, 1 or a lot of different queries. The query_id > will uniquely identify statements after some normalization and removing the > constant parts (so for instance "select 1;" and "Select 2 ;" will have the > same identifier). Having only that information in the log can be useful on its > own, but you usually get way more benefit using additional modules like > pg_stat_statements. Thanks, that helps, but I still do not see them in logs. ************************************* corrects myself ..... ok now it works, when i set log_min_duration_statement =0 to log all statements. test=# show log_min_duration_statement; log_min_duration_statement ---------------------------- 0 (1 row) test=# explain (analyze,verbose) select * from t where id < floor((random() * 100)::int); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Append (cost=0.00..3.10 rows=3 width=4) (actual time=0.018..0.022 rows=3 loops=1) -> Seq Scan on public.t0 t_1 (cost=0.00..1.03 rows=1 width=4) (actual time=0.018..0.018 rows=1 loops=1) Output: t_1.id Filter: ((t_1.id)::double precision < floor((((random() * '100'::double precision))::integer)::double precision)) -> Seq Scan on public.t1 t_2 (cost=0.00..1.03 rows=1 width=4) (actual time=0.001..0.002 rows=1 loops=1) Output: t_2.id Filter: ((t_2.id)::double precision < floor((((random() * '100'::double precision))::integer)::double precision)) -> Seq Scan on public.t2 t_3 (cost=0.00..1.03 rows=1 width=4) (actual time=0.001..0.001 rows=1 loops=1) Output: t_3.id Filter: ((t_3.id)::double precision < floor((((random() * '100'::double precision))::integer)::double precision)) Query Identifier: 1051405225525186795 Planning Time: 0.353 ms Execution Time: 0.034 ms (13 rows) test=# select query_id, query, pid from pg_stat_activity where pid = pg_backend_pid(); query_id | query | pid ----------------------+---------------------------------------------------------------------------------+------ -4293879703199833131 | select query_id, query, pid from pg_stat_activity where pid = pg_backend_pid(); | 2777 (1 row) [timestamp=2021-06-17 20:54:12 IST] [query_id=-1477439429101745134] :LOG: duration: 0.040 ms statement: show log_min_duration_statement; [timestamp=2021-06-17 20:54:16 IST] [query_id=1308619569072270555] :LOG: duration: 0.882 ms statement: explain (analyze,verbose) select * from t where id < floor((random() * 100)::int); [timestamp=2021-06-17 20:54:19 IST] [query_id=-4293879703199833131] :LOG: duration: 1.201 ms statement: select query_id, query, pid from pg_stat_activity where pid = pg_backend_pid(); all good, Julien. -- Thanks, Vijay Mumbai, India
On Thu, Jun 17, 2021 at 08:57:02PM +0530, Vijaykumar Jain wrote: > > test=# show log_line_prefix; > log_line_prefix > -------------------------------- > [timestamp=%t] [query_id=%Q] : > (1 row) > > test=# show compute_query_id; > compute_query_id > ------------------ > on > (1 row) > > test=# show log_statement; > log_statement > --------------- > all > (1 row) > > ************************************* corrects myself > ..... ok now it works, when i set log_min_duration_statement =0 to log > all statements. > > test=# show log_min_duration_statement; > log_min_duration_statement > ---------------------------- > 0 > (1 row) Yes, unfortunately log_statements is not compatible with compute_query_id. This is documented at https://www.postgresql.org/docs/devel/runtime-config-logging.html#GUC-LOG-LINE-PREFIX: > The %Q escape always reports a zero identifier for lines output by > log_statement because log_statement generates output before an identifier can > be calculated, including invalid statements for which an identifier cannot be > calculated.