Thread: compute_query_id

compute_query_id

From
Vijaykumar Jain
Date:
hi,

I noticed this new param compute_query_id in pg14beta.
it is interesting as I was long wanting to identify a query with a unique id like we have for http requests etc so that we can trace the query all the way to shards via FDW etc.

but i cannot see them in the logs even after setting compute_query_id on.
i also read
compute_query_id_query  for the same.
how is the compute_query_id actually calculated? 
why does it show 0 in logs for random sql queries. 
log_line_prefix = '%Q :'
0 :LOG:  statement: select * from pg_stat_activity;

i mean pid already was doing the job to identify the query and its children even it logs,
but i know pid will get recycled.

tldr;
how is compute_query_id different from pid to identify some query running ?
can it be passed on to FDW queries ? for tracing etc ?

am i totally getting its use case totally wrong :) 


--
Thanks,
Vijay
Mumbai, India

Re: compute_query_id

From
Julien Rouhaud
Date:
On Thu, Jun 17, 2021 at 08:09:54PM +0530, Vijaykumar Jain wrote:
> how is the compute_query_id actually calculated?

It's the exact same implementation that was extracted from pg_stat_statements.
You have some implementation details at
https://www.postgresql.org/docs/current/pgstatstatements.html.

> why does it show 0 in logs for random sql queries.
> log_line_prefix = '%Q :'
> 0 :LOG:  statement: select * from pg_stat_activity;

It means that you haven't enabled it:

2021-06-17 22:46:16.231 CST [11246] queryid=0 LOG:  duration: 4.971 ms  statement: select * from pg_stat_activity ;
2021-06-17 22:46:25.383 CST [11246] queryid=0 LOG:  duration: 0.284 ms  statement: set compute_query_id = on;
2021-06-17 22:46:28.744 CST [11246] queryid=941978042436931562 LOG:  duration: 1.725 ms  statement: select * from
pg_stat_activity;
 

> i mean pid already was doing the job to identify the query and its children
> even it logs,
> but i know pid will get recycled.

I'm not sure that I understand that question.  The pid will identify a backend,
and that backend can execute 0, 1 or a lot of different queries.  The query_id
will uniquely identify statements after some normalization and removing the
constant parts (so for instance "select 1;" and "Select 2  ;" will have the
same identifier).  Having only that information in the log can be useful on its
own, but you usually get way more benefit using additional modules like
pg_stat_statements.



Re: compute_query_id

From
Vijaykumar Jain
Date:
On Thu, 17 Jun 2021 at 20:20, Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Thu, Jun 17, 2021 at 08:09:54PM +0530, Vijaykumar Jain wrote:
> > how is the compute_query_id actually calculated?
>
> > why does it show 0 in logs for random sql queries.
> > log_line_prefix = '%Q :'
> > 0 :LOG:  statement: select * from pg_stat_activity;
>
> It means that you haven't enabled it:
>
> 2021-06-17 22:46:16.231 CST [11246] queryid=0 LOG:  duration: 4.971 ms  statement: select * from pg_stat_activity ;
> 2021-06-17 22:46:25.383 CST [11246] queryid=0 LOG:  duration: 0.284 ms  statement: set compute_query_id = on;
> 2021-06-17 22:46:28.744 CST [11246] queryid=941978042436931562 LOG:  duration: 1.725 ms  statement: select * from
pg_stat_activity;
 
>

 psql test
psql (14beta1)
Type "help" for help.

test=# show log_line_prefix;
        log_line_prefix
--------------------------------
 [timestamp=%t] [query_id=%Q] :
(1 row)

test=# show compute_query_id;
 compute_query_id
------------------
 on
(1 row)

test=# show log_statement;
 log_statement
---------------
 all
(1 row)

test=# select query_id, query, pid from pg_stat_activity where pid =
pg_backend_pid();
-[ RECORD 1 ]-----------------------------------------------------------------------------
query_id | -4293879703199833131
query    | select query_id, query, pid from pg_stat_activity where pid
= pg_backend_pid();
pid      | 2640


from the logs, i get the id for some queries, but not all
[timestamp=2021-06-17 20:37:59 IST] [query_id=488416992746849793]
:ERROR:  relation "t" already exists
[timestamp=2021-06-17 20:37:59 IST] [query_id=488416992746849793]
:STATEMENT:  create table t(id int);
[timestamp=2021-06-17 20:38:22 IST] [query_id=0] :LOG:  statement:
prepare qq(int) as select count(1) from t where id = $1;
[timestamp=2021-06-17 20:38:29 IST] [query_id=0] :LOG:  statement:
execute qq(1);
[timestamp=2021-06-17 20:38:29 IST] [query_id=0] :DETAIL:  prepare:
prepare qq(int) as select count(1) from t where id = $1;
[timestamp=2021-06-17 20:38:32 IST] [query_id=0] :LOG:  statement:
execute qq(2);
[timestamp=2021-06-17 20:38:32 IST] [query_id=0] :DETAIL:  prepare:
prepare qq(int) as select count(1) from t where id = $1;
[timestamp=2021-06-17 20:39:25 IST] [query_id=0] :LOG:  statement:
select query_id, query, pid from pg_stat_activity where pid = 0;
[timestamp=2021-06-17 20:40:36 IST] [query_id=0] :LOG:  statement:
select count(1) from t;
[timestamp=2021-06-17 20:40:47 IST] [query_id=0] :LOG:  statement:
select count(1) from t where id < 100;

test=# explain (analyze,verbose) select * from t where id <
floor((random() * 100)::int);
                                                        QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..3.10 rows=3 width=4) (actual time=0.009..0.014
rows=3 loops=1)
....
 Query Identifier: 1051405225525186795
 Planning Time: 0.090 ms
 Execution Time: 0.030 ms
(13 rows)

test=# select query_id, query, pid from pg_stat_activity where pid =
pg_backend_pid();
       query_id       |                                      query
                                 | pid
----------------------+---------------------------------------------------------------------------------+------
 -4293879703199833131 | select query_id, query, pid from
pg_stat_activity where pid = pg_backend_pid(); | 2671
(1 row)

but in logs
[timestamp=2021-06-17 20:46:47 IST] [query_id=0] :LOG:  statement:
explain select query_id, query, pid from pg_stat_activity where pid =
pg_backend_pid();
[timestamp=2021-06-17 20:46:54 IST] [query_id=0] :LOG:  statement:
explain select query_id, query, pid from pg_stat_activity where pid >
100;
[timestamp=2021-06-17 20:46:58 IST] [query_id=0] :LOG:  statement:
explain analyze select query_id, query, pid from pg_stat_activity
where pid > 100;
[timestamp=2021-06-17 20:47:25 IST] [query_id=0] :LOG:  statement:
explain analyze select * from t where id <  floor((random() *
100)::int);
[timestamp=2021-06-17 20:48:16 IST] [query_id=0] :LOG:  statement:
explain (analyze,verbose) select * from t where id <  floor((random()
* 100)::int);
[timestamp=2021-06-17 20:48:38 IST] [query_id=0] :LOG:  statement:
select query_id, query, pid from pg_stat_activity where pid =
pg_backend_pid();

not sure, if i am missing something obvious?

> I'm not sure that I understand that question.  The pid will identify a backend,
> and that backend can execute 0, 1 or a lot of different queries.  The query_id
> will uniquely identify statements after some normalization and removing the
> constant parts (so for instance "select 1;" and "Select 2  ;" will have the
> same identifier).  Having only that information in the log can be useful on its
> own, but you usually get way more benefit using additional modules like
> pg_stat_statements.

Thanks, that helps, but I still do not see them in logs.


************************************* corrects myself
..... ok now it works, when i set log_min_duration_statement =0 to log
all statements.

test=# show log_min_duration_statement;
 log_min_duration_statement
----------------------------
 0
(1 row)

test=# explain (analyze,verbose) select * from t where id <
floor((random() * 100)::int);
                                                        QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
 Append  (cost=0.00..3.10 rows=3 width=4) (actual time=0.018..0.022
rows=3 loops=1)
   ->  Seq Scan on public.t0 t_1  (cost=0.00..1.03 rows=1 width=4)
(actual time=0.018..0.018 rows=1 loops=1)
         Output: t_1.id
         Filter: ((t_1.id)::double precision < floor((((random() *
'100'::double precision))::integer)::double precision))
   ->  Seq Scan on public.t1 t_2  (cost=0.00..1.03 rows=1 width=4)
(actual time=0.001..0.002 rows=1 loops=1)
         Output: t_2.id
         Filter: ((t_2.id)::double precision < floor((((random() *
'100'::double precision))::integer)::double precision))
   ->  Seq Scan on public.t2 t_3  (cost=0.00..1.03 rows=1 width=4)
(actual time=0.001..0.001 rows=1 loops=1)
         Output: t_3.id
         Filter: ((t_3.id)::double precision < floor((((random() *
'100'::double precision))::integer)::double precision))
 Query Identifier: 1051405225525186795
 Planning Time: 0.353 ms
 Execution Time: 0.034 ms
(13 rows)

test=# select query_id, query, pid from pg_stat_activity where pid =
pg_backend_pid();
       query_id       |                                      query
                                 | pid
----------------------+---------------------------------------------------------------------------------+------
 -4293879703199833131 | select query_id, query, pid from
pg_stat_activity where pid = pg_backend_pid(); | 2777
(1 row)


[timestamp=2021-06-17 20:54:12 IST] [query_id=-1477439429101745134]
:LOG:  duration: 0.040 ms  statement: show log_min_duration_statement;
[timestamp=2021-06-17 20:54:16 IST] [query_id=1308619569072270555]
:LOG:  duration: 0.882 ms  statement: explain (analyze,verbose) select
* from t where id <  floor((random() * 100)::int);
[timestamp=2021-06-17 20:54:19 IST] [query_id=-4293879703199833131]
:LOG:  duration: 1.201 ms  statement: select query_id, query, pid from
pg_stat_activity where pid = pg_backend_pid();


all good, Julien.

-- 
Thanks,
Vijay
Mumbai, India



Re: compute_query_id

From
Julien Rouhaud
Date:
On Thu, Jun 17, 2021 at 08:57:02PM +0530, Vijaykumar Jain wrote:
>
> test=# show log_line_prefix;
>         log_line_prefix
> --------------------------------
>  [timestamp=%t] [query_id=%Q] :
> (1 row)
> 
> test=# show compute_query_id;
>  compute_query_id
> ------------------
>  on
> (1 row)
> 
> test=# show log_statement;
>  log_statement
> ---------------
>  all
> (1 row)
> 
> ************************************* corrects myself
> ..... ok now it works, when i set log_min_duration_statement =0 to log
> all statements.
> 
> test=# show log_min_duration_statement;
>  log_min_duration_statement
> ----------------------------
>  0
> (1 row)

Yes, unfortunately log_statements is not compatible with compute_query_id.
This is documented at
https://www.postgresql.org/docs/devel/runtime-config-logging.html#GUC-LOG-LINE-PREFIX:

> The %Q escape always reports a zero identifier for lines output by
> log_statement because log_statement generates output before an identifier can
> be calculated, including invalid statements for which an identifier cannot be
> calculated.