Thread: BUG #17668: Query normalization generates multiple queryId:s for calls to the same procedure
BUG #17668: Query normalization generates multiple queryId:s for calls to the same procedure
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 17668 Logged by: Marcus Kempe Email address: marcus.kempe@gmail.com PostgreSQL version: 14.5 Operating system: Linux Description: Consider the following case: postgres=# create function test(x bigint) returns void language sql as $$ select x ;$$; CREATE FUNCTION postgres=# select test(1); test ------ (1 row) postgres=# select calls,userid,dbid,queryid,query from public.pg_stat_statements where query like 'select test%'; calls | userid | dbid | queryid | query -------+--------+-------+----------------------+----------------- 1 | 10 | 13757 | -8729651821477628722 | select test($1) (1 row) postgres=# select test(12345678910); -- larger that INTMAX. test ------ (1 row) postgres=# select calls,userid,dbid,queryid,query from public.pg_stat_statements where query like 'select test%'; calls | userid | dbid | queryid | query -------+--------+-------+----------------------+----------------- 1 | 10 | 13757 | -8729651821477628722 | select test($1) 1 | 10 | 13757 | 1452143938866253601 | select test($1) (2 rows) So given the above 2 calls for the same function, there are 2 different queryIds generated, seemingly based on the fact that in the first case the parameter can be cast to integer, and in the second case it can not. I would have expected both calls to be normalized and fingerprinted to one and the same queryId. This behavior was similar even before queryjumble.c was introduced in PG14.
Re: BUG #17668: Query normalization generates multiple queryId:s for calls to the same procedure
From
Julien Rouhaud
Date:
Hi, On Thu, Oct 27, 2022 at 01:49:18PM +0000, PG Bug reporting form wrote: > > postgres=# select calls,userid,dbid,queryid,query from > public.pg_stat_statements where query like 'select test%'; > calls | userid | dbid | queryid | query > -------+--------+-------+----------------------+----------------- > 1 | 10 | 13757 | -8729651821477628722 | select test($1) > 1 | 10 | 13757 | 1452143938866253601 | select test($1) > (2 rows) > > So given the above 2 calls for the same function, there are 2 different > queryIds generated, seemingly based on the fact that in the first case the > parameter can be cast to integer, and in the second case it can not. > > I would have expected both calls to be normalized and fingerprinted to one > and the same queryId. Agreed, and that's actually a known problem that is currently being worked on. You can look at https://www.postgresql.org/message-id/flat/36e5bffe-e989-194f-85c8-06e7bc88e6f7%40amazon.com for details about the discussion and in-progress fix.
Re: BUG #17668: Query normalization generates multiple queryId:s for calls to the same procedure
From
Tom Lane
Date:
Julien Rouhaud <rjuju123@gmail.com> writes: > On Thu, Oct 27, 2022 at 01:49:18PM +0000, PG Bug reporting form wrote: >> I would have expected both calls to be normalized and fingerprinted to one >> and the same queryId. > Agreed, and that's actually a known problem that is currently being worked on. > You can look at > https://www.postgresql.org/message-id/flat/36e5bffe-e989-194f-85c8-06e7bc88e6f7%40amazon.com > for details about the discussion and in-progress fix. Um ... that seems unrelated. AFAICS the OP is complaining about the fact that '42'::int4::int8 is not identical to '42'::int8. Well, they're not. I seriously doubt that we would or should consider trying to get queryjumble to mask that. regards, tom lane
Re: BUG #17668: Query normalization generates multiple queryId:s for calls to the same procedure
From
Julien Rouhaud
Date:
On Thu, Oct 27, 2022 at 02:28:17PM -0400, Tom Lane wrote: > Julien Rouhaud <rjuju123@gmail.com> writes: > > On Thu, Oct 27, 2022 at 01:49:18PM +0000, PG Bug reporting form wrote: > >> I would have expected both calls to be normalized and fingerprinted to one > >> and the same queryId. > > > Agreed, and that's actually a known problem that is currently being worked on. > > You can look at > > https://www.postgresql.org/message-id/flat/36e5bffe-e989-194f-85c8-06e7bc88e6f7%40amazon.com > > for details about the discussion and in-progress fix. > > Um ... that seems unrelated. AFAICS the OP is complaining about > the fact that '42'::int4::int8 is not identical to '42'::int8. > Well, they're not. I seriously doubt that we would or should > consider trying to get queryjumble to mask that. Oh wow I totally misread the bug report, sorry about that. A good night sleep and a coffee later I see what it's about, and I agree that we shouldn't try to mask that even if it might seem surprising.
Re: BUG #17668: Query normalization generates multiple queryId:s for calls to the same procedure
From
Marcus Kempe
Date:
Appreciate the clarifications. The only thing I would then add is that if we agree that it's unintuitive, this could be better reflected in the documentation.
Currently the only mention is the following (pgstatstatements.html):
"queries with identical texts might appear as separate entries, if they have different meanings as a result of factors such as different search_path settings."
Thanks and best regards,
Marcus Kempe
On Fri, Oct 28, 2022 at 3:28 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
On Thu, Oct 27, 2022 at 02:28:17PM -0400, Tom Lane wrote:
> Julien Rouhaud <rjuju123@gmail.com> writes:
> > On Thu, Oct 27, 2022 at 01:49:18PM +0000, PG Bug reporting form wrote:
> >> I would have expected both calls to be normalized and fingerprinted to one
> >> and the same queryId.
>
> > Agreed, and that's actually a known problem that is currently being worked on.
> > You can look at
> > https://www.postgresql.org/message-id/flat/36e5bffe-e989-194f-85c8-06e7bc88e6f7%40amazon.com
> > for details about the discussion and in-progress fix.
>
> Um ... that seems unrelated. AFAICS the OP is complaining about
> the fact that '42'::int4::int8 is not identical to '42'::int8.
> Well, they're not. I seriously doubt that we would or should
> consider trying to get queryjumble to mask that.
Oh wow I totally misread the bug report, sorry about that. A good night sleep
and a coffee later I see what it's about, and I agree that we shouldn't try to
mask that even if it might seem surprising.
Re: BUG #17668: Query normalization generates multiple queryId:s for calls to the same procedure
From
Alexey Ermakov
Date:
Hello, I've seen possibly related case where that might be a problem. PostgreSQL 12.8, pg_stat_statements.max=10000. There are queries like these: insert into tablename (col1, ..., col28) values ($1, ..., $28), ... ($2605, ... , $2660) on conflict do nothing. Problem is that not only we might insert different number of rows but there might be many duplicates of the same exact query in pg_stat_statements (userid and dbid are the same in all rows): select md5(query), count(*), pg_size_pretty(sum(length(query))) from pg_stat_statements where query like 'insert into tablename%' group by 1 order by 2 desc limit 10; md5 | count | pg_size_pretty ----------+-------+---------------- 170...84 | 1283 | 2679 kB 906...53 | 897 | 1724 kB 55e...78 | 676 | 12 MB ea4...ef | 642 | 11 MB 944...0f | 629 | 11 MB 04e...ec | 530 | 9138 kB 572...27 | 476 | 8575 kB 5ea...3c | 430 | 7331 kB be1...6c | 324 | 5704 kB f69...47 | 313 | 550 kB There are 8 bigint columns in the table, I suspect that we might insert into some of them values both greater and less than INT_MAX. In result pg_stat_statements buffer is full of such queries (they take 9k rows out of 10k right now, 10h since pg_stat_statements_reset call), currently total sum of query lengths is 117MB (97% of total sum). From monitoring perspective that might be a problem since calculating metrics from pg_stat_statements became more expensive and we might lose information about some queries when there is not enough space for them. But I suppose such cases are pretty rare and some of them could be fixed with COPY queries instead of inserts. -- Thanks, Alexey Ermakov On 2022-10-28 00:28, Tom Lane wrote: > Julien Rouhaud <rjuju123@gmail.com> writes: >> On Thu, Oct 27, 2022 at 01:49:18PM +0000, PG Bug reporting form wrote: >>> I would have expected both calls to be normalized and fingerprinted to one >>> and the same queryId. >> Agreed, and that's actually a known problem that is currently being worked on. >> You can look at >> https://www.postgresql.org/message-id/flat/36e5bffe-e989-194f-85c8-06e7bc88e6f7%40amazon.com >> for details about the discussion and in-progress fix. > Um ... that seems unrelated. AFAICS the OP is complaining about > the fact that '42'::int4::int8 is not identical to '42'::int8. > Well, they're not. I seriously doubt that we would or should > consider trying to get queryjumble to mask that. > > regards, tom lane > >