Home > mailing lists

Re: pg_stat_statements and "IN" conditions - Mailing list pgsql-hackers

From	Álvaro Herrera
Subject	Re: pg_stat_statements and "IN" conditions
Date	February 12 21:39:39
Msg-id	202502121839.vjr3jm6yb35h@alvherre.pgsql Whole thread Raw
In response to	Re: pg_stat_statements and "IN" conditions (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses	Re: pg_stat_statements and "IN" conditions
List	pgsql-hackers

Tree view

On 2025-Feb-12, Dmitry Dolgov wrote:

> I've been experimenting with this today, and while it's easy to
> implement,

Great.

> there is one annoying thing for which I don't have a solution
> yet. When generating a normalized version for such merged queries in
> pgss we rely on expression location, something like:
> 
>     select i from t where i in (a1, a2, a3, ..., aN);
>                                 |                 |
>                              expr loc1         expr loc2
> 
> We remember loc1 and loc2, then do not copy anything between then into
> the normalized query. Now, the expression location is only known up to
> the parsing token, without taking into account e.g. parenthesis in more
> complex expressions. Which means we don't know exactly where an
> expression starts or ends, and it's hard to correctly represent queries
> like:
> 
>     select i from t where i in (((a1)), ((a2)), ((a3)), ..., ((aN)));
>                                   |                            |
>                               expr loc1                    expr loc2
> 
> The normalized version looks like this:
> 
>     select i from t where i in (((...)));
> 
> While it does not affect the actual functionality and is purely
> cosmetic, it's quite visible and causes questions.

The nastiness level of this seems quite low, compared to what happens to
this other example if we didn't handle these easy cases:

create table t (a float);
select i from t where i in (1, 2);
select i from t where i in (1, '2');
select i from t where i in ('1', 2);
select i from t where i in ('1', '2');
select i from t where i in (1.0, 1.0);

(The point here is that the datatype differs for the constants from the
lexer down in each of these cases.)

I think it's more important to handle this better than what the posted
patch does, than improving the lexing in presence of other lexical
elements in the query.  With the current patch I get _five_
pg_stat_statements entries from these queries above, where only one of
them was able to apply merging of the elements:

       queryid        │                query                
──────────────────────┼─────────────────────────────────────
 -5783267088740508246 │ select i from t where i in ($1, $2)
  6446023427308995149 │ select i from t where i in ($1, $2)
  3778867339896201523 │ select i from t where i in (...)
 -8733218180609156532 │ select i from t where i in ($1, $2)
 -5106509834475638715 │ select i from t where i in ($1, $2)

If I understand what you're saying, it's that the extra parenthesis
cause the recorded query text be a little uglier (but the queryid still
ends up being one and the same for all queries), which seems much less
of a problem.  I'm okay saying that cases like that can be improved
later.  (It seems to me that you want to improve the way we pass the
lexed string down to pg_stat_statements, and frankly that even seems a
different problem altogether.)

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/

pgsql-hackers by date:

From: Tom Lane
Date: 12 February, 21:35:20
Subject: Re: Small memory fixes for pg_createsubcriber

From: Tom Lane
Date: 12 February, 21:40:15
Subject: Re: explain analyze rows=%.0f

Re: pg_stat_statements and "IN" conditions - Mailing list pgsql-hackers

Previous

Next