Thread: RangeTblEntry jumble omissions

RangeTblEntry jumble omissions

From
Peter Eisentraut
Date:
I think there are some fields from the RangeTblEntry struct missing in 
the jumble (function _jumbleRangeTblEntry()).  Probably, some of these 
were really just forgotten, in other cases this might be an intentional 
decision, but then it might be good to document it.  This has come up in 
thread [0] and there is a patch [1], but I figured I'd start a new 
thread here to get the attention of those who know more about 
pg_stat_statements.

I think the following fields are missing.  (See also attached patch.)

- alias

Currently, two queries like

SELECT * FROM t1 AS foo
SELECT * FROM t1 AS bar

are counted together by pg_stat_statements -- that might be ok, but they 
both get listed under whichever one is run first, so here if you are 
looking for the "AS bar" query, you won't find it.

- join_using_alias

Similar situation, currently

SELECT * FROM t1 JOIN t2 USING (a, b)
SELECT * FROM t1 JOIN t2 USING (a, b) AS x

are counted together.

- funcordinality

This was probably just forgotten.  It should be included because the 
WITH ORDINALITY clause changes the query result.

- lateral

Also probably forgotten.  A query specifying LATERAL is clearly 
different from one without it.

Thoughts?  Anything else missing perhaps?


[0]: 
https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org
[1]: 
https://www.postgresql.org/message-id/attachment/154249/v2-0002-Remove-custom-_jumbleRangeTblEntry.patch
Attachment

Re: RangeTblEntry jumble omissions

From
Alvaro Herrera
Date:
On 2024-Feb-23, Peter Eisentraut wrote:

> - alias
> 
> Currently, two queries like
> 
> SELECT * FROM t1 AS foo
> SELECT * FROM t1 AS bar
> 
> are counted together by pg_stat_statements -- that might be ok, but they
> both get listed under whichever one is run first, so here if you are looking
> for the "AS bar" query, you won't find it.

Another, similar but not quite: if you do

SET search_path TO foo;
SELECT * FROM t1;
SET search_path TO bar;
SELECT * FROM t1;

and you have both foo.t1 and bar.t1, you'll get two identical-looking
queries in pg_stat_statements with different jumble IDs, with no way to
know which is which.  Not sure if the jumbling of the RTE (which
includes the OID of the table in question) itself is to blame, or
whether we want to store the relevant schemas with the entry somehow, or
what.  Obviously, failing to differentiate them would not be an
improvement.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: RangeTblEntry jumble omissions

From
Julien Rouhaud
Date:
Hi,

On Fri, Feb 23, 2024 at 04:26:53PM +0100, Peter Eisentraut wrote:
>
> - alias
>
> Currently, two queries like
>
> SELECT * FROM t1 AS foo
> SELECT * FROM t1 AS bar
>
> are counted together by pg_stat_statements -- that might be ok, but they
> both get listed under whichever one is run first, so here if you are looking
> for the "AS bar" query, you won't find it.

I think this one is intentional.  This alias won't change the query behavior or
the field names so it's good to avoid extraneous entries.  It's true that you
then won't find something matching "AS bar", but it's not something you can
rely on anyway.

If you first execute "select * from t1 as foo" and then "SELECT * FROM t1 AS
foo" then you won't find anything matching "AS foo" either.  There isn't even
any guarantee that the stored query text will be jumbled.

> - join_using_alias
>
> Similar situation, currently
>
> SELECT * FROM t1 JOIN t2 USING (a, b)
> SELECT * FROM t1 JOIN t2 USING (a, b) AS x
>
> are counted together.

IMHO same as above.

> - funcordinality
>
> This was probably just forgotten.  It should be included because the WITH
> ORDINALITY clause changes the query result.

Agreed.

> - lateral
>
> Also probably forgotten.  A query specifying LATERAL is clearly different
> from one without it.

Agreed.



Re: RangeTblEntry jumble omissions

From
Tom Lane
Date:
Julien Rouhaud <rjuju123@gmail.com> writes:
> On Fri, Feb 23, 2024 at 04:26:53PM +0100, Peter Eisentraut wrote:
>> - funcordinality
>> This was probably just forgotten.  It should be included because the WITH
>> ORDINALITY clause changes the query result.

> Agreed.

Seems OK.

>> - lateral
>> Also probably forgotten.  A query specifying LATERAL is clearly different
>> from one without it.

> Agreed.

Nah ... I think that LATERAL should be ignored on essentially the
same grounds on which you argue for ignoring aliases.  If it
affects the query's semantics, it's because there is a lateral
reference in the subject subquery or function, and that reference
already contributes to the query hash.  If there is no such
reference, then LATERAL is a noise word.  It doesn't help any that
LATERAL is actually optional for functions, making it certainly a
noise word there.

IIRC, the parser+planner cooperatively fix things so that the final
state of an RTE's lateral field reflects reality.  But if we are
hashing before that's happened, it's not worth all that much.

            regards, tom lane



Re: RangeTblEntry jumble omissions

From
Julien Rouhaud
Date:
On Fri, Feb 23, 2024 at 11:00:41PM +0100, Alvaro Herrera wrote:
>
> Another, similar but not quite: if you do
>
> SET search_path TO foo;
> SELECT * FROM t1;
> SET search_path TO bar;
> SELECT * FROM t1;
>
> and you have both foo.t1 and bar.t1, you'll get two identical-looking
> queries in pg_stat_statements with different jumble IDs, with no way to
> know which is which.  Not sure if the jumbling of the RTE (which
> includes the OID of the table in question) itself is to blame, or
> whether we want to store the relevant schemas with the entry somehow, or
> what.  Obviously, failing to differentiate them would not be an
> improvement.

Yeah that's also a very old known problem.  This has been raised multiple times
(on top of my head [1], [2], [3]).  At this point I'm not exactly holding my
breath.

[1]: https://www.postgresql.org/message-id/flat/8f54c609-17c6-945b-fe13-8b07c0866420%40dalibo.com
[2]: https://www.postgresql.org/message-id/flat/9baf5c06-d6ab-c688-010c-843348e3d98c%40gmail.com
[3]: https://www.postgresql.org/message-id/flat/3aa097d7-7c47-187b-5913-db8366cd4491%40gmail.com



Re: RangeTblEntry jumble omissions

From
Michael Paquier
Date:
On Fri, Feb 23, 2024 at 06:52:54PM -0500, Tom Lane wrote:
> Julien Rouhaud <rjuju123@gmail.com> writes:
>> On Fri, Feb 23, 2024 at 04:26:53PM +0100, Peter Eisentraut wrote:
>>> - funcordinality
>>> This was probably just forgotten.  It should be included because the WITH
>>> ORDINALITY clause changes the query result.
>
>> Agreed.
>
> Seems OK.

+1.

>>> - lateral
>>> Also probably forgotten.  A query specifying LATERAL is clearly different
>>> from one without it.
>
>> Agreed.
>
> Nah ... I think that LATERAL should be ignored on essentially the
> same grounds on which you argue for ignoring aliases.  If it
> affects the query's semantics, it's because there is a lateral
> reference in the subject subquery or function, and that reference
> already contributes to the query hash.  If there is no such
> reference, then LATERAL is a noise word.  It doesn't help any that
> LATERAL is actually optional for functions, making it certainly a
> noise word there.

Sounds like a fair argument to me.

Btw, I think that you should add a few queries to the tests of
pg_stat_statements to track the change of behavior when you have
aliases, as an effect of the fields added in the jumbling.
--
Michael

Attachment

Re: RangeTblEntry jumble omissions

From
Peter Eisentraut
Date:
On 26.02.24 02:08, Michael Paquier wrote:
> On Fri, Feb 23, 2024 at 06:52:54PM -0500, Tom Lane wrote:
>> Julien Rouhaud <rjuju123@gmail.com> writes:
>>> On Fri, Feb 23, 2024 at 04:26:53PM +0100, Peter Eisentraut wrote:
>>>> - funcordinality
>>>> This was probably just forgotten.  It should be included because the WITH
>>>> ORDINALITY clause changes the query result.
>>
>>> Agreed.
>>
>> Seems OK.
> 
> +1.

Ok, I have added funcordinality for the RTE_FUNCTION case, and left the 
others alone.