Improve handling of pg_stat_statements handling of bind "IN" variables - Mailing list pgsql-hackers

From Pavel Trukhanov
Subject Improve handling of pg_stat_statements handling of bind "IN" variables
Date
Msg-id CAF42k=JCfHMJtkAVXCzBn2XBxDC83xb4VhV7jU7enPnZ0CfEQQ@mail.gmail.com
Whole thread Raw
Responses Re: Improve handling of pg_stat_statements handling of bind "IN" variables
List pgsql-hackers
Hi Hackers

I would like to embark on a journey to try to implement this issue I
found on TODO list –
https://www.postgresql.org/message-id/flat/CAM3SWZSpdPB3uErnXWMt3q74y0r%2B84ZsOt2U3HKKes_V7O%2B0Qg%40mail.gmail.com
In short: pgss distinguishes "SELECT * WHERE id IN (1, 2)" and "SELECT
* WHERE id IN (1, 2, 3)" as two separate queryId's, resulting in
separate entries in pgss. While in practice in most cases it should be
considered as the same thing.

Though it was added in TODO by Bruce Momjian some time ago, I
personally have been annoyed by this issue, because we use pgss as a
data source in our monitoring system okmeter.io – so we've been using
some work arounds for this in our system.

The way AFAIU it is suggested to be handled in the previous thread is
to not jumble ArrayExpr recursively and just treat it as "some list of
zero or more nodes".
I have already lurked around related code, but I have stumbled on some
problems with the way I see I can implement this.

So I want to ask for advice and maybe even guidance because I'm new to
PG internals and not a regular in C coding.

1. ArrayExpr
ArrayExpr is used to represent not only "IN" clauses, but also for
example "SELECT ARRAY[1, 2, 3]" and maybe some other cases I didn't
think of.
That brings the question whether "IN (...)" should be handled
separately from actual usage of ARRAY.
Or it is okay for any ARRAY to be jumbled w/o respect to number of
entries in it?
With that, "SELECT ARRAY[1, 2]" becomes undistinguishable from "SELECT
ARRAY[1, 2, 3]" etc in pgss.

I'm asking this because I'm not sure if it would be okay to handle
both cases in the same way.
For example "SELECT ARRAY[1, 2, a] FROM table" and "SELECT ARRAY[b]
FROM table" might end up in the same pgss entry.

While a separate handling for "IN (...)" seems to require lots of
changes – starting from parser (new parser node type) and further.
How should I proceed?

2 Weird arrays - with Consts and Params or const expressions or
different types etc
SELECT * FROM test WHERE a IN (1, $1)
SELECT * FROM test WHERE a IN (1, 3+1)
SELECT * FROM test WHERE a IN (1, 2.1)
SELECT * FROM test WHERE a IN (1.1, 2.1)  etc.
How those should be handled?
Should those be indistinguishable from "IN ($1, $2, $3)" as well?
Or such non realistic usage examples are negligible and no one cares
what happens with them?

3 Tests in pgss.sql/out and Vars
I would like someone to point me in a direction of how could I
implement a test that will query
"SELECT * FROM test WHERE a IN ($1, $2, $3, ...)" with params, not
consts, because I think this is the most common case actually.
And existing tests only check consts in "IN" list.
I don't see a way to implement such a test with the existing test
infrastructure.
Though if that might be considered out of the scope for this TODO, it
would be okay with me.


I would appreciate any feedback.
---
Pavel Trukhanov



pgsql-hackers by date:

Previous
From: "Jamison, Kirk"
Date:
Subject: RE: [PATCH] Speedup truncates of relation forks
Next
From: Richard Guo
Date:
Subject: Re: Parallel grouping sets