Re: pg_stat_statements and "IN" conditions - Mailing list pgsql-hackers

From Zhihong Yu
Subject Re: pg_stat_statements and "IN" conditions
Date
Msg-id CALNJ-vQx7u-Nq_wVun72CVrHe4DcGfEurj_ikckJrRy2Pem=JA@mail.gmail.com
Whole thread Raw
In response to Re: pg_stat_statements and "IN" conditions  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: pg_stat_statements and "IN" conditions
List pgsql-hackers
Hi,
A few comments.

+                           "After this number of duplicating constants start to merge them.",

duplicating -> duplicate

+               foreach(lc, (List *) expr)
+               {
+                   Node * subExpr = (Node *) lfirst(lc);
+
+                   if (!IsA(subExpr, Const))
+                   {
+                       allConst = false;
+                       break;
+                   }
+               }

It seems the above foreach loop (within foreach(temp, (List *) node)) can be preceded with a check that allConst is true. Otherwise the loop can be skipped.

+                   if (currentExprIdx == pgss_merge_threshold - 1)
+                   {
+                       JumbleExpr(jstate, expr);
+
+                       /*
+                        * A const expr is already found, so JumbleExpr must
+                        * record it. Mark it as merged, it will be the first
+                        * merged but still present in the statement query.
+                        */
+                       Assert(jstate->clocations_count > 0);
+                       jstate->clocations[jstate->clocations_count - 1].merged = true;
+                       currentExprIdx++;
+                   }

The above snippet occurs a few times. Maybe extract into a helper method.

Cheers

On Sat, Dec 26, 2020 at 2:45 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> On Wed, Nov 18, 2020 at 05:04:32PM +0100, Dmitry Dolgov wrote:
> > On Wed, Aug 12, 2020 at 06:19:02PM +0200, Dmitry Dolgov wrote:
> >
> > I would like to start another thread to follow up on [1], mostly to bump up the
> > topic. Just to remind, it's about how pg_stat_statements jumbling ArrayExpr in
> > queries like:
> >
> >     SELECT something FROM table WHERE col IN (1, 2, 3, ...)
> >
> > The current implementation produces different jumble hash for every different
> > number of arguments for essentially the same query. Unfortunately a lot of ORMs
> > like to generate these types of queries, which in turn leads to
> > pg_stat_statements pollution. Ideally we want to prevent this and have only one
> > record for such a query.
> >
> > As the result of [1] I've identified two highlighted approaches to improve this
> > situation:
> >
> > * Reduce the generated ArrayExpr to an array Const immediately, in cases where
> >   all the inputs are Consts.
> >
> > * Make repeating Const to contribute nothing to the resulting hash.
> >
> > I've tried to prototype both approaches to find out pros/cons and be more
> > specific. Attached patches could not be considered a completed piece of work,
> > but they seem to work, mostly pass the tests and demonstrate the point. I would
> > like to get some high level input about them and ideally make it clear what is
> > the preferred solution to continue with.
>
> I've implemented the second approach mentioned above, this version was
> tested on our test clusters for some time without visible issues. Will
> create a CF item and would appreciate any feedback.

After more testing I found couple of things that could be improved,
namely in the presence of non-reducible constants one part of the query
was not copied into the normalized version, and this approach also could
be extended for Params. These are incorporated in the attached patch.

pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Parallel Inserts in CREATE TABLE AS
Next
From: Pavel Stehule
Date:
Subject: Re: Rethinking plpgsql's assignment implementation