Home > mailing lists

Re: Scanning all partition when more than 100 items in "where id in()" clause - Mailing list pgsql-bugs

From	Soni M
Subject	Re: Scanning all partition when more than 100 items in "where id in()" clause
Date	July 27, 2018 09:11:39
Msg-id	CAAMgDXk1aPqhRbfowJPWY7GGnUZhhBRHnqTX4+1e5cLwbcC_Cg@mail.gmail.com Whole thread
In response to	Re: Scanning all partition when more than 100 items in "where id in()" clause (Stephen Frost <sfrost@snowman.net>)
List	pgsql-bugs

Tree view

After loading the relation information to cache, the best time for 100 items is around Planning time: 2.789 ms, and the best time for the 101 item is around Planning time: 3.159 ms.

On Fri, Jul 27, 2018 at 9:23 AM, Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Soni M (diptatapa@gmail.com) wrote:
> Hello All, I think I found bug here. Postgres 9.6.9 on Red Hat Enterprise
> Linux Server release 7.5 (Maipo).

[... the 99 item query ...]

> Planning time: 12.969 ms
> Execution time: 7.062 ms

[... the 100 item query ...]

> Planning time: 60.537 ms
> Execution time: 51.401 ms

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> I think it's a consequence of the limitation in predtest.c that it won't
> try to do predicate proofs using ScalarArrayOps with more than 100 array
> entries:

This is far from the first time we (or, at least, I) have seen
complaints about that particular constant.

> /*
> * Proof attempts involving large arrays in ScalarArrayOpExpr nodes are
> * likely to require O(N^2) time, and more often than not fail anyway.
> * So we set an arbitrary limit on the number of array elements that
> * we will allow to be treated as an AND or OR clause.
> * XXX is it worth exposing this as a GUC knob?
> */
> #define MAX_SAOP_ARRAY_SIZE 100

Which certainly makes me think that comment in there might be worth
something- perhaps we should make this a GUC and let users see just what
would end up happening with a different choice. There could certainly
be cases where it'd be better to work it out.

> Not a bug, but a tradeoff. You'd be unhappy if the planner spent longer
> devising the plan than it saved to eliminate the extra partitions.

While I agree in concept, I'm awful curious how the "simpler" approach
used when we hit the limit resulted in a 5x increase in planning time.
Looks a bit like the extra time required to perform that elimination for
at least a few more items would be saving us cycles somewhere else that
are apparently pretty expensive.

Soni, any chance that this query was the first time all of those
partitions were hit in this backend, meaning there was a lot of time
required to load the relation information for them? What happens if you
prime the backend by running the 100-item case once, and then do 5 runs
of the 99-item and then 5 of the 100-item case?

Thanks!

Stephen

Regards,

Soni Maula Harriz

pgsql-bugs by date:

From: Amit Kapila
Date: 27 July 2018, 08:47:49
Subject: Re: BUG #15290: Stuck Parallel Index Scan query

From: Thomas Munro
Date: 27 July 2018, 09:40:15
Subject: Re: BUG #15300: "do you want the application "postgres" to acceptincoming network connections" dialog box madness

Re: Scanning all partition when more than 100 items in "where id in()" clause - Mailing list pgsql-bugs

Previous

Next