Re: Any disadvantages of using =ANY(ARRAY()) instead of IN? - Mailing list pgsql-performance

From Clemens Eisserer
Subject Re: Any disadvantages of using =ANY(ARRAY()) instead of IN?
Date
Msg-id CAFvQSYTwLqvPOF9iLycrAsCq0fNJ8bfOjPvbbZhc70maA=TpzQ@mail.gmail.com
Whole thread Raw
In response to Re: Any disadvantages of using =ANY(ARRAY()) instead of IN?  (Noah Misch <noah@leadboat.com>)
Responses Re: Any disadvantages of using =ANY(ARRAY()) instead of IN?
List pgsql-performance
Hello Noah,

Thanks a lot for your feedback and explanations.

> Since you have 15+ tables at the top level, the genetic query optimizer should
> be kicking in and delivering a plan in reasonable time, albeit with plan
> quality hazards.  There's a danger zone when the deterministic planner is
> still in effect but {from,join}_collapse_limit have limited the scope of its
> investigation.  If you're in that zone and have not hand-tailored your
> explicit join order, poor plans are unsurprising.  What exact configuration
> changes are you using?

Basically only the changes, suggested here a year ago,  which made the
problem go away for less complex queries:

geqo_threshold = 20
from_collapse_limit = 13
join_collapse_limit = 13


> Hundreds of rows, no.  Consider this example:
> IN(...):
>  Total runtime: 2200.767 ms
>
> ANY(ARRAY(...)):
>  Total runtime: 11748.348 ms

In case there is an index on C, the resulting index scan is, even with
1000 elements, 3 times faster on my Notebook.
However, both queries execute in next-to-no time (15 vs 5ms).

> Filing a bug report with the content you've already posted would not add much,
> but a self-contained test case could prove useful.  Many of the deficiencies
> that can make ANY(ARRAY(...)) win do represent unimplemented planner
> intelligence more than bugs.
>
> Incidentally, you can isolate whether ANY(ARRAY(...))'s advantage comes solely
> from suppressing the subquery collapse.  Keep "IN" but tack "OFFSET 0" onto
> the subquery.  If this gives the same performance as ANY(ARRAY(...)), then the
> subquery-collapse suppression was indeed the source of advantage.

I see your point, some dumb logic to replace IN with ANY(ARRAY
wouldn't always yield better results.
I'll try to come up with a self-containing testcase.

Thanks again, Clemens

pgsql-performance by date:

Previous
From: Robert Klemme
Date:
Subject: Re: Maximum number of sequences that can be created
Next
From: Craig James
Date:
Subject: Re: Maximum number of sequences that can be created