Thread: GEQO and join_collapse_limit correlation

GEQO and join_collapse_limit correlation

From
Juan José Santamaría Flecha
Date:
Hi,

We recently had an issue in production. We have queries that are procedurally generated by an Object/Relational Mapping framework. Some of these queries are huge, involving over 120 tables.

With the following parameters the planner seemed to be getting very bad plans for some of these queries (times are from a single execution, but they are in those orders of magnitude):

----
from_collapse_limit = 14
join_collapse_limit = 14
geqo_threshold = 14
geqo_effort= 5

(cost=14691360.79..81261293.30 rows=6 width=15934)

 Planning time: 3859.928 ms
 Execution time: 6883365.973 ms
----

If we raise the join_collapse_limit to a really high value the plans are much better, but (of course) planning time gets worse:

----
from_collapse_limit = 150
join_collapse_limit = 150
geqo_threshold = 14
geqo_effort= 5

(cost=379719.44..562997.32 rows=7 width=15934)

 Planning time: 7112.416 ms
 Execution time: 7.741 ms
----

After some testing in order to lower the planning time we ended bringing down the GEQO values, and we have the best results with:

----
from_collapse_limit = 150
join_collapse_limit = 150
geqo_threshold = 2
geqo_effort= 2

(cost=406427.86..589667.55 rows=6 width=15934)

 Planning time: 2721.099 ms
 Execution time: 22.728 ms
----

Issues with the join_collapse_limit have been discussed before [1], but lowering the GEQO values seems counterintuitive based on the documentation for this parameter [2]: "Setting this value [join_collapse_limit] to geqo_threshold or more may trigger use of the GEQO planner, resulting in non-optimal plans."

What we want to know is if this mechanisms are working as intended and we can follow a similar approach in the future (lower GEQO values), or this is just a fluke for a corner case.

I have been able to reproduce a similar behaviour, to a much smaller scale, with the attached scripts in Postgres 10.



Regards,

Juan José Santamaría

Attachment

Re: GEQO and join_collapse_limit correlation

From
Tom Lane
Date:
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= <juanjo.santamaria@gmail.com> writes:
> We recently had an issue in production. We have queries that are
> procedurally generated by an Object/Relational Mapping framework. Some of
> these queries are huge, involving over 120 tables.

Yeah, you're going to have problems with that :-(

> After some testing in order to lower the planning time we ended bringing
> down the GEQO values, and we have the best results with:

> from_collapse_limit = 150
> join_collapse_limit = 150
> geqo_threshold = 2
> geqo_effort= 2

Hmm.  The trouble with this approach is that you're relying on GEQO
to find a good plan, and that's only probabilistic --- especially so
when you're reducing geqo_effort, meaning it doesn't try as many
possibilities as it otherwise might.  Basically, therefore, the
fear is that every so often you'll get a bad plan.

If the queries are fairly stylized, you might be able to get good
results by exploiting rather than bypassing join_collapse_limit:
determine what a good join order is, and then write the FROM clause
as an explicit JOIN nest in that order, and then *reduce* not raise
join_collapse_limit to force the planner to follow the syntactic
join order.  In this way you'd get rid of most of the run-time
join order search effort.  Don't know how cooperative your ORM
would be with such an approach though.

            regards, tom lane


Re: GEQO and join_collapse_limit correlation

From
Juan José Santamaría Flecha
Date:
Hi,

> After some testing in order to lower the planning time we ended bringing
> down the GEQO values, and we have the best results with:

> from_collapse_limit = 150
> join_collapse_limit = 150
> geqo_threshold = 2
> geqo_effort= 2

Hmm.  The trouble with this approach is that you're relying on GEQO
to find a good plan, and that's only probabilistic --- especially so
when you're reducing geqo_effort, meaning it doesn't try as many
possibilities as it otherwise might.  Basically, therefore, the
fear is that every so often you'll get a bad plan.

What we felt odd was having to find a balance between geqo_threshold and join_collapse_limit, lowering one was only effective after raising the other. The geqo_effort was only mofidied after we found this path, and some more testing.

In an environment with geqo_threshold=1 and join_collapse_limit=1, would the planner be GEQO exclusive (and syntactic)?

If the queries are fairly stylized, you might be able to get good
results by exploiting rather than bypassing join_collapse_limit:
determine what a good join order is, and then write the FROM clause
as an explicit JOIN nest in that order, and then *reduce* not raise
join_collapse_limit to force the planner to follow the syntactic
join order.  In this way you'd get rid of most of the run-time
join order search effort.  Don't know how cooperative your ORM
would be with such an approach though.

The ORM seems to build the join path just the other way round of what would be good for the planner. The thing we should take a good look at if it is really needed looking at +120 tables for a query that gets a pretty trivial result, but that is completely off topic.
 
                        regards, tom lane

Thanks for your repply.

Regards,

Juan José Santamaría

Re: GEQO and join_collapse_limit correlation

From
Tom Lane
Date:
=?UTF-8?Q?Juan_Jos=C3=A9_Santamar=C3=ADa_Flecha?= <juanjo.santamaria@gmail.com> writes:
> In an environment with geqo_threshold=1 and join_collapse_limit=1, would
> the planner be GEQO exclusive (and syntactic)?

GEQO's only function, basically, is to search for the join order to use.
If you're constraining the join order completely with
join_collapse_limit=1 then forcing the GEQO path to be taken would just
add pointless overhead.  (If it does anything at all ... I don't remember
the logic exactly but we might be bright enough not to bother with GEQO in
such a situation, regardless of geqo_threshold.)

            regards, tom lane


Re: GEQO and join_collapse_limit correlation

From
Juan José Santamaría Flecha
Date:
Hi,

GEQO's only function, basically, is to search for the join order to use.
If you're constraining the join order completely with
join_collapse_limit=1 then forcing the GEQO path to be taken would just
add pointless overhead.  (If it does anything at all ... I don't remember
the logic exactly but we might be bright enough not to bother with GEQO in
such a situation, regardless of geqo_threshold.)
 
Got it. Thanks a lot.

Regards,

Juan José Santamaría