GEQO and join_collapse_limit correlation - Mailing list pgsql-performance

From Juan José Santamaría Flecha
Subject GEQO and join_collapse_limit correlation
Date
Msg-id CAC+AXB3Ug25QUT+hnoZ2bk6w5oFfBbk1qaHu63G1qKm2NO=z1A@mail.gmail.com
Whole thread Raw
Responses Re: GEQO and join_collapse_limit correlation
List pgsql-performance
Hi,

We recently had an issue in production. We have queries that are procedurally generated by an Object/Relational Mapping framework. Some of these queries are huge, involving over 120 tables.

With the following parameters the planner seemed to be getting very bad plans for some of these queries (times are from a single execution, but they are in those orders of magnitude):

----
from_collapse_limit = 14
join_collapse_limit = 14
geqo_threshold = 14
geqo_effort= 5

(cost=14691360.79..81261293.30 rows=6 width=15934)

 Planning time: 3859.928 ms
 Execution time: 6883365.973 ms
----

If we raise the join_collapse_limit to a really high value the plans are much better, but (of course) planning time gets worse:

----
from_collapse_limit = 150
join_collapse_limit = 150
geqo_threshold = 14
geqo_effort= 5

(cost=379719.44..562997.32 rows=7 width=15934)

 Planning time: 7112.416 ms
 Execution time: 7.741 ms
----

After some testing in order to lower the planning time we ended bringing down the GEQO values, and we have the best results with:

----
from_collapse_limit = 150
join_collapse_limit = 150
geqo_threshold = 2
geqo_effort= 2

(cost=406427.86..589667.55 rows=6 width=15934)

 Planning time: 2721.099 ms
 Execution time: 22.728 ms
----

Issues with the join_collapse_limit have been discussed before [1], but lowering the GEQO values seems counterintuitive based on the documentation for this parameter [2]: "Setting this value [join_collapse_limit] to geqo_threshold or more may trigger use of the GEQO planner, resulting in non-optimal plans."

What we want to know is if this mechanisms are working as intended and we can follow a similar approach in the future (lower GEQO values), or this is just a fluke for a corner case.

I have been able to reproduce a similar behaviour, to a much smaller scale, with the attached scripts in Postgres 10.



Regards,

Juan José Santamaría

Attachment

pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: primary key hash index
Next
From: Tom Lane
Date:
Subject: Re: GEQO and join_collapse_limit correlation