Home > mailing lists

Re: Add a greedy join search algorithm to handle large join problems - Mailing list pgsql-hackers

From	lakshmi
Subject	Re: Add a greedy join search algorithm to handle large join problems
Date	February 16 14:26:13
Msg-id	CAEvyyTjuxiaL4_brvTsA3DZVcq5qgZS=yxFyNqoMtQWVZNiT1A@mail.gmail.com Whole thread
In response to	Re: Add a greedy join search algorithm to handle large join problems (Tomas Vondra <tomas@vondra.me>)
Responses	Re: Add a greedy join search algorithm to handle large join problems
List	pgsql-hackers

Tree view

Hi Tomas,

Thank you for the clarification. You’re right that the synthetic test doesn’t strongly constrain join order and, with the default join_collapse_limit , the DP comparison was limited.

I’m now preparing follow-up tests using selected JOB queries and adjusted planner settings to provide a fair and more realistic comparison. I’ll share the results soon.

Regards,
Lakshmi

On Mon, Feb 16, 2026 at 4:27 PM Tomas Vondra <tomas@vondra.me> wrote:

On 2/16/26 11:44, lakshmi wrote:
> Hi Tomas,
>
> Thank you for the question.
> The 15-table and 20-table results I shared were obtained using a
> synthetic join workload designed to stress join-order planning and
> measure planning-time scaling, rather than a JOB or TPC-H query.
> Each query is essentially a left-deep chain of equality joins over
> simple tables. For reference, the structure is equivalent to:
>

Isn't the left-deep shape more a feature of the plan than the query?
With all the joins using the same "id" attribute, there will be one huge
equivalence class, so AFAICS this does not really force any particular
order (and we could easily pick a bushy plan).

>
> 15-table join
>
> SELECT count(*)
>
> FROM t1
>
> JOIN t2 ON t1.id <http://t1.id> = t2.id <http://t2.id>
>
> JOIN t3 ON t2.id <http://t2.id> = t3.id <http://t3.id>
>
> ...
>
> JOIN t15 ON t14.id <http://t14.id> = t15.id <http://t15.id>;
>
>
>
>
> 20-table join
>
>
>
> SELECT count(*)
>
> FROM t1
>
> JOIN t2 ON t1.id <http://t1.id> = t2.id <http://t2.id>
>
> JOIN t3 ON t2.id <http://t2.id> = t3.id <http://t3.id>
>
> ...
>
> JOIN t20 ON t19.id <http://t19.id> = t20.id <http://t20.id>;
>
>
> Regarding planner settings:
>
>
> -geqo_thresholdwas set to:
>
> a high value (e.g., 100) to force DP
>
> a low value (e.g., 2) to allow GEQO/GOO
>
>
>
> -enable_goo_join_searchwas toggled on/off depending on the comparison
> being measured.
>
>
>
> -Other planner parameters, including join_collapse_limit, were left at
> their default values.
>

Well, this means the DP problem was limited to 8. Is that a fair
comparison for the other algorithms?

>
> So these experiments mainly evaluate planning-time scaling and basic
> plan sanity on a controlled join graph, rather than realistic workload
> plan quality.
>
> I’m currently preparing additional tests using selected JOB queries to
> provide more meaningful plan-quality comparisons and will share those
> results once available.
>

OK

--
Tomas Vondra

pgsql-hackers by date:

From: Bertrand Drouvot
Date: 16 February, 14:17:32
Subject: Re: Fix uninitialized xl_running_xacts padding

From: Jakub Wartak
Date: 16 February, 14:34:17
Subject: Re: 64-bit wait_event and introduction of 32-bit wait_event_arg

Re: Add a greedy join search algorithm to handle large join problems - Mailing list pgsql-hackers

Previous

Next