Re: Add a greedy join search algorithm to handle large join problems - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Add a greedy join search algorithm to handle large join problems
Date
Msg-id CAFiTN-uOfES2p_ukrX+eTB+G+_K2zQ+_Y2tAXK2ffpbS-JDpzw@mail.gmail.com
Whole thread Raw
In response to Add a greedy join search algorithm to handle large join problems  (Chengpeng Yan <chengpeng_yan@outlook.com>)
List pgsql-hackers
On Tue, Dec 2, 2025 at 9:18 AM Chengpeng Yan <chengpeng_yan@outlook.com> wrote:
>
> Hi hackers,
>
> This patch implements GOO (Greedy Operator Ordering), a greedy
> join-order search method for large join problems, based on Fegaras (DEXA
> ’98) [1]. The algorithm repeatedly selects, among all legal joins, the
> join pair with the lowest estimated total cost, merges them, and
> continues until a single join remains. Patch attached.

Interesting.

> To get an initial sense of performance, I reused the star join /
> snowflake examples and the testing script from the thread in [2]. The
> star-join GUC in that SQL workload was replaced with
> `enable_goo_join_search`, so the same tests can run under DP (standard
> dynamic programming) / GEQO(Genetic Query Optimizer) / GOO. For these
> tests, geqo_threshold was set to 15 for DP, and to 5 for both GEQO and
> GOO. Other planner settings, including join_collapse_limit, remained at
> their defaults.
>
> On my local machine, a single-client pgbench run produces the following
> throughput (tps):
>
>                     |    DP    |   GEQO   |    GOO
> --------------------+----------+----------+-----------
> starjoin    (inner) |  1762.52 |  192.13  |  6168.89
> starjoin    (outer) |  1683.92 |  173.90  |  5626.56
> snowflake   (inner) |  1829.04 |  133.40  |  3929.57
> snowflake   (outer) |  1397.93 |   99.65  |  3040.52
>

Is pgbench the right workload to test this, I mean what are we trying
to compare here the planning time taken by DP vs GEQO vs GOO or the
quality of the plans generated by different join ordering algorithms
or both?  All pgbench queries are single table scans and there is no
involvement of the join search, so I am not sure how we can justify
these gains?


--
Regards,
Dilip Kumar
Google



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Proposal: Conflict log history table for Logical Replication
Next
From: Pavel Stehule
Date:
Subject: Re: Inline non-SQL SRFs using SupportRequestSimplify