Re: Add a greedy join search algorithm to handle large join problems - Mailing list pgsql-hackers
| From | Chengpeng Yan |
|---|---|
| Subject | Re: Add a greedy join search algorithm to handle large join problems |
| Date | |
| Msg-id | AEE7DBDE-8778-4AAC-8890-ED2B936C0854@Outlook.com Whole thread Raw |
| In response to | Re: Add a greedy join search algorithm to handle large join problems (Tomas Vondra <tomas@vondra.me>) |
| List | pgsql-hackers |
> On Dec 28, 2025, at 01:00, Tomas Vondra <tomas@vondra.me> wrote: > > You realize this does not actually change shared buffers size, right? > Because that requires a restart, not just pg_reload_conf. Are you sure > you're running with 4GB shared buffers? Good catch, and sorry for not mentioning that explicitly. I did restart the instance and verified that the new shared_buffers setting was in effect. I’ll make sure to call this out clearly in future descriptions. > I'm not sure SF=1 is sufficient for making any clear conclusions. No > matter what the shared_buffers value is, it's going to fit into RAM, the > runs are going to be fully cached. Thanks for the comment. I agree that TPC-H at SF=1 is too small to support strong conclusions. I mainly used SF=1 as a convenient starting point to quickly validate ideas and iterate on the implementation. I’ve already run experiments on JOB and plan to share those results next, where the behavior should be more representative and the conclusions more meaningful. > I don't quite see how we could make good decisions if the estimates are > this wrong. Garbage in, garbage out. AFAICS this affects both the > regular DP planning, which heavily relies on the costing, but also the > approaches on heuristics, which still rely on estimates in some way. > > If the aggregate subquery is 1000x off, why should any of the following > decisions be right? The approaches may have different tolerance for > estimation errors - some may be "defensive" and handle misestimates > better, but the trade off is that it may pick slower plans when the > estimates are exact. > > I don't think there's an "optimal" approach picking the best plan in > both cases. If there was, join ordering wouldn't be such a challenge. > > I don't think Q20 is about greediness. Poor estimates are going to be an > issue for all approaches relying on them in some way. > > The issue with Q7 seems pretty inherent to greedy algorithms. Picking > solutions that are optimal locally but not globally is the definition of > "greedy". I don't think it matters which exact metric is used, this flaw > is built-in. And I don't think it's solvable. I agree that all approaches are affected by estimation errors, but as you noted, they differ in how much error they can tolerate. In its current form, GOO is still quite fragile in this respect and can exhibit severe regressions when estimates are badly off. Based on additional experiments, I also agree that a single greedy rule inevitably has cases where it performs extremely poorly; as a result, my current direction is no longer to further tune the greedy criterion itself, but to improve the robustness of the existing greedy choices through additional mechanisms. Regarding Q20, I agree that this is not fundamentally a “greedy” issue. If the rowcount estimates were reasonably accurate, the extreme behavior observed there would likely be avoided. At the same time, when estimates are wrong by orders of magnitude, the current GOO approach does not handle this well, and improving robustness under such severe misestimation is exactly one of the main aspects I am currently working on. I have already run additional experiments in this direction, using result_size and cost as the base signals, and will share the results separately. > Sufficient for what? Sufficient to replace DP or to replace GEQO? > > I don't think we'd want to replace DP with such greedy algorithm. With > accurate estimates and modest number of joins, we should be able to find > the best join (more or less). > > But this thread is not about replacing DP, I see GOO more as a GEQO > replacement, right? Or did the goal change? The goal has always been to use GOO as a replacement for GEQO, not DP. > I'm not sure ignoring selectivity entirely is a good plan, though. Yes, > if the selectivity/estimate is significantly off, the plan may be poor. > But what exactly do you plan to use instead? Isn't a lot of the metrics > (output size, ...) derived from the selectivity/estimate anyway? > > I agree the JOB benchmark may be a better fit for this. The TPC-H is > pretty simple, and even for TPC-DS the number of joins is not all that > high. Sorry if my initial feedback suggested these are the proper > benchmarks to evaluate this. > > I suggest the evaluation should focus on cases where we expect GOO to > actually be used in practice - as a replacement for GEQO. Only when it > proves useful/acceptable for that use case (for sufficiently many joins > etc.), we can start comparing with DP to figure out if we need to adjust > the thresholds and so on. > > Another suggestion is to also test with larger data sets. The problems > with SF=10 or SF=100 may be very different, even with TPC-H. Also, > consider testing with "cold" cases, as if there's nothing cached by > restarting the Postgres instance and drop page cache between runs. Thanks for the feedback. I’ve already run experiments on the JOB benchmark and will share the results shortly, including a comparison against GEQO, with DP used as a baseline. I agree that the primary goal at this stage is to show that the approach is good enough for the intended use case first, before looking into secondary aspects such as threshold tuning. Larger scale factors and cold-cache runs are also part of my upcoming evaluation plan, and I agree those are important dimensions to cover. Regarding selectivity, I agree the situation is not entirely clear-cut. As you noted, result_size is also derived from selectivity to some extent, and many bad cases share similar characteristics. However, in my initial TPC-H SF=1 experiments, selectivity-based greedy rules already showed many poor cases, and I expect this to become worse in more complex scenarios. For now, I therefore focused on cost and result_size: cost because it is a first-class concept in PostgreSQL, and result_size because it is commonly cited in the literature as a more robust signal for greedy join ordering. If you think selectivity is still an important signal to evaluate, I can certainly add further experiments for it, but based on the above I have not pursued additional selectivity-focused tests so far. > > I realized the paper mentioned at the beginning of this thread is from > 1998. That doesn't make it wrong, but I was wondering if there are some > newer papers about join order search, with interesting > alternative/better approaches. > > An interesting paper I found is this CIDR21 paper: > > Simplicity Done Right for Join Ordering > Axel Hertzschuch, Claudio Hartmann, Dirk Habich, Wolfgang Lehner > https://vldb.org/cidrdb/2021/simplicity-done-right-for-join-ordering.html > https://vldb.org/cidrdb/papers/2021/cidr2021_paper01.pdfl > > Which does something similar to our approach, although it seems to be > more a replacement for the DP and not just for GEQO. It's meant to be > cheaper, but also more resilient to poor join orders, as it cares about > "upper bound" (~worst case) for the join orders. > > It goes beyond the scope of GOO, as the paper also talks about sampling > to determine some of the estimates. But maybe it'd be useful (better > than GEQO) even without that? > > I find the focus on the "worst case" (and only trusting baserel > estimates) interesting. I wonder if it might help with the common > nestloop problem, where we opt to believe a very optimistic low > cardinality estimate, only to end with a sequence of nestloops that > never complete. Thanks for the pointer. I’ve also been looking at more recent work on join ordering, and there are indeed several ideas that seem relevant to the current direction, including the paper you mentioned. These are definitely directions I plan to consider more seriously. My current plan is to first establish a reasonably solid baseline using relatively simple techniques—reducing extreme regressions and getting overall plan quality closer to DP. Once that is in place, I’d like to step back, summarize which recent papers and ideas seem most applicable, and then experiment with more advanced approaches incrementally, building on that foundation. Thanks for all the detailed feedback and the many useful insights throughout this discussion. -- Best regards, Chengpeng Yan
pgsql-hackers by date: