Re: Removing unneeded self joins - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Removing unneeded self joins |
Date | |
Msg-id | CAKJS1f_cPDTRBTW4KpuD5fktVjPs2ugg704CJn=OxWXoUv-rDw@mail.gmail.com Whole thread Raw |
In response to | Re: Removing unneeded self joins (Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>) |
Responses |
Re: Removing unneeded self joins
|
List | pgsql-hackers |
On Sat, 23 Mar 2019 at 03:39, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote: > The bug you mention later is an implementation bug that can be fixed (I > will expand on that below). Besides this, do you think current self-join > detection algorithm has fundamental correctness problems? I am not aware > of such problems and this algorithm reflects my understanding of what > constitutes a removable self-join, so it would be helpful if you could > explain what exactly is wrong with it. I did explain what is wrong with it, and also showed an example of why it is broken. I didn't see anything that looks fundamentally broken, just the implementation needs more work. > Your alternative idea sounds plausible, but I won't know it's correct > for sure until I implement it, which is a nontrivial amount of work. I > am also concerned that we will have to redo calculations similar to > innerrel_is_unique(), because having near-zero overhead is a hard > prerequisite for this patch, as was discussed upthread. Are you worried about bypassing the unique rel cache is going to be too costly? > > Here's an example of what can go wrong with your current code: > > This is a bug indeed. Unique index search is not exhaustive, so if many > indexes match the join quals, we might not find the same index for both > sides. I think this can be overcome if we switch to exhaustive search in > relation_has_unique_index_for, and then try to find matching indexes in > is_unique_self_join. Please see the new patch for the fix. I really don't think modifying relation_has_unique_index_for to collect details for up to 5 indexes is a good fix for this. It looks like it's still possible to trigger this, just the example would need to be more complex. Also, you've likely just more than doubled the runtime of a successful match in relation_has_unique_index_for(). Previously, on average we'd have found that match halfway through the list of indexes. Now it's most likely you'll end up looking at every index, even after a match has been found. That does not seem well aligned to keeping the CPU overhead for the patch low. What is wrong with just weeding out join quals that don't have the equivalent expression on either side before passing them to relation_has_unique_index_for()? That'll save you from getting false matches like I showed. To make that work it just seems mostly like you'd mostly just need to swap the order of operations in the patch, but you'd also likely end up needing to rip out all the UniqueRelInfo code, since I don't think that'll be needed any longer. Likely that means your entire patch would be limited to analyzejoins.c, although I'm unsure what of the eclass editing code should be moved into equivclass.c. > > I also think this way would give you the subquery GROUP BY / DISTINCT > > self join removal for just about free. > > > Could you explain how exactly we can generalize join removal to the > DISTINCT/GROUP BY case? I understand the removable self-joins as having > the same row on both sides of join, as identified by ctid, but I'm not > sure how to apply this to subqueries. Your earlier DISTINCT example > looked like it needed a different kind of join removal, with different > conditions for when it applies (please see my previous letter for details). Well, by removing the requirement that the unique proofs have to come from a unique index. I don't think you need to ensure this works for your patch, it would be nice if it did for free. I just don't think your implementation should block it from ever working. > Another thing I'd like to mention is that this patch splits in two > independent parts, the detection of self-joins and their removal. While > we are looking for agreement on the detection part, could you also > review the removal part? I'm sure it has its own share of problems. I'd rather focus on the detection method before reviewing the removal code. If there's some blocker in the detection code then the removal code is not useful. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: