Re: [PATCH] Keeps tracking the uniqueness with UniqueKey - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: [PATCH] Keeps tracking the uniqueness with UniqueKey |
Date | |
Msg-id | CAApHDvpx1qED1uLqubcKJ=oHatCMd7pTUKkdq0B72_08nbR3Hw@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Keeps tracking the uniqueness with UniqueKey (Andy Fan <zhihui.fan1213@gmail.com>) |
Responses |
Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Re: [PATCH] Keeps tracking the uniqueness with UniqueKey |
List | pgsql-hackers |
On Fri, 3 Apr 2020 at 15:18, Andy Fan <zhihui.fan1213@gmail.com> wrote: > All above 4 item Done. Just to explain my view on this going forward for PG14. I do plan to do a more thorough review of this soon. I wasn't so keen on pursuing this for PG13 as the skip scans patch [1] needs to use the same infrastructure this patch has added and it does not, yet. The infrastructure (knowing the unique properties of a RelOptInfo), as provided by the patch Andy has been working on, which is based on my rough prototype version, I believe should be used for the skip scans patch as well. I understand that as skip scans currently stands, Jasper has done quite a bit of work to add the UniqueKeys, however, this was unfortunately based on some early description of UniqueKeys where I had thought that we could just store EquivalenceClasses. I no longer think that's the case, and I believe the implementation that we require is something more along the lines of Andy's latest version of the patch. However, I've not quite stared at it long enough to be highly confident in that. I'd like to strike up a bit of a plan to move both Andy's work and the Skip scans work forward for PG14. Here are my thoughts so far: 1. Revise v4 of remove DISTINCT patch to split the patch into two pieces. 0001 should add the UniqueKey code but not any additional planner smarts to use them (i.e remove GROUP BY / DISTINCT) elimination parts. Join removals and Unique joins should use UniqueKeys in this patch. 0002 should add back the GROUP BY / DISTINCT smarts and add whatever tests should be added for that and include updating existing expected results and modifying any tests which no longer properly test what they're meant to be testing. I've done this with the attached patch. 2. David / Jesper to look at 0001 and build or align the existing skip scans 0001 patch to make use of Andy's 0001 patch. This will require tagging UniqueKeys onto Paths, not just RelOptInfos, plus a bunch of other work. Clearly UniqueKeys must suit both needs and since we have two different implementations each providing some subset of the features, then clearly we're not yet ready to move both skip scans and this patch forward together. We need to align that and move both patches forward together. Hopefully, the attached 0001 patch helps move that along. While I'm here, a quick review of Andy's v4 patch. I didn't address any of this in the attached v5. These are only based on what I saw when shuffling some code around. It's not an in-depth review. 1. Out of date comment in join.sql -- join removal is not possible when the GROUP BY contains a column that is -- not in the join condition. (Note: as of 9.6, we notice that b.id is a -- primary key and so drop b.c_id from the GROUP BY of the resulting plan; -- but this happens too late for join removal in the outer plan level.) explain (costs off) select d.* from d left join (select d, c_id from b group by b.d, b.c_id) s on d.a = s.d; You've changed the GROUP BY clause so it does not include b.id, so the Note in the comment is now misleading. 2. I think 0002 is overly restrictive in its demands that parse->hasAggs must be false. We should be able to just use a Group Aggregate with unsorted input when the input_rel is unique on the GROUP BY clause. This will save on hashing and sorting. Basically similar to what we do for when a query contains aggregates without any GROUP BY. 3. I don't quite understand why you changed this to a right join: -- Test case where t1 can be optimized but not t2 explain (costs off) select t1.*,t2.x,t2.z -from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y +from t1 right join t2 on t1.a = t2.x and t1.b = t2.y Perhaps this change is left over from some previous version of the patch? [1] https://commitfest.postgresql.org/27/1741/
Attachment
pgsql-hackers by date: