Re: Properly pathify the union planner - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Properly pathify the union planner |
Date | |
Msg-id | CAApHDvoV4QgNsXsegaCJQFHK4v8-gE7_DzXmSMgm_Hg+C+4oYg@mail.gmail.com Whole thread Raw |
In response to | Properly pathify the union planner (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Properly pathify the union planner
Re: Properly pathify the union planner |
List | pgsql-hackers |
On Thu, 2 Nov 2023 at 12:42, David Rowley <dgrowleyml@gmail.com> wrote: > One part that still needs work is the EquivalanceClass building. > Because we only build the final targetlist for the Append after > planning all the append child queries, I ended up having to populate > the EquivalanceClasses backwards, i.e children first. add_eq_member() > determines if you're passing a child member by checking if parent != > NULL. Since I don't have a parent EquivalenceMember to pass, > em_is_child gets set wrongly, and that causes problems because > ec_has_const can get set to true when it shouldn't. This is a problem > as it can make a PathKey redundant when it isn't. I wonder if I'll > need to change the signature of add_eq_member() and add an "is_child" > bool to force the EM to be a child em... Needs more thought... I've spent more time working on this and I ended up solving the above problem by delaying the subquery path creation on the union children until after we've built the top-level targetlist. This allows the parent eclasses to be correctly added before adding members for the union children. (See build_setop_child_paths() in the patch) There's been quite a bit of progress in other areas too. Incremental sorts now work: # create table t1(a int primary key, b int not null); # create table t2(a int primary key, b int not null); # insert into t1 select x,x from generate_Series(1,1000000)x; # insert into t2 select x,x from generate_Series(1,1000000)x; # vacuum analyze t1,t2; # explain (costs off) select * from t1 union select * from t2; QUERY PLAN -------------------------------------------------- Unique -> Merge Append Sort Key: t1.a, t1.b -> Incremental Sort Sort Key: t1.a, t1.b Presorted Key: t1.a -> Index Scan using t1_pkey on t1 -> Incremental Sort Sort Key: t2.a, t2.b Presorted Key: t2.a -> Index Scan using t2_pkey on t2 (11 rows) However, I've not yet made the MergeAppend UNIONs work when the datatypes don't match on either side of the UNION. For now, the reason this does not work is due to convert_subquery_pathkeys() being unable to find the pathkey targets in the targetlist. The actual targets can't be found due to the typecast. I wondered if this could be fixed by adding an additional projection path to the subquery when the output columns don't match the setop->colTypes, but I'm a bit put off by the comment in transformSetOperationTree: > * For all non-UNKNOWN-type cases, we verify coercibility but we > * don't modify the child's expression, for fear of changing the > * child query's semantics. I assume that's worried about the semantics of things like WHERE clauses, so maybe the projection path in the subquery would be ok. I need to spend more time on that. Another problem I hit was add_path() pfreeing a Path that I needed. This was happening due to how I'm building the final paths in the subquery when setop_pathkeys are set. Because I want to always include the cheapest_input_path to allow that path to be used in hash-based UNIONs, I also want to provide sorted paths so that MergeAppend has something to work with. I found cases where I'd add_path() the cheapest_input_path to the final rel then also attempt to sort that path. Unfortunately, add_path() found the unsorted path and the sorted path fuzzily the same cost and opted to keep the sorted one due to it having better pathkeys. add_path() then pfree'd the cheapest_input_path which meant the Sort's subpath was gone which obviously caused issues in createplan.c. For now, as a temporary fix, I've just #ifdef'd out the code in add_path() that's pfreeing the old path. I have drafted a patch that refcounts Paths, but I'm unsure if that's the correct solution as I'm only maintaining the refcounts in add_path() and add_partial_path(). I think a true correct solution would bump the refcount when a path is used as some other path's subpath. That would mean having to recursively pfree paths up until we find one with a refcount>0. Seems a bit expensive for add_path() to do. I've attached the updated patch. This one is probably ready for someone to test out. There will be more work to do, however. David
Attachment
pgsql-hackers by date: