Re: Push down more full joins in postgres_fdw - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Push down more full joins in postgres_fdw |
Date | |
Msg-id | CAFjFpRcn7=DNOXm-PJ_jVDqAmghKVf6tApQC+_RgMZ8tOPExcA@mail.gmail.com Whole thread Raw |
In response to | Re: Push down more full joins in postgres_fdw (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>) |
Responses |
Re: Push down more full joins in postgres_fdw
|
List | pgsql-hackers |
> >> 13. The comment below is missing the main purpose i.e. creating a a unique >> alias, in case the relation gets converted into a subquery. Lowest or >> highest >> relid will create a unique alias at given level of join and that would be >> more >> future proof. If we decide to consider paths for every join order, >> following >> comment will no more be true. >> + /* >> + * Set the relation index. This is defined as the position of this >> + * joinrel in the join_rel_list list plus the length of the rtable >> list. >> + * Note that since this joinrel is at the end of the list when we are >> + * called, we can get the position by list_length. >> + */ >> + fpinfo->relation_index = >> + list_length(root->parse->rtable) + >> list_length(root->join_rel_list); > > > I don't agree on that point. As I said before, the approach using the > lowest/highest relid would make a remote query having nested subqueries > difficult to read. And even if we decided to consider paths for every join > order, we could define the relation_index safely, by modifying > postgresGetForeignJoinPaths so that it (1) sets the relation_index the > proposed way at the first time it is called and (2) avoids setting the > relation_index after the first call, by checking to see > fpinfo->relation_index > 0. OK. Although, the alias which contains a number, which apparently doesn't show any relation with the relation (no pun intended :)) being deparsed, won't make much sense in the deparsed query. But I am fine leaving this to the committer's judgement. May be you want to add an assert here making sure that relation_index is set only once. Something like Assert(fpinfo->relation_index == 0) just before setting it. > >> 14. The function name talks about non-vars but the Assert few lines below >> asserts that every node there is a Var. Need better naming. >> +reltarget_has_non_vars(RelOptInfo *foreignrel) >> +{ >> + ListCell *lc; >> + >> + foreach(lc, foreignrel->reltarget->exprs) >> + { >> + Var *var = (Var *) lfirst(lc); >> + >> + Assert(IsA(var, Var)); >> And also an explanation for the claim >> + * Note: currently deparseExplicitTargetList can't properly handle such >> Vars. > > > Will remove this function for the reason described in #12. > >> 15. While deparsing every Var, we are descending the RelOptInfo hierarchy. > > > Right. I think that not only avoids creating redundant data to find the > alias of a given Var node but also would be able to find it in optimal cost. > >> Instead of that, we should probably gather all the alias information once >> and >> store it in context as an array indexed by relid. While deparsing a Var we >> can >> get the targetlist and alias for a given relation by using var->varno as >> index. >> And then search for given Var node in the targetlist to deparse the column >> name >> by its position in the targetlist. For the relations that are not >> converted >> into subqueries, this array will not hold any information and the Var will >> be >> deparsed as it's being done today. > > > Sorry, I don't understand this part. How does that work when creating a > remote query having nested subqueries? Is that able to search for a given > Var node efficiently? For every relation that is deparsed as a subquery, we will create a separate context. Each such context will have an array described above. This array will contain the targetlist and aliases for all the relations, covered by that relation, that are required to be deparsed as subqueries. These arrays bubble up to topmost relation that is not required to be deparsed as subquery. When a relation is required to be deparsed as a subquery, its immediate upper relation sets the alias and targetlist chosen for that relation at all the indexes corresponding to all the base relations covered by that relation. For example, let's assume that a relation (1, 2, 3) is required to be deparsed as subquery for an immediate upper relation (1, 2, 3, 4, 5) (thus the other joining relation being (4,5)). While deparsing for relation (1,2,3,4,5), the context will contain a 5 element array, with positions 1, 2, 3 filled by same targetlist and alias whereas positions 4 and 5 will not be filled as those relations are not deparsed as subqueries. Let's assume in relation (1, 2, 3), (1, 3) in turn requires subquery but (2) does not. Thus the context created while deparsing (1, 2, 3) will have a 3 element array with positions 1 and 3 containing the same targetlist and alias, where as position 2 will be empty. When deparsing a Var node with varno = N and varattno = m, if the nth position in the array in the context is empty, that Var node will be deparsed as rN.<column name>. But if that position is has alias sZ, then we search for that Var node in the targetlist and if it's found at kth position in the targetlist, we will deparse it as sZ.ck. The search in the targetlist can be performed using tlist_member, and then fetching the position by TargetEntry::resno. This does not require any recursion and thus saves stack space and some CPU cycles required for recursion. I guess, the arrays need to be computed only once for any relation when the query for that relation is deparsed the first time. Thanks for considering other suggestions. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: