Re: Push down more full joins in postgres_fdw - Mailing list pgsql-hackers

From Etsuro Fujita
Subject Re: Push down more full joins in postgres_fdw
Date
Msg-id 5aa9ed42-a00f-be85-1c8d-4b10a0b12c7e@lab.ntt.co.jp
Whole thread Raw
In response to Re: Push down more full joins in postgres_fdw  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Responses Re: Push down more full joins in postgres_fdw  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
List pgsql-hackers
On 2016/10/25 18:58, Ashutosh Bapat wrote:

You wrote:
>>> 13. The comment below is missing the main purpose i.e. creating a a unique
>>> alias, in case the relation gets converted into a subquery. Lowest or
>>> highest
>>> relid will create a unique alias at given level of join and that would be
>>> more
>>> future proof. If we decide to consider paths for every join order,
>>> following
>>> comment will no more be true.
>>> +    /*
>>> +     * Set the relation index.  This is defined as the position of this
>>> +     * joinrel in the join_rel_list list plus the length of the rtable
>>> list.
>>> +     * Note that since this joinrel is at the end of the list when we are
>>> +     * called, we can get the position by list_length.
>>> +     */
>>> +    fpinfo->relation_index =
>>> +        list_length(root->parse->rtable) +
>>> list_length(root->join_rel_list);

I wrote:
>> I don't agree on that point.  As I said before, the approach using the
>> lowest/highest relid would make a remote query having nested subqueries
>> difficult to read.  And even if we decided to consider paths for every join
>> order, we could define the relation_index safely, by modifying
>> postgresGetForeignJoinPaths so that it (1) sets the relation_index the
>> proposed way at the first time it is called and (2) avoids setting the
>> relation_index after the first call, by checking to see
>> fpinfo->relation_index > 0.

> OK. Although, the alias which contains a number, which apparently
> doesn't show any relation with the relation (no pun intended :)) being
> deparsed, won't make much sense in the deparsed query. But I am fine
> leaving this to the committer's judgement.

Fine with me.

> May be you want to add an
> assert here making sure that relation_index is set only once.
> Something like Assert(fpinfo->relation_index == 0) just before setting
> it.

Will do.

>>> 15. While deparsing every Var, we are descending the RelOptInfo hierarchy.

>> Right.  I think that not only avoids creating redundant data to find the
>> alias of a given Var node but also would be able to find it in optimal cost.

>>> Instead of that, we should probably gather all the alias information once
>>> and
>>> store it in context as an array indexed by relid. While deparsing a Var we
>>> can
>>> get the targetlist and alias for a given relation by using var->varno as
>>> index.
>>> And then search for given Var node in the targetlist to deparse the column
>>> name
>>> by its position in the targetlist. For the relations that are not
>>> converted
>>> into subqueries, this array will not hold any information and the Var will
>>> be
>>> deparsed as it's being done today.

>> Sorry, I don't understand this part.  How does that work when creating a
>> remote query having nested subqueries?  Is that able to search for a given
>> Var node efficiently?

> For every relation that is deparsed as a subquery, we will create a
> separate context. Each such context will have an array described
> above. This array will contain the targetlist and aliases for all the
> relations, covered by that relation, that are required to be deparsed
> as subqueries. These arrays bubble up to topmost relation that is not
> required to be deparsed as subquery. When a relation is required to be
> deparsed as a subquery, its immediate upper relation sets the alias
> and targetlist chosen for that relation at all the indexes
> corresponding to all the base relations covered by that relation.

Thanks for the explanation!

> For
> example, let's assume that a relation (1, 2, 3) is required to be
> deparsed as subquery for an immediate upper relation (1, 2, 3, 4, 5)
> (thus the other joining relation being (4,5)). While deparsing for
> relation (1,2,3,4,5), the context will contain a 5 element array, with
> positions 1, 2, 3 filled by same targetlist and alias whereas
> positions 4 and 5 will not be filled as those relations are not
> deparsed as subqueries.

Sorry, I don't understand this.  In that case, the immediate upper 
relation (1, 2, 3, 4, 5) would need to fill the targetlist and aliases 
for *the join relation (1, 2, 3) somewhere*, not the targetlist and 
aliases for each of the component relations 1, 2, and 3, because the 
join relation is deparsed as a subquery.  Maybe I'm missing something, 
though.

> Let's assume in relation (1, 2, 3), (1, 3) in
> turn requires subquery but (2) does not. Thus the context created
> while deparsing (1, 2, 3) will have a 3 element array with positions 1
> and 3 containing the same targetlist and alias, where as position 2
> will be empty.

> When deparsing a Var node with varno = N and varattno =
> m, if the nth position in the array in the context is empty, that Var
> node will be deparsed as rN.<column name>.

What happens when deparsing eg, a Var with varno = 2 at the topmost 
relation (1, 2, 3, 4, 5)?  The second position of the array is empty, 
but the join relation (1, 2, 3) is deparsed as a subquery, so the Var 
should be deparsed as an alias of an output column of the subquery at 
the topmost relation, I think.

> But if that position is has
> alias sZ, then we search for that Var node in the targetlist and if
> it's found at kth position in the targetlist, we will deparse it as
> sZ.ck. The search in the targetlist can be performed using
> tlist_member, and then fetching the position by TargetEntry::resno.

> This does not require any recursion and thus saves stack space and
> some CPU cycles required for recursion.

Is that true?

> some CPU cycles required for recursion. I guess, the arrays need to be
> computed only once for any relation when the query for that relation
> is deparsed the first time.

Does this algorithm extend to the case where we consider paths for every 
join order?

> Thanks for considering other suggestions.

Your suggestions are really helpful to improve the patch.  Thanks!

Best regards,
Etsuro Fujita





pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Declarative partitioning - another take
Next
From: Peter Eisentraut
Date:
Subject: Re: macaddr 64 bit (EUI-64) datatype support