Thread: Maintain the pathkesy for subquery from outer side information

Maintain the pathkesy for subquery from outer side information

From
Andy Fan
Date:
When I am working on the UnqiueKey stuff, I find the following cases.

SELECT * FROM (SELECT * FROM t offset 0) v ORDER BY a;
// root->query_keys = A.  root->order_pathkeys = A
// Current: subroot->query_pathkeys = NIL.
// Expected:  subroot->xxxx_pathkeys = [A].

SELECT * FROM (SELECT * FROM t offset 0) v,  t2 WHERE t2.a = t.a;
// root->query_keys = NIL
// Current: subroot->query_keys = NIL
// Expected: subroot->xxx_pathkeys = A

To resolve this issue,  I want to add a root->outer_pathkeys which means the
interesting order from the outer side for a subquery. To cover the
cases like below

// root->query_keys = root->order_keys = b.
// Expected: subroot->xxx_pathkeys = (a)?  (b)?
SELECT * FROM (SELECT * FROM t offset 0) v,  t2
WHERE t2.a = t.a order by v.b;

the root->outer_pathkeys should be a list of lists.  in above case
subroot->outer_pathkeys should be [ [a], [b] ],  this value may be
checked at many
places, like pathkeys_useful_for_ordering, get_useful_pathkeys_for_relation,
build_index_paths and more. My list might be incomplete, but once we
have a new place to check and the data is maintained already, it would
be easy to
improve.  My thinking is we maintain the root->outer_pathkeys first, and then
improve the well known function as the first step. What do you think?


-- 
Best Regards
Andy Fan (https://www.aliyun.com/)



Re: Maintain the pathkesy for subquery from outer side information

From
Tom Lane
Date:
Andy Fan <zhihui.fan1213@gmail.com> writes:
> When I am working on the UnqiueKey stuff, I find the following cases.
> SELECT * FROM (SELECT * FROM t offset 0) v ORDER BY a;
> // root->query_keys = A.  root->order_pathkeys = A
> // Current: subroot->query_pathkeys = NIL.
> // Expected:  subroot->xxxx_pathkeys = [A].

Why do you "expect" that?  I think pushing the outer ORDER BY past a
LIMIT is an unacceptable semantics change.

            regards, tom lane



Re: Maintain the pathkesy for subquery from outer side information

From
Andy Fan
Date:
On Sat, Jul 24, 2021 at 10:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Andy Fan <zhihui.fan1213@gmail.com> writes:
> > When I am working on the UnqiueKey stuff, I find the following cases.
> > SELECT * FROM (SELECT * FROM t offset 0) v ORDER BY a;
> > // root->query_keys = A.  root->order_pathkeys = A
> > // Current: subroot->query_pathkeys = NIL.
> > // Expected:  subroot->xxxx_pathkeys = [A].
>
> Why do you "expect" that?  I think pushing the outer ORDER BY past a
> LIMIT is an unacceptable semantics change.
>
>                         regards, tom lane

I don't mean push down a "ORDER BY" clause to subquery,  I mean push
down an "interesting order" to subquery.   for example we have index t(a);
then SELECT * FROM (SELECT a FROM t OFFSET 0) v ORDER BY a;
In the current implementation, when we are planning the subuqery, planners
think the "pathkey a" is not interesting,  but it should be IIUC.



-- 
Best Regards
Andy Fan (https://www.aliyun.com/)



Re: Maintain the pathkesy for subquery from outer side information

From
Tom Lane
Date:
Andy Fan <zhihui.fan1213@gmail.com> writes:
> On Sat, Jul 24, 2021 at 10:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Why do you "expect" that?  I think pushing the outer ORDER BY past a
>> LIMIT is an unacceptable semantics change.

> I don't mean push down a "ORDER BY" clause to subquery,  I mean push
> down an "interesting order" to subquery.   for example we have index t(a);
> then SELECT * FROM (SELECT a FROM t OFFSET 0) v ORDER BY a;
> In the current implementation, when we are planning the subuqery, planners
> think the "pathkey a" is not interesting,  but it should be IIUC.

No, it should not be.

(1) We have long treated "OFFSET 0" as an optimization fence.  That means
that the outer query shouldn't affect what plan you get for the subquery.

(2) If you ignore point (1), you could argue that choosing a different
result order doesn't matter for this subquery.  However, it potentially
*does* matter for a large fraction of the cases in which we'd not have
flattened the subquery into the outer query.  In subqueries involving
things like volatile functions, aggregates, window functions, etc,
encouraging the sub-planner to pick a different result ordering could
lead to visibly different output.

I think that in cases where there's not a semantic hazard involved,
we'd usually have pulled up the subquery so that this is all moot
anyway.

            regards, tom lane



Re: Maintain the pathkesy for subquery from outer side information

From
Andy Fan
Date:
> I think that in cases where there's not a semantic hazard involved,
> we'd usually have pulled up the subquery so that this is all moot
> anyway.
>

I get your point with this statement. Things driven by this idea look
practical and lucky.  But for the UniqueKey stuff, we are not
that lucky.

SELECT pk FROM t;  -- Maintain the UniqueKey would be not necessary.

However

SELECT DISTINCT pk FROM (SELECT volatile_f(a), pk from t) WHERE ..;

Maintaining the UniqueKey in subquery is necessary since it is useful outside.

-- 
Best Regards
Andy Fan (https://www.aliyun.com/)