Re: Reference to parent query from ANY sublink - Mailing list pgsql-hackers

From Antonin Houska
Subject Re: Reference to parent query from ANY sublink
Date
Msg-id 52A77B9E.1020500@gmail.com
Whole thread Raw
In response to Re: Reference to parent query from ANY sublink  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: Reference to parent query from ANY sublink
List pgsql-hackers
On 12/06/2013 03:33 PM, Kevin Grittner wrote:
> Antonin Houska <antonin.houska@gmail.com> wrote:
> 
>> SELECT *
>> FROM    tab1 a
>>      LEFT JOIN
>>      tab2 b
>>      ON a.i = ANY (
>>          SELECT  k
>>          FROM    tab3 c
>>          WHERE    k = a.i);
> 
> This query works with k in any or all tables, but the semantics
> certainly vary depending on where k happens to be.  It would help a
> lot if you showed SQL statements to create and populate the tables
> involved and/or if you qualified all referenced column names with
> the table alias to avoid ambiguity.

I used the DDLs attached (tables.ddl) for this query too, not only for
the queries in quaries.sql. Yes, if I had mentioned it and/or qualified
the 'k' column reference, it wouldn't have broken anything.

> If I assume that the k reference is supposed to be a column in
> tab3, what you have is a query where you always get all rows from
> tab1, and for each row from tab1 you either match it to all rows
> from tab2 or no rows from tab2 depending on whether the tab1 row
> has a match in tab3.

I concede this particular query is not useful. But the important thing
to consider here is which side of the LEFT JOIN the subquery references.

>> SELECT  *
>> FROM    tab1 a
>>      LEFT JOIN
>>      (
>>        SELECT *
>>        tab2 b
>>        SEMI JOIN
>>        (  SELECT  k
>>            FROM    tab3 c
>>            WHERE  k = a.i
>>        ) AS ANY_subquery
>>        ON a.i = ANY_subquery.k
>>      ) AS SJ_subquery
>>      ON true;
> 
> It is hard to see what you intend here, since this is not valid
> syntax.

This is what I - after having read the related source code - imagine to
happen internally when the ANY predicate of the first query is being
processed. In fact it should become something like this (also internal
stuff)

SELECT  *
FROM    tab1 a       LEFT JOIN       (         tab2 b         SEMI JOIN         (  SELECT  k             FROM    tab3 c
           WHERE  k = a.i         ) AS ANY_subquery         ON a.i = ANY_subquery.k)       ON true;
 

that is, SEMI JOIN node inserted into the tree rather than a subquery
(SJ_subquery). I posted the construct with SJ_subquery to show how I
thought about the problem: I thought it's safe (even though not
necessarily beautiful) to wrap the SEMI JOIN into the SJ_subquery and
let the existing infrastructure decide whether it's legal to turn it
into a join node. I concluded that the subquery's references to the tab1
ensure that SJ_subquery won't be flattened, so the patch does nothing if
such a reference exists.

> PostgreSQL supports semi-joins; but that is an implementation detail
> for the EXISTS or IN syntax.

... and for ANY, see subselect.c:convert_ANY_sublink_to_join()

> Could you clarify your intent?

To get rid of a subplan in some cases that require it so far: when the
subquery references table exactly 1 level higher (i.e. the immediate
parent query).

(I got the idea while reading the source code, as opposed to query
tuning.)

// Antonin Houska (Tony)

> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 




pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: ANALYZE sampling is too good
Next
From: Robert Haas
Date:
Subject: Re: stats for network traffic WIP