Re: Removing unneeded self joins - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Removing unneeded self joins
Date
Msg-id 4bc7e3f9-d955-c213-dbcd-c41d4f83bbe6@iki.fi
Whole thread Raw
In response to Re: Removing unneeded self joins  (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
Responses Re: Removing unneeded self joins  ("Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>)
List pgsql-hackers
On 28/11/2020 19:21, Andrey Lepikhov wrote:
> On 27.11.2020 21:49, Heikki Linnakangas wrote:
>> On 31/10/2020 11:26, Andrey V. Lepikhov wrote:
>>> +            /*
>>> +             * Process restrictlist to seperate out the self join
>>> quals from
>>> +             * the other quals. e.g x = x goes to selfjoinquals and a
>>> = b to
>>> +             * otherjoinquals.
>>> +             */
>>> +            split_selfjoin_quals(root, restrictlist, &selfjoinquals,
>>> +                                 &otherjoinquals);
>>> +
>>> +            if (list_length(selfjoinquals) == 0)
>>> +            {
>>> +                /*
>>> +                 * Have a chance to remove join if target list
>>> contains vars from
>>> +                 * the only one relation.
>>> +                 */
>>
>> I don't understand the logic here. If 'selfjoinquals' is empty, it means
>> that there is no join qual between the two relations, right? How can we
>> ever remove the join in that case? And how does the target list affect
>> that? Can you give an example query of that?
> 
> Maybe it is a problem of variable naming. Following the idea of David
> Rowley, we split quals into two subsets: {x==x} and another, for example
> {x=y}.
> First set is an trivial case of self-join: if we have unique index on
> the attribute 'x', then this join is self-join.
> Second set is give us a chance: if right side is unique for right side
> of the qual and no vars from right side end up in the target list of the
> join, then this is a self-join case. Example:
> 
> CREATE TABLE a(x int, y int);
> CREATE UNIQUE INDEX ON a(x);
> SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.x;  -- self-join
> CREATE UNIQUE INDEX ON a(y);
> SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y;  -- self-join too

The latter join is not "useless". The patch is returning incorrect 
result for that query:

> postgres=# insert into a values (1, 2);
> INSERT 0 1
> postgres=# insert into a values (2, 1);
> INSERT 0 1
> postgres=# SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- WRONG RESULT
>  x | y 
> ---+---
> (0 rows)
> 
> postgres=# set enable_self_join_removal=off;
> SET
> postgres=# SELECT a1.* FROM a a1, a a2 WHERE a1.x = a2.y; -- CORRECT RESULT
>  x | y 
> ---+---
>  1 | 2
>  2 | 1
> (2 rows)

- Heikki






pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Printing LSN made easy
Next
From: Paul A Jungwirth
Date:
Subject: Re: range_agg