Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table
Date
Msg-id tx2q6qi4hn2diocmnwfjo5xw2m56iabz3fyeimk7oeyymy3u4b@4ria2zwtz5mp
Whole thread Raw
In response to Do not scan index in right table if condition for left join evaluates to false using columns in left table  (Илья Жарков <izharkov1243@gmail.com>)
Responses Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table
Re: Do not scan index in right table if condition for left join evaluates to false using columns in left table
List pgsql-hackers
Hi,

On 2024-12-07 21:30:46 +0300, Илья Жарков wrote:
> Hi, could you help to understand why Postgres scans index in right table in
> the following case:
>
> CREATE TABLE parent (
> >   id integer PRIMARY KEY,
> >   dtype text
> > );
>
>
> CREATE TABLE child (
> >   id integer PRIMARY KEY
> > );
>
>
> INSERT INTO parent (id, dtype) values (1, 'A');
> > INSERT INTO child (id) values (1);
>
>
> EXPLAIN ANALYZE
> > SELECT *
> > FROM parent p
> > LEFT JOIN child c
> >   ON p.id = c.id
> >   AND p.dtype = 'B'
> > WHERE p.id = 1;
>
>
> Note that the only record in *parent *table has dtype == 'A', but the join
> condition has p.dtype = 'B'.
> The query plan still shows Index Only Scan on *child *table with loops=1.

The relevant difference between the inner and left join is that for the inner
join we push down the p.dtype = 'B'::text condition down. However, we do *not*
do so for outer joins.

Outer:
┌───────────────────────────────────────────────────┐
│                    QUERY PLAN                     │
├───────────────────────────────────────────────────┤
│ Nested Loop Left Join                             │
│   Join Filter: (p.dtype = 'B'::text)              │
│   ->  Index Scan using parent_pkey on parent p    │
│         Index Cond: (id = 1)                      │
│   ->  Index Only Scan using child_pkey on child c │
│         Index Cond: (id = 1)                      │
└───────────────────────────────────────────────────┘

Inner:
┌───────────────────────────────────────────────────┐
│                    QUERY PLAN                     │
├───────────────────────────────────────────────────┤
│ Nested Loop                                       │
│   ->  Index Scan using parent_pkey on parent p    │
│         Index Cond: (id = 1)                      │
│         Filter: (dtype = 'B'::text)               │
│   ->  Index Only Scan using child_pkey on child c │
│         Index Cond: (id = 1)                      │
└───────────────────────────────────────────────────┘


We *do* have code that recognizes the case where a clause in a join's ON only
references the nullable side. We however don't have code that recognizes the
same if it's the non-nullable side.

That's somewhat surprising, but it does kinda make sense: A after all, in a
query like yours, you could just have had the p.dtype = 'B' in the WHERE list,
rather than inside the join's ON. The same isn't true for the nullable side of
the join, as a condition for te nullable side in the WHERE clause breaks the
join's "outerness".

I.e. you can write your query as
  SELECT * FROM parent p LEFT JOIN child c ON p.id = c.id WHERE p.id = 1 AND p.dtype = 'B';
in which case you get the expected query plan:

┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                         QUERY PLAN
     │
 

├─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Nested Loop Left Join  (cost=0.31..16.36 rows=1 width=40) (actual time=0.035..0.036 rows=0 loops=1)
     │
 
│   ->  Index Scan using parent_pkey on parent p  (cost=0.15..8.17 rows=1 width=36) (actual time=0.034..0.034 rows=0
loops=1)│
 
│         Index Cond: (id = 1)
     │
 
│         Filter: (dtype = 'B'::text)
     │
 
│         Rows Removed by Filter: 1
     │
 
│   ->  Index Only Scan using child_pkey on child c  (cost=0.15..8.17 rows=1 width=4) (never executed)
     │
 
│         Index Cond: (id = 1)
     │
 
│         Heap Fetches: 0
     │
 
│ Planning Time: 29.912 ms
     │
 
│ Execution Time: 0.095 ms
     │
 

└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘


ISTM that it shouldn't be expensive to recognize this type of join clause and
pushes them down. While it could be done by the query's author, it seems worth
handling this on our side.  But maybe I'm missing something here?


Greetings,

Andres Freund

PS:

Here's the repro in an easier to copy way:

CREATE TABLE parent (id integer PRIMARY KEY,dtype text not null);
CREATE TABLE child (id integer PRIMARY KEY);
INSERT INTO parent (id, dtype) values (1, 'A');
INSERT INTO child (id) values (1);

EXPLAIN SELECT * FROM parent p LEFT JOIN child c ON p.id = c.id AND p.dtype = 'B' WHERE p.id = 1;
EXPLAIN SELECT * FROM parent p JOIN child c ON p.id = c.id AND p.dtype = 'B' WHERE p.id = 1;



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Statistics Import and Export
Next
From: Tomas Vondra
Date:
Subject: Re: [PATCH] Fixed creation of empty .log files during log rotation