Re: IS NOT NULL and LEFT JOIN - Mailing list pgsql-performance

From Laurent Martelli
Subject Re: IS NOT NULL and LEFT JOIN
Date
Msg-id 5444D5FF.5070407@enercoop.org
Whole thread Raw
In response to Re: IS NOT NULL and LEFT JOIN  (David Rowley <dgrowleyml@gmail.com>)
Responses Re: IS NOT NULL and LEFT JOIN
List pgsql-performance
Hi David,

Do we agree that both queries are identical ? Since we join on c.user_info=u.id having u.id is not null or c.user_info is not null in the where clause is the same, isn't it ?

Since c.user_info=u.id the condition on u.id is not null does not use any *new* information from user_user_info.

Regards,
Laurent

Le 19/10/2014 10:41, David Rowley a écrit :
On Sun, Oct 19, 2014 at 5:10 PM, Laurent Martelli <laurent.martelli@enercoop.org> wrote:
Hello there,

I have a strange query plan involving an IS NOT NULL and a LEFT JOIN.

I grant you that the query can be written without the JOIN on
user_user_info,
but it is generated like this by hibernate. Just changing the IS NOT NULL condition
to the other side of useless JOIN makes a big difference in the query plan :

-- THE BAD ONE : given the selectivity on c.name and c.email, barely more than one row will ever be returned

But it looks like you're ignoring the fact that the OR condition would force the query to match not only the user and the email, but also any row that finds a match in the user_user_info table, which going by the planner's estimates, that's every row in the contract_contract table. This is why the planner chooses a seqscan on the contract_contract table instead of using the index on lower(name).

Is it really your intention to get all rows that find a this martelli contract that has this email, and along with that, get every contract that has a not null user_info record?

I see that you have a foreign key on c.user_info to reference the user, so this should be matching everything with a non null user_info record.
 

explain analyze select c.*
       from contact_contact c
       left outer join user_user_info u on c.user_info=u.id
       left outer join contact_address a on c.address=a.id
      where lower(c.name)='martelli'
        and c.email='dsfze@ezrfz.com' or u.id is not null;
                                                           QUERY PLAN                                                          
--------------------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=1.83..2246.76 rows=59412 width=4012) (actual time=53.645..53.645 rows=0 loops=1)
   Hash Cond: (c.user_info = u.id)
   Filter: (((lower((c.name)::text) = 'martelli'::text) AND ((c.email)::text = 'dsfze@ezrfz.com'::text)) OR (u.id IS NOT NULL))
   Rows Removed by Filter: 58247
   ->  Seq Scan on contact_contact c  (cost=0.00..2022.12 rows=59412 width=4012) (actual time=0.007..6.892 rows=58247 loops=1)
   ->  Hash  (cost=1.37..1.37 rows=37 width=8) (actual time=0.029..0.029 rows=37 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 2kB
         ->  Seq Scan on user_user_info u  (cost=0.00..1.37 rows=37 width=8) (actual time=0.004..0.015 rows=37 loops=1)
 Planning time: 0.790 ms
 Execution time: 53.712 ms

-- THE GOOD ONE (test IS NOT NULL on
contact0_.user_info instead of userinfo1_.id)
explain analyze select c.*         
       from contact_contact c        
       left outer join user_user_info u on c.user_info=u.id          
       left outer join contact_address a on c.address=a.id         
      where lower(c.name)='martelli'
        and c.email='dsfze@ezrfz.com' or c.user_info is not null;
                                                                 QUERY PLAN                                                                
--------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on contact_contact c  (cost=8.60..16.41 rows=1 width=4012) (actual time=0.037..0.037 rows=0 loops=1)
   Recheck Cond: (((email)::text = 'dsfze@ezrfz.com'::text) OR (user_info IS NOT NULL))
   Filter: (((lower((name)::text) = 'martelli'::text) AND ((email)::text = 'dsfze@ezrfz.com'::text)) OR (user_info IS NOT NULL))
   ->  BitmapOr  (cost=8.60..8.60 rows=2 width=0) (actual time=0.034..0.034 rows=0 loops=1)
         ->  Bitmap Index Scan on idx_contact_email  (cost=0.00..4.30 rows=2 width=0) (actual time=0.027..0.027 rows=0 loops=1)
               Index Cond: ((email)::text = 'dsfze@ezrfz.com'::text)
         ->  Bitmap Index Scan on contact_contact_user_info_idx  (cost=0.00..4.30 rows=1 width=0) (actual time=0.007..0.007 rows=0 loops=1)
               Index Cond: (user_info IS NOT NULL)
 Planning time: 0.602 ms
 Execution time: 0.118 ms


If you look closely at the 2nd query plan, you'll see that no joins are performed, and it's only the contract_contract table that's looked at. This is because PostgresSQL sees that none of the columns from the 2 tables which are being left joined to are used, and also that the columns that you're joining to on these tables are unique, therefore joining to them cannot duplicate any rows, and since these are left joined, if there was no matching row, then it wouldn't filter out rows from the contract_contract table, as it would with INNER JOINs. The planner sees that these left joins are pointless, so just removes them from the plan.

Regards

David Rowley

pgsql-performance by date:

Previous
From: Greg Stark
Date:
Subject: Re: Yet another abort-early plan disaster on 9.3
Next
From: Tom Lane
Date:
Subject: Re: IS NOT NULL and LEFT JOIN