Thread: IS NOT NULL and LEFT JOIN

IS NOT NULL and LEFT JOIN

From

Laurent Martelli

Date:

19 October 2014, 04:11:00

Hello there,

I have a strange query plan involving an IS NOT NULL and a LEFT JOIN.

I grant you that the query can be written without the JOIN on user_user_info,
but it is generated like this by hibernate. Just changing the IS NOT NULL condition
to the other side of useless JOIN makes a big difference in the query plan :

-- THE BAD ONE : given the selectivity on c.name and c.email, barely more than one row will ever be returned
explain analyze select c.*
       from contact_contact c
       left outer join user_user_info u on c.user_info=u.id
       left outer join contact_address a on c.address=a.id
      where lower(c.name)='martelli'
        and c.email='dsfze@ezrfz.com' or u.id is not null;
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=1.83..2246.76 rows=59412 width=4012) (actual time=53.645..53.645 rows=0 loops=1)
   Hash Cond: (c.user_info = u.id)
   Filter: (((lower((c.name)::text) = 'martelli'::text) AND ((c.email)::text = 'dsfze@ezrfz.com'::text)) OR (u.id IS NOT NULL))
   Rows Removed by Filter: 58247
   -> Seq Scan on contact_contact c (cost=0.00..2022.12 rows=59412 width=4012) (actual time=0.007..6.892 rows=58247 loops=1)
   -> Hash (cost=1.37..1.37 rows=37 width=8) (actual time=0.029..0.029 rows=37 loops=1)
         Buckets: 1024 Batches: 1 Memory Usage: 2kB
         -> Seq Scan on user_user_info u (cost=0.00..1.37 rows=37 width=8) (actual time=0.004..0.015 rows=37 loops=1)
Planning time: 0.790 ms
Execution time: 53.712 ms

-- THE GOOD ONE (test IS NOT NULL on contact0_.user_info instead of userinfo1_.id)
explain analyze select c.*
       from contact_contact c
       left outer join user_user_info u on c.user_info=u.id
       left outer join contact_address a on c.address=a.id
      where lower(c.name)='martelli'
        and c.email='dsfze@ezrfz.com' or c.user_info is not null;
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on contact_contact c (cost=8.60..16.41 rows=1 width=4012) (actual time=0.037..0.037 rows=0 loops=1)
   Recheck Cond: (((email)::text = 'dsfze@ezrfz.com'::text) OR (user_info IS NOT NULL))
   Filter: (((lower((name)::text) = 'martelli'::text) AND ((email)::text = 'dsfze@ezrfz.com'::text)) OR (user_info IS NOT NULL))
   -> BitmapOr (cost=8.60..8.60 rows=2 width=0) (actual time=0.034..0.034 rows=0 loops=1)
         -> Bitmap Index Scan on idx_contact_email (cost=0.00..4.30 rows=2 width=0) (actual time=0.027..0.027 rows=0 loops=1)
               Index Cond: ((email)::text = 'dsfze@ezrfz.com'::text)
         -> Bitmap Index Scan on contact_contact_user_info_idx (cost=0.00..4.30 rows=1 width=0) (actual time=0.007..0.007 rows=0 loops=1)
               Index Cond: (user_info IS NOT NULL)
Planning time: 0.602 ms
Execution time: 0.118 ms

My tables are as follow, and I use postgres 9.4 :
                                          Table « public.contact_contact »       Colonne         |            Type             | Modificateurs | Stockage | Cible de statistiques | Description 
------------------------+-----------------------------+---------------+----------+-----------------------+-------------id                     | bigint                      | non NULL      | plain    |                       | archived               | boolean                     |               | plain    |                       | version                | integer                     |               | plain    |                       | created_on             | timestamp without time zone |               | plain    |                       | updated_on             | timestamp without time zone |               | plain    |                       | actor_ref              | character varying(255)      |               | extended |                       | addressl1              | character varying(255)      |               | extended |                       | comment                | text                        |               | extended |                       | contact_partner_ok     | boolean                     |               | plain    |                       | date_of_birth          | date                        |               | plain    |                       | email                  | character varying(255)      |               | extended |                       | email_pro              | character varying(255)      |               | extended |                       | fax                    | character varying(255)      |               | extended |                       | first_name             | character varying(255)      |               | extended |                       | fixed_phone1           | character varying(255)      |               | extended |                       | fixed_phone2           | character varying(255)      |               | extended |                       | fixed_phone_pro        | character varying(255)      |               | extended |                       | import_key1            | character varying(255)      |               | extended |                       | import_key2            | character varying(255)      |               | extended |                       | koala_id               | character varying(255)      |               | extended |                       | mobile_phone_perso     | character varying(255)      |               | extended |                       | mobile_phone_pro       | character varying(255)      |               | extended |                       | name                   | character varying(255)      | non NULL      | extended |                       | ola_email              | character varying(255)      |               | extended |                       | ola_phone              | character varying(255)      |               | extended |                       | person_category_select | character varying(255)      |               | extended |                       | web_site               | character varying(255)      |               | extended |                       | year_of_birth          | integer                     |               | plain    |                       | created_by             | bigint                      |               | plain    |                       | updated_by             | bigint                      |               | plain    |                       | action_event_source    | bigint                      |               | plain    |                       | address                | bigint                      |               | plain    |                       | address_pro            | bigint                      |               | plain    |                       | jobtitle               | bigint                      |               | plain    |                       | merged_with            | bigint                      |               | plain    |                       | nationality_country    | bigint                      |               | plain    |                       | origin                 | bigint                      |               | plain    |                       | place_of_birth_address | bigint                      |               | plain    |                       | title                  | bigint                      |               | plain    |                       | user_info              | bigint                      |               | plain    |                       | import_origin          | character varying(255)      |               | extended |                       | duplicates             | bigint                      |               | plain    |                       | 
Index :   "contact_contact_pkey" PRIMARY KEY, btree (id)   "uk_bx19539x7h0y0w4p4uw9gnqbo" UNIQUE CONSTRAINT, btree (koala_id)   "uk_vg25de8jcu18m89o9dy2n4fe" UNIQUE CONSTRAINT, btree (import_key1)   "contact_contact_action_event_source_idx" btree (action_event_source)   "contact_contact_address_idx" btree (address)   "contact_contact_address_l1_idx" btree (addressl1)   "contact_contact_address_pro_idx" btree (address_pro)   "contact_contact_jobtitle_idx" btree (jobtitle)   "contact_contact_merged_with_idx" btree (merged_with)   "contact_contact_name_idx" btree (name)   "contact_contact_nationality_country_idx" btree (nationality_country)   "contact_contact_origin_idx" btree (origin)   "contact_contact_place_of_birth_address_idx" btree (place_of_birth_address)   "contact_contact_title_idx" btree (title)   "contact_contact_user_info_idx" btree (user_info)   "idx_contact_email" btree (email)   "idx_contact_lower_name" btree (lower(name::text))   "idx_contact_search_name" btree (lower(name::text), lower(first_name::text))
Contraintes de clés étrangères :   "fk_8dj7rw3jrdxk4vxbi6vony0ne" FOREIGN KEY (created_by) REFERENCES auth_user(id)   "fk_9s1dhwrvw6lq74fvty6oj2wc5" FOREIGN KEY (address_pro) REFERENCES contact_address(id)   "fk_9wjsgh8lt5ixbshx9pjwmjtk1" FOREIGN KEY (origin) REFERENCES crm_origin(id)   "fk_ad53x8tdando1w1jdlyxcop9v" FOREIGN KEY (duplicates) REFERENCES contact_contact(id)   "fk_edusucr1gdfj99vtm0a70gggg" FOREIGN KEY (title) REFERENCES contact_title(id)   "fk_g7u75rjd754m7evn2alckjvka" FOREIGN KEY (merged_with) REFERENCES contact_contact(id)   "fk_j72hkuq0337v6utjbf85hhvxg" FOREIGN KEY (action_event_source) REFERENCES crm_action_event_source(id)   "fk_k73mcu7swia6uf6qpp4v6lwxf" FOREIGN KEY (updated_by) REFERENCES auth_user(id)   "fk_mvpl7wudcdqgitmmsd900od97" FOREIGN KEY (place_of_birth_address) REFERENCES contact_address(id)   "fk_onriw4jpgeuvhfk827amxry8k" FOREIGN KEY (address) REFERENCES contact_address(id)   "fk_rpkvno8705gap9ejj4wnnb7hl" FOREIGN KEY (nationality_country) REFERENCES territory_country(id)   "fk_s9fsy33u5a9ke8wee9mc2vpsx" FOREIGN KEY (user_info) REFERENCES user_user_info(id)   "fk_t8uexb8lmgaftjsnn63eoty90" FOREIGN KEY (jobtitle) REFERENCES contact_jobtitle(id)

coopener=# \d+ user_user_info                                       Table « public.user_user_info »    Colonne     |            Type             | Modificateurs | Stockage | Cible de statistiques | Description 
-----------------+-----------------------------+---------------+----------+-----------------------+-------------id              | bigint                      | non NULL      | plain    |                       | archived        | boolean                     |               | plain    |                       | version         | integer                     |               | plain    |                       | created_on      | timestamp without time zone |               | plain    |                       | updated_on      | timestamp without time zone |               | plain    |                       | full_name       | character varying(255)      |               | extended |                       | import_key      | character varying(255)      |               | extended |                       | import_username | character varying(255)      |               | extended |                       | today           | timestamp without time zone |               | plain    |                       | user_system_ok  | boolean                     |               | plain    |                       | created_by      | bigint                      |               | plain    |                       | updated_by      | bigint                      |               | plain    |                       | active_company  | bigint                      |               | plain    |                       | agency          | bigint                      |               | plain    |                       | internal_user   | bigint                      | non NULL      | plain    |                       | 
Index :   "user_user_info_pkey" PRIMARY KEY, btree (id)   "uk_99o17944ddytysui6b05lxyb2" UNIQUE CONSTRAINT, btree (import_key)   "uk_cqgrw75h35ts19uixn03rkjsu" UNIQUE CONSTRAINT, btree (internal_user)   "uk_jtsvu4r7s12nnh9o2sloqyqv4" UNIQUE CONSTRAINT, btree (import_username)   "user_user_info_active_company_idx" btree (active_company)   "user_user_info_agency_idx" btree (agency)   "user_user_info_full_name_idx" btree (full_name)
Contraintes de clés étrangères :   "fk_cojxp4r7d8n2l135gy4xa4vak" FOREIGN KEY (active_company) REFERENCES contact_company(id)   "fk_cqgrw75h35ts19uixn03rkjsu" FOREIGN KEY (internal_user) REFERENCES auth_user(id)   "fk_k3riohsx7jrhxkxdmxyeqflq1" FOREIGN KEY (updated_by) REFERENCES auth_user(id)   "fk_r3e16hs6puibteaby3rk42yg0" FOREIGN KEY (created_by) REFERENCES auth_user(id)   "fk_t389sdkhi9owy0xbhec2nqp5w" FOREIGN KEY (agency) REFERENCES contact_agency(id)

coopener=# \d+ contact_address                                         Table « public.contact_address »      Colonne        |            Type             | Modificateurs | Stockage | Cible de statistiques | Description 
----------------------+-----------------------------+---------------+----------+-----------------------+-------------id                   | bigint                      | non NULL      | plain    |                       | archived             | boolean                     |               | plain    |                       | version              | integer                     |               | plain    |                       | created_on           | timestamp without time zone |               | plain    |                       | updated_on           | timestamp without time zone |               | plain    |                       | addressl2            | character varying(255)      |               | extended |                       | addressl3            | character varying(255)      |               | extended |                       | addressl4            | character varying(255)      |               | extended |                       | addressl5            | character varying(255)      |               | extended |                       | addressl6            | character varying(255)      |               | extended |                       | certified_ok         | boolean                     |               | plain    |                       | consumption_place_ok | boolean                     |               | plain    |                       | full_name            | character varying(255)      |               | extended |                       | insee_code           | character varying(255)      |               | extended |                       | koala_id             | character varying(255)      |               | extended |                       | created_by           | bigint                      |               | plain    |                       | updated_by           | bigint                      |               | plain    |                       | addressl7country     | bigint                      |               | plain    |                       | commune              | bigint                      |               | plain    |                       | 
Index :   "contact_address_pkey" PRIMARY KEY, btree (id)   "contact_address_address_l7_country_idx" btree (addressl7country)   "contact_address_commune_idx" btree (commune)   "contact_address_full_name_idx" btree (full_name)
Contraintes de clés étrangères :   "fk_4yx7nnewflhyjdm5tue5qntbg" FOREIGN KEY (commune) REFERENCES territory_commune(id)   "fk_5lwaygtve0ol8ma53picsdef" FOREIGN KEY (addressl7country) REFERENCES territory_country(id)   "fk_p9svu5ssynimpuu0is3j396lt" FOREIGN KEY (updated_by) REFERENCES auth_user(id)   "fk_rm0lcgnys2n97ad62jkm53qlt" FOREIGN KEY (created_by) REFERENCES auth_user(id)


Regards,
Laurent

Re: IS NOT NULL and LEFT JOIN

From

David Rowley

Date:

19 October 2014, 08:42:03

On Sun, Oct 19, 2014 at 5:10 PM, Laurent Martelli <laurent.martelli@enercoop.org> wrote:

Hello there,

I have a strange query plan involving an IS NOT NULL and a LEFT JOIN.

I grant you that the query can be written without the JOIN on user_user_info,
but it is generated like this by hibernate. Just changing the IS NOT NULL condition
to the other side of useless JOIN makes a big difference in the query plan :

-- THE BAD ONE : given the selectivity on c.name and c.email, barely more than one row will ever be returned

But it looks like you're ignoring the fact that the OR condition would force the query to match not only the user and the email, but also any row that finds a match in the user_user_info table, which going by the planner's estimates, that's every row in the contract_contract table. This is why the planner chooses a seqscan on the contract_contract table instead of using the index on lower(name).

Is it really your intention to get all rows that find a this martelli contract that has this email, and along with that, get every contract that has a not null user_info record?

I see that you have a foreign key on c.user_info to reference the user, so this should be matching everything with a non null user_info record.

explain analyze select c.*
       from contact_contact c
       left outer join user_user_info u on c.user_info=u.id
       left outer join contact_address a on c.address=a.id
      where lower(c.name)='martelli'
        and c.email='dsfze@ezrfz.com' or u.id is not null;

                               QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=1.83..2246.76 rows=59412 width=4012) (actual time=53.645..53.645 rows=0 loops=1)
   Hash Cond: (c.user_info = u.id)
   Filter: (((lower((c.name)::text) = 'martelli'::text) AND ((c.email)::text = 'dsfze@ezrfz.com'::text)) OR (u.id IS NOT NULL))
   Rows Removed by Filter: 58247
   -> Seq Scan on contact_contact c (cost=0.00..2022.12 rows=59412 width=4012) (actual time=0.007..6.892 rows=58247 loops=1)
   -> Hash (cost=1.37..1.37 rows=37 width=8) (actual time=0.029..0.029 rows=37 loops=1)
         Buckets: 1024 Batches: 1 Memory Usage: 2kB
         -> Seq Scan on user_user_info u (cost=0.00..1.37 rows=37 width=8) (actual time=0.004..0.015 rows=37 loops=1)
Planning time: 0.790 ms
Execution time: 53.712 ms

-- THE GOOD ONE (test IS NOT NULL on contact0_.user_info instead of userinfo1_.id)
explain analyze select c.*
       from contact_contact c
       left outer join user_user_info u on c.user_info=u.id
       left outer join contact_address a on c.address=a.id
      where lower(c.name)='martelli'
        and c.email='dsfze@ezrfz.com' or c.user_info is not null;
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on contact_contact c (cost=8.60..16.41 rows=1 width=4012) (actual time=0.037..0.037 rows=0 loops=1)
   Recheck Cond: (((email)::text = 'dsfze@ezrfz.com'::text) OR (user_info IS NOT NULL))
   Filter: (((lower((name)::text) = 'martelli'::text) AND ((email)::text = 'dsfze@ezrfz.com'::text)) OR (user_info IS NOT NULL))
   -> BitmapOr (cost=8.60..8.60 rows=2 width=0) (actual time=0.034..0.034 rows=0 loops=1)
         -> Bitmap Index Scan on idx_contact_email (cost=0.00..4.30 rows=2 width=0) (actual time=0.027..0.027 rows=0 loops=1)
               Index Cond: ((email)::text = 'dsfze@ezrfz.com'::text)
         -> Bitmap Index Scan on contact_contact_user_info_idx (cost=0.00..4.30 rows=1 width=0) (actual time=0.007..0.007 rows=0 loops=1)
               Index Cond: (user_info IS NOT NULL)
Planning time: 0.602 ms
Execution time: 0.118 ms

If you look closely at the 2nd query plan, you'll see that no joins are performed, and it's only the contract_contract table that's looked at. This is because PostgresSQL sees that none of the columns from the 2 tables which are being left joined to are used, and also that the columns that you're joining to on these tables are unique, therefore joining to them cannot duplicate any rows, and since these are left joined, if there was no matching row, then it wouldn't filter out rows from the contract_contract table, as it would with INNER JOINs. The planner sees that these left joins are pointless, so just removes them from the plan.

Regards

David Rowley

Re: IS NOT NULL and LEFT JOIN

From

Laurent Martelli

Date:

20 October 2014, 09:29:44

Hi David,

Do we agree that both queries are identical ? Since we join on c.user_info=u.id having u.id is not null or c.user_info is not null in the where clause is the same, isn't it ?

Since c.user_info=u.id the condition on u.id is not null does not use any *new* information from user_user_info.

Regards,
Laurent

Le 19/10/2014 10:41, David Rowley a écrit :

On Sun, Oct 19, 2014 at 5:10 PM, Laurent Martelli <laurent.martelli@enercoop.org> wrote:
Hello there,

I have a strange query plan involving an IS NOT NULL and a LEFT JOIN.

I grant you that the query can be written without the JOIN on user_user_info,
but it is generated like this by hibernate. Just changing the IS NOT NULL condition
to the other side of useless JOIN makes a big difference in the query plan :

-- THE BAD ONE : given the selectivity on c.name and c.email, barely more than one row will ever be returned

But it looks like you're ignoring the fact that the OR condition would force the query to match not only the user and the email, but also any row that finds a match in the user_user_info table, which going by the planner's estimates, that's every row in the contract_contract table. This is why the planner chooses a seqscan on the contract_contract table instead of using the index on lower(name).

Is it really your intention to get all rows that find a this martelli contract that has this email, and along with that, get every contract that has a not null user_info record?

I see that you have a foreign key on c.user_info to reference the user, so this should be matching everything with a non null user_info record.

explain analyze select c.*
       from contact_contact c
       left outer join user_user_info u on c.user_info=u.id
       left outer join contact_address a on c.address=a.id
      where lower(c.name)='martelli'
        and c.email='dsfze@ezrfz.com' or u.id is not null;
                               QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Hash Left Join (cost=1.83..2246.76 rows=59412 width=4012) (actual time=53.645..53.645 rows=0 loops=1)
   Hash Cond: (c.user_info = u.id)
   Filter: (((lower((c.name)::text) = 'martelli'::text) AND ((c.email)::text = 'dsfze@ezrfz.com'::text)) OR (u.id IS NOT NULL))
   Rows Removed by Filter: 58247
   -> Seq Scan on contact_contact c (cost=0.00..2022.12 rows=59412 width=4012) (actual time=0.007..6.892 rows=58247 loops=1)
   -> Hash (cost=1.37..1.37 rows=37 width=8) (actual time=0.029..0.029 rows=37 loops=1)
         Buckets: 1024 Batches: 1 Memory Usage: 2kB
         -> Seq Scan on user_user_info u (cost=0.00..1.37 rows=37 width=8) (actual time=0.004..0.015 rows=37 loops=1)
Planning time: 0.790 ms
Execution time: 53.712 ms

-- THE GOOD ONE (test IS NOT NULL on contact0_.user_info instead of userinfo1_.id)
explain analyze select c.*
       from contact_contact c
       left outer join user_user_info u on c.user_info=u.id
       left outer join contact_address a on c.address=a.id
      where lower(c.name)='martelli'
        and c.email='dsfze@ezrfz.com' or c.user_info is not null;
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on contact_contact c (cost=8.60..16.41 rows=1 width=4012) (actual time=0.037..0.037 rows=0 loops=1)
   Recheck Cond: (((email)::text = 'dsfze@ezrfz.com'::text) OR (user_info IS NOT NULL))
   Filter: (((lower((name)::text) = 'martelli'::text) AND ((email)::text = 'dsfze@ezrfz.com'::text)) OR (user_info IS NOT NULL))
   -> BitmapOr (cost=8.60..8.60 rows=2 width=0) (actual time=0.034..0.034 rows=0 loops=1)
         -> Bitmap Index Scan on idx_contact_email (cost=0.00..4.30 rows=2 width=0) (actual time=0.027..0.027 rows=0 loops=1)
               Index Cond: ((email)::text = 'dsfze@ezrfz.com'::text)
         -> Bitmap Index Scan on contact_contact_user_info_idx (cost=0.00..4.30 rows=1 width=0) (actual time=0.007..0.007 rows=0 loops=1)
               Index Cond: (user_info IS NOT NULL)
Planning time: 0.602 ms
Execution time: 0.118 ms
If you look closely at the 2nd query plan, you'll see that no joins are performed, and it's only the contract_contract table that's looked at. This is because PostgresSQL sees that none of the columns from the 2 tables which are being left joined to are used, and also that the columns that you're joining to on these tables are unique, therefore joining to them cannot duplicate any rows, and since these are left joined, if there was no matching row, then it wouldn't filter out rows from the contract_contract table, as it would with INNER JOINs. The planner sees that these left joins are pointless, so just removes them from the plan.

Regards

David Rowley

Re: IS NOT NULL and LEFT JOIN

From

Tom Lane

Date:

20 October 2014, 13:58:16

Laurent Martelli <laurent.martelli@enercoop.org> writes:
> Do we agree that both queries are identical ?

No, they *aren't* identical.  Go consult any SQL reference.  Left join
conditions don't work the way you seem to be thinking: after the join,
the RHS column might be null, rather than equal to the LHS column.

            regards, tom lane

Re: IS NOT NULL and LEFT JOIN

From

Laurent Martelli

Date:

20 October 2014, 21:02:47

Le 20/10/2014 15:58, Tom Lane a écrit :
> Laurent Martelli <laurent.martelli@enercoop.org> writes:
>> Do we agree that both queries are identical ?
> No, they *aren't* identical.  Go consult any SQL reference.  Left join
> conditions don't work the way you seem to be thinking: after the join,
> the RHS column might be null, rather than equal to the LHS column.
Yes, I was wrong to assume that c.user_info=u.id because of the LEFT JOIN.

But since I only want rows where u.id IS NOT NULL, in any case I will
also have c.user_info IS NOT NULL.

Also, having a foreign key, if c.user_info is not null, it will have a
match in u. So in that case, either both c.user_info and c.id are null
in the result rows, or they are equal.

Regards,
Laurent

Re: IS NOT NULL and LEFT JOIN

From

David G Johnston

Date:

20 October 2014, 22:19:59

Laurent Martelli wrote
> Le 20/10/2014 15:58, Tom Lane a écrit :
>> Laurent Martelli <

> laurent.martelli@

> > writes:
>>> Do we agree that both queries are identical ?
>> No, they *aren't* identical.  Go consult any SQL reference.  Left join
>> conditions don't work the way you seem to be thinking: after the join,
>> the RHS column might be null, rather than equal to the LHS column.
> Yes, I was wrong to assume that c.user_info=u.id because of the LEFT JOIN.
>
> But since I only want rows where u.id IS NOT NULL, in any case I will
> also have c.user_info IS NOT NULL.
>
> Also, having a foreign key, if c.user_info is not null, it will have a
> match in u. So in that case, either both c.user_info and c.id are null
> in the result rows, or they are equal.

The planner only expends so much effort converting between equivalent query
forms.  By adding u.id IS NOT NULL you are saying that you really meant to
use INNER JOIN instead of LEFT JOIN but whether the planner can and/or does
act on that information in the WHERE clause to modify its joins is beyond my
knowledge.  It doesn't seem to and probably correctly isn't worth adding the
planner cycles to fix a poorly written/generated query on-the-fly.

Now that it has been pointed out that the two queries you supplied are
semantically different it is unclear what your point here is.  It is known
that Hibernate (and humans too) will generate sub-optimal plans that can be
rewritten using relational algebra and better optimized for having done so.
But such work takes resources that would be expended for every single query
while manually rewriting the sub-optimal query solves the problem
once-and-for-all.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/IS-NOT-NULL-and-LEFT-JOIN-tp5823591p5823737.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: IS NOT NULL and LEFT JOIN

From

David G Johnston

Date:

20 October 2014, 22:30:12

David G Johnston wrote
>
> Laurent Martelli wrote
>> Le 20/10/2014 15:58, Tom Lane a écrit :
>>> Laurent Martelli <

>> laurent.martelli@

>> > writes:
>>>> Do we agree that both queries are identical ?
>>> No, they *aren't* identical.  Go consult any SQL reference.  Left join
>>> conditions don't work the way you seem to be thinking: after the join,
>>> the RHS column might be null, rather than equal to the LHS column.
>> Yes, I was wrong to assume that c.user_info=u.id because of the LEFT
>> JOIN.
>>
>> But since I only want rows where u.id IS NOT NULL, in any case I will
>> also have c.user_info IS NOT NULL.
>>
>> Also, having a foreign key, if c.user_info is not null, it will have a
>> match in u. So in that case, either both c.user_info and c.id are null
>> in the result rows, or they are equal.
> The planner only expends so much effort converting between equivalent
> query forms.  By adding u.id IS NOT NULL you are saying that you really
> meant to use INNER JOIN instead of LEFT JOIN but whether the planner can
> and/or does act on that information in the WHERE clause to modify its
> joins is beyond my knowledge.  It doesn't seem to and probably correctly
> isn't worth adding the planner cycles to fix a poorly written/generated
> query on-the-fly.
>
>
> Now that it has been pointed out that the two queries you supplied are
> semantically different it is unclear what your point here is.  It is known
> that Hibernate (and humans too) will generate sub-optimal plans that can
> be rewritten using relational algebra and better optimized for having done
> so.  But such work takes resources that would be expended for every single
> query while manually rewriting the sub-optimal query solves the problem
> once-and-for-all.
>
> David J.

Didn't sound right what I wrote above...

The presence of the "OR" screws things up even further since it does force
the use of LEFT JOIN mechanics for the single case where the name and e-mail
match.

I would maybe try a UNION DISTINCT query instead of an OR clause if you want
to have a query that performs better than the Hibernate one...otherwise
others more knowledgeable than myself have not made any indication that the
planner is unintentionally deficient in its handling of your original query.

You may try posting your actual question, and not the SQL, and see if that
sparks any suggestions.

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/IS-NOT-NULL-and-LEFT-JOIN-tp5823591p5823739.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: IS NOT NULL and LEFT JOIN

From

David Rowley

Date:

21 October 2014, 08:44:56

On Tue, Oct 21, 2014 at 2:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Laurent Martelli <laurent.martelli@enercoop.org> writes:
> Do we agree that both queries are identical ?

No, they *aren't* identical. Go consult any SQL reference. Left join
conditions don't work the way you seem to be thinking: after the join,
the RHS column might be null, rather than equal to the LHS column.

For what it's worth I'd say they are identical, at least, if you discount deferring foreign key constraints or also executing the query from within a volatile function which was called by a query which just updated the user_info table to break referential integrity.

The presence of the foreign key on contract_contract.user_info which references user_user_info.id means that any non-null contract_contract.user_info record must reference a valid user_user_info record, therefore the join is not required to prove that a non nulled user_info contract records match a user info record, therefore the join to check it exists is pretty much pointless in just about all cases that you're likely to care about.

Although, saying that I'm still a bit confused about the question. Are you asking if there's some way to get PostgreSQL to run the 1st query faster? Or are you asking if both queries are equivalent?

Regards

David Rowley

Re: IS NOT NULL and LEFT JOIN

From

"Laurent Martelli"

Date:

22 October 2014, 06:29:24

Le Mardi 21 Octobre 2014 10:44 CEST, David Rowley <dgrowleyml@gmail.com> a écrit:

> For what it's worth I'd say they are identical, at least, if you discount
> deferring  foreign key constraints or also executing the query from within
> a volatile function which was called by a query which just updated the
> user_info table to break referential integrity.

I must say I had not thought of that.

> The presence of the foreign key on contract_contract.user_info which
> references user_user_info.id means that any non-null
> contract_contract.user_info record must reference a valid user_user_info
> record, therefore the join is not required to prove that a non nulled
> user_info contract records match a user info record, therefore the join to
> check it exists is pretty much pointless in just about all cases that
> you're likely to care about.
>
> Although, saying that I'm still a bit confused about the question. Are you
> asking if there's some way to get PostgreSQL to run the 1st query faster?
> Or are you asking if both queries are equivalent?

I was asking for a way to make it run faster. Given that it returns at most a few rows found by an index, I was
thinkingit could be made to run faster. 

But I agree that the query is not well written (well generated by hibernate) considering the result I want.

Regards,
Laurent