Re: bug in query planning? - Mailing list pgsql-general

From Tom Lane
Subject Re: bug in query planning?
Date
Msg-id 12673.1072244711@sss.pgh.pa.us
Whole thread Raw
In response to Re: bug in query planning?  (DeJuan Jackson <djackson@speedfc.com>)
Responses Re: bug in query planning?
List pgsql-general
DeJuan Jackson <djackson@speedfc.com> writes:
> Query 1:
> SELECT COUNT(message_id)
>  FROM messages m
>       LEFT JOIN accounts a
>        ON  m.account_id::bigint = a.account_id::bigint
>  WHERE a.email = 'stevena@neosynapse.net';

> Query 2:
> SELECT COUNT(message_id)
>  FROM accounts a
>       LEFT JOIN messages m
>        ON  a.account_id::bigint = m.account_id::bigint
>  WHERE a.email = 'stevena@neosynapse.net';

> Query 3:
> SELECT COUNT(message_id)
>  FROM messages m, accounts a
>  WHERE m.account_id::bigint = a.account_id::bigint
>        AND a.email = 'stevena@neosynapse.net';

>  From what I can see they are not the same query and therefore shouldn't
> use the same plan.

Actually, queries 1 and 3 are equivalent, and I believe PG 7.4 will
recognize them as such.  The reason is that the WHERE clause "a.email =
'something'" cannot succeed when a.email is NULL; therefore, there is no
point in the JOIN being a LEFT JOIN --- any null-extended rows added by
the left join will be thrown away again by the WHERE clause.  We may as
well reduce the LEFT JOIN to a plain inner JOIN, whereupon query 1 is
obviously the same as query 3.  PG 7.4's optimizer can make exactly this
sequence of deductions.  The bit of knowledge it needs for this is that
the '=' operator involved is STRICT, ie, yields NULL for NULL input.
All the standard '=' operators are strict and are so marked in the
catalogs.  (If you are defining a user-defined type, don't forget to
mark your operators strict where applicable.)

I believe that query 2 is really equivalent to the others as well, but
proving it is more subtle.  The reason is that COUNT(message_id) does
not count rows where message_id is NULL, and so any null-extended rows
added by the LEFT JOIN won't be counted, and so we might as well reduce
the LEFT JOIN to a plain inner JOIN.  PG's optimizer will not recognize
this, however.  Possibly it could if anyone wanted to figure out how.
Right now we make very few assumptions about the behavior of aggregate
functions, but I think you could prove that this is safe based on the
behavior of nodeAgg.c for strict transition functions.  Next question
is whether the case would come up often enough to be worth testing
for ...

            regards, tom lane

pgsql-general by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: How to hide database structure
Next
From: Karam Chand
Date:
Subject: Where do I get Windows libpq and header files?