Re: Strange query behaviour - Mailing list pgsql-hackers

From Andrew Gierth
Subject Re: Strange query behaviour
Date
Msg-id 87va2g30wc.fsf@news-spur.riddles.org.uk
Whole thread Raw
In response to Strange query behaviour  (Isaac Morland <isaac.morland@gmail.com>)
Responses Re: Strange query behaviour  (Isaac Morland <isaac.morland@gmail.com>)
List pgsql-hackers
>>>>> "Isaac" == Isaac Morland <isaac.morland@gmail.com> writes:

 Isaac> So it is as if checking the whole tuple for NULL requires
 Isaac> reading the PDF bytea columns, but checking just the primary key
 Isaac> for NULL or even reading the lengths of the PDFs does not.

That is almost certainly exactly what happens. If the PDF columns are of
type bytea, or if they are of type text and the database encoding is
single-byte, then length() does not need to detoast the column in order
to get the size (the byte size of the toasted datum is stored in the
toast pointer).

However, constructing a whole-row datum does require detoasting any
externally-stored columns (this used not to be the case, but that caused
a lot of bugs).

 Isaac> For the moment I'm going to fix it by just using
 Isaac> "y.primary_key_column IS NULL" instead of "y IS NULL" where I
 Isaac> want to check whether I have a row from y corresponding to a
 Isaac> given row in x.

What you should actually use in these cases for your IS NULL check is
one of the columns of the join condition. That allows the planner to
detect that the query is in fact an anti-join, and optimize accordingly.

The other, and IMO better, way to write anti-join queries is to use an
explicit NOT EXISTS. (Note, do _not_ use NOT IN, since that has its own
issues with NULL handling.)

 Isaac> 1) when checking an entire row for null,

This isn't a very common operation and the SQL-standard semantics for it
are actually quite weird (for example, x IS NULL is not the opposite
condition to x IS NOT NULL). So I don't think we need to encourage it.

 Isaac> start with a primary key field or other NOT NULL field. In the
 Isaac> common case of checking what happened with a left join, this is
 Isaac> all that needs to be done - either there is a row, in which case
 Isaac> the field cannot be NULL, or there is no row and all the other
 Isaac> fields must also be NULL.

The planner can do even better than this if you apply the IS NULL test
_specifically to one of the join columns_. When a join condition is
strict (which it almost always is), then testing the column from the
nullable side is provably (to the planner) equivalent to testing the
existence of the matching row, which allows it to transform the join
from an outer join to an anti-join.

-- 
Andrew (irc:RhodiumToad)


pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Refactoring the checkpointer's fsync request queue
Next
From: Fabien COELHO
Date:
Subject: Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD