Strange query behaviour - Mailing list pgsql-hackers

From Isaac Morland
Subject Strange query behaviour
Date
Msg-id CAMsGm5fzAJZYQi83uu75Fy8MhckwhCHgAQcRSUHGXbmC3H441g@mail.gmail.com
Whole thread Raw
Responses Re: Strange query behaviour  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
List pgsql-hackers
I'm finding a massive difference in query execution time between two queries that should be identical:

Very slow:
select ... from x natural left join y where y is null

Fast:
select ... from x natural left join y where y.primary_key_column is null

A fact that I suspect is important is that y has a column whose contents is PDFs with a total size of  35608033659. However, I can query that size using a query that looks like this:

select sum (length (pdf_field_1) + length (pdf_field_2)) from y

This runs very fast (2.8ms for 2324 rows).

So it is as if checking the whole tuple for NULL requires reading the PDF bytea columns, but checking just the primary key for NULL or even reading the lengths of the PDFs does not.

For the moment I'm going to fix it by just using "y.primary_key_column IS NULL" instead of "y IS NULL" where I want to check whether I have a row from y corresponding to a given row in x. But this seems like strange behaviour. I can think of a couple of potential enhancements that this suggests:

1) when checking an entire row for null, start with a primary key field or other NOT NULL field. In the common case of checking what happened with a left join, this is all that needs to be done - either there is a row, in which case the field cannot be NULL, or there is no row and all the other fields must also be NULL.

2) when checking a field for NULL, is it really necessary to load the field contents? It feels like whether or not a value is NULL should be possible to determine without de-toasting (if I have the right terminology).

Any ideas anybody might have would be much appreciated.

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: COPY FROM WHEN condition
Next
From: Andrew Dunstan
Date:
Subject: Re: Thread-unsafe coding in ecpg