Thread: ORDER BY and NULLs

ORDER BY and NULLs

From
T E Schmitz
Date:
Hello,

I am using PostgreSQL 7.4.2 and as I understand NULL values always sort 
last.

However, I have a table from which select using two numerical sort keys 
"FROM" and "TO". "TO" might be NULL and I would like to display those 
rows first (without sorting the column in descending order).

Is there any way this can be achieved without inserting bogus values 
into that column?

-- 


Regards/Gruß,

Tarlika Elisabeth Schmitz


Re: ORDER BY and NULLs

From
Date:
Use the coalesce() function.  (coalesce returns the first non-null value in its list)

Specifically

ORDER BY coalesce("TO", 0), "FROM"

If you have records in "TO" column whose values is LESS then 0, then you need to replace 0 with
something that sorts BEFORE the first most value that your TO result can return.

Terry Fielder
Manager Software Development and Deployment
Great Gulf Homes / Ashton Woods Homes
terry@greatgulfhomes.com
Fax: (416) 441-9085


> -----Original Message-----
> From: pgsql-sql-owner@postgresql.org
> [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of T E Schmitz
> Sent: Sunday, September 19, 2004 10:58 AM
> To: pgsql-sql@postgresql.org
> Subject: [SQL] ORDER BY and NULLs
>
>
> Hello,
>
> I am using PostgreSQL 7.4.2 and as I understand NULL values
> always sort
> last.
>
> However, I have a table from which select using two numerical
> sort keys
> "FROM" and "TO". "TO" might be NULL and I would like to display those
> rows first (without sorting the column in descending order).
>
> Is there any way this can be achieved without inserting bogus values
> into that column?
>
> --
>
>
> Regards/Gruß,
>
> Tarlika Elisabeth Schmitz
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>



Re: ORDER BY and NULLs

From
Jean-Luc Lachance
Date:
select ... order by "FROM" is not null, "FROM";

If you have large amount of rows (with or without nulls) it is faster if 
use a partial index.

create index ... on ...("FROM");
create index ... on ...("FROM") where "FROM" is null;


JLL


terry@ashtonwoodshomes.com wrote:

> Use the coalesce() function.  (coalesce returns the first non-null value in its list)
> 
> Specifically
> 
> ORDER BY coalesce("TO", 0), "FROM"
> 
> If you have records in "TO" column whose values is LESS then 0, then you need to replace 0 with
> something that sorts BEFORE the first most value that your TO result can return.
> 
> Terry Fielder
> Manager Software Development and Deployment
> Great Gulf Homes / Ashton Woods Homes
> terry@greatgulfhomes.com
> Fax: (416) 441-9085
> 
> 
> 
>>-----Original Message-----
>>From: pgsql-sql-owner@postgresql.org
>>[mailto:pgsql-sql-owner@postgresql.org]On Behalf Of T E Schmitz
>>Sent: Sunday, September 19, 2004 10:58 AM
>>To: pgsql-sql@postgresql.org
>>Subject: [SQL] ORDER BY and NULLs
>>
>>
>>Hello,
>>
>>I am using PostgreSQL 7.4.2 and as I understand NULL values
>>always sort
>>last.
>>
>>However, I have a table from which select using two numerical
>>sort keys
>>"FROM" and "TO". "TO" might be NULL and I would like to display those
>>rows first (without sorting the column in descending order).
>>
>>Is there any way this can be achieved without inserting bogus values
>>into that column?
>>
>>--
>>
>>
>>Regards/Gruß,
>>
>>Tarlika Elisabeth Schmitz
>>
>>---------------------------(end of
>>broadcast)---------------------------
>>TIP 4: Don't 'kill -9' the postmaster
>>
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
>                http://www.postgresql.org/docs/faqs/FAQ.html
> 


Re: ORDER BY and NULLs

From
T E Schmitz
Date:
Hello Jean-Luc,
You must've been reading my mind. I was just wondering what to do about 
indexing on that particular table. I read somewhere that an Index is not 
going to improve the performance of an ORDER BY if the sort column 
contains NULLs because NULLs aren't indexed?

For the sake of the example I had simplified matters a wee bit. What I 
really have is:

SELECT * FROM PRODUCT ORDER BY NAME, FROM, TO, FROM2, TO2

FROM, TO, FROM2, TO2 might be NULL. If FROM is NULL, TO will be NULL. If 
FROM2 is NULL, TO2 will be NULL.

How would you index this table?

Kind regards,
Tarlika

Jean-Luc Lachance wrote:

> select ... order by "FROM" is not null, "FROM";
> 
> If you have large amount of rows (with or without nulls) it is faster if 
> use a partial index.
> 
> create index ... on ...("FROM");
> create index ... on ...("FROM") where "FROM" is null;
> 
> 
> JLL
> 
> 
> terry@ashtonwoodshomes.com wrote:
> 
>> Use the coalesce() function.  (coalesce returns the first non-null 
>> value in its list)
>>
>> Specifically
>>
>> ORDER BY coalesce("TO", 0), "FROM"
>>
>> If you have records in "TO" column whose values is LESS then 0, then 
>> you need to replace 0 with
>> something that sorts BEFORE the first most value that your TO result 
>> can return.
>>
>> Terry Fielder
>> Manager Software Development and Deployment
>> Great Gulf Homes / Ashton Woods Homes
>> terry@greatgulfhomes.com
>> Fax: (416) 441-9085
>>
>>
>>
>>> -----Original Message-----
>>> From: pgsql-sql-owner@postgresql.org
>>> [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of T E Schmitz
>>> Sent: Sunday, September 19, 2004 10:58 AM
>>> To: pgsql-sql@postgresql.org
>>> Subject: [SQL] ORDER BY and NULLs
>>>
>>>
>>> Hello,
>>>
>>> I am using PostgreSQL 7.4.2 and as I understand NULL values
>>> always sort
>>> last.
>>>
>>> However, I have a table from which select using two numerical
>>> sort keys
>>> "FROM" and "TO". "TO" might be NULL and I would like to display those
>>> rows first (without sorting the column in descending order).
>>>
>>> Is there any way this can be achieved without inserting bogus values
>>> into that column?
>>>
>>> -- 
>>>
>>>
>>> Regards/Gruß,
>>>
>>> Tarlika Elisabeth Schmitz


Re: ORDER BY and NULLs

From
Tom Lane
Date:
T E Schmitz <mailreg@numerixtechnology.de> writes:
> You must've been reading my mind. I was just wondering what to do about 
> indexing on that particular table. I read somewhere that an Index is not 
> going to improve the performance of an ORDER BY if the sort column 
> contains NULLs because NULLs aren't indexed?

Whatever you were reading had it pretty badly garbled :-(

Btree indexes *do* store nulls, so the presence of nulls doesn't affect
whether they are usable for meeting an ORDER BY spec.  However the index
sort order does have to exactly match the ORDER BY list, and even then
it's not necessarily the case that the index is useful.  The brutal fact
is that seqscan-and-sort is generally faster than a full-table indexscan
for large tables anyway, unless the table is clustered or otherwise
roughly in order by the index.

If you are going to use an ORDER BY that involves COALESCE or NOT NULL
expressions, then the only way that it could be met with an index is to
create an expressional index on exactly that list of expressions.  For
instance

regression=# create table foo (f int, t int);
CREATE TABLE
regression=# explain select * from foo order by f, coalesce(t, -1);                        QUERY PLAN
-------------------------------------------------------------Sort  (cost=69.83..72.33 rows=1000 width=8)  Sort Key: f,
COALESCE(t,-1)  ->  Seq Scan on foo  (cost=0.00..20.00 rows=1000 width=8)
 
(3 rows)

regression=# create index fooi on foo (f, (coalesce(t, -1)));
CREATE INDEX
regression=# explain select * from foo order by f, coalesce(t, -1);                            QUERY PLAN
--------------------------------------------------------------------Index Scan using fooi on foo  (cost=0.00..52.00
rows=1000width=8)
 
(1 row)

regression=#

I'm a bit dubious that such an index would be worth its update costs,
given that it's likely to be no more than a marginal win for the query.
But try it and see.


> Jean-Luc Lachance wrote:
>> If you have large amount of rows (with or without nulls) it is faster if 
>> use a partial index.

This advice seems entirely irrelevant to the problem of sorting the
whole table...
        regards, tom lane


Re: ORDER BY and NULLs

From
T E Schmitz
Date:
Hello Tom,

Tom Lane wrote:

> T E Schmitz <mailreg@numerixtechnology.de> writes:
> 
>>I read somewhere that an Index is not 
>>going to improve the performance of an ORDER BY if the sort column 
>>contains NULLs because NULLs aren't indexed?
> 
> Whatever you were reading had it pretty badly garbled :-(

I just dug out the PostgreSQL book again because I thought I might've 
garbled it:

Quote: "PostgreSQL will not index NULL values. Because an index will 
never include NULL values, it cannot be used to satisfy the ORDER BY 
clause of a query that returns all rows in a table."


> Btree indexes *do* store nulls, so the presence of nulls doesn't affect

Thank you for your explanations. At the moment the table has only 1300 
entries and any query is responsive. I'm just planning ahead...

-- 


Regards/Gruß,

Tarlika Elisabeth Schmitz


Re: ORDER BY and NULLs

From
Greg Stark
Date:
T E Schmitz <mailreg@numerixtechnology.de> writes:

> I just dug out the PostgreSQL book again because I thought I might've garbled
> it:
> 
> Quote: "PostgreSQL will not index NULL values. Because an index will never
> include NULL values, it cannot be used to satisfy the ORDER BY clause of a
> query that returns all rows in a table."

You should just cross out that whole section. It's just flatly wrong. 

I had always assumed it was just people bringing assumptions over from Oracle
where it is true. Perhaps this book is to blame for some of the confusion.
Which book is it?

Postgres indexes NULLs. It can use them for ORDER BY clauses. 

Where it cannot use them is to satisfy "WHERE foo IS NULL" or "WHERE foo IS
NOT NULL" constraints though. That's an implementation detail, but it can be
worked around with partial indexes.

-- 
greg



Re: ORDER BY and NULLs

From
Tom Lane
Date:
T E Schmitz <mailreg@numerixtechnology.de> writes:
> Tom Lane wrote:
>> Whatever you were reading had it pretty badly garbled :-(

> I just dug out the PostgreSQL book again because I thought I might've 
> garbled it:

> Quote: "PostgreSQL will not index NULL values. Because an index will 
> never include NULL values, it cannot be used to satisfy the ORDER BY 
> clause of a query that returns all rows in a table."

[ shrug ]  It's wrong on both counts, and has been since (checks CVS) 1997.
What book is that anyway?

There is a related statement that is still true: "WHERE x IS NULL"
(or NOT NULL) clauses are not indexscannable.  This is a shortcoming of
the planner-to-index-access-method interface, though, not a question of
whether the index can store NULLs.
        regards, tom lane


Re: ORDER BY and NULLs

From
T E Schmitz
Date:
Hello Greg,

Greg Stark wrote:

> T E Schmitz <mailreg@numerixtechnology.de> writes:
> 
>>Quote: "PostgreSQL will not index NULL values. Because an index will never
>>include NULL values, it cannot be used to satisfy the ORDER BY clause of a
>>query that returns all rows in a table."
> 
> 
> You should just cross out that whole section. It's just flatly wrong. 
> I had always assumed it was just people bringing assumptions over from Oracle
> where it is true. Perhaps this book is to blame for some of the confusion.
> Which book is it?

PostgreSQL by Korry Douglas + Susan Douglas, ISBN 0-7357-1257-3; Feb 2003

> Postgres indexes NULLs. It can use them for ORDER BY clauses. 
> 
> Where it cannot use them is to satisfy "WHERE foo IS NULL" or "WHERE foo IS
> NOT NULL" constraints though. That's an implementation detail, but it can be
> worked around with partial indexes.

The paragraph continues:
"If the SELECT command included the clause WHERE phone NOT NULL, 
PostgreSQL could use the index to satisfy the ORDER BY clause.
An index that covers optional (NOT NULL) columns will not be used to 
speed table joins either."

-- 

Regards/Gruß,

Tarlika Elisabeth Schmitz


Re: ORDER BY and NULLs

From
Tom Lane
Date:
T E Schmitz <mailreg@numerixtechnology.de> writes:
> Greg Stark wrote:
>> Which book is it?

> PostgreSQL by Korry Douglas + Susan Douglas, ISBN 0-7357-1257-3; Feb 2003

Hmm, I've heard of that book but never seen it.  The authors are not
participants in the PG community --- AFAICT neither of them have ever
posted anything in the mailing lists.

> The paragraph continues:
> "If the SELECT command included the clause WHERE phone NOT NULL, 
> PostgreSQL could use the index to satisfy the ORDER BY clause.
> An index that covers optional (NOT NULL) columns will not be used to 
> speed table joins either."

My goodness, it seems to be a veritable fount of misinformation :-(

I wonder how much of this is stuff that is true for Oracle and they just
assumed it carried over?
        regards, tom lane


Re: ORDER BY and NULLs

From
T E Schmitz
Date:
Hello Tom,

Tom Lane wrote:
> T E Schmitz <mailreg@numerixtechnology.de> writes:
> 
>>Greg Stark wrote:
>>
>>>Which book is it?
> 
> 
>>PostgreSQL by Korry Douglas + Susan Douglas, ISBN 0-7357-1257-3; Feb 2003
> 
> 
> Hmm, I've heard of that book but never seen it.  The authors are not
> participants in the PG community --- AFAICT neither of them have ever
> posted anything in the mailing lists.
> 
> 
>>The paragraph continues:
>>"If the SELECT command included the clause WHERE phone NOT NULL, 
>>PostgreSQL could use the index to satisfy the ORDER BY clause.
>>An index that covers optional (NOT NULL) columns will not be used to 
>>speed table joins either."
> 
> 
> My goodness, it seems to be a veritable fount of misinformation :-(


Well, that's great. My knowledge of SQL is good enough to model a DB and 
do run of the mill queries; but when it comes to some fine details I 
rely on sensible input ;-)

Thanks for chipping in here and answering what I thought was a dummy 
question.

-- 


Kind Regards/Gruß,

Tarlika


Re: ORDER BY and NULLs

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> > The paragraph continues:
> > "If the SELECT command included the clause WHERE phone NOT NULL, 
> > PostgreSQL could use the index to satisfy the ORDER BY clause.
> > An index that covers optional (NOT NULL) columns will not be used to 
> > speed table joins either."
> 
> My goodness, it seems to be a veritable fount of misinformation :-(
> 
> I wonder how much of this is stuff that is true for Oracle and they just
> assumed it carried over?

The first part is true for Oracle. You have to add the WHERE phone NOT NULL to
convince Oracle it can use an index. Or just make the column NOT NULL to begin
with I think.

However as far as I recall the second part is not true. Oracle is smart enough
to realize that an equijoin clause implies NOT NULL and therefore allows it to
use the index.

(This may have all changed in Oracle 9+. The last I saw of Oracle was 8i)

I wonder if they just tried explain on a bunch of queries and noticed that
postgres wasn't using an index for SELECT * FROM foo ORDER BY bar and came up
with explanations for the patterns they saw?

-- 
greg



Re: ORDER BY and NULLs

From
T E Schmitz
Date:
Hello,

Greg Stark wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> 
>>>The paragraph continues:
>>>"If the SELECT command included the clause WHERE phone NOT NULL, 
>>>PostgreSQL could use the index to satisfy the ORDER BY clause.
>>>An index that covers optional (NOT NULL) columns will not be used to 
>>>speed table joins either."
>>
>>My goodness, it seems to be a veritable fount of misinformation :-(
>>

> I wonder if they just tried explain on a bunch of queries and noticed that
> postgres wasn't using an index for SELECT * FROM foo ORDER BY bar and came up
> with explanations for the patterns they saw?

This paragraph was in the chapter about PostgreSQL indexing and the 
paragraph itself was entitled "Indexes and Null values".

-- 


Regards/Gruß,

Tarlika Elisabeth Schmitz


Re: ORDER BY and NULLs

From
Murphy Pope
Date:
> You should just cross out that whole section. It's just flatly wrong.
> 
> I had always assumed it was just people bringing assumptions over from
> Oracle where it is true. Perhaps this book is to blame for some of the
> confusion. Which book is it?
> 
> Postgres indexes NULLs. It can use them for ORDER BY clauses.

I know this is an old-ish topic, but the question keeps coming up and I see
different answers every time.

I think I found the definitive answer and it looks like everyone (Bruce,
Tom, the book) is half-right.  Maybe this should go in a FAQ or something
since there seems to be so much confusion.

From section 41.3 of the documentation - this section describes the pg_am
table:

> An index access method that supports multiple columns 
> (has amcanmulticol true) must  support indexing null 
> values in columns after the first, because the planner 
> will assume the index can be used for queries on just 
> the first column(s). For example, consider an index 
> on (a,b) and a query with WHERE a = 4. The system will 
> assume the index can be used to scan for rows 
> with a = 4, which is wrong if the index omits rows 
> where b is null. It is, however, OK to omit rows 
> where the first indexed column is null. (GiST 
> currently does so.) amindexnulls should be set true 
> only if the index access method indexes all rows, 
> including arbitrary combinations of null values.

Here's what I get when I look at pg_am:

select amname, amcanmulticol, amindexnulls from pg_am;
 amname | amcanmulticol | amindexnulls
--------+---------------+--------------
 rtree  | f             | f
 btree  | t             | t
 hash   | f             | f
 gist   | t             | f

So it looks like btree indexes will index completely-NULL values, but the
other types won't index a row where all of the index columns are NULL.

Am I reading that right?

It sounds like the explanation quoted from the book is correct for all types
except for btree?




Re: ORDER BY and NULLs

From
Murphy Pope
Date:
>> I just dug out the PostgreSQL book again because I thought I might've
>> garbled it:
>> 
>> Quote: "PostgreSQL will not index NULL values. Because an index will
>> never include NULL values, it cannot be used to satisfy the ORDER BY
>> clause of a query that returns all rows in a table."
> 
> You should just cross out that whole section. It's just flatly wrong.
> 
> I had always assumed it was just people bringing assumptions over from
> Oracle where it is true. Perhaps this book is to blame for some of the
> confusion. Which book is it?
> 
> Postgres indexes NULLs. It can use them for ORDER BY clauses.

Now I'm confused...  here's a quote from Bruce Momjian from Oct. 2003:

> To be specific, we do not do index NULL values in a column, but we
> easily index non-null values in the column.

And a comment from backend/access/gist/gist.c (appears a few times):

> GiST cannot index tuples with leading NULLs

So what's the story?  Do GiST indexes index NULLs? Do other index types
index NULLs? Is the comment wrong or am I misreading it?







Re: ORDER BY and NULLs

From
Murphy Pope
Date:
>> You should just cross out that whole section. It's just flatly wrong.
>> 
>> I had always assumed it was just people bringing assumptions over from
>> Oracle where it is true. Perhaps this book is to blame for some of the
>> confusion. Which book is it?
>> 
>> Postgres indexes NULLs. It can use them for ORDER BY clauses.
> 
> Now I'm confused...  

I think I found the definitive answer and it looks like everyone (Bruce,
Tom, the book) is half-right.  Maybe this should go in a FAQ or something
since there seems to be so much confusion. 

From section 41.3 of the documentation - this section describes the pg_am
table:

> An index access method that supports multiple columns 
> (has amcanmulticol true) must  support indexing null 
> values in columns after the first, because the planner 
> will assume the index can be used for queries on just 
> the first column(s). For example, consider an index 
> on (a,b) and a query with WHERE a = 4. The system will 
> assume the index can be used to scan for rows 
> with a = 4, which is wrong if the index omits rows 
> where b is null. It is, however, OK to omit rows 
> where the first indexed column is null. (GiST 
> currently does so.) amindexnulls should be set true 
> only if the index access method indexes all rows, 
> including arbitrary combinations of null values.

Here's what I get when I look at pg_am:

select amname, amcanmulticol, amindexnulls from pg_am;amname | amcanmulticol | amindexnulls
--------+---------------+--------------rtree  | f             | fbtree  | t             | thash   | f             |
fgist  | t             | f
 

So it looks like btree indexes will index completely-NULL values, but the
other types won't index a row where all of the index columns are NULL.

Am I reading that right?

It sounds like the explanation quoted from the book is correct for all types
except for btree?