Thread: ORDER BY and NULLs
Hello, I am using PostgreSQL 7.4.2 and as I understand NULL values always sort last. However, I have a table from which select using two numerical sort keys "FROM" and "TO". "TO" might be NULL and I would like to display those rows first (without sorting the column in descending order). Is there any way this can be achieved without inserting bogus values into that column? -- Regards/Gruß, Tarlika Elisabeth Schmitz
Use the coalesce() function. (coalesce returns the first non-null value in its list) Specifically ORDER BY coalesce("TO", 0), "FROM" If you have records in "TO" column whose values is LESS then 0, then you need to replace 0 with something that sorts BEFORE the first most value that your TO result can return. Terry Fielder Manager Software Development and Deployment Great Gulf Homes / Ashton Woods Homes terry@greatgulfhomes.com Fax: (416) 441-9085 > -----Original Message----- > From: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of T E Schmitz > Sent: Sunday, September 19, 2004 10:58 AM > To: pgsql-sql@postgresql.org > Subject: [SQL] ORDER BY and NULLs > > > Hello, > > I am using PostgreSQL 7.4.2 and as I understand NULL values > always sort > last. > > However, I have a table from which select using two numerical > sort keys > "FROM" and "TO". "TO" might be NULL and I would like to display those > rows first (without sorting the column in descending order). > > Is there any way this can be achieved without inserting bogus values > into that column? > > -- > > > Regards/Gruß, > > Tarlika Elisabeth Schmitz > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
select ... order by "FROM" is not null, "FROM"; If you have large amount of rows (with or without nulls) it is faster if use a partial index. create index ... on ...("FROM"); create index ... on ...("FROM") where "FROM" is null; JLL terry@ashtonwoodshomes.com wrote: > Use the coalesce() function. (coalesce returns the first non-null value in its list) > > Specifically > > ORDER BY coalesce("TO", 0), "FROM" > > If you have records in "TO" column whose values is LESS then 0, then you need to replace 0 with > something that sorts BEFORE the first most value that your TO result can return. > > Terry Fielder > Manager Software Development and Deployment > Great Gulf Homes / Ashton Woods Homes > terry@greatgulfhomes.com > Fax: (416) 441-9085 > > > >>-----Original Message----- >>From: pgsql-sql-owner@postgresql.org >>[mailto:pgsql-sql-owner@postgresql.org]On Behalf Of T E Schmitz >>Sent: Sunday, September 19, 2004 10:58 AM >>To: pgsql-sql@postgresql.org >>Subject: [SQL] ORDER BY and NULLs >> >> >>Hello, >> >>I am using PostgreSQL 7.4.2 and as I understand NULL values >>always sort >>last. >> >>However, I have a table from which select using two numerical >>sort keys >>"FROM" and "TO". "TO" might be NULL and I would like to display those >>rows first (without sorting the column in descending order). >> >>Is there any way this can be achieved without inserting bogus values >>into that column? >> >>-- >> >> >>Regards/Gruß, >> >>Tarlika Elisabeth Schmitz >> >>---------------------------(end of >>broadcast)--------------------------- >>TIP 4: Don't 'kill -9' the postmaster >> > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html >
Hello Jean-Luc, You must've been reading my mind. I was just wondering what to do about indexing on that particular table. I read somewhere that an Index is not going to improve the performance of an ORDER BY if the sort column contains NULLs because NULLs aren't indexed? For the sake of the example I had simplified matters a wee bit. What I really have is: SELECT * FROM PRODUCT ORDER BY NAME, FROM, TO, FROM2, TO2 FROM, TO, FROM2, TO2 might be NULL. If FROM is NULL, TO will be NULL. If FROM2 is NULL, TO2 will be NULL. How would you index this table? Kind regards, Tarlika Jean-Luc Lachance wrote: > select ... order by "FROM" is not null, "FROM"; > > If you have large amount of rows (with or without nulls) it is faster if > use a partial index. > > create index ... on ...("FROM"); > create index ... on ...("FROM") where "FROM" is null; > > > JLL > > > terry@ashtonwoodshomes.com wrote: > >> Use the coalesce() function. (coalesce returns the first non-null >> value in its list) >> >> Specifically >> >> ORDER BY coalesce("TO", 0), "FROM" >> >> If you have records in "TO" column whose values is LESS then 0, then >> you need to replace 0 with >> something that sorts BEFORE the first most value that your TO result >> can return. >> >> Terry Fielder >> Manager Software Development and Deployment >> Great Gulf Homes / Ashton Woods Homes >> terry@greatgulfhomes.com >> Fax: (416) 441-9085 >> >> >> >>> -----Original Message----- >>> From: pgsql-sql-owner@postgresql.org >>> [mailto:pgsql-sql-owner@postgresql.org]On Behalf Of T E Schmitz >>> Sent: Sunday, September 19, 2004 10:58 AM >>> To: pgsql-sql@postgresql.org >>> Subject: [SQL] ORDER BY and NULLs >>> >>> >>> Hello, >>> >>> I am using PostgreSQL 7.4.2 and as I understand NULL values >>> always sort >>> last. >>> >>> However, I have a table from which select using two numerical >>> sort keys >>> "FROM" and "TO". "TO" might be NULL and I would like to display those >>> rows first (without sorting the column in descending order). >>> >>> Is there any way this can be achieved without inserting bogus values >>> into that column? >>> >>> -- >>> >>> >>> Regards/Gruß, >>> >>> Tarlika Elisabeth Schmitz
T E Schmitz <mailreg@numerixtechnology.de> writes: > You must've been reading my mind. I was just wondering what to do about > indexing on that particular table. I read somewhere that an Index is not > going to improve the performance of an ORDER BY if the sort column > contains NULLs because NULLs aren't indexed? Whatever you were reading had it pretty badly garbled :-( Btree indexes *do* store nulls, so the presence of nulls doesn't affect whether they are usable for meeting an ORDER BY spec. However the index sort order does have to exactly match the ORDER BY list, and even then it's not necessarily the case that the index is useful. The brutal fact is that seqscan-and-sort is generally faster than a full-table indexscan for large tables anyway, unless the table is clustered or otherwise roughly in order by the index. If you are going to use an ORDER BY that involves COALESCE or NOT NULL expressions, then the only way that it could be met with an index is to create an expressional index on exactly that list of expressions. For instance regression=# create table foo (f int, t int); CREATE TABLE regression=# explain select * from foo order by f, coalesce(t, -1); QUERY PLAN -------------------------------------------------------------Sort (cost=69.83..72.33 rows=1000 width=8) Sort Key: f, COALESCE(t,-1) -> Seq Scan on foo (cost=0.00..20.00 rows=1000 width=8) (3 rows) regression=# create index fooi on foo (f, (coalesce(t, -1))); CREATE INDEX regression=# explain select * from foo order by f, coalesce(t, -1); QUERY PLAN --------------------------------------------------------------------Index Scan using fooi on foo (cost=0.00..52.00 rows=1000width=8) (1 row) regression=# I'm a bit dubious that such an index would be worth its update costs, given that it's likely to be no more than a marginal win for the query. But try it and see. > Jean-Luc Lachance wrote: >> If you have large amount of rows (with or without nulls) it is faster if >> use a partial index. This advice seems entirely irrelevant to the problem of sorting the whole table... regards, tom lane
Hello Tom, Tom Lane wrote: > T E Schmitz <mailreg@numerixtechnology.de> writes: > >>I read somewhere that an Index is not >>going to improve the performance of an ORDER BY if the sort column >>contains NULLs because NULLs aren't indexed? > > Whatever you were reading had it pretty badly garbled :-( I just dug out the PostgreSQL book again because I thought I might've garbled it: Quote: "PostgreSQL will not index NULL values. Because an index will never include NULL values, it cannot be used to satisfy the ORDER BY clause of a query that returns all rows in a table." > Btree indexes *do* store nulls, so the presence of nulls doesn't affect Thank you for your explanations. At the moment the table has only 1300 entries and any query is responsive. I'm just planning ahead... -- Regards/Gruß, Tarlika Elisabeth Schmitz
T E Schmitz <mailreg@numerixtechnology.de> writes: > I just dug out the PostgreSQL book again because I thought I might've garbled > it: > > Quote: "PostgreSQL will not index NULL values. Because an index will never > include NULL values, it cannot be used to satisfy the ORDER BY clause of a > query that returns all rows in a table." You should just cross out that whole section. It's just flatly wrong. I had always assumed it was just people bringing assumptions over from Oracle where it is true. Perhaps this book is to blame for some of the confusion. Which book is it? Postgres indexes NULLs. It can use them for ORDER BY clauses. Where it cannot use them is to satisfy "WHERE foo IS NULL" or "WHERE foo IS NOT NULL" constraints though. That's an implementation detail, but it can be worked around with partial indexes. -- greg
T E Schmitz <mailreg@numerixtechnology.de> writes: > Tom Lane wrote: >> Whatever you were reading had it pretty badly garbled :-( > I just dug out the PostgreSQL book again because I thought I might've > garbled it: > Quote: "PostgreSQL will not index NULL values. Because an index will > never include NULL values, it cannot be used to satisfy the ORDER BY > clause of a query that returns all rows in a table." [ shrug ] It's wrong on both counts, and has been since (checks CVS) 1997. What book is that anyway? There is a related statement that is still true: "WHERE x IS NULL" (or NOT NULL) clauses are not indexscannable. This is a shortcoming of the planner-to-index-access-method interface, though, not a question of whether the index can store NULLs. regards, tom lane
Hello Greg, Greg Stark wrote: > T E Schmitz <mailreg@numerixtechnology.de> writes: > >>Quote: "PostgreSQL will not index NULL values. Because an index will never >>include NULL values, it cannot be used to satisfy the ORDER BY clause of a >>query that returns all rows in a table." > > > You should just cross out that whole section. It's just flatly wrong. > I had always assumed it was just people bringing assumptions over from Oracle > where it is true. Perhaps this book is to blame for some of the confusion. > Which book is it? PostgreSQL by Korry Douglas + Susan Douglas, ISBN 0-7357-1257-3; Feb 2003 > Postgres indexes NULLs. It can use them for ORDER BY clauses. > > Where it cannot use them is to satisfy "WHERE foo IS NULL" or "WHERE foo IS > NOT NULL" constraints though. That's an implementation detail, but it can be > worked around with partial indexes. The paragraph continues: "If the SELECT command included the clause WHERE phone NOT NULL, PostgreSQL could use the index to satisfy the ORDER BY clause. An index that covers optional (NOT NULL) columns will not be used to speed table joins either." -- Regards/Gruß, Tarlika Elisabeth Schmitz
T E Schmitz <mailreg@numerixtechnology.de> writes: > Greg Stark wrote: >> Which book is it? > PostgreSQL by Korry Douglas + Susan Douglas, ISBN 0-7357-1257-3; Feb 2003 Hmm, I've heard of that book but never seen it. The authors are not participants in the PG community --- AFAICT neither of them have ever posted anything in the mailing lists. > The paragraph continues: > "If the SELECT command included the clause WHERE phone NOT NULL, > PostgreSQL could use the index to satisfy the ORDER BY clause. > An index that covers optional (NOT NULL) columns will not be used to > speed table joins either." My goodness, it seems to be a veritable fount of misinformation :-( I wonder how much of this is stuff that is true for Oracle and they just assumed it carried over? regards, tom lane
Hello Tom, Tom Lane wrote: > T E Schmitz <mailreg@numerixtechnology.de> writes: > >>Greg Stark wrote: >> >>>Which book is it? > > >>PostgreSQL by Korry Douglas + Susan Douglas, ISBN 0-7357-1257-3; Feb 2003 > > > Hmm, I've heard of that book but never seen it. The authors are not > participants in the PG community --- AFAICT neither of them have ever > posted anything in the mailing lists. > > >>The paragraph continues: >>"If the SELECT command included the clause WHERE phone NOT NULL, >>PostgreSQL could use the index to satisfy the ORDER BY clause. >>An index that covers optional (NOT NULL) columns will not be used to >>speed table joins either." > > > My goodness, it seems to be a veritable fount of misinformation :-( Well, that's great. My knowledge of SQL is good enough to model a DB and do run of the mill queries; but when it comes to some fine details I rely on sensible input ;-) Thanks for chipping in here and answering what I thought was a dummy question. -- Kind Regards/Gruß, Tarlika
Tom Lane <tgl@sss.pgh.pa.us> writes: > > The paragraph continues: > > "If the SELECT command included the clause WHERE phone NOT NULL, > > PostgreSQL could use the index to satisfy the ORDER BY clause. > > An index that covers optional (NOT NULL) columns will not be used to > > speed table joins either." > > My goodness, it seems to be a veritable fount of misinformation :-( > > I wonder how much of this is stuff that is true for Oracle and they just > assumed it carried over? The first part is true for Oracle. You have to add the WHERE phone NOT NULL to convince Oracle it can use an index. Or just make the column NOT NULL to begin with I think. However as far as I recall the second part is not true. Oracle is smart enough to realize that an equijoin clause implies NOT NULL and therefore allows it to use the index. (This may have all changed in Oracle 9+. The last I saw of Oracle was 8i) I wonder if they just tried explain on a bunch of queries and noticed that postgres wasn't using an index for SELECT * FROM foo ORDER BY bar and came up with explanations for the patterns they saw? -- greg
Hello, Greg Stark wrote: > Tom Lane <tgl@sss.pgh.pa.us> writes: > > >>>The paragraph continues: >>>"If the SELECT command included the clause WHERE phone NOT NULL, >>>PostgreSQL could use the index to satisfy the ORDER BY clause. >>>An index that covers optional (NOT NULL) columns will not be used to >>>speed table joins either." >> >>My goodness, it seems to be a veritable fount of misinformation :-( >> > I wonder if they just tried explain on a bunch of queries and noticed that > postgres wasn't using an index for SELECT * FROM foo ORDER BY bar and came up > with explanations for the patterns they saw? This paragraph was in the chapter about PostgreSQL indexing and the paragraph itself was entitled "Indexes and Null values". -- Regards/Gruß, Tarlika Elisabeth Schmitz
> You should just cross out that whole section. It's just flatly wrong. > > I had always assumed it was just people bringing assumptions over from > Oracle where it is true. Perhaps this book is to blame for some of the > confusion. Which book is it? > > Postgres indexes NULLs. It can use them for ORDER BY clauses. I know this is an old-ish topic, but the question keeps coming up and I see different answers every time. I think I found the definitive answer and it looks like everyone (Bruce, Tom, the book) is half-right. Maybe this should go in a FAQ or something since there seems to be so much confusion. From section 41.3 of the documentation - this section describes the pg_am table: > An index access method that supports multiple columns > (has amcanmulticol true) must support indexing null > values in columns after the first, because the planner > will assume the index can be used for queries on just > the first column(s). For example, consider an index > on (a,b) and a query with WHERE a = 4. The system will > assume the index can be used to scan for rows > with a = 4, which is wrong if the index omits rows > where b is null. It is, however, OK to omit rows > where the first indexed column is null. (GiST > currently does so.) amindexnulls should be set true > only if the index access method indexes all rows, > including arbitrary combinations of null values. Here's what I get when I look at pg_am: select amname, amcanmulticol, amindexnulls from pg_am; amname | amcanmulticol | amindexnulls --------+---------------+-------------- rtree | f | f btree | t | t hash | f | f gist | t | f So it looks like btree indexes will index completely-NULL values, but the other types won't index a row where all of the index columns are NULL. Am I reading that right? It sounds like the explanation quoted from the book is correct for all types except for btree?
>> I just dug out the PostgreSQL book again because I thought I might've >> garbled it: >> >> Quote: "PostgreSQL will not index NULL values. Because an index will >> never include NULL values, it cannot be used to satisfy the ORDER BY >> clause of a query that returns all rows in a table." > > You should just cross out that whole section. It's just flatly wrong. > > I had always assumed it was just people bringing assumptions over from > Oracle where it is true. Perhaps this book is to blame for some of the > confusion. Which book is it? > > Postgres indexes NULLs. It can use them for ORDER BY clauses. Now I'm confused... here's a quote from Bruce Momjian from Oct. 2003: > To be specific, we do not do index NULL values in a column, but we > easily index non-null values in the column. And a comment from backend/access/gist/gist.c (appears a few times): > GiST cannot index tuples with leading NULLs So what's the story? Do GiST indexes index NULLs? Do other index types index NULLs? Is the comment wrong or am I misreading it?
>> You should just cross out that whole section. It's just flatly wrong. >> >> I had always assumed it was just people bringing assumptions over from >> Oracle where it is true. Perhaps this book is to blame for some of the >> confusion. Which book is it? >> >> Postgres indexes NULLs. It can use them for ORDER BY clauses. > > Now I'm confused... I think I found the definitive answer and it looks like everyone (Bruce, Tom, the book) is half-right. Maybe this should go in a FAQ or something since there seems to be so much confusion. From section 41.3 of the documentation - this section describes the pg_am table: > An index access method that supports multiple columns > (has amcanmulticol true) must support indexing null > values in columns after the first, because the planner > will assume the index can be used for queries on just > the first column(s). For example, consider an index > on (a,b) and a query with WHERE a = 4. The system will > assume the index can be used to scan for rows > with a = 4, which is wrong if the index omits rows > where b is null. It is, however, OK to omit rows > where the first indexed column is null. (GiST > currently does so.) amindexnulls should be set true > only if the index access method indexes all rows, > including arbitrary combinations of null values. Here's what I get when I look at pg_am: select amname, amcanmulticol, amindexnulls from pg_am;amname | amcanmulticol | amindexnulls --------+---------------+--------------rtree | f | fbtree | t | thash | f | fgist | t | f So it looks like btree indexes will index completely-NULL values, but the other types won't index a row where all of the index columns are NULL. Am I reading that right? It sounds like the explanation quoted from the book is correct for all types except for btree?