Thread: tuple compare involving NULL
hi, i guess my problem reduces to the following question, but if there is not enough detail feel free to ask for more details. the following query returns true (as expected). =# SELECT (2,4,'a string') > (2,3,'another string'); but any comparison involving NULL also returns NULL (i also kind of expected that). e.g.: =# SELECT (2,NULL, 'a') > (1, NULL, 'b'); # no 1 =# SELECT (NULL, 2, 'a') > (NULL, 1, 'b'); # no 2 =# SELECT (NULL, 1, 'b') > (NULL, 2, 'a'); # no 3 does anyone knows a way to modify the queries to return true for number 1, true for 2 and false for 3, i.e. treat NULL in a tuple such that it compares smaller than anything not NULL? i can (of course) expand the tuple compare (a1, a2, a3) > (b1, b2, b3) to =# SELECT a1 > b1 or (a1 = b1 and (a2 > b2 or (a2 = b2 and a3 > b3)) and insert appropriate COALESCEs and IS NULLs and much conditional logic. but i really hope, there is a better way. thank you in advance, tobias florek
Tobias Florek <postgres@ibotty.net> wrote: > =# SELECT (2,NULL, 'a') > (1, NULL, 'b'); # no 1 > =# SELECT (NULL, 2, 'a') > (NULL, 1, 'b'); # no 2 > =# SELECT (NULL, 1, 'b') > (NULL, 2, 'a'); # no 3 > > does anyone knows a way to modify the queries to return true for > number 1, true for 2 and false for 3, i.e. treat NULL in a tuple > such that it compares smaller than anything not NULL? I think it might make sense for the first one to return true, although I would want to closely review the spec on that. Logically, I think it *should be* equivalent to: SELECT 2 >= 1 AND (2 > 1 OR (NULL >= NULL AND (NULL > NULL or 'a' > 'b'))); ... and *that* returns true. I don't think there's any hope for the other two without using COALESCE, although you could do that within the row value constructors: SELECT (coalesce(NULL, ''), 2, 'a') > (coalesce(NULL, ''), 1, 'b'); -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tobias Florek wrote: > i guess my problem reduces to the following question, but if there is > not enough detail feel free to ask for more details. > > > the following query returns true (as expected). > > =# SELECT (2,4,'a string') > (2,3,'another string'); > > but any comparison involving NULL also returns NULL (i also kind of > expected that). e.g.: > > =# SELECT (2,NULL, 'a') > (1, NULL, 'b'); # no 1 > =# SELECT (NULL, 2, 'a') > (NULL, 1, 'b'); # no 2 > =# SELECT (NULL, 1, 'b') > (NULL, 2, 'a'); # no 3 > > does anyone knows a way to modify the queries to return true for number > 1, true for 2 and false for 3, i.e. treat NULL in a tuple such that it > compares smaller than anything not NULL? > > > i can (of course) expand the tuple compare (a1, a2, a3) > (b1, b2, b3) to > =# SELECT a1 > b1 > or (a1 = b1 and (a2 > b2 > or (a2 = b2 and a3 > b3)) > > and insert appropriate COALESCEs and IS NULLs and much conditional > logic. but i really hope, there is a better way. I can't think of any. There is IS DISTINCT FROM for "equality" involving NULLs, but that won't solve your problem. I'd say that you are basically trying to abuse NULL, which means something like "don't know" or "missing value", and that cannot be compared with other values. Something with COALESCE would maybe be the simplest workaround, like (COALESCE(a, -1000), COALESCE(b, -1000)) > (COALESCE(c, -1000), COALESCE(d, -1000)) Yours, Laurenz Albe
Tobias Florek <postgres@ibotty.net> writes: > but any comparison involving NULL also returns NULL (i also kind of > expected that). e.g.: > =# SELECT (2,NULL, 'a') > (1, NULL, 'b'); # no 1 Really? I get regression=# SELECT (2,NULL, 'a') > (1, NULL, 'b'); ?column? ---------- t (1 row) > does anyone knows a way to modify the queries to return true for number > 1, true for 2 and false for 3, i.e. treat NULL in a tuple such that it > compares smaller than anything not NULL? No, it doesn't work like that exactly. But in this example, 2 > 1 is sufficient to determine the result, so the nulls need not be compared. By and large, though, I'd say that this question is an indicator that you're misusing NULL. Rethink your data design. regards, tom lane
hi and thank you all for your replies, (you are right, that i had the first example wrong.) unfortunately > SELECT (coalesce(NULL, ''), 2, 'a') > (coalesce(NULL, ''), 1, 'b'); or > (COALESCE(a, -1000), COALESCE(b, -1000)) > (COALESCE(c, -1000), COALESCE(d, -1000)) will only work for specific data types and not, say, integers or dates. maybe a little more context might be helpful. i am trying to have reasonable efficient paging. the query i am building looks like select t.* from table t, (select a, b from table where id = ) q where (q.a, t.b, t.id) > (t.a, q.b, q.id) order by t.a desc, t.b asc, t.id asc limit 10; where t is a table with column id (primary key, serial), a and b. that works fine and efficient (given an index (a,b) on t) without NULLs, but (predictably) not in the presence of NULLs. i would certainly like to handle that better, but i don't have any ideas besides manually expanding the tuple comparison. thank you so far, tobias florek
Tobias Florek wrote: > maybe a little more context might be helpful. i am trying to have > reasonable efficient paging. the query i am building looks like > > select t.* > from table t, > (select a, b from table where id = ) q > where (q.a, t.b, t.id) > (t.a, q.b, q.id) > order by t.a desc, t.b asc, t.id asc > limit 10; > > where t is a table with column id (primary key, serial), a and b. > > that works fine and efficient (given an index (a,b) on t) without NULLs, > but (predictably) not in the presence of NULLs. > > i would certainly like to handle that better, but i don't have any ideas > besides manually expanding the tuple comparison. That would probably make it harder to use a multicolumn index correctly. The best solution would probably be to set the relevant fields NOT NULL. NULLs usually make things harder on the database side. Yours, Laurenz Albe
Tobias Florek wrote > hi and thank you all for your replies, > > (you are right, that i had the first example wrong.) > > > unfortunately > >> SELECT (coalesce(NULL, ''), 2, 'a') > (coalesce(NULL, ''), 1, 'b'); > > or > > > (COALESCE(a, -1000), COALESCE(b, -1000)) > (COALESCE(c, -1000), > COALESCE(d, -1000)) > > will only work for specific data types and not, say, integers or dates. > > > maybe a little more context might be helpful. i am trying to have > reasonable efficient paging. the query i am building looks like > > select t.* > from table t, > (select a, b from table where id = ) q > where (q.a, t.b, t.id) > (t.a, q.b, q.id) > order by t.a desc, t.b asc, t.id asc > limit 10; > > where t is a table with column id (primary key, serial), a and b. > > that works fine and efficient (given an index (a,b) on t) without NULLs, > but (predictably) not in the presence of NULLs. > > i would certainly like to handle that better, but i don't have any ideas > besides manually expanding the tuple comparison. > > thank you so far, > tobias florek This isn't really useful context, to me at least. It tells me nothing of why you need an artificial PK or why you think a and b need to allow null. The where clause is odd with its mix of q and t in the same row value and q.id is technically broken though I know this is just an example. As much as arbitrary default values suck they are at least better than null in this regard. You can use coalesce for any data type. Dave -- View this message in context: http://postgresql.1045698.n5.nabble.com/tuple-compare-involving-NULL-tp5814686p5814716.html Sent from the PostgreSQL - novice mailing list archive at Nabble.com.
Tobias Florek wrote > hi and thank you all for your replies, > > (you are right, that i had the first example wrong.) > > > unfortunately > >> SELECT (coalesce(NULL, ''), 2, 'a') > (coalesce(NULL, ''), 1, 'b'); > > or > > > (COALESCE(a, -1000), COALESCE(b, -1000)) > (COALESCE(c, -1000), > COALESCE(d, -1000)) > > will only work for specific data types and not, say, integers or dates. > > > maybe a little more context might be helpful. i am trying to have > reasonable efficient paging. the query i am building looks like > > select t.* > from table t, > (select a, b from table where id = ) q > where (q.a, t.b, t.id) > (t.a, q.b, q.id) > order by t.a desc, t.b asc, t.id asc > limit 10; > > where t is a table with column id (primary key, serial), a and b. > > that works fine and efficient (given an index (a,b) on t) without NULLs, > but (predictably) not in the presence of NULLs. > > i would certainly like to handle that better, but i don't have any ideas > besides manually expanding the tuple comparison. > > thank you so far, > tobias florek This isn't really useful context, to me at least. It tells me nothing of why you need an artificial PK or why you think a and b need to allow null. The where clause is odd with its mix of q and t in the same row value and q.id is technically broken though I know this is just an example. As much as arbitrary default values suck they are at least better than null in this regard. You can use coalesce for any data type. Dave -- View this message in context: http://postgresql.1045698.n5.nabble.com/tuple-compare-involving-NULL-tp5814686p5814717.html Sent from the PostgreSQL - novice mailing list archive at Nabble.com.
Albe Laurenz <laurenz.albe@wien.gv.at> writes: > Tobias Florek wrote: >> maybe a little more context might be helpful. i am trying to have >> reasonable efficient paging. the query i am building looks like >> >> select t.* >> from table t, >> (select a, b from table where id = ) q >> where (q.a, t.b, t.id) > (t.a, q.b, q.id) >> order by t.a desc, t.b asc, t.id asc >> limit 10; >> >> where t is a table with column id (primary key, serial), a and b. >> >> that works fine and efficient (given an index (a,b) on t) without NULLs, >> but (predictably) not in the presence of NULLs. >> >> i would certainly like to handle that better, but i don't have any ideas >> besides manually expanding the tuple comparison. > That would probably make it harder to use a multicolumn index correctly. Yeah, if you change the WHERE condition at all, it will stop matching the multicolumn index. regards, tom lane
hi, > This isn't really useful context, to me at least. sorry to hear that. > It tells me nothing of why you need an artificial PK i don't understand what you mean with artificial. id _is_ the primary key. > or why you think a and b need to allow null. they do. i don't control the database layout and i have to support pagination involving NULLs in order by columns. > The where clause is odd with its mix of q and t in the same row value it is to allow ascending order on the first and descending order on the second. > and q.id is technically broken though I know this is just an > example. why is it? whenever (q.a, q.b) = (t.a, t.b) you need another (unique) key to seek to the next row. maybe i am missing things. > As muchas arbitrary default values suck they are at least better > than null in this regard. You can use coalesce for any data type. is there a special value (of every type) that is minimal for ever comparison (except with itself)? if so i could easily compare (COALESCE(t.a, MINIMAL VALUE), COALESCE(q.b, MINIMAL VALUE), t.id) thank you for your time, tobias florek
hi, > This isn't really useful context, to me at least. sorry to hear that. > It tells me nothing of why you need an artificial PK i don't understand what you mean with artificial. id _is_ the primary key. > or why you think a and b need to allow null. they do. i don't control the database layout and i have to support pagination involving NULLs in order by columns. > The where clause is odd with its mix of q and t in the same row value it is to allow ascending order on the first and descending order on the second. > and q.id is technically broken though I know this is just an > example. why is it? whenever (q.a, q.b) = (t.a, t.b) you need another (unique) key to seek to the next row. maybe i am missing things. > As muchas arbitrary default values suck they are at least better > than null in this regard. You can use coalesce for any data type. is there a special value (of every type) that is minimal for ever comparison (except with itself)? if so i could easily compare (COALESCE(t.a, MINIMAL VALUE), COALESCE(q.b, MINIMAL VALUE), t.id) thank you for your time, tobias florek
hi,sorry to hear that.This isn't really useful context, to me at least.i don't understand what you mean with artificial. id _is_ the primary key.It tells me nothing of why you need an artificial PK
It is defined as a serial so it is "made up" - i.e., not a fundamental part of the data. Ideally you'd have another unique index on this table as well.
they do. i don't control the database layout and i have to support pagination involving NULLs in order by columns.
> or why you think a and b need to allow null.
Fair enough - though this limits your options.
it is to allow ascending order on the first and descending order on the second. why is it? whenever (q.a, q.b) = (t.a, t.b) you need another (unique) key to seek to the next row. maybe i am missing things.The where clause is odd with its mix of q and t in the same row value
The query you provided did not have a column "q.id" output from the subquery is all. You did " ( SELECT a, b FROM ... WHERE id ) q "
As muchas arbitrary default values suck they are at least better
than null in this regard. You can use coalesce for any data type.
is there a special value (of every type) that is minimal for ever comparison (except with itself)? if so i could easily compare
(COALESCE(t.a, MINIMAL VALUE), COALESCE(q.b, MINIMAL VALUE), t.id)
Not as such but in most use cases there is a value you can choose that would have the same effect. If you have some control over the schema you can add a check constraint to ensure that our chosen special value is always less than the allowed values for the relevant column.
David J.
On Wed, Aug 13, 2014 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Albe Laurenz <laurenz.albe@wien.gv.at> writes: >> Tobias Florek wrote: >>> maybe a little more context might be helpful. i am trying to have >>> reasonable efficient paging. the query i am building looks like >>> >>> select t.* >>> from table t, >>> (select a, b from table where id = ) q >>> where (q.a, t.b, t.id) > (t.a, q.b, q.id) >>> order by t.a desc, t.b asc, t.id asc >>> limit 10; >>> >>> where t is a table with column id (primary key, serial), a and b. >>> >>> that works fine and efficient (given an index (a,b) on t) without NULLs, >>> but (predictably) not in the presence of NULLs. >>> >>> i would certainly like to handle that better, but i don't have any ideas >>> besides manually expanding the tuple comparison. > >> That would probably make it harder to use a multicolumn index correctly. > > Yeah, if you change the WHERE condition at all, it will stop matching > the multicolumn index. Right -- the current behavior (row-wise compairson) was implemented precisely because it matched multi column index behaviors so that easy and cheap paging was possible without using cursors. This is a very common way of accessing data for systems converted from the older ISAM style of record navigation. So OP's gripe is invalid on it's face, the server does exactly what he wants it to do. IMO, it's very underutilized technique. If you *had* (say, if using a database without row wise comparison support) to to expand to cascading boolean logic, you'd want to write it as: SELECT a1 >= b1 AND (a1 > b1 OR a2 >= b2) AND (a1 > b1 OR a2 > b2 OR a3 > b3) ORDER BY a1,a2,a3 LIMIT k; By having the AND logic on the outside, there is at least a chance of getting use of an index on 'a1'. This by the way has nothing to do with 'keys'. It's just a mechanic for fast paging through data using an index. merlin