Thread: tuple compare involving NULL

tuple compare involving NULL

From
Tobias Florek
Date:
hi,

i guess my problem reduces to the following question, but if there is
not enough detail feel free to ask for more details.


the following query returns true (as expected).

=# SELECT (2,4,'a string') > (2,3,'another string');

but any comparison involving NULL also returns NULL (i also kind of
expected that). e.g.:

=# SELECT (2,NULL, 'a')  > (1, NULL, 'b');  # no 1
=# SELECT (NULL, 2, 'a') > (NULL, 1, 'b');  # no 2
=# SELECT (NULL, 1, 'b') > (NULL, 2, 'a');  # no 3

does anyone knows a way to modify the queries to return true for number
1, true for 2 and false for 3, i.e. treat NULL in a tuple such that it
compares smaller than anything not NULL?


i can (of course) expand the tuple compare (a1, a2, a3) > (b1, b2, b3) to
=# SELECT a1 > b1
        or (a1 = b1 and (a2 > b2
                         or (a2 = b2 and a3 > b3))

and insert appropriate COALESCEs and IS NULLs and much conditional
logic. but i really hope, there is a better way.

thank you in advance,
  tobias florek


Re: tuple compare involving NULL

From
Kevin Grittner
Date:
Tobias Florek <postgres@ibotty.net> wrote:

> =# SELECT (2,NULL, 'a')  > (1, NULL, 'b');  # no 1
> =# SELECT (NULL, 2, 'a') > (NULL, 1, 'b');  # no 2
> =# SELECT (NULL, 1, 'b') > (NULL, 2, 'a');  # no 3
>
> does anyone knows a way to modify the queries to return true for
> number 1, true for 2 and false for 3, i.e. treat NULL in a tuple
> such that it compares smaller than anything not NULL?

I think it might make sense for the first one to return true,
although I would want to closely review the spec on that.
Logically, I think it *should be* equivalent to:

SELECT 2 >= 1 AND (2 > 1 OR (NULL >= NULL AND (NULL > NULL or 'a' > 'b')));

... and *that* returns true.  I don't think there's any hope for
the other two without using COALESCE, although you could do that
within the row value constructors:

SELECT (coalesce(NULL, ''), 2, 'a') > (coalesce(NULL, ''), 1, 'b');

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: tuple compare involving NULL

From
Albe Laurenz
Date:
Tobias Florek wrote:
> i guess my problem reduces to the following question, but if there is
> not enough detail feel free to ask for more details.
> 
> 
> the following query returns true (as expected).
> 
> =# SELECT (2,4,'a string') > (2,3,'another string');
> 
> but any comparison involving NULL also returns NULL (i also kind of
> expected that). e.g.:
> 
> =# SELECT (2,NULL, 'a')  > (1, NULL, 'b');  # no 1
> =# SELECT (NULL, 2, 'a') > (NULL, 1, 'b');  # no 2
> =# SELECT (NULL, 1, 'b') > (NULL, 2, 'a');  # no 3
> 
> does anyone knows a way to modify the queries to return true for number
> 1, true for 2 and false for 3, i.e. treat NULL in a tuple such that it
> compares smaller than anything not NULL?
> 
> 
> i can (of course) expand the tuple compare (a1, a2, a3) > (b1, b2, b3) to
> =# SELECT a1 > b1
>         or (a1 = b1 and (a2 > b2
>                          or (a2 = b2 and a3 > b3))
> 
> and insert appropriate COALESCEs and IS NULLs and much conditional
> logic. but i really hope, there is a better way.

I can't think of any.
There is IS DISTINCT FROM for "equality" involving NULLs, but that won't solve
your problem.

I'd say that you are basically trying to abuse NULL, which means something
like "don't know" or "missing value", and that cannot be compared with other values.

Something with COALESCE would maybe be the simplest workaround, like

(COALESCE(a, -1000), COALESCE(b, -1000)) > (COALESCE(c, -1000), COALESCE(d, -1000))

Yours,
Laurenz Albe

Re: tuple compare involving NULL

From
Tom Lane
Date:
Tobias Florek <postgres@ibotty.net> writes:
> but any comparison involving NULL also returns NULL (i also kind of
> expected that). e.g.:

> =# SELECT (2,NULL, 'a')  > (1, NULL, 'b');  # no 1

Really?  I get

regression=# SELECT (2,NULL, 'a')  > (1, NULL, 'b');
 ?column?
----------
 t
(1 row)

> does anyone knows a way to modify the queries to return true for number
> 1, true for 2 and false for 3, i.e. treat NULL in a tuple such that it
> compares smaller than anything not NULL?

No, it doesn't work like that exactly.  But in this example, 2 > 1 is
sufficient to determine the result, so the nulls need not be compared.

By and large, though, I'd say that this question is an indicator that
you're misusing NULL.  Rethink your data design.

            regards, tom lane


Re: tuple compare involving NULL

From
Tobias Florek
Date:
hi and thank you all for your replies,

(you are right, that i had the first example wrong.)


unfortunately

> SELECT (coalesce(NULL, ''), 2, 'a') > (coalesce(NULL, ''), 1, 'b');

or

 > (COALESCE(a, -1000), COALESCE(b, -1000)) > (COALESCE(c, -1000),
COALESCE(d, -1000))

will only work for specific data types and not, say, integers or dates.


maybe a little more context might be helpful. i am trying to have
reasonable efficient paging. the query i am building looks like

select t.*
   from table t,
        (select a, b from table where id = ) q
   where (q.a, t.b, t.id) > (t.a, q.b, q.id)
   order by t.a desc, t.b asc, t.id asc
   limit 10;

where t is a table with column id (primary key, serial), a and b.

that works fine and efficient (given an index (a,b) on t) without NULLs,
but (predictably) not in the presence of NULLs.

i would certainly like to handle that better, but i don't have any ideas
besides manually expanding the tuple comparison.

thank you so far,
  tobias florek


Re: tuple compare involving NULL

From
Albe Laurenz
Date:
Tobias Florek wrote:
> maybe a little more context might be helpful. i am trying to have
> reasonable efficient paging. the query i am building looks like
> 
> select t.*
>    from table t,
>         (select a, b from table where id = ) q
>    where (q.a, t.b, t.id) > (t.a, q.b, q.id)
>    order by t.a desc, t.b asc, t.id asc
>    limit 10;
> 
> where t is a table with column id (primary key, serial), a and b.
> 
> that works fine and efficient (given an index (a,b) on t) without NULLs,
> but (predictably) not in the presence of NULLs.
> 
> i would certainly like to handle that better, but i don't have any ideas
> besides manually expanding the tuple comparison.

That would probably make it harder to use a multicolumn index correctly.

The best solution would probably be to set the relevant fields NOT NULL.

NULLs usually make things harder on the database side.

Yours,
Laurenz Albe

Re: tuple compare involving NULL

From
David G Johnston
Date:
Tobias Florek wrote
> hi and thank you all for your replies,
>
> (you are right, that i had the first example wrong.)
>
>
> unfortunately
>
>> SELECT (coalesce(NULL, ''), 2, 'a') > (coalesce(NULL, ''), 1, 'b');
>
> or
>
>  > (COALESCE(a, -1000), COALESCE(b, -1000)) > (COALESCE(c, -1000),
> COALESCE(d, -1000))
>
> will only work for specific data types and not, say, integers or dates.
>
>
> maybe a little more context might be helpful. i am trying to have
> reasonable efficient paging. the query i am building looks like
>
> select t.*
>    from table t,
>         (select a, b from table where id = ) q
>    where (q.a, t.b, t.id) > (t.a, q.b, q.id)
>    order by t.a desc, t.b asc, t.id asc
>    limit 10;
>
> where t is a table with column id (primary key, serial), a and b.
>
> that works fine and efficient (given an index (a,b) on t) without NULLs,
> but (predictably) not in the presence of NULLs.
>
> i would certainly like to handle that better, but i don't have any ideas
> besides manually expanding the tuple comparison.
>
> thank you so far,
>   tobias florek

This isn't really useful context, to me at least.  It tells me nothing of
why you need an artificial PK or why you think a and b need to allow null.
The where clause is odd with its mix of q and t in the same row value and
q.id is technically broken though I know this is just an example.

As much as arbitrary default values suck they are at least better than null
in this regard.  You can use coalesce for any data type.

Dave






--
View this message in context:
http://postgresql.1045698.n5.nabble.com/tuple-compare-involving-NULL-tp5814686p5814716.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.


Re: tuple compare involving NULL

From
David G Johnston
Date:
Tobias Florek wrote
> hi and thank you all for your replies,
>
> (you are right, that i had the first example wrong.)
>
>
> unfortunately
>
>> SELECT (coalesce(NULL, ''), 2, 'a') > (coalesce(NULL, ''), 1, 'b');
>
> or
>
>  > (COALESCE(a, -1000), COALESCE(b, -1000)) > (COALESCE(c, -1000),
> COALESCE(d, -1000))
>
> will only work for specific data types and not, say, integers or dates.
>
>
> maybe a little more context might be helpful. i am trying to have
> reasonable efficient paging. the query i am building looks like
>
> select t.*
>    from table t,
>         (select a, b from table where id = ) q
>    where (q.a, t.b, t.id) > (t.a, q.b, q.id)
>    order by t.a desc, t.b asc, t.id asc
>    limit 10;
>
> where t is a table with column id (primary key, serial), a and b.
>
> that works fine and efficient (given an index (a,b) on t) without NULLs,
> but (predictably) not in the presence of NULLs.
>
> i would certainly like to handle that better, but i don't have any ideas
> besides manually expanding the tuple comparison.
>
> thank you so far,
>   tobias florek

This isn't really useful context, to me at least.  It tells me nothing of
why you need an artificial PK or why you think a and b need to allow null.
The where clause is odd with its mix of q and t in the same row value and
q.id is technically broken though I know this is just an example.

As much as arbitrary default values suck they are at least better than null
in this regard.  You can use coalesce for any data type.

Dave






--
View this message in context:
http://postgresql.1045698.n5.nabble.com/tuple-compare-involving-NULL-tp5814686p5814717.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.


Re: tuple compare involving NULL

From
Tom Lane
Date:
Albe Laurenz <laurenz.albe@wien.gv.at> writes:
> Tobias Florek wrote:
>> maybe a little more context might be helpful. i am trying to have
>> reasonable efficient paging. the query i am building looks like
>>
>> select t.*
>> from table t,
>> (select a, b from table where id = ) q
>> where (q.a, t.b, t.id) > (t.a, q.b, q.id)
>> order by t.a desc, t.b asc, t.id asc
>> limit 10;
>>
>> where t is a table with column id (primary key, serial), a and b.
>>
>> that works fine and efficient (given an index (a,b) on t) without NULLs,
>> but (predictably) not in the presence of NULLs.
>>
>> i would certainly like to handle that better, but i don't have any ideas
>> besides manually expanding the tuple comparison.

> That would probably make it harder to use a multicolumn index correctly.

Yeah, if you change the WHERE condition at all, it will stop matching
the multicolumn index.

            regards, tom lane


Re: tuple compare involving NULL

From
Tobias Florek
Date:
hi,

 > This isn't really useful context, to me at least.

sorry to hear that.

 > It tells me nothing of why you need an artificial PK

i don't understand what you mean with artificial. id _is_ the primary key.


 > or why you think a and b need to allow null.

they do. i don't control the database layout and i have to support
pagination involving NULLs in order by columns.


 > The where clause is odd with its mix of q and t in the same row value

it is to allow ascending order on the first and descending order on the
second.


 > and q.id is technically broken though I know this is just an
 > example.

why is it? whenever (q.a, q.b) = (t.a, t.b) you need another (unique)
key to seek to the next row. maybe i am missing things.


 > As muchas arbitrary default values suck they are at least better
 > than null in this regard.  You can use coalesce for any data type.

is there a special value (of every type) that is minimal for ever
comparison (except with itself)? if so i could easily compare
  (COALESCE(t.a, MINIMAL VALUE), COALESCE(q.b, MINIMAL VALUE), t.id)

thank you for your time,
  tobias florek


Re: tuple compare involving NULL

From
Tob
Date:
hi,

> This isn't really useful context, to me at least.

sorry to hear that.

> It tells me nothing of why you need an artificial PK

i don't understand what you mean with artificial. id _is_ the primary key.


 > or why you think a and b need to allow null.

they do. i don't control the database layout and i have to support
pagination involving NULLs in order by columns.


> The where clause is odd with its mix of q and t in the same row value

it is to allow ascending order on the first and descending order on the
second.


> and q.id is technically broken though I know this is just an
> example.

why is it? whenever (q.a, q.b) = (t.a, t.b) you need another (unique)
key to seek to the next row. maybe i am missing things.


> As muchas arbitrary default values suck they are at least better
> than null in this regard.  You can use coalesce for any data type.

is there a special value (of every type) that is minimal for ever
comparison (except with itself)? if so i could easily compare
  (COALESCE(t.a, MINIMAL VALUE), COALESCE(q.b, MINIMAL VALUE), t.id)

thank you for your time,
  tobias florek


Re: tuple compare involving NULL

From
David Johnston
Date:
On Wed, Aug 13, 2014 at 8:24 AM, Tob <me@ibotty.net> wrote:
hi,


This isn't really useful context, to me at least.

sorry to hear that.


It tells me nothing of why you need an artificial PK

i don't understand what you mean with artificial. id _is_ the primary key.

 
​It is defined as a serial so it is "made up" - i.e., not a fundamental part of the data.  Ideally you'd have another unique index on this table as well.​

 

> or why you think a and b need to allow null.

they do. i don't control the database layout and i have to support pagination involving NULLs in order by columns.


​Fair enough - though this limits your options.


The where clause is odd with its mix of q and t in the same row value

it is to allow ascending order on the first and descending order on the second.



and q.id is technically broken though I know this is just an
example.

why is it? whenever (q.a, q.b) = (t.a, t.b) you need another (unique) key to seek to the next row. maybe i am missing things.


​The query you provided did not have a column "q.id" output from the subquery is all.  You did " ( SELECT a, b FROM ... WHERE id ) q​ "


As muchas arbitrary default values suck they are at least better

than null in this regard.  You can use coalesce for any data type.

is there a special value (of every type) that is minimal for ever comparison (except with itself)? if so i could easily compare
 (COALESCE(t.a, MINIMAL VALUE), COALESCE(q.b, MINIMAL VALUE), t.id)



​Not as such but in most use cases there is a value you can choose that would have the same effect.  If you have some control over the schema you can add a check constraint to ensure that our chosen special value is always less than the allowed values for the relevant column.

David J.​

Re: tuple compare involving NULL

From
Merlin Moncure
Date:
On Wed, Aug 13, 2014 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Albe Laurenz <laurenz.albe@wien.gv.at> writes:
>> Tobias Florek wrote:
>>> maybe a little more context might be helpful. i am trying to have
>>> reasonable efficient paging. the query i am building looks like
>>>
>>> select t.*
>>> from table t,
>>> (select a, b from table where id = ) q
>>> where (q.a, t.b, t.id) > (t.a, q.b, q.id)
>>> order by t.a desc, t.b asc, t.id asc
>>> limit 10;
>>>
>>> where t is a table with column id (primary key, serial), a and b.
>>>
>>> that works fine and efficient (given an index (a,b) on t) without NULLs,
>>> but (predictably) not in the presence of NULLs.
>>>
>>> i would certainly like to handle that better, but i don't have any ideas
>>> besides manually expanding the tuple comparison.
>
>> That would probably make it harder to use a multicolumn index correctly.
>
> Yeah, if you change the WHERE condition at all, it will stop matching
> the multicolumn index.

Right -- the current behavior (row-wise compairson) was implemented
precisely because it matched multi column index behaviors so that easy
and cheap paging was possible without using cursors.  This is a very
common way of accessing data for systems converted from the older ISAM
style of record navigation.  So OP's gripe is invalid on it's face,
the server does exactly what he wants it to do.  IMO, it's very
underutilized technique.

If you *had* (say, if using a database without row wise comparison
support) to to expand to cascading boolean logic, you'd want to write
it as:

SELECT
  a1 >= b1
  AND (a1 > b1 OR a2 >= b2)
  AND (a1 >  b1 OR a2 > b2 OR a3 > b3)
ORDER BY a1,a2,a3  LIMIT k;

By having the AND logic on the outside, there is at least a chance of
getting use of an index on 'a1'.

This by the way has nothing to do with 'keys'.  It's just a mechanic
for fast paging through data using an index.

merlin