Thread: null and =

null and =

From
Slavica Stefic
Date:
Hello,

is null = null true or also null ?

more precisely I have this kind of situation in a mission critical
project and I'm,sadly, not an expert in SQL.
But until now I used null values with a specific meaning in my database,
and I didn't knew that
I would come in this situation:

=>create table dummy (a int, b int);
insert into dummy values (1);
insert into dummy values (2);
insert into dummy values (3);


--- this work as expected
=>select * from dummy where a = 1 and a in (select a from dummy where a
!= 3 );
a|b
-+-
1|
(1 row)

--- this one also
=> select a from dummy where a = 1 intersect select a from dummy where a
!= 3 ;
a
-
1
(1 row)

---- !!!!!!!!
=> select a,b from dummy where a = 1 intersect select a,b from dummy
where a != 3 ;
a|b
-+-
(0 rows)

it appears that two null records are not equal if they are both null.
 I tried also
select b = b from dummy where b is null;
and I get three empty rows.

First question: is this correct? is this SQL conformant?
        2:  if I change the sources for the operator to compare nulls as
they where values
                will it have too many side effects?

one possibility I have is to create a new type with a custum operator
"=" for each field type I use
in this compond filter query but I'd like to know if there are other
solutions before I start to do
this long coding.


I would appreciate very much and kind of help.
Thanks in advance

Marko Mikulicic


Re: [GENERAL] null and =

From
Herbert Liechti
Date:
Slavica Stefic wrote:


> is null = null true or also null?

A null value represents either the absence of data or
an unknown value. A null value is not
the same as a zero or a blank.

Thus two tuples with null values are never the same and can not
be joined.

> it appears that two null records are not equal if they are both null.
> First question: is this correct? is this SQL conformant?

yes

>         2:  if I change the sources for the operator to compare nulls as
> they where values
>                 will it have too many side effects?

Don't do that. You will get a desaster ;-)

Herbie

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Herbert Liechti                     E-Mail: Herbert.Liechti@thinx.ch
ThinX networked business services        Stahlrain 10, CH-5200 Brugg
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Re: [GENERAL] null and =

From
Mike Mascari
Date:
Slavica Stefic wrote:

> Hello,
>
> is null = null true or also null ?

NULL = NULL is also NULL (or more explicity, UNKNOWN, implemented as NULL).
Since NULL means "unknown/not applicable" we don't know whether two
"unknowns" are, in fact, equal, and that is what the standard says - at
least according to Joe Celko...

>
> more precisely I have this kind of situation in a mission critical
> project and I'm,sadly, not an expert in SQL.
> But until now I used null values with a specific meaning in my database,
> and I didn't knew that
> I would come in this situation:
>
> =>create table dummy (a int, b int);
> insert into dummy values (1);
> insert into dummy values (2);
> insert into dummy values (3);
>
> --- this work as expected
> =>select * from dummy where a = 1 and a in (select a from dummy where a
> != 3 );
> a|b
> -+-
> 1|
> (1 row)
>
> --- this one also
> => select a from dummy where a = 1 intersect select a from dummy where a
> != 3 ;
> a
> -
> 1
> (1 row)
>
> ---- !!!!!!!!
> => select a,b from dummy where a = 1 intersect select a,b from dummy
> where a != 3 ;
> a|b
> -+-
> (0 rows)

I would avoid using the INTERSECT/EXCEPT code since the query rewriter
rewrites these to IN clauses which cannot use indexes. As soon as the tables
grow beyond more than a couple hundred rows, the statment becomes unusable.
Instead, I would use a correlated subquery with an EXISTS/NOT EXISTS test
against the criteria for which you are searching:

SELECT t1.a, t1.b FROM dummy t1
WHERE EXISTS (SELECT t2.a FROM dummy t2 WHERE t1.a = t2.a)
....

then, if you need a comparison of the entire row in the correlated subquery,
you could use a clause such as

SELECT t1.a, t1.b FROM dummy t1
WHERE EXISTS (SELECT t2.a FROM dummy t2 WHERE t1.a = t2.a AND
t1.b IS NULL and t2.b IS NULL);

Hope that helps,

Mike



Re: [GENERAL] null and =

From
Slavica Stefic
Date:
Mike Mascari wrote:

>
> I would avoid using the INTERSECT/EXCEPT code since the query rewriter
> rewrites these to IN clauses which cannot use indexes. As soon as the tables
> grow beyond more than a couple hundred rows, the statment becomes unusable.
> Instead, I would use a correlated subquery with an EXISTS/NOT EXISTS test
> against the criteria for which you are searching:
>
> SELECT t1.a, t1.b FROM dummy t1
> WHERE EXISTS (SELECT t2.a FROM dummy t2 WHERE t1.a = t2.a)
> ....
>
> then, if you need a comparison of the entire row in the correlated subquery,
> you could use a clause such as
>
> SELECT t1.a, t1.b FROM dummy t1
> WHERE EXISTS (SELECT t2.a FROM dummy t2 WHERE t1.a = t2.a AND
> t1.b IS NULL and t2.b IS NULL);
>
> Hope that helps,
>
> Mike

Yes, thanks.
But there is still a problem.  I'm generating queries from user selection and
linking them in a serie.
Using intersect this was very simple (yes, it was slow,but flexible). Using the
form you proposed queries
has to be nested and in every level the table should have an unique name and a
t1.a = t2.a .. tn-1.a = tn.a
has to be generated, which is not a big problem but I have really too much of
this "query building blocks"
and rewrite them to coform the new model will be painful and buggy (many
queries are too long to fit in
a row, and what you don't see well you don't write well :-) ). I hoped that
there was a "plugin" solution
but Murphy was right :-)

Another question: how does the rewriter rewrite the intersect/except query if
it has many fields?
as far as I know the IN clause accepts only subqueries wich return one column
only.
Can I someway (assuming that performance is not a point) force INTERSECT
rewriter to generate code
that confront only the primary key and not all the fields?

Thanks.

Marko Mikulicic



Re: [GENERAL] null and =

From
Lincoln Yeoh
Date:
At 04:59 PM 05-12-1999 -0500, Mike Mascari wrote:
>Instead, I would use a correlated subquery with an EXISTS/NOT EXISTS test
>against the criteria for which you are searching:
>
>SELECT t1.a, t1.b FROM dummy t1
>WHERE EXISTS (SELECT t2.a FROM dummy t2 WHERE t1.a = t2.a)
>....
>

I've got two tables, the first table contains "user accounts" and I'd like
to insert rows in the second table if necessary (if the userids don't exist
in the second table).

My query to find out which ids need to be inserted is something like this:
select usr_id,usr_login from users where usr_id not in (select userid from
wm_accounts);

(first table = users, second table = wm_accounts)

Is there a way of doing this efficiently and effectively? I cannot use
rules because I need to insert some extra nondatabase data into the second
table.

I would prefer a query which can work if the second table is null. The
problem with other queries I thought of was they would return null if the
second table is null.

I could in theory modify the program which inserts rows to the first table
and make it insert rows to the second table, however I'd like to keep a
functional and logical separation between the two programs (they aren't
supposed to be doing each others jobs).

Any suggestions?

Thanks,
Link.



Re: [GENERAL] null and =

From
Mike Mascari
Date:
Lincoln Yeoh wrote:

> At 04:59 PM 05-12-1999 -0500, Mike Mascari wrote:
> >Instead, I would use a correlated subquery with an EXISTS/NOT EXISTS test
> >against the criteria for which you are searching:
> >
> >SELECT t1.a, t1.b FROM dummy t1
> >WHERE EXISTS (SELECT t2.a FROM dummy t2 WHERE t1.a = t2.a)
> >....
> >
>
> I've got two tables, the first table contains "user accounts" and I'd like
> to insert rows in the second table if necessary (if the userids don't exist
> in the second table).
>
> My query to find out which ids need to be inserted is something like this:
> select usr_id,usr_login from users where usr_id not in (select userid from
> wm_accounts);
>
> (first table = users, second table = wm_accounts)
>
> Is there a way of doing this efficiently and effectively? I cannot use
> rules because I need to insert some extra nondatabase data into the second
> table.
>

I would just do:

SELECT users.usr_id, users.usr_login FROM users WHERE NOT EXISTS
 (SELECT wm_accounts.usr_id FROM wm_accounts WHERE
  wm_accounts.usr_id = users.usr_id);

>
> I would prefer a query which can work if the second table is null. The
> problem with other queries I thought of was they would return null if the
> second table is null.
>

I'm not sure what you mean here. Do you mean the table doesn't exist, is empty,
or that the usr_id field is NULL?

> I could in theory modify the program which inserts rows to the first table
> and make it insert rows to the second table, however I'd like to keep a
> functional and logical separation between the two programs (they aren't
> supposed to be doing each others jobs).
>

What happens if a record is added to wm_accounts? Do you want a record to also
be created in users as well, or is users a superset with respect to wm_accounts
and usr_id?

>
> Any suggestions?
>
> Thanks,
> Link.
>




Re: [GENERAL] null and =

From
Lincoln Yeoh
Date:
At 10:39 PM 05-12-1999 -0500, Mike Mascari wrote:
>I would just do:
>
>SELECT users.usr_id, users.usr_login FROM users WHERE NOT EXISTS
> (SELECT wm_accounts.usr_id FROM wm_accounts WHERE
>  wm_accounts.usr_id = users.usr_id);

OK, my current query is
select usr_id,usr_login from users where usr_id not in (select userid from
wm_accounts);

Your query on small test tables (after vacuum analyze):
Seq Scan on users  (cost=1.83 rows=25 width=16)

  SubPlan
    ->  Seq Scan on wm_accounts  (cost=1.33 rows=2 width=4)

My query:
Seq Scan on users  (cost=1.83 rows=25 width=16)

  SubPlan
    ->  Seq Scan on wm_accounts  (cost=1.33 rows=10 width=4)

What does rows mean? But it looks like your query is better :). Don't fully
understand why tho. Would it work if wm_accounts is empty?

>> I would prefer a query which can work if the second table is null. The
>> problem with other queries I thought of was they would return null if the
>> second table is null.
>>
>
>I'm not sure what you mean here. Do you mean the table doesn't exist, is
empty,
>or that the usr_id field is NULL?

I meant the table is empty.

>What happens if a record is added to wm_accounts? Do you want a record to
also
>be created in users as well, or is users a superset with respect to
wm_accounts
>and usr_id?

Users is a superset.

Thanks for your suggestion,

Link.


Re: [GENERAL] null and =

From
Mike Mascari
Date:
Lincoln Yeoh wrote:

> At 10:39 PM 05-12-1999 -0500, Mike Mascari wrote:
> >I would just do:
> >
> >SELECT users.usr_id, users.usr_login FROM users WHERE NOT EXISTS
> > (SELECT wm_accounts.usr_id FROM wm_accounts WHERE
> >  wm_accounts.usr_id = users.usr_id);
>
> OK, my current query is
> select usr_id,usr_login from users where usr_id not in (select userid from
> wm_accounts);
>
> Your query on small test tables (after vacuum analyze):
> Seq Scan on users  (cost=1.83 rows=25 width=16)
>
>   SubPlan
>     ->  Seq Scan on wm_accounts  (cost=1.33 rows=2 width=4)
>
> My query:
> Seq Scan on users  (cost=1.83 rows=25 width=16)
>
>   SubPlan
>     ->  Seq Scan on wm_accounts  (cost=1.33 rows=10 width=4)
>
> What does rows mean? But it looks like your query is better :). Don't fully
> understand why tho. Would it work if wm_accounts is empty?

Yes, certainly. If you envision what the backend is doing, for NOT EXISTS, its
something like this:

for each users record
    perform an index or sequential lookup in wm_accounts for a matching usr_id:
        found:        continue
        not found:  output usr_id
next users record

If, however, you use the NOT IN clause, it looks more like this:

for each users record
    for each wm_accounts record
        if users.usr_id = wm_accounts.usr_id, continue to next users record
    next wm_accounts record
    output usr_id
next users record

At least with the EXISTS/NOT EXISTS method, you give the backend the opportunity
to use indexes on the correlated table. Most commercial databases will
instantiate a temporary table when processing IN clauses and will rewrite the
query as an EXISTS (or DISTINCT join/outer join). PostgreSQL doesn't do that at
the moment. What indexes do you have on users and wm_accounts? You should have
one on usr_id of both. It may simply be that the optimizer isn't using indexes
since the number of rows is small. Here is a pseudo-equivalent explain plan from
a production database:

explain select webuser from webusers where not exists (
select permitbuy.webuser from permitbuy where webusers.webuser =
permitbuy.webuser);

NOTICE:  QUERY PLAN:

Seq Scan on webusers  (cost=7.78 rows=145 width=12)
  SubPlan
    ->  Index Scan using k_permitbuy1 on permitbuy  (cost=4.36 rows=48 width=12)

EXPLAIN

vs. using IN:

explain select webuser from webusers where webuser not in (select webuser from
permitbuy);

NOTICE:  QUERY PLAN:

Seq Scan on webusers  (cost=7.78 rows=145 width=12)
  SubPlan
    ->  Seq Scan on permitbuy  (cost=32.05 rows=759 width=12)

EXPLAIN

Hope that helps,

Mike



timezone correction.

From
Nikos Mouat
Date:
Is this a bug?

bleah=> select ('Mon Nov 29 21:47:04 PST 1999'::timestamp);
?column?
----------------------
1999-11-30 00:47:04-05
(1 row)

bleah=> select ('Mon Nov 29 21:47:04 PST 1999'::timestamp)::date;
  ?column?
----------
11-29-1999
(1 row)

bleah=>