Thread: null and =
Hello, is null = null true or also null ? more precisely I have this kind of situation in a mission critical project and I'm,sadly, not an expert in SQL. But until now I used null values with a specific meaning in my database, and I didn't knew that I would come in this situation: =>create table dummy (a int, b int); insert into dummy values (1); insert into dummy values (2); insert into dummy values (3); --- this work as expected =>select * from dummy where a = 1 and a in (select a from dummy where a != 3 ); a|b -+- 1| (1 row) --- this one also => select a from dummy where a = 1 intersect select a from dummy where a != 3 ; a - 1 (1 row) ---- !!!!!!!! => select a,b from dummy where a = 1 intersect select a,b from dummy where a != 3 ; a|b -+- (0 rows) it appears that two null records are not equal if they are both null. I tried also select b = b from dummy where b is null; and I get three empty rows. First question: is this correct? is this SQL conformant? 2: if I change the sources for the operator to compare nulls as they where values will it have too many side effects? one possibility I have is to create a new type with a custum operator "=" for each field type I use in this compond filter query but I'd like to know if there are other solutions before I start to do this long coding. I would appreciate very much and kind of help. Thanks in advance Marko Mikulicic
Slavica Stefic wrote: > is null = null true or also null? A null value represents either the absence of data or an unknown value. A null value is not the same as a zero or a blank. Thus two tuples with null values are never the same and can not be joined. > it appears that two null records are not equal if they are both null. > First question: is this correct? is this SQL conformant? yes > 2: if I change the sources for the operator to compare nulls as > they where values > will it have too many side effects? Don't do that. You will get a desaster ;-) Herbie -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Herbert Liechti E-Mail: Herbert.Liechti@thinx.ch ThinX networked business services Stahlrain 10, CH-5200 Brugg ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Slavica Stefic wrote: > Hello, > > is null = null true or also null ? NULL = NULL is also NULL (or more explicity, UNKNOWN, implemented as NULL). Since NULL means "unknown/not applicable" we don't know whether two "unknowns" are, in fact, equal, and that is what the standard says - at least according to Joe Celko... > > more precisely I have this kind of situation in a mission critical > project and I'm,sadly, not an expert in SQL. > But until now I used null values with a specific meaning in my database, > and I didn't knew that > I would come in this situation: > > =>create table dummy (a int, b int); > insert into dummy values (1); > insert into dummy values (2); > insert into dummy values (3); > > --- this work as expected > =>select * from dummy where a = 1 and a in (select a from dummy where a > != 3 ); > a|b > -+- > 1| > (1 row) > > --- this one also > => select a from dummy where a = 1 intersect select a from dummy where a > != 3 ; > a > - > 1 > (1 row) > > ---- !!!!!!!! > => select a,b from dummy where a = 1 intersect select a,b from dummy > where a != 3 ; > a|b > -+- > (0 rows) I would avoid using the INTERSECT/EXCEPT code since the query rewriter rewrites these to IN clauses which cannot use indexes. As soon as the tables grow beyond more than a couple hundred rows, the statment becomes unusable. Instead, I would use a correlated subquery with an EXISTS/NOT EXISTS test against the criteria for which you are searching: SELECT t1.a, t1.b FROM dummy t1 WHERE EXISTS (SELECT t2.a FROM dummy t2 WHERE t1.a = t2.a) .... then, if you need a comparison of the entire row in the correlated subquery, you could use a clause such as SELECT t1.a, t1.b FROM dummy t1 WHERE EXISTS (SELECT t2.a FROM dummy t2 WHERE t1.a = t2.a AND t1.b IS NULL and t2.b IS NULL); Hope that helps, Mike
Mike Mascari wrote: > > I would avoid using the INTERSECT/EXCEPT code since the query rewriter > rewrites these to IN clauses which cannot use indexes. As soon as the tables > grow beyond more than a couple hundred rows, the statment becomes unusable. > Instead, I would use a correlated subquery with an EXISTS/NOT EXISTS test > against the criteria for which you are searching: > > SELECT t1.a, t1.b FROM dummy t1 > WHERE EXISTS (SELECT t2.a FROM dummy t2 WHERE t1.a = t2.a) > .... > > then, if you need a comparison of the entire row in the correlated subquery, > you could use a clause such as > > SELECT t1.a, t1.b FROM dummy t1 > WHERE EXISTS (SELECT t2.a FROM dummy t2 WHERE t1.a = t2.a AND > t1.b IS NULL and t2.b IS NULL); > > Hope that helps, > > Mike Yes, thanks. But there is still a problem. I'm generating queries from user selection and linking them in a serie. Using intersect this was very simple (yes, it was slow,but flexible). Using the form you proposed queries has to be nested and in every level the table should have an unique name and a t1.a = t2.a .. tn-1.a = tn.a has to be generated, which is not a big problem but I have really too much of this "query building blocks" and rewrite them to coform the new model will be painful and buggy (many queries are too long to fit in a row, and what you don't see well you don't write well :-) ). I hoped that there was a "plugin" solution but Murphy was right :-) Another question: how does the rewriter rewrite the intersect/except query if it has many fields? as far as I know the IN clause accepts only subqueries wich return one column only. Can I someway (assuming that performance is not a point) force INTERSECT rewriter to generate code that confront only the primary key and not all the fields? Thanks. Marko Mikulicic
At 04:59 PM 05-12-1999 -0500, Mike Mascari wrote: >Instead, I would use a correlated subquery with an EXISTS/NOT EXISTS test >against the criteria for which you are searching: > >SELECT t1.a, t1.b FROM dummy t1 >WHERE EXISTS (SELECT t2.a FROM dummy t2 WHERE t1.a = t2.a) >.... > I've got two tables, the first table contains "user accounts" and I'd like to insert rows in the second table if necessary (if the userids don't exist in the second table). My query to find out which ids need to be inserted is something like this: select usr_id,usr_login from users where usr_id not in (select userid from wm_accounts); (first table = users, second table = wm_accounts) Is there a way of doing this efficiently and effectively? I cannot use rules because I need to insert some extra nondatabase data into the second table. I would prefer a query which can work if the second table is null. The problem with other queries I thought of was they would return null if the second table is null. I could in theory modify the program which inserts rows to the first table and make it insert rows to the second table, however I'd like to keep a functional and logical separation between the two programs (they aren't supposed to be doing each others jobs). Any suggestions? Thanks, Link.
Lincoln Yeoh wrote: > At 04:59 PM 05-12-1999 -0500, Mike Mascari wrote: > >Instead, I would use a correlated subquery with an EXISTS/NOT EXISTS test > >against the criteria for which you are searching: > > > >SELECT t1.a, t1.b FROM dummy t1 > >WHERE EXISTS (SELECT t2.a FROM dummy t2 WHERE t1.a = t2.a) > >.... > > > > I've got two tables, the first table contains "user accounts" and I'd like > to insert rows in the second table if necessary (if the userids don't exist > in the second table). > > My query to find out which ids need to be inserted is something like this: > select usr_id,usr_login from users where usr_id not in (select userid from > wm_accounts); > > (first table = users, second table = wm_accounts) > > Is there a way of doing this efficiently and effectively? I cannot use > rules because I need to insert some extra nondatabase data into the second > table. > I would just do: SELECT users.usr_id, users.usr_login FROM users WHERE NOT EXISTS (SELECT wm_accounts.usr_id FROM wm_accounts WHERE wm_accounts.usr_id = users.usr_id); > > I would prefer a query which can work if the second table is null. The > problem with other queries I thought of was they would return null if the > second table is null. > I'm not sure what you mean here. Do you mean the table doesn't exist, is empty, or that the usr_id field is NULL? > I could in theory modify the program which inserts rows to the first table > and make it insert rows to the second table, however I'd like to keep a > functional and logical separation between the two programs (they aren't > supposed to be doing each others jobs). > What happens if a record is added to wm_accounts? Do you want a record to also be created in users as well, or is users a superset with respect to wm_accounts and usr_id? > > Any suggestions? > > Thanks, > Link. >
At 10:39 PM 05-12-1999 -0500, Mike Mascari wrote: >I would just do: > >SELECT users.usr_id, users.usr_login FROM users WHERE NOT EXISTS > (SELECT wm_accounts.usr_id FROM wm_accounts WHERE > wm_accounts.usr_id = users.usr_id); OK, my current query is select usr_id,usr_login from users where usr_id not in (select userid from wm_accounts); Your query on small test tables (after vacuum analyze): Seq Scan on users (cost=1.83 rows=25 width=16) SubPlan -> Seq Scan on wm_accounts (cost=1.33 rows=2 width=4) My query: Seq Scan on users (cost=1.83 rows=25 width=16) SubPlan -> Seq Scan on wm_accounts (cost=1.33 rows=10 width=4) What does rows mean? But it looks like your query is better :). Don't fully understand why tho. Would it work if wm_accounts is empty? >> I would prefer a query which can work if the second table is null. The >> problem with other queries I thought of was they would return null if the >> second table is null. >> > >I'm not sure what you mean here. Do you mean the table doesn't exist, is empty, >or that the usr_id field is NULL? I meant the table is empty. >What happens if a record is added to wm_accounts? Do you want a record to also >be created in users as well, or is users a superset with respect to wm_accounts >and usr_id? Users is a superset. Thanks for your suggestion, Link.
Lincoln Yeoh wrote: > At 10:39 PM 05-12-1999 -0500, Mike Mascari wrote: > >I would just do: > > > >SELECT users.usr_id, users.usr_login FROM users WHERE NOT EXISTS > > (SELECT wm_accounts.usr_id FROM wm_accounts WHERE > > wm_accounts.usr_id = users.usr_id); > > OK, my current query is > select usr_id,usr_login from users where usr_id not in (select userid from > wm_accounts); > > Your query on small test tables (after vacuum analyze): > Seq Scan on users (cost=1.83 rows=25 width=16) > > SubPlan > -> Seq Scan on wm_accounts (cost=1.33 rows=2 width=4) > > My query: > Seq Scan on users (cost=1.83 rows=25 width=16) > > SubPlan > -> Seq Scan on wm_accounts (cost=1.33 rows=10 width=4) > > What does rows mean? But it looks like your query is better :). Don't fully > understand why tho. Would it work if wm_accounts is empty? Yes, certainly. If you envision what the backend is doing, for NOT EXISTS, its something like this: for each users record perform an index or sequential lookup in wm_accounts for a matching usr_id: found: continue not found: output usr_id next users record If, however, you use the NOT IN clause, it looks more like this: for each users record for each wm_accounts record if users.usr_id = wm_accounts.usr_id, continue to next users record next wm_accounts record output usr_id next users record At least with the EXISTS/NOT EXISTS method, you give the backend the opportunity to use indexes on the correlated table. Most commercial databases will instantiate a temporary table when processing IN clauses and will rewrite the query as an EXISTS (or DISTINCT join/outer join). PostgreSQL doesn't do that at the moment. What indexes do you have on users and wm_accounts? You should have one on usr_id of both. It may simply be that the optimizer isn't using indexes since the number of rows is small. Here is a pseudo-equivalent explain plan from a production database: explain select webuser from webusers where not exists ( select permitbuy.webuser from permitbuy where webusers.webuser = permitbuy.webuser); NOTICE: QUERY PLAN: Seq Scan on webusers (cost=7.78 rows=145 width=12) SubPlan -> Index Scan using k_permitbuy1 on permitbuy (cost=4.36 rows=48 width=12) EXPLAIN vs. using IN: explain select webuser from webusers where webuser not in (select webuser from permitbuy); NOTICE: QUERY PLAN: Seq Scan on webusers (cost=7.78 rows=145 width=12) SubPlan -> Seq Scan on permitbuy (cost=32.05 rows=759 width=12) EXPLAIN Hope that helps, Mike
Is this a bug? bleah=> select ('Mon Nov 29 21:47:04 PST 1999'::timestamp); ?column? ---------------------- 1999-11-30 00:47:04-05 (1 row) bleah=> select ('Mon Nov 29 21:47:04 PST 1999'::timestamp)::date; ?column? ---------- 11-29-1999 (1 row) bleah=>