Thread: negative queries puzzle

negative queries puzzle

From
Jinn Koriech
Date:
hi all,

here's a query i've never been able to improve:

i have an old data set and a new data set - in this case uk postcodes
with eastings and northings.  i want to extract the new and changed
postcodes from the new set.  to get the changed entries i use a join and
it works okay:

SELECT n.postcode, n.easting, n.northing FROM v_postcode_new n,
v_postcode_old o WHERE n.postcode = o.postcode AND (n.easting <>
o.lattitude OR n.northing <> o.longitude);


but then to get the entirely new items out i use a sub query which takes
for ever

SELECT DISTINCT * FROM v_postcode_new WHERE postcode NOT IN ( SELECT
postcode FROM v_postcode_old ) ORDER BY postcode ASC;

does anyone know of a quicker way to accomplish this?  i guess there
must be some cleaver way around it, but it's beyond me.

thanks,
jinn


Re: negative queries puzzle

From
Stephan Szabo
Date:
On 31 Jul 2002, Jinn Koriech wrote:

> hi all,
>
> here's a query i've never been able to improve:
>
> i have an old data set and a new data set - in this case uk postcodes
> with eastings and northings.  i want to extract the new and changed
> postcodes from the new set.  to get the changed entries i use a join and
> it works okay:
>
> SELECT n.postcode, n.easting, n.northing FROM v_postcode_new n,
> v_postcode_old o WHERE n.postcode = o.postcode AND (n.easting <>
> o.lattitude OR n.northing <> o.longitude);
>
>
> but then to get the entirely new items out i use a sub query which takes
> for ever
>
> SELECT DISTINCT * FROM v_postcode_new WHERE postcode NOT IN ( SELECT
> postcode FROM v_postcode_old ) ORDER BY postcode ASC;
>
> does anyone know of a quicker way to accomplish this?  i guess there
> must be some cleaver way around it, but it's beyond me.

Hmm, a couple of possible other queries:

-- Do you really need the distinct?
select distinct * from v_postcode_new where not exists (select * from v_postcode_old where v_postcode_old.postcode=
                             v_postcode_new.postcode);
 

Or maybe (just thought of this, think it should work, but
am not entirely sure)
select distinct v_postcode_new.* fromv_postcode_new left outer join v_postcode_old using(postcode)where
v_postcode_old.postcodeis null;
 



Re: negative queries puzzle

From
"Christopher Kings-Lynne"
Date:
> but then to get the entirely new items out i use a sub query which takes
> for ever
>
> SELECT DISTINCT * FROM v_postcode_new WHERE postcode NOT IN ( SELECT
> postcode FROM v_postcode_old ) ORDER BY postcode ASC;

NOT IN is known to be very, very slow in Postgres.  Use NOT EXISTS instead:

SELECT DISTINCT * FROM v_postcode_new vpn WHERE NOT EXISTS (SELECT TRUE FROM
v_postcode_old vpo WHERE vpo.postcode=vpn.postcode) ORDER BY postcode ASC;

Chris



Re: negative queries puzzle

From
Ludwig Lim
Date:
--- Jinn Koriech <lists@idealint.co.uk> wrote:
> hi all,

> but then to get the entirely new items out i use a
> sub query which takes
> for ever
> 
> SELECT DISTINCT * FROM v_postcode_new WHERE postcode
> NOT IN ( SELECT
> postcode FROM v_postcode_old ) ORDER BY postcode
> ASC;
> 
> does anyone know of a quicker way to accomplish
> this?  
 Try using the "NOT EXIST" clause instead of the "NOT
IN". The "EXIST" clause utilizes the index while the
"IN" does not utilizes index (i.e. uses sequential
scan therefore it is much slower).
  SELECT DISTINCT *  FROM v_postcode_new  WHERE NOT EXIST( SELECT postcode                   FROM v_postcode_old
          WHERE v_postcode_new.postcode =                         v_postcode_old.postcode)  ORDER BY postcode ASC;
 



ludwig.

__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com


Re: negative queries puzzle

From
Gunther Schadow
Date:
Another option is to try an outer join from the new to the old
table and then select those rows that carry NULL in the
columns from the old table.

regards
-Gunther

Ludwig Lim wrote:

> --- Jinn Koriech <lists@idealint.co.uk> wrote:
> 
>>hi all,
>>
> 
>>but then to get the entirely new items out i use a
>>sub query which takes
>>for ever
>>
>>SELECT DISTINCT * FROM v_postcode_new WHERE postcode
>>NOT IN ( SELECT
>>postcode FROM v_postcode_old ) ORDER BY postcode
>>ASC;
>>
>>does anyone know of a quicker way to accomplish
>>this?  
>>
> 
>   Try using the "NOT EXIST" clause instead of the "NOT
> IN". The "EXIST" clause utilizes the index while the
> "IN" does not utilizes index (i.e. uses sequential
> scan therefore it is much slower).
> 
>    SELECT DISTINCT *
>    FROM v_postcode_new
>    WHERE NOT EXIST( SELECT postcode
>                     FROM v_postcode_old
>                     WHERE v_postcode_new.postcode =
>                           v_postcode_old.postcode)
>    ORDER BY postcode ASC;
> 
> 
>  
> 
> ludwig.
> 
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Health - Feel better, live better
> http://health.yahoo.com
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> 


-- 
Gunther Schadow, M.D., Ph.D.                    gschadow@regenstrief.org
Medical Information Scientist      Regenstrief Institute for Health Care
Adjunct Assistant Professor        Indiana University School of Medicine
tel:1(317)630-7960                         http://aurora.regenstrief.org