Thread: negative queries puzzle
hi all, here's a query i've never been able to improve: i have an old data set and a new data set - in this case uk postcodes with eastings and northings. i want to extract the new and changed postcodes from the new set. to get the changed entries i use a join and it works okay: SELECT n.postcode, n.easting, n.northing FROM v_postcode_new n, v_postcode_old o WHERE n.postcode = o.postcode AND (n.easting <> o.lattitude OR n.northing <> o.longitude); but then to get the entirely new items out i use a sub query which takes for ever SELECT DISTINCT * FROM v_postcode_new WHERE postcode NOT IN ( SELECT postcode FROM v_postcode_old ) ORDER BY postcode ASC; does anyone know of a quicker way to accomplish this? i guess there must be some cleaver way around it, but it's beyond me. thanks, jinn
On 31 Jul 2002, Jinn Koriech wrote: > hi all, > > here's a query i've never been able to improve: > > i have an old data set and a new data set - in this case uk postcodes > with eastings and northings. i want to extract the new and changed > postcodes from the new set. to get the changed entries i use a join and > it works okay: > > SELECT n.postcode, n.easting, n.northing FROM v_postcode_new n, > v_postcode_old o WHERE n.postcode = o.postcode AND (n.easting <> > o.lattitude OR n.northing <> o.longitude); > > > but then to get the entirely new items out i use a sub query which takes > for ever > > SELECT DISTINCT * FROM v_postcode_new WHERE postcode NOT IN ( SELECT > postcode FROM v_postcode_old ) ORDER BY postcode ASC; > > does anyone know of a quicker way to accomplish this? i guess there > must be some cleaver way around it, but it's beyond me. Hmm, a couple of possible other queries: -- Do you really need the distinct? select distinct * from v_postcode_new where not exists (select * from v_postcode_old where v_postcode_old.postcode= v_postcode_new.postcode); Or maybe (just thought of this, think it should work, but am not entirely sure) select distinct v_postcode_new.* fromv_postcode_new left outer join v_postcode_old using(postcode)where v_postcode_old.postcodeis null;
> but then to get the entirely new items out i use a sub query which takes > for ever > > SELECT DISTINCT * FROM v_postcode_new WHERE postcode NOT IN ( SELECT > postcode FROM v_postcode_old ) ORDER BY postcode ASC; NOT IN is known to be very, very slow in Postgres. Use NOT EXISTS instead: SELECT DISTINCT * FROM v_postcode_new vpn WHERE NOT EXISTS (SELECT TRUE FROM v_postcode_old vpo WHERE vpo.postcode=vpn.postcode) ORDER BY postcode ASC; Chris
--- Jinn Koriech <lists@idealint.co.uk> wrote: > hi all, > but then to get the entirely new items out i use a > sub query which takes > for ever > > SELECT DISTINCT * FROM v_postcode_new WHERE postcode > NOT IN ( SELECT > postcode FROM v_postcode_old ) ORDER BY postcode > ASC; > > does anyone know of a quicker way to accomplish > this? Try using the "NOT EXIST" clause instead of the "NOT IN". The "EXIST" clause utilizes the index while the "IN" does not utilizes index (i.e. uses sequential scan therefore it is much slower). SELECT DISTINCT * FROM v_postcode_new WHERE NOT EXIST( SELECT postcode FROM v_postcode_old WHERE v_postcode_new.postcode = v_postcode_old.postcode) ORDER BY postcode ASC; ludwig. __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
Another option is to try an outer join from the new to the old table and then select those rows that carry NULL in the columns from the old table. regards -Gunther Ludwig Lim wrote: > --- Jinn Koriech <lists@idealint.co.uk> wrote: > >>hi all, >> > >>but then to get the entirely new items out i use a >>sub query which takes >>for ever >> >>SELECT DISTINCT * FROM v_postcode_new WHERE postcode >>NOT IN ( SELECT >>postcode FROM v_postcode_old ) ORDER BY postcode >>ASC; >> >>does anyone know of a quicker way to accomplish >>this? >> > > Try using the "NOT EXIST" clause instead of the "NOT > IN". The "EXIST" clause utilizes the index while the > "IN" does not utilizes index (i.e. uses sequential > scan therefore it is much slower). > > SELECT DISTINCT * > FROM v_postcode_new > WHERE NOT EXIST( SELECT postcode > FROM v_postcode_old > WHERE v_postcode_new.postcode = > v_postcode_old.postcode) > ORDER BY postcode ASC; > > > > > ludwig. > > __________________________________________________ > Do You Yahoo!? > Yahoo! Health - Feel better, live better > http://health.yahoo.com > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > -- Gunther Schadow, M.D., Ph.D. gschadow@regenstrief.org Medical Information Scientist Regenstrief Institute for Health Care Adjunct Assistant Professor Indiana University School of Medicine tel:1(317)630-7960 http://aurora.regenstrief.org