Thread: EXCEPT Queries
I have a query that I must dedup using data in another table. This can be done quiet easily using an EXCEPT clause as long as both queries return the same result set, the problem with that is I only want to dedup based on a single columns value not dedup based on entire rows. The only other way I can see of doing this is using a NOT IN () clause, this seems to take much more time and resources than an EXCEPT statement. Is there any way to quickly dedup two tables based on only one row? Here is an example of what I'm currently doing. TABLE "A" "SUBSCRIBER_NAME" | "ACCOUNT_NUMBER" -------------------------------------- BOB | 000001 JOE | 000002 TABLE "B" "SUBSCRIBER_NAME" | "ACCOUNT_NUMBER" -------------------------------------- BOB | 000001 To dedup table "A" using the data in table "B" I could use the following, except that the dedup takes place on the whole row when I only want it to take place on the "ACCOUNT_NUMBER" column. SELECT "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER" FROM "A" EXCEPT SELECT "B"."SUBSCRIBER_NAME" , "B"."ACCOUNT_NUMBER" FROM "B" I could use this but it takes WAY to long when both result sets contain more than a few hundred records. It seems to increase exponentially the more records you add to each result set. The data sets I'm working with contain anywhere from 0 to 500000 records. SELECT "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER" FROM "A" WHERE "A"."ACCOUNT_NUMBER" NOT IN ( SELECT "B"."ACCOUNT_NUMBER" FROM "B" ) -- Mark Mitchell www.lapcrew.com Registered Linux User # 299662 Mandrake 8.0 , RedHat 8.0
Mark, > SELECT > "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER" > FROM "A" > WHERE > "A"."ACCOUNT_NUMBER" > NOT IN > ( > SELECT > "B"."ACCOUNT_NUMBER" > FROM "B" > ) Use "WHERE NOT EXISTS" instead. See the documentation on SELECT for how to write this clause. -- Josh Berkus Aglio Database Solutions San Francisco
On Wed, 19 Feb 2003, Mark Mitchell wrote: > I have a query that I must dedup using data in another table. > This can be done quiet easily using an EXCEPT clause as long as both > queries return the same result set, the problem with that is I only want > to dedup based on a single columns value not dedup based on entire rows. > The only other way I can see of doing this is using a NOT IN () clause, > this seems to take much more time and resources than an EXCEPT > statement. Is there any way to quickly dedup two tables based on only > one row? > > Here is an example of what I'm currently doing. > > TABLE "A" > "SUBSCRIBER_NAME" | "ACCOUNT_NUMBER" > -------------------------------------- > BOB | 000001 > JOE | 000002 > > TABLE "B" > "SUBSCRIBER_NAME" | "ACCOUNT_NUMBER" > -------------------------------------- > BOB | 000001 > > To dedup table "A" using the data in table "B" I could use the > following, except that the dedup takes place on the whole row when I > only want it to take place on the "ACCOUNT_NUMBER" column. > > SELECT > "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER" > FROM "A" > EXCEPT > SELECT > "B"."SUBSCRIBER_NAME" , "B"."ACCOUNT_NUMBER" > FROM "B" > > > I could use this but it takes WAY to long when both result sets contain > more than a few hundred records. It seems to increase exponentially the > more records you add to each result set. The data sets I'm working with > contain anywhere from 0 to 500000 records. > > SELECT > "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER" > FROM "A" > WHERE > "A"."ACCOUNT_NUMBER" > NOT IN > ( > SELECT > "B"."ACCOUNT_NUMBER" > FROM "B" > ) > Big sub queries are a bad idea. Since we know where the number is going to be why does this not work? SELECT a.subscriber_name, a.account_number FROM a WHERE EXISTS (SELECT b.account_number FROM b WHERE b.account_number=a.account_number); it should be a lot quicker if your account_number columns are indexed. I hope that helps Peter Childs
On Thu, 20 Feb 2003 01:22:33 -0500, Mark Mitchell wrote: > Here is an example of what I'm currently doing. > > TABLE "A" > "SUBSCRIBER_NAME" | "ACCOUNT_NUMBER" > -------------------------------------- BOB | 000001 JOE > | 000002 > > TABLE "B" > "SUBSCRIBER_NAME" | "ACCOUNT_NUMBER" > -------------------------------------- BOB | 000001 > > To dedup table "A" using the data in table "B" I could use the > following, except that the dedup takes place on the whole row when I > only want it to take place on the "ACCOUNT_NUMBER" column. > > SELECT > "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER" FROM "A" EXCEPT > SELECT > "B"."SUBSCRIBER_NAME" , "B"."ACCOUNT_NUMBER" FROM "B" > > How about a SELECT DISTINCT ON? SELECT DISTINCT ON (account_number) subscriber_name, account_number FROM (SELECT 1 AS sort_order, subscriber_name, account_number FROM "A"UNIONSELECT 2, subscriber_name, account_number FROM "B"ORDERBY sort_order) as tmp ORDER BY account_number; (Untested, but it follows a pattern I've learned.) -- Jeff Boes vox 269.226.9550 ext 24 Database Engineer fax 269.349.9076 Nexcerpt, Inc. http://www.nexcerpt.com ...Nexcerpt... Extend your Expertise