Thread: EXCEPT Queries

EXCEPT Queries

From

Mark Mitchell

Date:

20 February 2003, 01:21:39

I have a query that I must dedup using data in another table.
This can be done quiet easily using an EXCEPT clause as long as both 
queries return the same result set, the problem with that is I only want 
to dedup based on a single columns value not dedup based on entire rows. 
The only other way I can see of doing this is using a NOT IN () clause, 
this seems to take much more time and resources than an EXCEPT 
statement. Is there any way to quickly dedup two tables based on only 
one row?

Here is an example of what I'm currently doing.

TABLE "A"
"SUBSCRIBER_NAME" | "ACCOUNT_NUMBER"
--------------------------------------
BOB               |  000001
JOE               |  000002

TABLE "B"
"SUBSCRIBER_NAME" |  "ACCOUNT_NUMBER"
--------------------------------------
BOB               |   000001

To dedup table "A" using the data in table "B" I could use the 
following, except that the dedup takes place on the whole row when I 
only want it to take place on the "ACCOUNT_NUMBER" column.

SELECT  
"A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER"
FROM "A"
EXCEPT
SELECT  
"B"."SUBSCRIBER_NAME" , "B"."ACCOUNT_NUMBER"
FROM "B"


I could use this but it takes WAY to long when both result sets contain 
more than a few hundred records. It seems to increase exponentially the 
more records you add to each result set. The data sets I'm working with 
contain anywhere from 0 to 500000 records.

SELECT  
"A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER"
FROM "A"
WHERE
"A"."ACCOUNT_NUMBER"
NOT IN
( SELECT   "B"."ACCOUNT_NUMBER" FROM "B"
)



-- 
Mark Mitchell
www.lapcrew.com
Registered Linux User # 299662
Mandrake 8.0 , RedHat 8.0

Re: EXCEPT Queries

From

Josh Berkus

Date:

20 February 2003, 01:46:40

Mark,

> SELECT
> "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER"
> FROM "A"
> WHERE
> "A"."ACCOUNT_NUMBER"
> NOT IN
> (
>   SELECT
>   "B"."ACCOUNT_NUMBER"
>   FROM "B"
> )

Use "WHERE NOT EXISTS" instead.   See the documentation on SELECT for how to
write this clause.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: EXCEPT Queries

From

Peter Childs

Date:

20 February 2003, 03:29:23

On Wed, 19 Feb 2003, Mark Mitchell wrote:

> I have a query that I must dedup using data in another table.
> This can be done quiet easily using an EXCEPT clause as long as both 
> queries return the same result set, the problem with that is I only want 
> to dedup based on a single columns value not dedup based on entire rows. 
> The only other way I can see of doing this is using a NOT IN () clause, 
> this seems to take much more time and resources than an EXCEPT 
> statement. Is there any way to quickly dedup two tables based on only 
> one row?
> 
> Here is an example of what I'm currently doing.
> 
> TABLE "A"
> "SUBSCRIBER_NAME" | "ACCOUNT_NUMBER"
> --------------------------------------
> BOB               |  000001
> JOE               |  000002
> 
> TABLE "B"
> "SUBSCRIBER_NAME" |  "ACCOUNT_NUMBER"
> --------------------------------------
> BOB               |   000001
> 
> To dedup table "A" using the data in table "B" I could use the 
> following, except that the dedup takes place on the whole row when I 
> only want it to take place on the "ACCOUNT_NUMBER" column.
> 
> SELECT  
> "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER"
> FROM "A"
> EXCEPT
> SELECT  
> "B"."SUBSCRIBER_NAME" , "B"."ACCOUNT_NUMBER"
> FROM "B"
> 
> 
> I could use this but it takes WAY to long when both result sets contain 
> more than a few hundred records. It seems to increase exponentially the 
> more records you add to each result set. The data sets I'm working with 
> contain anywhere from 0 to 500000 records.
> 
> SELECT  
> "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER"
> FROM "A"
> WHERE
> "A"."ACCOUNT_NUMBER"
> NOT IN
> (
>   SELECT  
>   "B"."ACCOUNT_NUMBER"
>   FROM "B"
> )
> 
Big sub queries are a bad idea. Since we know where the number is 
going to be why does this not work?

SELECT a.subscriber_name, a.account_number FROM a WHERE 
EXISTS (SELECT b.account_number FROM b WHERE 
b.account_number=a.account_number);
it should be a lot quicker if your account_number columns are 
indexed.
I hope that helps

Peter Childs

Re: EXCEPT Queries

From

Jeff Boes

Date:

20 February 2003, 20:36:27

On Thu, 20 Feb 2003 01:22:33 -0500, Mark Mitchell wrote:

> Here is an example of what I'm currently doing.
>
> TABLE "A"
> "SUBSCRIBER_NAME" | "ACCOUNT_NUMBER"
> -------------------------------------- BOB               |  000001 JOE
>             |  000002
>
> TABLE "B"
> "SUBSCRIBER_NAME" |  "ACCOUNT_NUMBER"
> -------------------------------------- BOB               |   000001
>
> To dedup table "A" using the data in table "B" I could use the
> following, except that the dedup takes place on the whole row when I
> only want it to take place on the "ACCOUNT_NUMBER" column.
>
> SELECT
> "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER" FROM "A" EXCEPT
> SELECT
> "B"."SUBSCRIBER_NAME" , "B"."ACCOUNT_NUMBER" FROM "B"
>
>

How about a SELECT DISTINCT ON?

SELECT DISTINCT ON (account_number)
subscriber_name, account_number
FROM
(SELECT 1 AS sort_order, subscriber_name, account_number FROM "A"UNIONSELECT 2, subscriber_name, account_number FROM
"B"ORDERBY sort_order) as tmp 
ORDER BY account_number;

(Untested, but it follows a pattern I've learned.)

--
Jeff Boes                                      vox 269.226.9550 ext 24
Database Engineer                                     fax 269.349.9076
Nexcerpt, Inc.                                 http://www.nexcerpt.com          ...Nexcerpt... Extend your Expertise