On Wed, 19 Feb 2003, Mark Mitchell wrote:
> I have a query that I must dedup using data in another table.
> This can be done quiet easily using an EXCEPT clause as long as both
> queries return the same result set, the problem with that is I only want
> to dedup based on a single columns value not dedup based on entire rows.
> The only other way I can see of doing this is using a NOT IN () clause,
> this seems to take much more time and resources than an EXCEPT
> statement. Is there any way to quickly dedup two tables based on only
> one row?
>
> Here is an example of what I'm currently doing.
>
> TABLE "A"
> "SUBSCRIBER_NAME" | "ACCOUNT_NUMBER"
> --------------------------------------
> BOB | 000001
> JOE | 000002
>
> TABLE "B"
> "SUBSCRIBER_NAME" | "ACCOUNT_NUMBER"
> --------------------------------------
> BOB | 000001
>
> To dedup table "A" using the data in table "B" I could use the
> following, except that the dedup takes place on the whole row when I
> only want it to take place on the "ACCOUNT_NUMBER" column.
>
> SELECT
> "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER"
> FROM "A"
> EXCEPT
> SELECT
> "B"."SUBSCRIBER_NAME" , "B"."ACCOUNT_NUMBER"
> FROM "B"
>
>
> I could use this but it takes WAY to long when both result sets contain
> more than a few hundred records. It seems to increase exponentially the
> more records you add to each result set. The data sets I'm working with
> contain anywhere from 0 to 500000 records.
>
> SELECT
> "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER"
> FROM "A"
> WHERE
> "A"."ACCOUNT_NUMBER"
> NOT IN
> (
> SELECT
> "B"."ACCOUNT_NUMBER"
> FROM "B"
> )
>
Big sub queries are a bad idea. Since we know where the number is
going to be why does this not work?
SELECT a.subscriber_name, a.account_number FROM a WHERE
EXISTS (SELECT b.account_number FROM b WHERE
b.account_number=a.account_number);
it should be a lot quicker if your account_number columns are
indexed.
I hope that helps
Peter Childs