Re: EXCEPT Queries - Mailing list pgsql-sql

From Peter Childs
Subject Re: EXCEPT Queries
Date
Msg-id Pine.LNX.4.44.0302200822130.5033-100000@RedDragon.Childs
Whole thread Raw
In response to EXCEPT Queries  (Mark Mitchell <mark@lapcrew.com>)
List pgsql-sql
On Wed, 19 Feb 2003, Mark Mitchell wrote:

> I have a query that I must dedup using data in another table.
> This can be done quiet easily using an EXCEPT clause as long as both 
> queries return the same result set, the problem with that is I only want 
> to dedup based on a single columns value not dedup based on entire rows. 
> The only other way I can see of doing this is using a NOT IN () clause, 
> this seems to take much more time and resources than an EXCEPT 
> statement. Is there any way to quickly dedup two tables based on only 
> one row?
> 
> Here is an example of what I'm currently doing.
> 
> TABLE "A"
> "SUBSCRIBER_NAME" | "ACCOUNT_NUMBER"
> --------------------------------------
> BOB               |  000001
> JOE               |  000002
> 
> TABLE "B"
> "SUBSCRIBER_NAME" |  "ACCOUNT_NUMBER"
> --------------------------------------
> BOB               |   000001
> 
> To dedup table "A" using the data in table "B" I could use the 
> following, except that the dedup takes place on the whole row when I 
> only want it to take place on the "ACCOUNT_NUMBER" column.
> 
> SELECT  
> "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER"
> FROM "A"
> EXCEPT
> SELECT  
> "B"."SUBSCRIBER_NAME" , "B"."ACCOUNT_NUMBER"
> FROM "B"
> 
> 
> I could use this but it takes WAY to long when both result sets contain 
> more than a few hundred records. It seems to increase exponentially the 
> more records you add to each result set. The data sets I'm working with 
> contain anywhere from 0 to 500000 records.
> 
> SELECT  
> "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER"
> FROM "A"
> WHERE
> "A"."ACCOUNT_NUMBER"
> NOT IN
> (
>   SELECT  
>   "B"."ACCOUNT_NUMBER"
>   FROM "B"
> )
> 
Big sub queries are a bad idea. Since we know where the number is 
going to be why does this not work?

SELECT a.subscriber_name, a.account_number FROM a WHERE 
EXISTS (SELECT b.account_number FROM b WHERE 
b.account_number=a.account_number);
it should be a lot quicker if your account_number columns are 
indexed.
I hope that helps

Peter Childs



pgsql-sql by date:

Previous
From: Josh Berkus
Date:
Subject: Re: EXCEPT Queries
Next
From: Stephen.Thompson@bmwfin.com
Date:
Subject: Re: VIEW or Stored Proc - Is this even possible?