Home > mailing lists

Re: EXCEPT Queries - Mailing list pgsql-sql

From	Peter Childs
Subject	Re: EXCEPT Queries
Date	February 20, 2003 06:29:23
Msg-id	Pine.LNX.4.44.0302200822130.5033-100000@RedDragon.Childs Whole thread Raw
In response to	EXCEPT Queries (Mark Mitchell <mark@lapcrew.com>)
List	pgsql-sql

Tree view

On Wed, 19 Feb 2003, Mark Mitchell wrote:

> I have a query that I must dedup using data in another table.
> This can be done quiet easily using an EXCEPT clause as long as both 
> queries return the same result set, the problem with that is I only want 
> to dedup based on a single columns value not dedup based on entire rows. 
> The only other way I can see of doing this is using a NOT IN () clause, 
> this seems to take much more time and resources than an EXCEPT 
> statement. Is there any way to quickly dedup two tables based on only 
> one row?
> 
> Here is an example of what I'm currently doing.
> 
> TABLE "A"
> "SUBSCRIBER_NAME" | "ACCOUNT_NUMBER"
> --------------------------------------
> BOB               |  000001
> JOE               |  000002
> 
> TABLE "B"
> "SUBSCRIBER_NAME" |  "ACCOUNT_NUMBER"
> --------------------------------------
> BOB               |   000001
> 
> To dedup table "A" using the data in table "B" I could use the 
> following, except that the dedup takes place on the whole row when I 
> only want it to take place on the "ACCOUNT_NUMBER" column.
> 
> SELECT  
> "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER"
> FROM "A"
> EXCEPT
> SELECT  
> "B"."SUBSCRIBER_NAME" , "B"."ACCOUNT_NUMBER"
> FROM "B"
> 
> 
> I could use this but it takes WAY to long when both result sets contain 
> more than a few hundred records. It seems to increase exponentially the 
> more records you add to each result set. The data sets I'm working with 
> contain anywhere from 0 to 500000 records.
> 
> SELECT  
> "A"."SUBSCRIBER_NAME" , "A"."ACCOUNT_NUMBER"
> FROM "A"
> WHERE
> "A"."ACCOUNT_NUMBER"
> NOT IN
> (
>   SELECT  
>   "B"."ACCOUNT_NUMBER"
>   FROM "B"
> )
> 
Big sub queries are a bad idea. Since we know where the number is 
going to be why does this not work?

SELECT a.subscriber_name, a.account_number FROM a WHERE 
EXISTS (SELECT b.account_number FROM b WHERE 
b.account_number=a.account_number);
it should be a lot quicker if your account_number columns are 
indexed.
I hope that helps

Peter Childs

pgsql-sql by date:

From: Josh Berkus
Date: 20 February 2003, 04:46:40
Subject: Re: EXCEPT Queries

From: Stephen.Thompson@bmwfin.com
Date: 20 February 2003, 08:18:48
Subject: Re: VIEW or Stored Proc - Is this even possible?

Re: EXCEPT Queries - Mailing list pgsql-sql

Previous

Next