Thread: filtering

filtering

From
"Kevin B."
Date:
Hi,

I have a 14 million row table with one index on two fields one is a varchar
the other is a date.  The combination of the two makes the row unique.

Data
-----------------
name  date    ... other fields
a     1/1/01
a     1/2/01
a     1/3/01
b     1/1/01
b     1/2/01
d     1/1/01
d     1/2/01

I have a table with just the names.  each name occurs once.
UName
---------
name
a
b
c
d

I've tried a number of queries to find which name is in UName but not in
Data.   However, they are all taking too long (more than 30 minutes - but
the hard drive is a slow 4200rpm IDE....).

What is the quickest query to get the result that I want?  Also, should I
put another index on the Data table for "name" only?

Thanks










Re: filtering

From
Bruno Wolff III
Date:
On Thu, Dec 09, 2004 at 10:25:25 -0500, "Kevin B." <db@ke5in.com> wrote:
> Hi,
> 
> I have a 14 million row table with one index on two fields one is a varchar
> the other is a date.  The combination of the two makes the row unique.
> 
> Data
> -----------------
> name  date    ... other fields
> a     1/1/01
> a     1/2/01
> a     1/3/01
> b     1/1/01
> b     1/2/01
> d     1/1/01
> d     1/2/01
> 
> I have a table with just the names.  each name occurs once.
> UName
> ---------
> name
> a
> b
> c
> d
> 
> I've tried a number of queries to find which name is in UName but not in
> Data.   However, they are all taking too long (more than 30 minutes - but
> the hard drive is a slow 4200rpm IDE....).
> 
> What is the quickest query to get the result that I want?  Also, should I
> put another index on the Data table for "name" only?

It might help if you showed us the explain analyze results from your
attempts.

(All of the suggestions below assuming there aren't any NULL names.)

I think the straight forward way to do this is something like:

SELECT name FROM uname WHERE NOT EXITS (   SELECT name FROM data WHERE uname.name = data.name )
;

If you are using 7.4 or later, you might try using NOT IN. (This does
not run efficiently in earlier versions of postgres.)

SELECT name FROM uname WHERE name NOT IN (SELECT name FROM data);

It is also possible to use set subtraction to get the result, but I doubt
this will be faster than using NOT EXISTS. (Using GROUP BY eliminate
duplicates allows the use of a hash aggregate plan if there aren't too
many unique names.)

SELECT name FROM uname
EXCEPT
SELECT name FROM data GROUP BY name
;