Thread: How would I write this query...

How would I write this query...

From
"Jim Fitzgerald"
Date:
Hi -

  I have two tables, one of them has names of people and an associated
integer ID.  The other table is a list of the people (from the first table)
by their ID number that have signed up for a class.  How would I write a
query that would list all the people from the first table that do not have
any entries in the second table?   Basically, I want a listing of all my
people who have not signed up for the class.

Thanks!
-Jim



Re: How would I write this query...

From
Michael Glaesemann
Date:
On May 1, 2006, at 3:32 , Jim Fitzgerald wrote:

>   I have two tables, one of them has names of people and an associated
> integer ID.  The other table is a list of the people (from the
> first table)
> by their ID number that have signed up for a class.  How would I
> write a
> query that would list all the people from the first table that do
> not have
> any entries in the second table?   Basically, I want a listing of
> all my
> people who have not signed up for the class.

What you want is sometimes called a semi-difference. Here's how I'd
do it:

select person_id, person_name
from persons
natural join (
     select person_id
     from persons
     except
     select person_id
     from class_signups
     ) not_yet_signed_up

where persons is your first table and class_signups is your second
table.

Hope this helps!

Michael Glaesemann
grzm seespotcode net




Re: How would I write this query...

From
Ragnar
Date:
On sun, 2006-04-30 at 11:32 -0700, Jim Fitzgerald wrote:
>   I have two tables, one of them has names of people and an associated
> integer ID.  The other table is a list of the people (from the first table)
> by their ID number that have signed up for a class.  How would I write a
> query that would list all the people from the first table that do not have
> any entries in the second table?   Basically, I want a listing of all my
> people who have not signed up for the class.

try

SELECT * FROM people LEFT JOIN classes using (peopleid)
         WHERE classes.classid IS NULL;


gnari



Re: How would I write this query...

From
"Guy Rouillier"
Date:
Jim Fitzgerald wrote:
> Hi -
>
>   I have two tables, one of them has names of people and an associated
> integer ID.  The other table is a list of the people (from the first
> table) by their ID number that have signed up for a class.  How would
> I write a query that would list all the people from the first table
> that do not have any entries in the second table?   Basically, I want
> a listing of all my people who have not signed up for the class.

select *
from people
where id not in
(
select id
from class_registration
)

--
Guy Rouillier


Re: How would I write this query...

From
"John D. Burger"
Date:
> select *
> from people
> where id not in
> (
> select id
> from class_registration
> )

In my experience, queries like the OUTER LEFT JOIN version posted
earlier are usually much more efficient than NOT IN queries like the
above.  The planner seems to be pretty smart about turning (positive)
IN queries into joins, but NOT IN queries usually turn into nested
table scans, in my experience.

- John D. Burger
   MITRE


Re: How would I write this query...

From
"Guy Rouillier"
Date:
John D. Burger wrote:
>> select *
>> from people
>> where id not in
>> (
>> select id
>> from class_registration
>> )
>
> In my experience, queries like the OUTER LEFT JOIN version posted
> earlier are usually much more efficient than NOT IN queries like the
> above.  The planner seems to be pretty smart about turning (positive)
> IN queries into joins, but NOT IN queries usually turn into nested
> table scans, in my experience.

Interesting, I am aware that each DBMS query optimizer does better with
some expressions, and worse with others. When I was a DB2 DBA, DB2 would
change from release to release the expressions it most preferred.

I imagine the above formulation is what many people would try initially,
until they encounter experiences such as yours.  I checked the TO-DO
list and I don't see anything pending to address this.  Bruce and/or
Tom, are there any far-off intentions to do anything to improve "not in"
execution?  Or perhaps to rewrite it to an equivalent expression that
already works well?

--
Guy Rouillier


Re: How would I write this query...

From
Alban Hertroys
Date:
Guy Rouillier wrote:
> Jim Fitzgerald wrote:
>
>>Hi -
>>
>>  I have two tables, one of them has names of people and an associated
>>integer ID.  The other table is a list of the people (from the first
>>table) by their ID number that have signed up for a class.  How would
>>I write a query that would list all the people from the first table
>>that do not have any entries in the second table?   Basically, I want
>>a listing of all my people who have not signed up for the class.
>
>
> select *
> from people
> where id not in
> (
> select id
> from class_registration
> )

Wouldn't a NOT EXISTS be faster? After all, the current record can be
disposed of as soon as there's any reference to it from class_registration.

For example:
select *
   from people
  where not exists (
    select 1
      from class_registration
     where id = people.id
  );

It may be faster to use * or a specific column name in the subquery
instead of the constant value 1. EXPLAIN ANALYZE will tell ;)

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
    7500 AK Enschede

// Integrate Your World //

Re: How would I write this query...

From
Martijn van Oosterhout
Date:
On Mon, May 01, 2006 at 05:23:41PM -0400, John D. Burger wrote:
> In my experience, queries like the OUTER LEFT JOIN version posted
> earlier are usually much more efficient than NOT IN queries like the
> above.  The planner seems to be pretty smart about turning (positive)
> IN queries into joins, but NOT IN queries usually turn into nested
> table scans, in my experience.

That's because they're not equivalent. IN/NOT IN have special semantics
w.r.t. NULLs that make them a bit more difficult to optimise. OUTER
JOINs on the other hand is easier since in a join condition anything =
NULL evaluates to NULL -> FALSE.

I think there's been some discussion about teaching the planner about
columns that cannot be NULL (like primary keys) thus allowing it to
perform this transformation safely. I don't know if anyone has done it
though...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: How would I write this query...

From
"John D. Burger"
Date:
Martijn van Oosterhout wrote:

> That's because they're not equivalent. IN/NOT IN have special semantics
> w.r.t. NULLs that make them a bit more difficult to optimise. OUTER
> JOINs on the other hand is easier since in a join condition anything =
> NULL evaluates to NULL -> FALSE.

Which is why Hash IN Joins were added, presumably.  But there's nothing
analogous for NOT IN, I guess, perhaps there can't be.

> I think there's been some discussion about teaching the planner about
> columns that cannot be NULL (like primary keys) thus allowing it to
> perform this transformation safely. I don't know if anyone has done it
> though...

Yeah, I've noticed cases where I've thought "Ah, the planner doesn't
know that column can't be null".  Similarly, it has seemed to me that
knowing that a column was UNIQUE could have made for a better plan,
although I can't think of any examples off-hand.  Maybe where I saw it
using a Hash aggregate on a unique column, and I thought it could just
use the index, although that may not make sense either.

- John D. Burger
   MITRE