Thread: How would I write this query...
Hi - I have two tables, one of them has names of people and an associated integer ID. The other table is a list of the people (from the first table) by their ID number that have signed up for a class. How would I write a query that would list all the people from the first table that do not have any entries in the second table? Basically, I want a listing of all my people who have not signed up for the class. Thanks! -Jim
On May 1, 2006, at 3:32 , Jim Fitzgerald wrote: > I have two tables, one of them has names of people and an associated > integer ID. The other table is a list of the people (from the > first table) > by their ID number that have signed up for a class. How would I > write a > query that would list all the people from the first table that do > not have > any entries in the second table? Basically, I want a listing of > all my > people who have not signed up for the class. What you want is sometimes called a semi-difference. Here's how I'd do it: select person_id, person_name from persons natural join ( select person_id from persons except select person_id from class_signups ) not_yet_signed_up where persons is your first table and class_signups is your second table. Hope this helps! Michael Glaesemann grzm seespotcode net
On sun, 2006-04-30 at 11:32 -0700, Jim Fitzgerald wrote: > I have two tables, one of them has names of people and an associated > integer ID. The other table is a list of the people (from the first table) > by their ID number that have signed up for a class. How would I write a > query that would list all the people from the first table that do not have > any entries in the second table? Basically, I want a listing of all my > people who have not signed up for the class. try SELECT * FROM people LEFT JOIN classes using (peopleid) WHERE classes.classid IS NULL; gnari
Jim Fitzgerald wrote: > Hi - > > I have two tables, one of them has names of people and an associated > integer ID. The other table is a list of the people (from the first > table) by their ID number that have signed up for a class. How would > I write a query that would list all the people from the first table > that do not have any entries in the second table? Basically, I want > a listing of all my people who have not signed up for the class. select * from people where id not in ( select id from class_registration ) -- Guy Rouillier
> select * > from people > where id not in > ( > select id > from class_registration > ) In my experience, queries like the OUTER LEFT JOIN version posted earlier are usually much more efficient than NOT IN queries like the above. The planner seems to be pretty smart about turning (positive) IN queries into joins, but NOT IN queries usually turn into nested table scans, in my experience. - John D. Burger MITRE
John D. Burger wrote: >> select * >> from people >> where id not in >> ( >> select id >> from class_registration >> ) > > In my experience, queries like the OUTER LEFT JOIN version posted > earlier are usually much more efficient than NOT IN queries like the > above. The planner seems to be pretty smart about turning (positive) > IN queries into joins, but NOT IN queries usually turn into nested > table scans, in my experience. Interesting, I am aware that each DBMS query optimizer does better with some expressions, and worse with others. When I was a DB2 DBA, DB2 would change from release to release the expressions it most preferred. I imagine the above formulation is what many people would try initially, until they encounter experiences such as yours. I checked the TO-DO list and I don't see anything pending to address this. Bruce and/or Tom, are there any far-off intentions to do anything to improve "not in" execution? Or perhaps to rewrite it to an equivalent expression that already works well? -- Guy Rouillier
Guy Rouillier wrote: > Jim Fitzgerald wrote: > >>Hi - >> >> I have two tables, one of them has names of people and an associated >>integer ID. The other table is a list of the people (from the first >>table) by their ID number that have signed up for a class. How would >>I write a query that would list all the people from the first table >>that do not have any entries in the second table? Basically, I want >>a listing of all my people who have not signed up for the class. > > > select * > from people > where id not in > ( > select id > from class_registration > ) Wouldn't a NOT EXISTS be faster? After all, the current record can be disposed of as soon as there's any reference to it from class_registration. For example: select * from people where not exists ( select 1 from class_registration where id = people.id ); It may be faster to use * or a specific column name in the subquery instead of the constant value 1. EXPLAIN ANALYZE will tell ;) -- Alban Hertroys alban@magproductions.nl magproductions b.v. T: ++31(0)534346874 F: ++31(0)534346876 M: I: www.magproductions.nl A: Postbus 416 7500 AK Enschede // Integrate Your World //
On Mon, May 01, 2006 at 05:23:41PM -0400, John D. Burger wrote: > In my experience, queries like the OUTER LEFT JOIN version posted > earlier are usually much more efficient than NOT IN queries like the > above. The planner seems to be pretty smart about turning (positive) > IN queries into joins, but NOT IN queries usually turn into nested > table scans, in my experience. That's because they're not equivalent. IN/NOT IN have special semantics w.r.t. NULLs that make them a bit more difficult to optimise. OUTER JOINs on the other hand is easier since in a join condition anything = NULL evaluates to NULL -> FALSE. I think there's been some discussion about teaching the planner about columns that cannot be NULL (like primary keys) thus allowing it to perform this transformation safely. I don't know if anyone has done it though... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Martijn van Oosterhout wrote: > That's because they're not equivalent. IN/NOT IN have special semantics > w.r.t. NULLs that make them a bit more difficult to optimise. OUTER > JOINs on the other hand is easier since in a join condition anything = > NULL evaluates to NULL -> FALSE. Which is why Hash IN Joins were added, presumably. But there's nothing analogous for NOT IN, I guess, perhaps there can't be. > I think there's been some discussion about teaching the planner about > columns that cannot be NULL (like primary keys) thus allowing it to > perform this transformation safely. I don't know if anyone has done it > though... Yeah, I've noticed cases where I've thought "Ah, the planner doesn't know that column can't be null". Similarly, it has seemed to me that knowing that a column was UNIQUE could have made for a better plan, although I can't think of any examples off-hand. Maybe where I saw it using a Hash aggregate on a unique column, and I thought it could just use the index, although that may not make sense either. - John D. Burger MITRE