Thread: Negating the list of selected rows of a join
Hello, I want to list the rows of a table with a text field whose values do not exist in a similar field of another table. Basically what I want to get is negated results of a join. Lets say the tables table_a and table_b have the field name. table_a table_b name age name ----- --- ----- Peter 27 Paul Paul 42 Mary 20 If I asked for a join like this: SELECT table_a.name,table_a.age FROM table_a,table_b WHERE table_a.name=table_b.name I would get: name age ----- --- Paul 42 But I want the opposite. I tried a non-equi join like this: SELECT table_a.name,table_a.age FROM table_a,table_b WHERE table_a.name<>table_b.name and I got: name age ----- --- Peter 27 Mary 20 It worked except for the case when table_b is empty. In this case the nothing was returned. Is this the expected behaviour or is it a bug in PostgreSQL? How can I make a query that works the way I want all the time, even for the case when table_b is empty? Regards, Manuel Lemos E-mail: mlemos@acm.org URL: http://www.e-na.net/the_author.html PGP key: finger://mlemos@zeus.ci.ua.pt --
Manuel Lemos wrote: > How can I make a query that works the way I want all the time, even for the > case when table_b is empty? SELECT table_a.name, table_a.age FROM table_a WHERE NOT EXISTS ( SELECT 'x' FROM table_b WHERE table_b.name = table_a.name ); Hope this will do the trick. :) Clark Evans
Manuel Lemos <mlemos@acm.org> wrote: > I want to list the rows of a table with a text field whose values do not > exist in a similar field of another table. Basically what I want to get > is negated results of a join. [...] > It worked except for the case when table_b is empty. In this case the > nothing was returned. Is this the expected behaviour or is it a bug in > PostgreSQL? If you list two (or more) tables in the 'from' clause of a select (that is, if you do a 'join'), a result table is built, in which each row of the first table is combined with each row from (all) the other table(s). To clarify, do simply SELECT table_a.name,table_b.name FROM table_a,table_b; on your table. When one of the tables has no rows, all the rows from the other(s) are combined with *nothing*; this gives nothing! ('combined' may be the wrong word; it's like a multiplication, and people speak of a 'Cartesian product' of the tables) The 'where' clause can restrict the rows of the result table to something useful, e.g., you can restrict to 'table_a.name = table_b.name'. A feature that probably will help you is the construction of a so-called 'sub-select' in the where clause: SELECT name FROM table_a WHERE name NOT IN (SELECT name FROM table_b); Hope it helps! Ulf -- ====================================================================== Ulf Mehlig <umehlig@zmt.uni-bremen.de> Center for Tropical Marine Ecology/ZMT, Bremen, Germany ----------------------------------------------------------------------
Clark Evans <clark.evans@manhattanproject.com>: > Manuel Lemos wrote: > > How can I make a query that works the way I want all the time, even for the > > case when table_b is empty? > > SELECT table_a.name, table_a.age > FROM table_a > WHERE NOT EXISTS ( > SELECT 'x' > FROM table_b > WHERE table_b.name = table_a.name > ); > > Hope this will do the trick. Maybe not -- doesn't that mean, that the query won't return a single row in case there is *any* pair of equal names in both tables?! Have a nice (sun)day, Ulf -- ====================================================================== Ulf Mehlig <umehlig@zmt.uni-bremen.de> Center for Tropical Marine Ecology/ZMT, Bremen, Germany ----------------------------------------------------------------------
Manuel Lemos wrote: > > I want to list the rows of a table with a text field whose values do not > exist in a similar field of another table. Basically what I want to get > is negated results of a join. Ulf Mehlig wrote: > SELECT name FROM table_a > WHERE name NOT IN (SELECT name FROM table_b); Clark Evans wrote: > SELECT table_a.name, table_a.age > FROM table_a > WHERE NOT EXISTS ( > SELECT 'x' > FROM table_b > WHERE table_b.name = table_a.name > ); I'm not sure about how well PostgreSQL handles these two. I'd try them both with your data set. If table_b is small (less than a few thousand rows) then Ulf's approach would work best. However, if table_b is large (more than a thousand) then I think the other approach may work better if table_b.name is indexed. Clark
Clark Evans <clark.evans@manhattanproject.com>/Ulf Mehlig <umehlig@pandora3.uni-bremen.de> wrote: > > > Hope this will do the trick. > > > > Maybe not -- doesn't that mean, that the query won't return a > > single row in case there is *any* pair of equal names in both > > tables?! > > No. It's a correlated sub-query. It's actually much more efficient > with large tables than the other approach (which has to put > the entire result set of table_b on the heap before it can > process table_a). Your approach is, however, much better (by a > large factor) if table_a is very large and table_b is very small, > since you don't have to hit the index for table_b on every > row of table_a... Yes, it *will* do the trick!! Sorry, Clark, I misinterpreted your sub-query ... have to read more carefully ... Thanks for your correction! Ulf -- ====================================================================== Ulf Mehlig <umehlig@zmt.uni-bremen.de> Center for Tropical Marine Ecology/ZMT, Bremen, Germany ----------------------------------------------------------------------