Thread: One to many query question
Hi, I'm having trouble with what I think should be an easy query. For simplicity, I will use a CD database as an example. Each CD may have multiple genres. Here's some sample data: Artist Title Genres ---------------------- ------------------------------ ---------------- Miles Davis Some Kind of Blue Jazz Metallica Ride the Lightning Rock Chemical Brothers Surrender Electronic Radiohead OK Computer Rock, Electronic For simplicities sake, let's ignore normalization on artist and genre, and say the tables look like: CREATE TABLE cd (id integer unique,artist varchar(25),title varchar(25) ); CREATE TABLE cd_genres (cd_id integer,genre varchar(25) ); How do I write a query to find all CDs that are NOT Rock? A co-worker showed me the following query: SELECT cd.*, rock.genre AS rock, jazz.genre AS jazz, electronic.genre AS electronic FROM cd LEFT JOIN cd_genres rock ON (cd.id = rock.cd_id AND rock.genre = 'Rock') LEFT JOIN cd_genres jazz ON (cd.id =jazz.cd_id AND jazz.genre = 'Jazz') LEFT JOIN cd_genres electronic ON (cd.id = electronic.cd_id AND electronic.genre ='Electronic'); This produces the following results, which seems to essentially de-normalize the data: id | artist | title | rock | jazz | electronic ----+-------------------+--------------------+------+------+------------ 1 | Miles Davis | Some Kind of Blue | | Jazz | 2 | Metallica | Ride the Lightning | Rock | | 3 | Chemical Brothers | Surrender | | | Electronic 4 | Radiohead | OK Computer | Rock | | Electronic (4 rows) Then to filter out those NOT Rock, I can add a: WHERE rock.genre IS NULL While, this *does* work, I have this feeling there is a better way (and I'm not sure of the performance). If I add more genres, I have to add more LEFT JOINs. I *could* actually create a column per genre, but this means adding and removing genres requires an alter table. And I'd rather actually normalize further such that the list of genres is in its *own* table. Any thoughts? I'm pretty much a SQL newbie, so pointers to good books or articles would also be helpful. Thanks! -Dave
On Wed, 2003-07-30 at 12:35, Dave Dribin wrote: > CREATE TABLE cd ( > id integer unique, > artist varchar(25), > title varchar(25) > ); > > CREATE TABLE cd_genres ( > cd_id integer, > genre varchar(25) > ); I think you've got this backwards. There is no advantage in the above table's over simply having a genre varchar(25) in the cd table. You really want: CREATE TABLE genre (genre_id serial,genre varchar(25) ); CREATE TABLE cd (cd_id integer unique,artist varchar(25),title varchar(25), genre_id varchar(25) references genre (genre_id) ); > How do I write a query to find all CDs that are NOT Rock? A co-worker > showed me the following query: Now the query is simple: SELECT cd.*, genre.genre FROM cd, genre WHERE cd.genre_id = genre.genre_id AND genre.genre != 'Rock'; Hope that helps, Eric
On Wed, Jul 30, 2003 at 01:11:35PM -0700, Eric Clark wrote: > On Wed, 2003-07-30 at 12:35, Dave Dribin wrote: > > CREATE TABLE cd ( > > id integer unique, > > artist varchar(25), > > title varchar(25) > > ); > > > > CREATE TABLE cd_genres ( > > cd_id integer, > > genre varchar(25) > > ); > > I think you've got this backwards. There is no advantage in the above > table's over simply having a genre varchar(25) in the cd table. > > You really want: > > CREATE TABLE genre ( > genre_id serial, > genre varchar(25) > ); > > CREATE TABLE cd ( > cd_id integer unique, > artist varchar(25), > title varchar(25), > genre_id varchar(25) references genre (genre_id) > ); This doesn't allow multiple genre's per CD, though, does it? A CD can only have 1 genre_id. I would like the ability to have multiple genres, in which case a third table is necessary: CREATE TABLE cd_genres ( cd_id integer, genre_id integer ); cd_id references cd.id and genre_id references genre.genre_id. This still requires the complex LEFT JOIN query from my first post, too, I think, *plus* an extra join between cd_genres and genre. -Dave
On Wednesday 30 July 2003 20:35, Dave Dribin wrote: > Hi, I'm having trouble with what I think should be an easy query. For > simplicity, I will use a CD database as an example. Each CD may have > multiple genres. Here's some sample data: > > Artist Title Genres > ---------------------- ------------------------------ ---------------- > Miles Davis Some Kind of Blue Jazz > Metallica Ride the Lightning Rock > Chemical Brothers Surrender Electronic > Radiohead OK Computer Rock, Electronic > > For simplicities sake, let's ignore normalization on artist and genre, > and say the tables look like: > > CREATE TABLE cd ( > id integer unique, > artist varchar(25), > title varchar(25) > ); > > CREATE TABLE cd_genres ( > cd_id integer, > genre varchar(25) > ); > > How do I write a query to find all CDs that are NOT Rock? A co-worker > showed me the following query: Basically you need to find all the CDs that ARE "rock" and subtract that set from the set of all CDs. You could use: SELECT id,artist,title FROM cd WHERE NOT EXIST (SELECT 1 FROM cd_genres WHERE cd_id=id AND genre='Rock'); or SELECT id,artist,title,cd_id FROM cd LEFT JOIN (SELECT cd_id FROM cd_genres WHERE genre='Rock') AS rock_cds ON cd_id=id WHERE cd_id IS NULL; or an EXCEPT clause etc. See which you like better. -- Richard Huxton Archonet Ltd
Dave Dribin wrote: >Hi, I'm having trouble with what I think should be an easy query. For >simplicity, I will use a CD database as an example. Each CD may have >multiple genres. Here's some sample data: > >Artist Title Genres >---------------------- ------------------------------ ---------------- >Miles Davis Some Kind of Blue Jazz >Metallica Ride the Lightning Rock >Chemical Brothers Surrender Electronic >Radiohead OK Computer Rock, Electronic > >For simplicities sake, let's ignore normalization on artist and genre, >and say the tables look like: > >CREATE TABLE cd ( > id integer unique, > artist varchar(25), > title varchar(25) >); > >CREATE TABLE cd_genres ( > cd_id integer, > genre varchar(25) >); > >How do I write a query to find all CDs that are NOT Rock? > What aboutselect * from cd where not exists (select 1 from cd_genres where cd_id = cd.id and genre='Rock')? Dima
> On Wed, Jul 30, 2003 at 01:11:35PM -0700, Eric Clark wrote: > > On Wed, 2003-07-30 at 12:35, Dave Dribin wrote: > > > CREATE TABLE cd ( > > > id integer unique, > > > artist varchar(25), > > > title varchar(25) > > > ); > > > > > > CREATE TABLE cd_genres ( > > > cd_id integer, > > > genre varchar(25) > > > ); > > > > I think you've got this backwards. There is no advantage in the above > > table's over simply having a genre varchar(25) in the cd table. > > > > You really want: > > > > CREATE TABLE genre ( > > genre_id serial, > > genre varchar(25) > > ); > > > > CREATE TABLE cd ( > > cd_id integer unique, > > artist varchar(25), > > title varchar(25), > > genre_id varchar(25) references genre (genre_id) > > ); > > This doesn't allow multiple genre's per CD, though, does it? A CD > can only have 1 genre_id. I would like the ability to have multiple > genres, in which case a third table is necessary: > > CREATE TABLE cd_genres ( > cd_id integer, > genre_id integer > ); > > cd_id references cd.id and genre_id references genre.genre_id. > > This still requires the complex LEFT JOIN query from my first post, > too, I think, *plus* an extra join between cd_genres and genre. > > -Dave What you may be looking for is a not exists subselect. Im not sure if this quite fits your example.. but maybe it will give you some ideas... SELECT cd.*, rock.genre AS rock, jazz.genre AS jazz, electronic.genre AS electronic FROM cd LEFT JOIN cd_genres jazz ON (cd.id = jazz.cd_id AND jazz.genre = 'Jazz') LEFT JOIN cd_genres electronic ON (cd.id= electronic.cd_id AND electronic.genre = 'Electronic'); WHERE NOT EXISTS (SELECT cd.id FROM cd join cd_genres rockON (cd.id = rock.cd_id AND rock.genre = 'Rock')) This is quite fast in postgres unless configured wrong.. be sure to join your subselect to your outer query. Hope that helps Chad
> This doesn't allow multiple genre's per CD, though, does it? A CD > can only have 1 genre_id. I would like the ability to have multiple > genres, in which case a third table is necessary: > > CREATE TABLE cd_genres ( > cd_id integer, > genre_id integer > ); > > cd_id references cd.id and genre_id references genre.genre_id. > > This still requires the complex LEFT JOIN query from my first post, > too, I think, *plus* an extra join between cd_genres and genre. Sorry, the cd_genre table would be the way to do it. This was a fairly complex problem so I created the tables in a test database and wrote a few queries that I think solve the problem for you, depending on how you want select to return the genre list. Here's a couple queries that will only get cd's that are not part of Rock. SELECT c.* FROM cd AS c WHERE 'Rock' NOT IN (SELECT g.genre FROM genre AS g, cd_genre AS cg WHERE g.genre_id = cg.genre_id AND cg.cd_id = c.cd_id); SELECT c.* FROM cd AS c WHERE NOT EXISTS (SELECT NULL FROM genre AS g, cd_genre AS cg WHERE g.genre_id = cg.genre_id AND cg.cd_id = c.cd_id AND g.genre = 'Rock'); The second one should be faster. The next query will show all cd's that not exclusively Rock. (OK Computer should be in the result) SELECT c.* FROM cd AS c, cd_genre AS cg, genre AS g WHERE c.cd_id = cg.cd_id AND cg.genre_id = g.genre_id AND g.genre != 'Rock'; If you add g.genre to any of the above queries you will get one row per cd+genre combination. I dont know of any way to make that a delimited list other than writing a function. So I wrote one for fun. The argument is the cd_id. CREATE OR REPLACE FUNCTION genre_list (integer) RETURNS TEXT AS ' DECLARE cdid ALIAS FOR $1; return_val TEXT; r RECORD; BEGIN FOR r IN SELECT g.genre FROM genre AS g, cd_genre AS cg WHERE g.genre_id = cg.genre_idAND cg.cd_id = cdid LOOP IF return_val IS NULL THEN return_val := r.genre; ELSE return_val := return_val || '', '' || r.genre; END IF; END LOOP; RETURN return_val; END ' LANGUAGE 'plpgsql' STABLE RETURNS NULL ON NULL INPUT SECURITY INVOKER; Now I see that while composing this email some others have replied with more simple solutions. For your case I prefer the added tables as they enforce the possible list of genre's. Its also handy to keep them seperate to get the list of genre's to display in a UI. Eric ps: aliasing all the table names is just my habit, do it however you see fit. I also dont like to make my table names plural, its implied.
On Wed, Jul 30, 2003 at 05:26:23PM -0400, Dmitry Tkach wrote: > >How do I write a query to find all CDs that are NOT Rock? > > > What about > select * from cd where not exists (select 1 from cd_genres where cd_id > = cd.id and genre='Rock')? Thanks everyone! This did indeed work, and it does seem clearer. I never knew about EXISTS before. How portable is this? I'm interested in supporting PostgreSQL and MS SQL Server for now, and possibly Oracle and MySQL in the future. -Dave
> > On Wed, Jul 30, 2003 at 05:26:23PM -0400, Dmitry Tkach wrote: > > >How do I write a query to find all CDs that are NOT Rock? > > > > > What about > > select * from cd where not exists (select 1 from cd_genres where cd_id > > = cd.id and genre='Rock')? > > Thanks everyone! This did indeed work, and it does seem clearer. I > never knew about EXISTS before. How portable is this? I'm interested > in supporting PostgreSQL and MS SQL Server for now, and possibly > Oracle and MySQL in the future. > It should be portable completely. It's SQL standard. Regards, Christoph
On Wed, Jul 30, 2003 at 02:35:20PM -0500, Dave Dribin wrote: > Hi, I'm having trouble with what I think should be an easy query. For > simplicity, I will use a CD database as an example. Each CD may have > multiple genres. Here's some sample data: > > Artist Title Genres > ---------------------- ------------------------------ ---------------- > Miles Davis Some Kind of Blue Jazz > Metallica Ride the Lightning Rock > Chemical Brothers Surrender Electronic > Radiohead OK Computer Rock, Electronic > > For simplicities sake, let's ignore normalization on artist and genre, > and say the tables look like: > > CREATE TABLE cd ( > id integer unique, > artist varchar(25), > title varchar(25) > ); > > CREATE TABLE cd_genres ( > cd_id integer, > genre varchar(25) > ); Assuming that each CD can have several rows in cd_genres, SELECT id, artist, title FROM cd WHERE id NOT IN ( SELECT cd_id FROM cd_genres WHERE genre = 'Rock' ); will do what you want. Your co-worker is perhaps used to certain lesser databases which don't support subselects... Richard
Richard: Ned Lily and I have been sending you e-mails for 3 weeks, and you have not responded. Since you've posted to the SQL list in that time, I can only think that you're not getting our e-mails. I'm hoping that by sending you via the SQL list, you'll receive the e-mail. Please contact us. cc: to me at josh at postgresql.org just in case. Thanks! -- Josh Berkus Aglio Database Solutions San Francisco