Thread: One to many query question

One to many query question

From
Dave Dribin
Date:
Hi, I'm having trouble with what I think should be an easy query.  For
simplicity, I will use a CD database as an example.  Each CD may have
multiple genres.  Here's some sample data:

Artist                  Title                           Genres
----------------------  ------------------------------  ----------------
Miles Davis        Some Kind of Blue        Jazz
Metallica        Ride the Lightning        Rock
Chemical Brothers    Surrender            Electronic
Radiohead        OK Computer            Rock, Electronic

For simplicities sake, let's ignore normalization on artist and genre,
and say the tables look like:

CREATE TABLE cd (id integer unique,artist varchar(25),title varchar(25)
);

CREATE TABLE cd_genres (cd_id integer,genre varchar(25)
);

How do I write a query to find all CDs that are NOT Rock?  A co-worker
showed me the following query:

SELECT cd.*, rock.genre AS rock, jazz.genre AS jazz, electronic.genre AS electronic
FROM cd LEFT JOIN cd_genres rock ON   (cd.id = rock.cd_id AND rock.genre = 'Rock') LEFT JOIN cd_genres jazz ON   (cd.id
=jazz.cd_id AND jazz.genre = 'Jazz') LEFT JOIN cd_genres electronic ON   (cd.id = electronic.cd_id AND electronic.genre
='Electronic');
 

This produces the following results, which seems to essentially
de-normalize the data:
id |      artist       |       title        | rock | jazz | electronic
----+-------------------+--------------------+------+------+------------ 1 | Miles Davis       | Some Kind of Blue  |
  | Jazz | 2 | Metallica         | Ride the Lightning | Rock |      | 3 | Chemical Brothers | Surrender          |
|     | Electronic 4 | Radiohead         | OK Computer        | Rock |      | Electronic
 
(4 rows)

Then to filter out those NOT Rock, I can add a:
 WHERE rock.genre IS NULL

While, this *does* work, I have this feeling there is a better way
(and I'm not sure of the performance).  If I add more genres, I have
to add more LEFT JOINs.  I *could* actually create a column per genre,
but this means adding and removing genres requires an alter table.
And I'd rather actually normalize further such that the list of genres
is in its *own* table.

Any thoughts?  I'm pretty much a SQL newbie, so pointers to good books
or articles would also be helpful.

Thanks!

-Dave


Re: One to many query question

From
Eric Clark
Date:
On Wed, 2003-07-30 at 12:35, Dave Dribin wrote: 
> CREATE TABLE cd (
>     id integer unique,
>     artist varchar(25),
>     title varchar(25)
> );
> 
> CREATE TABLE cd_genres (
>     cd_id integer,
>     genre varchar(25)
> );

I think you've got this backwards.  There is no advantage in the above
table's over simply having a genre varchar(25) in the cd table.

You really want:

CREATE TABLE genre (genre_id serial,genre varchar(25)
);

CREATE TABLE cd (cd_id integer unique,artist varchar(25),title varchar(25),       genre_id varchar(25) references genre
(genre_id)
);

> How do I write a query to find all CDs that are NOT Rock?  A co-worker
> showed me the following query:

Now the query is simple:

SELECT cd.*, genre.genre FROM cd, genre WHERE cd.genre_id =
genre.genre_id AND genre.genre != 'Rock';

Hope that helps,
Eric



Re: One to many query question

From
Dave Dribin
Date:
On Wed, Jul 30, 2003 at 01:11:35PM -0700, Eric Clark wrote:
> On Wed, 2003-07-30 at 12:35, Dave Dribin wrote: 
> > CREATE TABLE cd (
> >     id integer unique,
> >     artist varchar(25),
> >     title varchar(25)
> > );
> > 
> > CREATE TABLE cd_genres (
> >     cd_id integer,
> >     genre varchar(25)
> > );
> 
> I think you've got this backwards.  There is no advantage in the above
> table's over simply having a genre varchar(25) in the cd table.
> 
> You really want:
> 
> CREATE TABLE genre (
>     genre_id serial,
>     genre varchar(25)
> );
> 
> CREATE TABLE cd (
>     cd_id integer unique,
>     artist varchar(25),
>     title varchar(25),
>         genre_id varchar(25) references genre (genre_id)
> );

This doesn't allow multiple genre's per CD, though, does it?  A CD
can only have 1 genre_id.  I would like the ability to have multiple
genres, in which case a third table is necessary:

CREATE TABLE cd_genres (   cd_id integer,   genre_id integer
);

cd_id references cd.id and genre_id references genre.genre_id.

This still requires the complex LEFT JOIN query from my first post,
too, I think, *plus* an extra join between cd_genres and genre.

-Dave


Re: One to many query question

From
Richard Huxton
Date:
On Wednesday 30 July 2003 20:35, Dave Dribin wrote:
> Hi, I'm having trouble with what I think should be an easy query.  For
> simplicity, I will use a CD database as an example.  Each CD may have
> multiple genres.  Here's some sample data:
>
> Artist                  Title                           Genres
> ----------------------  ------------------------------  ----------------
> Miles Davis        Some Kind of Blue        Jazz
> Metallica        Ride the Lightning        Rock
> Chemical Brothers    Surrender            Electronic
> Radiohead        OK Computer            Rock, Electronic
>
> For simplicities sake, let's ignore normalization on artist and genre,
> and say the tables look like:
>
> CREATE TABLE cd (
>     id integer unique,
>     artist varchar(25),
>     title varchar(25)
> );
>
> CREATE TABLE cd_genres (
>     cd_id integer,
>     genre varchar(25)
> );
>
> How do I write a query to find all CDs that are NOT Rock?  A co-worker
> showed me the following query:

Basically you need to find all the CDs that ARE "rock" and subtract that set 
from the set of all CDs.

You could use:

SELECT id,artist,title FROM cd WHERE NOT EXIST 
(SELECT 1 FROM cd_genres WHERE cd_id=id AND genre='Rock');

or

SELECT id,artist,title,cd_id
FROM cd
LEFT JOIN  (SELECT cd_id FROM cd_genres WHERE genre='Rock') AS rock_cds 
ON cd_id=id
WHERE cd_id IS NULL;

or an EXCEPT clause etc.

See which you like better.

--  Richard Huxton Archonet Ltd


Re: One to many query question

From
Dmitry Tkach
Date:
Dave Dribin wrote:

>Hi, I'm having trouble with what I think should be an easy query.  For
>simplicity, I will use a CD database as an example.  Each CD may have
>multiple genres.  Here's some sample data:
>
>Artist                  Title                           Genres
>----------------------  ------------------------------  ----------------
>Miles Davis        Some Kind of Blue        Jazz
>Metallica        Ride the Lightning        Rock
>Chemical Brothers    Surrender            Electronic
>Radiohead        OK Computer            Rock, Electronic
>
>For simplicities sake, let's ignore normalization on artist and genre,
>and say the tables look like:
>
>CREATE TABLE cd (
>    id integer unique,
>    artist varchar(25),
>    title varchar(25)
>);
>
>CREATE TABLE cd_genres (
>    cd_id integer,
>    genre varchar(25)
>);
>
>How do I write a query to find all CDs that are NOT Rock? 
>
What aboutselect * from cd where not exists (select 1 from cd_genres where cd_id 
= cd.id and genre='Rock')?

Dima




Re: One to many query question

From
"Chad Thompson"
Date:
> On Wed, Jul 30, 2003 at 01:11:35PM -0700, Eric Clark wrote:
> > On Wed, 2003-07-30 at 12:35, Dave Dribin wrote:
> > > CREATE TABLE cd (
> > > id integer unique,
> > > artist varchar(25),
> > > title varchar(25)
> > > );
> > >
> > > CREATE TABLE cd_genres (
> > > cd_id integer,
> > > genre varchar(25)
> > > );
> >
> > I think you've got this backwards.  There is no advantage in the above
> > table's over simply having a genre varchar(25) in the cd table.
> >
> > You really want:
> >
> > CREATE TABLE genre (
> > genre_id serial,
> > genre varchar(25)
> > );
> >
> > CREATE TABLE cd (
> > cd_id integer unique,
> > artist varchar(25),
> > title varchar(25),
> >         genre_id varchar(25) references genre (genre_id)
> > );
>
> This doesn't allow multiple genre's per CD, though, does it?  A CD
> can only have 1 genre_id.  I would like the ability to have multiple
> genres, in which case a third table is necessary:
>
> CREATE TABLE cd_genres (
>     cd_id integer,
>     genre_id integer
> );
>
> cd_id references cd.id and genre_id references genre.genre_id.
>
> This still requires the complex LEFT JOIN query from my first post,
> too, I think, *plus* an extra join between cd_genres and genre.
>
> -Dave

What you may be looking for is a not exists subselect.

Im not sure if this quite fits your example.. but maybe it will give you
some ideas...

SELECT cd.*, rock.genre AS rock, jazz.genre AS jazz, electronic.genre AS electronic
FROM cd LEFT JOIN cd_genres jazz ON   (cd.id = jazz.cd_id AND jazz.genre = 'Jazz') LEFT JOIN cd_genres electronic ON
(cd.id= electronic.cd_id AND electronic.genre = 'Electronic'); WHERE NOT EXISTS (SELECT cd.id FROM cd join cd_genres
rockON (cd.id = rock.cd_id AND
 
rock.genre = 'Rock'))

This is quite fast in postgres unless configured wrong.. be sure to join
your subselect to your outer query.

Hope that helps
Chad





Re: One to many query question

From
Eric Clark
Date:
> This doesn't allow multiple genre's per CD, though, does it?  A CD
> can only have 1 genre_id.  I would like the ability to have multiple
> genres, in which case a third table is necessary:
> 
> CREATE TABLE cd_genres (
>     cd_id integer,
>     genre_id integer
> );
> 
> cd_id references cd.id and genre_id references genre.genre_id.
> 
> This still requires the complex LEFT JOIN query from my first post,
> too, I think, *plus* an extra join between cd_genres and genre.

Sorry, the cd_genre table would be the way to do it.  This was a fairly
complex problem so I created the tables in a test database and wrote a
few queries that I think solve the problem for you, depending on how you
want select to return the genre list.

Here's a couple queries that will only get cd's that are not part of
Rock.

SELECT c.*
FROM cd AS c
WHERE 'Rock' NOT IN (SELECT g.genre FROM genre AS g, cd_genre AS cg
WHERE g.genre_id = cg.genre_id AND cg.cd_id = c.cd_id);

SELECT c.*
FROM cd AS c
WHERE NOT EXISTS (SELECT NULL FROM genre AS g, cd_genre AS cg WHERE
g.genre_id = cg.genre_id AND cg.cd_id = c.cd_id AND g.genre = 'Rock');

The second one should be faster.  The next query will show all cd's that
not exclusively Rock. (OK Computer should be in the result)

SELECT c.*
FROM cd AS c, cd_genre AS cg, genre AS g
WHERE c.cd_id = cg.cd_id
AND cg.genre_id = g.genre_id
AND g.genre != 'Rock';

If you add g.genre to any of the above queries you will get one row per
cd+genre combination.  I dont know of any way to make that a delimited
list other than writing a function.  So I wrote one for fun.  The
argument is the cd_id.

CREATE OR REPLACE FUNCTION genre_list (integer) RETURNS TEXT AS '
DECLARE   cdid ALIAS FOR $1;   return_val TEXT;   r RECORD;
BEGIN   FOR r IN SELECT g.genre               FROM genre AS g, cd_genre AS cg              WHERE g.genre_id =
cg.genre_idAND cg.cd_id = cdid LOOP       IF return_val IS NULL THEN           return_val := r.genre;       ELSE
  return_val := return_val || '', '' || r.genre;       END IF;   END LOOP;   RETURN return_val;
 
END
' LANGUAGE 'plpgsql' STABLE RETURNS NULL ON NULL INPUT SECURITY INVOKER;

Now I see that while composing this email some others have replied with
more simple solutions.  For your case I prefer the added tables as they
enforce the possible list of genre's.  Its also handy to keep them
seperate to get the list of genre's to display in a UI.

Eric

ps: aliasing all the table names is just my habit, do it however you see
fit.  I also dont like to make my table names plural, its implied.




Re: One to many query question

From
Dave Dribin
Date:
On Wed, Jul 30, 2003 at 05:26:23PM -0400, Dmitry Tkach wrote:
> >How do I write a query to find all CDs that are NOT Rock? 
> >
> What about
> select * from cd where not exists (select 1 from cd_genres where cd_id 
> = cd.id and genre='Rock')?

Thanks everyone!  This did indeed work, and it does seem clearer.  I
never knew about EXISTS before.  How portable is this?  I'm interested
in supporting PostgreSQL and MS SQL Server for now, and possibly
Oracle and MySQL in the future.

-Dave


Re: One to many query question

From
Christoph Haller
Date:
>
> On Wed, Jul 30, 2003 at 05:26:23PM -0400, Dmitry Tkach wrote:
> > >How do I write a query to find all CDs that are NOT Rock?
> > >
> > What about
> > select * from cd where not exists (select 1 from cd_genres where
cd_id
> > = cd.id and genre='Rock')?
>
> Thanks everyone!  This did indeed work, and it does seem clearer.  I
> never knew about EXISTS before.  How portable is this?  I'm interested

> in supporting PostgreSQL and MS SQL Server for now, and possibly
> Oracle and MySQL in the future.
>
It should be portable completely. It's SQL standard.
Regards, Christoph




Re: One to many query question

From
Richard Poole
Date:
On Wed, Jul 30, 2003 at 02:35:20PM -0500, Dave Dribin wrote:
> Hi, I'm having trouble with what I think should be an easy query.  For
> simplicity, I will use a CD database as an example.  Each CD may have
> multiple genres.  Here's some sample data:
> 
> Artist                  Title                           Genres
> ----------------------  ------------------------------  ----------------
> Miles Davis        Some Kind of Blue        Jazz
> Metallica        Ride the Lightning        Rock
> Chemical Brothers    Surrender            Electronic
> Radiohead        OK Computer            Rock, Electronic
> 
> For simplicities sake, let's ignore normalization on artist and genre,
> and say the tables look like:
> 
> CREATE TABLE cd (
>     id integer unique,
>     artist varchar(25),
>     title varchar(25)
> );
> 
> CREATE TABLE cd_genres (
>     cd_id integer,
>     genre varchar(25)
> );

Assuming that each CD can have several rows in cd_genres,

SELECT id, artist, title
FROM cd
WHERE id NOT IN ( SELECT cd_id FROM cd_genres WHERE genre = 'Rock' );

will do what you want. Your co-worker is perhaps used to certain lesser
databases which don't support subselects...

Richard


OFF-TOPIC: Richard Huxton, Please Contact Us!

From
Josh Berkus
Date:
Richard:

Ned Lily and I have been sending you e-mails for 3 weeks, and you have not 
responded.   Since you've posted to the SQL list in that time, I can only 
think that you're not getting our e-mails.  I'm hoping that by sending you 
via the SQL list, you'll receive the e-mail.

Please contact us.   cc: to me at josh at postgresql.org just in case.

Thanks!

-- 
Josh Berkus
Aglio Database Solutions
San Francisco