Thread: [GENERAL] Tricky SQL query (tried [SQL])

[GENERAL] Tricky SQL query (tried [SQL])

From
stuart@ludwig.ucl.ac.uk (Stuart Rison)
Date:
Dear All,

I'm having a lot of trouble figuring out a good SQL query for the situation
below.

Consider the following table:

dev_brecard=> select * from test order by person;
person|fruit
- ------+---------
lucy  |mandarins
lucy  |tomatoes
lucy  |pears
lucy  |oranges
lucy  |apples
peter |pears
peter |apples
peter |oranges
peter |prunes
robert|figs
robert|dates
stuart|apples
stuart|pears
stuart|prunes
stuart|bananas
stuart|kumquats
(16 rows)

(code for creating and populating table is in a PS at the end of this posting)

You can assume that the table is appropriately normalised and that there is
a composite primary key for it (i.e. each COMBINATION of person and fruit
will appear only once and neither of the fields can be NULL)

How do I select from all person who like 'pears' and 'apples' (in this
case, lucy, peter and stuart) or 'pears', 'apples' and 'oranges' (in this
case, lucy and peter)?

I re-read my SQL books but I am still somewhat stumped.  Things I could
think of for that sort of query:

1) Select all persons who like 'pears'; Select all persons who like
'apples'; Select all persons who like 'oranges'; Calculate the INTERSECTION
of these sets.  But does postgreSQL have a INTERSECTION operator?

2) Use nested subselects:

Select person from test where fruit='pears' and person in (
        Select person from test where fruit='apples' and person in (
                Select person from test where fruit='oranges'
        )
)

But how efficient will this be if I start looking for 6 or seven fruits in
a table with hundreds of entries?

3) Am I storing this sort of data in to wrong kind of form (should I
somehow denormalise?  if so, how?)?

Any suggestions????

thanks for any help out there!

Stuart.

PS.  Code to cut and paste for table:

create table test (person    varchar(25), fruit     varchar(25));
insert into test values ('stuart','apples');
insert into test values ('stuart','pears');
insert into test values ('stuart','bananas');
insert into test values ('stuart','kumquats');
insert into test values ('peter','oranges');
insert into test values ('peter','prunes');
insert into test values ('lucy','mandarins');
insert into test values ('lucy','tomatoes');
insert into test values ('peter','apples');
insert into test values ('lucy','apples');
insert into test values ('peter','pears');
insert into test values ('lucy','pears');
insert into test values ('lucy','oranges');
insert into test values ('stuart','prunes');
insert into test values ('robert','figs');
insert into test values ('robert','dates');

+-------------------------+--------------------------------------+
| Stuart Rison            | Ludwig Institute for Cancer Research |
+-------------------------+ 91 Riding House Street               |
| Tel. (0171) 878 4041    | London, W1P 8BT, UNITED KINGDOM.     |
| Fax. (0171) 878 4040    | stuart@ludwig.ucl.ac.uk              |
+-------------------------+--------------------------------------+



Re: [GENERAL] Tricky SQL query (tried [SQL])

From
Dustin Sallings
Date:
On Wed, 13 Jan 1999, Stuart Rison wrote:

# dev_brecard=> select * from test order by person;
# person|fruit
# - ------+---------
# lucy  |mandarins
# lucy  |tomatoes
# lucy  |pears
# lucy  |oranges
# lucy  |apples
# peter |pears
# peter |apples
# peter |oranges
# peter |prunes
# robert|figs
# robert|dates
# stuart|apples
# stuart|pears
# stuart|prunes
# stuart|bananas
# stuart|kumquats
# (16 rows)

# You can assume that the table is appropriately normalised and that there is
# a composite primary key for it (i.e. each COMBINATION of person and fruit
# will appear only once and neither of the fields can be NULL)

    Actually, it would be normalized a little better if you weren't
replicating person names and fruit names for every row.

# How do I select from all person who like 'pears' and 'apples' (in this
# case, lucy, peter and stuart) or 'pears', 'apples' and 'oranges' (in
# this case, lucy and peter)?

    I got this to work:

    select distinct person from test
        where likes(person, 'oranges')
        and likes(person, 'apples')
        and likes(person, 'pears')

    Where likes is defined as follows:

create function likes(text, text) returns bool as
'
declare
        ret bool;
        cnt integer;
begin
        select count(*) into cnt from test where person = $1 and fruit = $2;
        if cnt = 0 then
                ret=0;
        else
                ret=1;
        end if;
        return(ret);
end;
' language 'plpgsql';


# I re-read my SQL books but I am still somewhat stumped.  Things I could
# think of for that sort of query:
#
# 1) Select all persons who like 'pears'; Select all persons who like
# 'apples'; Select all persons who like 'oranges'; Calculate the INTERSECTION
# of these sets.  But does postgreSQL have a INTERSECTION operator?
#
# 2) Use nested subselects:
#
# Select person from test where fruit='pears' and person in (
#         Select person from test where fruit='apples' and person in (
#                 Select person from test where fruit='oranges'
#         )
# )
#
# But how efficient will this be if I start looking for 6 or seven fruits in
# a table with hundreds of entries?
#
# 3) Am I storing this sort of data in to wrong kind of form (should I
# somehow denormalise?  if so, how?)?
#
# Any suggestions????
#
# thanks for any help out there!
#
# Stuart.
#
# PS.  Code to cut and paste for table:
#
# create table test (person    varchar(25), fruit     varchar(25));
# insert into test values ('stuart','apples');
# insert into test values ('stuart','pears');
# insert into test values ('stuart','bananas');
# insert into test values ('stuart','kumquats');
# insert into test values ('peter','oranges');
# insert into test values ('peter','prunes');
# insert into test values ('lucy','mandarins');
# insert into test values ('lucy','tomatoes');
# insert into test values ('peter','apples');
# insert into test values ('lucy','apples');
# insert into test values ('peter','pears');
# insert into test values ('lucy','pears');
# insert into test values ('lucy','oranges');
# insert into test values ('stuart','prunes');
# insert into test values ('robert','figs');
# insert into test values ('robert','dates');
#
# +-------------------------+--------------------------------------+
# | Stuart Rison            | Ludwig Institute for Cancer Research |
# +-------------------------+ 91 Riding House Street               |
# | Tel. (0171) 878 4041    | London, W1P 8BT, UNITED KINGDOM.     |
# | Fax. (0171) 878 4040    | stuart@ludwig.ucl.ac.uk              |
# +-------------------------+--------------------------------------+
#
#
#
#

--
SA, beyond.com           My girlfriend asked me which one I like better.
pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
|    Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________


RE: [GENERAL] Tricky SQL query (tried [SQL])

From
"Jackson, DeJuan"
Date:
IMHO the best table layout for your data would be:
 DROP TABLE person;
 DROP TABLE fruit;
 DROP TABLE person_fruit;
 CREATE TABLE person (id SERIAL PRIMARY KEY, name TEXT);
 CREATE TABLE fruit (id SERIAL PRIMARY KEY, name TEXT);
 CREATE TABLE person_fruit (p_id INT, f_id INT);
 CREATE UNIQUE INDEX pk_person_fruit ON person_fruit (p_id, f_id);
 INSERT INTO person (name) VALUES ('lucy');
 INSERT INTO person (name) VALUES ('peter');
 INSERT INTO person (name) VALUES ('robert');
 INSERT INTO person (name) VALUES ('stuart');
 INSERT INTO fruit (name) VALUES ('mandarins');
 INSERT INTO fruit (name) VALUES ('tomatoes');
 INSERT INTO fruit (name) VALUES ('pears');
 INSERT INTO fruit (name) VALUES ('oranges');
 INSERT INTO fruit (name) VALUES ('apples');
 INSERT INTO fruit (name) VALUES ('prunes');
 INSERT INTO fruit (name) VALUES ('figs');
 INSERT INTO fruit (name) VALUES ('dates');
 INSERT INTO fruit (name) VALUES ('bananas');
 INSERT INTO fruit (name) VALUES ('kumquats');
 INSERT INTO person_fruit (p_id, f_id) VALUES (1,1);
 INSERT INTO person_fruit (p_id, f_id) VALUES (1,2);
 INSERT INTO person_fruit (p_id, f_id) VALUES (1,3);
 INSERT INTO person_fruit (p_id, f_id) VALUES (1,4);
 INSERT INTO person_fruit (p_id, f_id) VALUES (1,5);
 INSERT INTO person_fruit (p_id, f_id) VALUES (2,3);
 INSERT INTO person_fruit (p_id, f_id) VALUES (2,5);
 INSERT INTO person_fruit (p_id, f_id) VALUES (2,4);
 INSERT INTO person_fruit (p_id, f_id) VALUES (2,6);
 INSERT INTO person_fruit (p_id, f_id) VALUES (3,7);
 INSERT INTO person_fruit (p_id, f_id) VALUES (3,8);
 INSERT INTO person_fruit (p_id, f_id) VALUES (4,5);
 INSERT INTO person_fruit (p_id, f_id) VALUES (4,3);
 INSERT INTO person_fruit (p_id, f_id) VALUES (4,6);
 INSERT INTO person_fruit (p_id, f_id) VALUES (4,9);
 INSERT INTO person_fruit (p_id, f_id) VALUES (4,10);
 SELECT p.name AS person, f.name AS fruit
   FROM person p, fruit f, person_fruit pf
  WHERE p.id = pf.p_id AND f.id = pf.f_id
  ORDER BY p.name;  --your 1st select;

 SELECT DISTINCT p.name FROM person p
  WHERE EXISTS(SELECT 1
                 FROM person_fruit pf
                WHERE pf.p_id = p.id AND
                      EXISTS(SELECT 1
                               FROM fruit f
                              WHERE f.id = pf.f_id AND
                                    f.name IN ('pears', 'apples',
'oranges'))); --Answer to your actual question

I think that this is 4th(or 5th) Normal Form (never had a DB class :^P).
Hope this helps,
    -DEJ

> -----Original Message-----
> From: Dustin Sallings [mailto:dustin@spy.net]
> Sent: Wednesday, January 13, 1999 11:48 AM
> To: Stuart Rison
> Cc: pgsql-general@postgreSQL.org
> Subject: Re: [GENERAL] Tricky SQL query (tried [SQL])
>
>
> On Wed, 13 Jan 1999, Stuart Rison wrote:
>
> # dev_brecard=> select * from test order by person;
> # person|fruit
> # - ------+---------
> # lucy  |mandarins
> # lucy  |tomatoes
> # lucy  |pears
> # lucy  |oranges
> # lucy  |apples
> # peter |pears
> # peter |apples
> # peter |oranges
> # peter |prunes
> # robert|figs
> # robert|dates
> # stuart|apples
> # stuart|pears
> # stuart|prunes
> # stuart|bananas
> # stuart|kumquats
> # (16 rows)
>
> # You can assume that the table is appropriately normalised
> and that there is
> # a composite primary key for it (i.e. each COMBINATION of
> person and fruit
> # will appear only once and neither of the fields can be NULL)
>
>     Actually, it would be normalized a little better if you weren't
> replicating person names and fruit names for every row.
>
> # How do I select from all person who like 'pears' and
> 'apples' (in this
> # case, lucy, peter and stuart) or 'pears', 'apples' and 'oranges' (in
> # this case, lucy and peter)?
>
>     I got this to work:
>
>     select distinct person from test
>         where likes(person, 'oranges')
>         and likes(person, 'apples')
>         and likes(person, 'pears')
>
>     Where likes is defined as follows:
>
> create function likes(text, text) returns bool as
> '
> declare
>         ret bool;
>         cnt integer;
> begin
>         select count(*) into cnt from test where person = $1
> and fruit = $2;
>         if cnt = 0 then
>                 ret=0;
>         else
>                 ret=1;
>         end if;
>         return(ret);
> end;
> ' language 'plpgsql';
>
>
> # I re-read my SQL books but I am still somewhat stumped.
> Things I could
> # think of for that sort of query:
> #
> # 1) Select all persons who like 'pears'; Select all persons who like
> # 'apples'; Select all persons who like 'oranges'; Calculate
> the INTERSECTION
> # of these sets.  But does postgreSQL have a INTERSECTION operator?
> #
> # 2) Use nested subselects:
> #
> # Select person from test where fruit='pears' and person in (
> #         Select person from test where fruit='apples' and person in (
> #                 Select person from test where fruit='oranges'
> #         )
> # )
> #
> # But how efficient will this be if I start looking for 6 or
> seven fruits in
> # a table with hundreds of entries?
> #
> # 3) Am I storing this sort of data in to wrong kind of form (should I
> # somehow denormalise?  if so, how?)?
> #
> # Any suggestions????
> #
> # thanks for any help out there!
> #
> # Stuart.
> #
> # PS.  Code to cut and paste for table:
> #
> # create table test (person    varchar(25), fruit     varchar(25));
> # insert into test values ('stuart','apples');
> # insert into test values ('stuart','pears');
> # insert into test values ('stuart','bananas');
> # insert into test values ('stuart','kumquats');
> # insert into test values ('peter','oranges');
> # insert into test values ('peter','prunes');
> # insert into test values ('lucy','mandarins');
> # insert into test values ('lucy','tomatoes');
> # insert into test values ('peter','apples');
> # insert into test values ('lucy','apples');
> # insert into test values ('peter','pears');
> # insert into test values ('lucy','pears');
> # insert into test values ('lucy','oranges');
> # insert into test values ('stuart','prunes');
> # insert into test values ('robert','figs');
> # insert into test values ('robert','dates');
> #
> # +-------------------------+--------------------------------------+
> # | Stuart Rison            | Ludwig Institute for Cancer Research |
> # +-------------------------+ 91 Riding House Street               |
> # | Tel. (0171) 878 4041    | London, W1P 8BT, UNITED KINGDOM.     |
> # | Fax. (0171) 878 4040    | stuart@ludwig.ucl.ac.uk              |
> # +-------------------------+--------------------------------------+
> #
> #
> #
> #
>
> --
> SA, beyond.com           My girlfriend asked me which one I
> like better.
> pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
> |    Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65
> 51 98 D8 BE
> L_______________________ I hope the answer won't upset her.
> ____________
>
>