Thread: [GENERAL] Tricky SQL query (tried [SQL])
Dear All, I'm having a lot of trouble figuring out a good SQL query for the situation below. Consider the following table: dev_brecard=> select * from test order by person; person|fruit - ------+--------- lucy |mandarins lucy |tomatoes lucy |pears lucy |oranges lucy |apples peter |pears peter |apples peter |oranges peter |prunes robert|figs robert|dates stuart|apples stuart|pears stuart|prunes stuart|bananas stuart|kumquats (16 rows) (code for creating and populating table is in a PS at the end of this posting) You can assume that the table is appropriately normalised and that there is a composite primary key for it (i.e. each COMBINATION of person and fruit will appear only once and neither of the fields can be NULL) How do I select from all person who like 'pears' and 'apples' (in this case, lucy, peter and stuart) or 'pears', 'apples' and 'oranges' (in this case, lucy and peter)? I re-read my SQL books but I am still somewhat stumped. Things I could think of for that sort of query: 1) Select all persons who like 'pears'; Select all persons who like 'apples'; Select all persons who like 'oranges'; Calculate the INTERSECTION of these sets. But does postgreSQL have a INTERSECTION operator? 2) Use nested subselects: Select person from test where fruit='pears' and person in ( Select person from test where fruit='apples' and person in ( Select person from test where fruit='oranges' ) ) But how efficient will this be if I start looking for 6 or seven fruits in a table with hundreds of entries? 3) Am I storing this sort of data in to wrong kind of form (should I somehow denormalise? if so, how?)? Any suggestions???? thanks for any help out there! Stuart. PS. Code to cut and paste for table: create table test (person varchar(25), fruit varchar(25)); insert into test values ('stuart','apples'); insert into test values ('stuart','pears'); insert into test values ('stuart','bananas'); insert into test values ('stuart','kumquats'); insert into test values ('peter','oranges'); insert into test values ('peter','prunes'); insert into test values ('lucy','mandarins'); insert into test values ('lucy','tomatoes'); insert into test values ('peter','apples'); insert into test values ('lucy','apples'); insert into test values ('peter','pears'); insert into test values ('lucy','pears'); insert into test values ('lucy','oranges'); insert into test values ('stuart','prunes'); insert into test values ('robert','figs'); insert into test values ('robert','dates'); +-------------------------+--------------------------------------+ | Stuart Rison | Ludwig Institute for Cancer Research | +-------------------------+ 91 Riding House Street | | Tel. (0171) 878 4041 | London, W1P 8BT, UNITED KINGDOM. | | Fax. (0171) 878 4040 | stuart@ludwig.ucl.ac.uk | +-------------------------+--------------------------------------+
On Wed, 13 Jan 1999, Stuart Rison wrote: # dev_brecard=> select * from test order by person; # person|fruit # - ------+--------- # lucy |mandarins # lucy |tomatoes # lucy |pears # lucy |oranges # lucy |apples # peter |pears # peter |apples # peter |oranges # peter |prunes # robert|figs # robert|dates # stuart|apples # stuart|pears # stuart|prunes # stuart|bananas # stuart|kumquats # (16 rows) # You can assume that the table is appropriately normalised and that there is # a composite primary key for it (i.e. each COMBINATION of person and fruit # will appear only once and neither of the fields can be NULL) Actually, it would be normalized a little better if you weren't replicating person names and fruit names for every row. # How do I select from all person who like 'pears' and 'apples' (in this # case, lucy, peter and stuart) or 'pears', 'apples' and 'oranges' (in # this case, lucy and peter)? I got this to work: select distinct person from test where likes(person, 'oranges') and likes(person, 'apples') and likes(person, 'pears') Where likes is defined as follows: create function likes(text, text) returns bool as ' declare ret bool; cnt integer; begin select count(*) into cnt from test where person = $1 and fruit = $2; if cnt = 0 then ret=0; else ret=1; end if; return(ret); end; ' language 'plpgsql'; # I re-read my SQL books but I am still somewhat stumped. Things I could # think of for that sort of query: # # 1) Select all persons who like 'pears'; Select all persons who like # 'apples'; Select all persons who like 'oranges'; Calculate the INTERSECTION # of these sets. But does postgreSQL have a INTERSECTION operator? # # 2) Use nested subselects: # # Select person from test where fruit='pears' and person in ( # Select person from test where fruit='apples' and person in ( # Select person from test where fruit='oranges' # ) # ) # # But how efficient will this be if I start looking for 6 or seven fruits in # a table with hundreds of entries? # # 3) Am I storing this sort of data in to wrong kind of form (should I # somehow denormalise? if so, how?)? # # Any suggestions???? # # thanks for any help out there! # # Stuart. # # PS. Code to cut and paste for table: # # create table test (person varchar(25), fruit varchar(25)); # insert into test values ('stuart','apples'); # insert into test values ('stuart','pears'); # insert into test values ('stuart','bananas'); # insert into test values ('stuart','kumquats'); # insert into test values ('peter','oranges'); # insert into test values ('peter','prunes'); # insert into test values ('lucy','mandarins'); # insert into test values ('lucy','tomatoes'); # insert into test values ('peter','apples'); # insert into test values ('lucy','apples'); # insert into test values ('peter','pears'); # insert into test values ('lucy','pears'); # insert into test values ('lucy','oranges'); # insert into test values ('stuart','prunes'); # insert into test values ('robert','figs'); # insert into test values ('robert','dates'); # # +-------------------------+--------------------------------------+ # | Stuart Rison | Ludwig Institute for Cancer Research | # +-------------------------+ 91 Riding House Street | # | Tel. (0171) 878 4041 | London, W1P 8BT, UNITED KINGDOM. | # | Fax. (0171) 878 4040 | stuart@ludwig.ucl.ac.uk | # +-------------------------+--------------------------------------+ # # # # -- SA, beyond.com My girlfriend asked me which one I like better. pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net> | Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 51 98 D8 BE L_______________________ I hope the answer won't upset her. ____________
IMHO the best table layout for your data would be: DROP TABLE person; DROP TABLE fruit; DROP TABLE person_fruit; CREATE TABLE person (id SERIAL PRIMARY KEY, name TEXT); CREATE TABLE fruit (id SERIAL PRIMARY KEY, name TEXT); CREATE TABLE person_fruit (p_id INT, f_id INT); CREATE UNIQUE INDEX pk_person_fruit ON person_fruit (p_id, f_id); INSERT INTO person (name) VALUES ('lucy'); INSERT INTO person (name) VALUES ('peter'); INSERT INTO person (name) VALUES ('robert'); INSERT INTO person (name) VALUES ('stuart'); INSERT INTO fruit (name) VALUES ('mandarins'); INSERT INTO fruit (name) VALUES ('tomatoes'); INSERT INTO fruit (name) VALUES ('pears'); INSERT INTO fruit (name) VALUES ('oranges'); INSERT INTO fruit (name) VALUES ('apples'); INSERT INTO fruit (name) VALUES ('prunes'); INSERT INTO fruit (name) VALUES ('figs'); INSERT INTO fruit (name) VALUES ('dates'); INSERT INTO fruit (name) VALUES ('bananas'); INSERT INTO fruit (name) VALUES ('kumquats'); INSERT INTO person_fruit (p_id, f_id) VALUES (1,1); INSERT INTO person_fruit (p_id, f_id) VALUES (1,2); INSERT INTO person_fruit (p_id, f_id) VALUES (1,3); INSERT INTO person_fruit (p_id, f_id) VALUES (1,4); INSERT INTO person_fruit (p_id, f_id) VALUES (1,5); INSERT INTO person_fruit (p_id, f_id) VALUES (2,3); INSERT INTO person_fruit (p_id, f_id) VALUES (2,5); INSERT INTO person_fruit (p_id, f_id) VALUES (2,4); INSERT INTO person_fruit (p_id, f_id) VALUES (2,6); INSERT INTO person_fruit (p_id, f_id) VALUES (3,7); INSERT INTO person_fruit (p_id, f_id) VALUES (3,8); INSERT INTO person_fruit (p_id, f_id) VALUES (4,5); INSERT INTO person_fruit (p_id, f_id) VALUES (4,3); INSERT INTO person_fruit (p_id, f_id) VALUES (4,6); INSERT INTO person_fruit (p_id, f_id) VALUES (4,9); INSERT INTO person_fruit (p_id, f_id) VALUES (4,10); SELECT p.name AS person, f.name AS fruit FROM person p, fruit f, person_fruit pf WHERE p.id = pf.p_id AND f.id = pf.f_id ORDER BY p.name; --your 1st select; SELECT DISTINCT p.name FROM person p WHERE EXISTS(SELECT 1 FROM person_fruit pf WHERE pf.p_id = p.id AND EXISTS(SELECT 1 FROM fruit f WHERE f.id = pf.f_id AND f.name IN ('pears', 'apples', 'oranges'))); --Answer to your actual question I think that this is 4th(or 5th) Normal Form (never had a DB class :^P). Hope this helps, -DEJ > -----Original Message----- > From: Dustin Sallings [mailto:dustin@spy.net] > Sent: Wednesday, January 13, 1999 11:48 AM > To: Stuart Rison > Cc: pgsql-general@postgreSQL.org > Subject: Re: [GENERAL] Tricky SQL query (tried [SQL]) > > > On Wed, 13 Jan 1999, Stuart Rison wrote: > > # dev_brecard=> select * from test order by person; > # person|fruit > # - ------+--------- > # lucy |mandarins > # lucy |tomatoes > # lucy |pears > # lucy |oranges > # lucy |apples > # peter |pears > # peter |apples > # peter |oranges > # peter |prunes > # robert|figs > # robert|dates > # stuart|apples > # stuart|pears > # stuart|prunes > # stuart|bananas > # stuart|kumquats > # (16 rows) > > # You can assume that the table is appropriately normalised > and that there is > # a composite primary key for it (i.e. each COMBINATION of > person and fruit > # will appear only once and neither of the fields can be NULL) > > Actually, it would be normalized a little better if you weren't > replicating person names and fruit names for every row. > > # How do I select from all person who like 'pears' and > 'apples' (in this > # case, lucy, peter and stuart) or 'pears', 'apples' and 'oranges' (in > # this case, lucy and peter)? > > I got this to work: > > select distinct person from test > where likes(person, 'oranges') > and likes(person, 'apples') > and likes(person, 'pears') > > Where likes is defined as follows: > > create function likes(text, text) returns bool as > ' > declare > ret bool; > cnt integer; > begin > select count(*) into cnt from test where person = $1 > and fruit = $2; > if cnt = 0 then > ret=0; > else > ret=1; > end if; > return(ret); > end; > ' language 'plpgsql'; > > > # I re-read my SQL books but I am still somewhat stumped. > Things I could > # think of for that sort of query: > # > # 1) Select all persons who like 'pears'; Select all persons who like > # 'apples'; Select all persons who like 'oranges'; Calculate > the INTERSECTION > # of these sets. But does postgreSQL have a INTERSECTION operator? > # > # 2) Use nested subselects: > # > # Select person from test where fruit='pears' and person in ( > # Select person from test where fruit='apples' and person in ( > # Select person from test where fruit='oranges' > # ) > # ) > # > # But how efficient will this be if I start looking for 6 or > seven fruits in > # a table with hundreds of entries? > # > # 3) Am I storing this sort of data in to wrong kind of form (should I > # somehow denormalise? if so, how?)? > # > # Any suggestions???? > # > # thanks for any help out there! > # > # Stuart. > # > # PS. Code to cut and paste for table: > # > # create table test (person varchar(25), fruit varchar(25)); > # insert into test values ('stuart','apples'); > # insert into test values ('stuart','pears'); > # insert into test values ('stuart','bananas'); > # insert into test values ('stuart','kumquats'); > # insert into test values ('peter','oranges'); > # insert into test values ('peter','prunes'); > # insert into test values ('lucy','mandarins'); > # insert into test values ('lucy','tomatoes'); > # insert into test values ('peter','apples'); > # insert into test values ('lucy','apples'); > # insert into test values ('peter','pears'); > # insert into test values ('lucy','pears'); > # insert into test values ('lucy','oranges'); > # insert into test values ('stuart','prunes'); > # insert into test values ('robert','figs'); > # insert into test values ('robert','dates'); > # > # +-------------------------+--------------------------------------+ > # | Stuart Rison | Ludwig Institute for Cancer Research | > # +-------------------------+ 91 Riding House Street | > # | Tel. (0171) 878 4041 | London, W1P 8BT, UNITED KINGDOM. | > # | Fax. (0171) 878 4040 | stuart@ludwig.ucl.ac.uk | > # +-------------------------+--------------------------------------+ > # > # > # > # > > -- > SA, beyond.com My girlfriend asked me which one I > like better. > pub 1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net> > | Key fingerprint = 87 02 57 08 02 D0 DA D6 C8 0F 3E 65 > 51 98 D8 BE > L_______________________ I hope the answer won't upset her. > ____________ > >