Thread: [GENERAL][SQL] 'denormalising' with a select
Hi there, This was posted to SQL where it 'truly' belongs but I got no answwer and since it has a bit of database design in it (and a lot more people seem to read [GENERAL]) I thought I'd try it here. Consider a table like this: brecard_id |code ----------------+---- IEGA18051999006 |COME IPHA04031999004 |CRIB IPHA04031999005 |COME IPHA04031999005 |CRIB IPHA26021999006 |SOLI IPHA26021999010 |COME IPHA26021999010 |SOLI ISTL04031999001 |CRIB IUCH03031999003 |COME IUCH03031999003 |CRIB IUCH03031999003 |MICR IUCH03031999003 |SOLI each combination of id and code is unique (they form a composite primary key) but any brecard_id could have 1 or more codes associated with it (theoretically with no upper boundary but let us say a maximum of 5 codes). Is there a SELECT which will turn each of the codes for one brecard_id into a column... ie. brecard_id |code1|code2|code3|code4|code5 ----------------+-----+-----+-----+-----+----- IEGA18051999006 |COME | | | | IPHA04031999004 |CRIB | | | | IPHA04031999005 |COME |CRIB | | | IPHA26021999006 |SOLI | | | | IPHA26021999010 |COME |SOLI | | | ISTL04031999001 |CRIB | | | | IUCH03031999003 |COME |CRIB |MICR | | IUCH03031999003 |SOLI | | | | and here a a few more brainteasers for you gurus out there... 1) I'm actually not fussed about the order the codes appear in the columns, but let's say the order mattered, would this affect the SELECT(s)? 2) Would it make the query easier if I knew the maximum number of codes one brecard_id could have? 3) (this one for true Wizards -and Sorceresses, Herouth ;)- only) Could you write a 'generalised' query which could cope with tables having variable 'maximum' numbers of codes associated with each brecard_id? For the life of me I don't know how you'd do it in SQL, I have PL/pgSQL and have started playing around with that but I'd hate to re-invent the wheel! regards, Stuart. +-------------------------+--------------------------------------+ | Stuart Rison | Ludwig Institute for Cancer Research | +-------------------------+ 91 Riding House Street | | Tel. (0171) 878 4041 | London, W1P 8BT, UNITED KINGDOM. | | Fax. (0171) 878 4040 | stuart@ludwig.ucl.ac.uk | +-------------------------+--------------------------------------+
I would write a function that returns a text containing a list of all codes with the brecard_id. Here is an example of the select statement: select distinct brecard_id || get_codes(brecard_id) from table_name; Here is an example of the function (not syntactically correct): create function get_codes(int4) returns text as ' declare rec record; rc text; sep text; begin rc := ''''; sep := ''''; for each rec in select distinct code from table_name where brecard_id = $1; rc := (rc || sep) || rec.code; -- parens are not needed in version 6.5 sep := '', ''; end for return rc; end' language plpsql I am not sure how get the codes to become separate fields in the select statement. If a code_list field will work instead of separate fields for each code then the above example should work great. I have also done a similar thing using aggregates but the example is more complicated. Using an aggregate function will perform better than the above example. Thanks, Michael > -----Original Message----- > From: Stuart Rison [SMTP:stuart@ludwig.ucl.ac.uk] > Sent: Tuesday, June 01, 1999 6:54 AM > To: pgsql-general@postgreSQL.org > Subject: [GENERAL][SQL] 'denormalising' with a select > > Hi there, > > This was posted to SQL where it 'truly' belongs but I got no answwer and > since it has a bit of database design in it (and a lot more people seem to > read [GENERAL]) I thought I'd try it here. > > Consider a table like this: > > brecard_id |code > ----------------+---- > IEGA18051999006 |COME > IPHA04031999004 |CRIB > IPHA04031999005 |COME > IPHA04031999005 |CRIB > IPHA26021999006 |SOLI > IPHA26021999010 |COME > IPHA26021999010 |SOLI > ISTL04031999001 |CRIB > IUCH03031999003 |COME > IUCH03031999003 |CRIB > IUCH03031999003 |MICR > IUCH03031999003 |SOLI > > each combination of id and code is unique (they form a composite primary > key) > but any brecard_id could have 1 or more codes associated with it > (theoretically with no upper boundary but let us say a maximum of 5 > codes). > > Is there a SELECT which will turn each of the codes for one brecard_id > into > a column... ie. > > brecard_id |code1|code2|code3|code4|code5 > ----------------+-----+-----+-----+-----+----- > IEGA18051999006 |COME | | | | > IPHA04031999004 |CRIB | | | | > IPHA04031999005 |COME |CRIB | | | > IPHA26021999006 |SOLI | | | | > IPHA26021999010 |COME |SOLI | | | > ISTL04031999001 |CRIB | | | | > IUCH03031999003 |COME |CRIB |MICR | | > IUCH03031999003 |SOLI | | | | > > and here a a few more brainteasers for you gurus out there... > > 1) I'm actually not fussed about the order the codes appear in the > columns, > but let's say the order mattered, would this affect the SELECT(s)? > 2) Would it make the query easier if I knew the maximum number of codes > one > brecard_id could have? > 3) (this one for true Wizards -and Sorceresses, Herouth ;)- only) Could > you > write a 'generalised' query which could cope with tables having variable > 'maximum' numbers of codes associated with each brecard_id? > > For the life of me I don't know how you'd do it in SQL, I have PL/pgSQL > and > have started playing around with that but I'd hate to re-invent the wheel! > > regards, > > Stuart. > > +-------------------------+--------------------------------------+ > | Stuart Rison | Ludwig Institute for Cancer Research | > +-------------------------+ 91 Riding House Street | > | Tel. (0171) 878 4041 | London, W1P 8BT, UNITED KINGDOM. | > | Fax. (0171) 878 4040 | stuart@ludwig.ucl.ac.uk | > +-------------------------+--------------------------------------+
Here is a version using aggregation: create function get_codes_agf(text, text) returns text as ' begin if (length($1) > 0) then return $1 || '', '' || $2 else return $2; end if; end' language 'plpsql'; CREATE AGGREGATE get_codes_ag ( sfunc1 = get_codes_agf, basetype = text, stype1 = text, initcond1 = '' ); select brecard_id, get_codes_ag(code) from table_name group by brecard_id; > > -----Original Message----- > > From: Stuart Rison [SMTP:stuart@ludwig.ucl.ac.uk] > > Sent: Tuesday, June 01, 1999 6:54 AM > > To: pgsql-general@postgreSQL.org > > Subject: [GENERAL][SQL] 'denormalising' with a select > > > > Hi there, > > > > This was posted to SQL where it 'truly' belongs but I got no answwer and > > since it has a bit of database design in it (and a lot more people seem > to > > read [GENERAL]) I thought I'd try it here. > > > > Consider a table like this: > > > > brecard_id |code > > ----------------+---- > > IEGA18051999006 |COME > > IPHA04031999004 |CRIB > > IPHA04031999005 |COME > > IPHA04031999005 |CRIB > > IPHA26021999006 |SOLI > > IPHA26021999010 |COME > > IPHA26021999010 |SOLI > > ISTL04031999001 |CRIB > > IUCH03031999003 |COME > > IUCH03031999003 |CRIB > > IUCH03031999003 |MICR > > IUCH03031999003 |SOLI > > > > each combination of id and code is unique (they form a composite primary > > key) > > but any brecard_id could have 1 or more codes associated with it > > (theoretically with no upper boundary but let us say a maximum of 5 > > codes). > > > > Is there a SELECT which will turn each of the codes for one brecard_id > > into > > a column... ie. > > > > brecard_id |code1|code2|code3|code4|code5 > > ----------------+-----+-----+-----+-----+----- > > IEGA18051999006 |COME | | | | > > IPHA04031999004 |CRIB | | | | > > IPHA04031999005 |COME |CRIB | | | > > IPHA26021999006 |SOLI | | | | > > IPHA26021999010 |COME |SOLI | | | > > ISTL04031999001 |CRIB | | | | > > IUCH03031999003 |COME |CRIB |MICR | | > > IUCH03031999003 |SOLI | | | | > > > > and here a a few more brainteasers for you gurus out there... > > > > 1) I'm actually not fussed about the order the codes appear in the > > columns, > > but let's say the order mattered, would this affect the SELECT(s)? > > 2) Would it make the query easier if I knew the maximum number of codes > > one > > brecard_id could have? > > 3) (this one for true Wizards -and Sorceresses, Herouth ;)- only) Could > > you > > write a 'generalised' query which could cope with tables having variable > > 'maximum' numbers of codes associated with each brecard_id? > > > > For the life of me I don't know how you'd do it in SQL, I have PL/pgSQL > > and > > have started playing around with that but I'd hate to re-invent the > wheel! > > > > regards, > > > > Stuart. > > > > +-------------------------+--------------------------------------+ > > | Stuart Rison | Ludwig Institute for Cancer Research | > > +-------------------------+ 91 Riding House Street | > > | Tel. (0171) 878 4041 | London, W1P 8BT, UNITED KINGDOM. | > > | Fax. (0171) 878 4040 | stuart@ludwig.ucl.ac.uk | > > +-------------------------+--------------------------------------+
A synopsis of 'the top-tastic answers' I got from Micheal and Herouth to my posting (copied at the end of the e-mail). 1) by SQL Herouth wrote: >The question is, why would you want to do something like that? It doesn't >add any more information than your basic table has. It is merely a >different way of presenting the same information. > >Do you simply want to display it this way? What do you want to do with it >on the frontend? Very good point Herouth, I was indeed just trying to present the information in a different format rather than actually requiring the codes to be available as seperate fields in a new table. >It's not impossible. You can do this with a five-way self-join and a lot of > <> clauses in the where. Out of curiosity, I tried it and, as Herouth points out, the 'general formula' (here illustrated to get three codes into seperate fields) is: SELECT DISTINCT t1.brecard_id, t1.code as code1, t2.code as code2, t3.code as code3 -- etc. FROM benign_pathologies t1, benign_pathologies t2, benign_pathologies t3 --etc. WHERE t1.code<>t2.code AND t2.code<>t3.code AND t3.code<>t1.code -- continue the inequalities for all possible pairs so that 3 tables yield 3 inequalities, 4 table yield 6 inequalities, 5 tables yield 10 inequalities etc. AND t1.brecard_id=t2.brecard_id AND t2.brecard_id=t3.brecard_id AND t3.brecard_id=t1.brecard_id -- continue all equijoins such that each table is equi-joined to the next one except for the last table which is joined to the first table ; >I don't think there is a way you can do the above in SQL without knowing >the maximum number of codes in advance. I'm pretty sure you are correct, certainly not a non-procedural way. The other limitation with this SELECT is that it will ONLY select brecard_id's which have X or more codes associated with it (where X is the number of tables in your target list) and if there are more the X codes, the 'extra' ones will be ignored (but not in a consistant fashion). and of course, it is very (and exponentially) inefficient! The other two suggestion were a procedural method (Michael) and using aggregates (Michael and Herouth) 2) procedurally: from Michael: >I would write a function that returns a text containing a list of all codes >with the brecard_id. using his suggestion which needed only minimal changes, I got: create function get_codes(bpchar) returns text as ' declare rec record; rc text; sep text; begin rc := ''''; sep := ''''; for rec in select distinct code from benign_pathologies where brecard_id = $1 LOOP rc := (rc || sep) || rec.code; -- parens are not needed in version 6.5 sep := '', ''; end LOOP; return rc; end; ' language 'plpgsql'; followed by: SELECT DISTINCT brecard_id, get_codes(brecard_id) from benign_pathologies; Which I recon would not be too dificult to adapt to INSERT rec.code(s) as seperate fields into a temp table. 3) using aggregates (and again very minimally adapted from Micheal): CREATE FUNCTION get_codes_agf(bpchar, bpchar) returns bpchar as ' begin IF (length($1) > 0) THEN return ($1 || '', '') || $2; ELSE return $2; END IF; END;' language 'plpgsql'; CREATE AGGREGATE get_codes_ag ( sfunc1 = get_codes_agf, basetype = bpchar, stype1 = bpchar, initcond1 = '' ); I just could not believe how elegant this last solution was, of course, because at no point in the definitions does a table name appear, the solution is general too! Finally, and for those of you who put up with all my woffle, is it the 'done thing' in these mailing lists to present a synopsis of answers to your postings or did I just bore everyone and waste loads of bandwidth? Regards, Stuart. >> > -----Original Message----- >> > From: Stuart Rison [SMTP:stuart@ludwig.ucl.ac.uk] >> > Sent: Tuesday, June 01, 1999 6:54 AM >> > To: pgsql-general@postgreSQL.org >> > Subject: [GENERAL][SQL] 'denormalising' with a select >> > >> > Hi there, >> > >> > This was posted to SQL where it 'truly' belongs but I got no answwer and >> > since it has a bit of database design in it (and a lot more people seem >> to >> > read [GENERAL]) I thought I'd try it here. >> > >> > Consider a table like this: >> > >> > brecard_id |code >> > ----------------+---- >> > IEGA18051999006 |COME >> > IPHA04031999004 |CRIB >> > IPHA04031999005 |COME >> > IPHA04031999005 |CRIB >> > IPHA26021999006 |SOLI >> > IPHA26021999010 |COME >> > IPHA26021999010 |SOLI >> > ISTL04031999001 |CRIB >> > IUCH03031999003 |COME >> > IUCH03031999003 |CRIB >> > IUCH03031999003 |MICR >> > IUCH03031999003 |SOLI >> > >> > each combination of id and code is unique (they form a composite primary >> > key) >> > but any brecard_id could have 1 or more codes associated with it >> > (theoretically with no upper boundary but let us say a maximum of 5 >> > codes). >> > >> > Is there a SELECT which will turn each of the codes for one brecard_id >> > into >> > a column... ie. >> > >> > brecard_id |code1|code2|code3|code4|code5 >> > ----------------+-----+-----+-----+-----+----- >> > IEGA18051999006 |COME | | | | >> > IPHA04031999004 |CRIB | | | | >> > IPHA04031999005 |COME |CRIB | | | >> > IPHA26021999006 |SOLI | | | | >> > IPHA26021999010 |COME |SOLI | | | >> > ISTL04031999001 |CRIB | | | | >> > IUCH03031999003 |COME |CRIB |MICR | | >> > IUCH03031999003 |SOLI | | | | >> > >> > and here a a few more brainteasers for you gurus out there... >> > >> > 1) I'm actually not fussed about the order the codes appear in the >> > columns, >> > but let's say the order mattered, would this affect the SELECT(s)? >> > 2) Would it make the query easier if I knew the maximum number of codes >> > one >> > brecard_id could have? >> > 3) (this one for true Wizards -and Sorceresses, Herouth ;)- only) Could >> > you >> > write a 'generalised' query which could cope with tables having variable >> > 'maximum' numbers of codes associated with each brecard_id? >> > >> > For the life of me I don't know how you'd do it in SQL, I have PL/pgSQL >> > and >> > have started playing around with that but I'd hate to re-invent the >> wheel! >> > >> > regards, >> > >> > Stuart. +-------------------------+--------------------------------------+ | Stuart Rison | Ludwig Institute for Cancer Research | +-------------------------+ 91 Riding House Street | | Tel. (0171) 878 4041 | London, W1P 8BT, UNITED KINGDOM. | | Fax. (0171) 878 4040 | stuart@ludwig.ucl.ac.uk | +-------------------------+--------------------------------------+