Thread: [GENERAL][SQL] 'denormalising' with a select

[GENERAL][SQL] 'denormalising' with a select

From

Stuart Rison

Date:

01 June 1999, 08:56:14

Hi there,

This was posted to SQL where it 'truly' belongs but I got no answwer and
since it has a bit of database design in it (and a lot more people seem to
read [GENERAL]) I thought I'd try it here.

Consider a table like this:

brecard_id      |code
----------------+----
IEGA18051999006 |COME
IPHA04031999004 |CRIB
IPHA04031999005 |COME
IPHA04031999005 |CRIB
IPHA26021999006 |SOLI
IPHA26021999010 |COME
IPHA26021999010 |SOLI
ISTL04031999001 |CRIB
IUCH03031999003 |COME
IUCH03031999003 |CRIB
IUCH03031999003 |MICR
IUCH03031999003 |SOLI

each combination of id and code is unique (they form a composite primary key)
but any brecard_id could have 1 or more codes associated with it
(theoretically with no upper boundary but let us say a maximum of 5 codes).

Is there a SELECT which will turn each of the codes for one brecard_id into
a column... ie.

brecard_id      |code1|code2|code3|code4|code5
----------------+-----+-----+-----+-----+-----
IEGA18051999006 |COME |     |     |     |
IPHA04031999004 |CRIB |     |     |     |
IPHA04031999005 |COME |CRIB |     |     |
IPHA26021999006 |SOLI |     |     |     |
IPHA26021999010 |COME |SOLI |     |     |
ISTL04031999001 |CRIB |     |     |     |
IUCH03031999003 |COME |CRIB |MICR |     |
IUCH03031999003 |SOLI |     |     |     |

and here a a few more brainteasers for you gurus out there...

1) I'm actually not fussed about the order the codes appear in the columns,
but let's say the order mattered, would this affect the SELECT(s)?
2) Would it make the query easier if I knew the maximum number of codes one
brecard_id could have?
3) (this one for true Wizards -and Sorceresses, Herouth ;)- only) Could you
write a 'generalised' query which could cope with tables having variable
'maximum' numbers of codes associated with each brecard_id?

For the life of me I don't know how you'd do it in SQL, I have PL/pgSQL and
have started playing around with that but I'd hate to re-invent the wheel!

regards,

Stuart.

+-------------------------+--------------------------------------+
| Stuart Rison            | Ludwig Institute for Cancer Research |
+-------------------------+ 91 Riding House Street               |
| Tel. (0171) 878 4041    | London, W1P 8BT, UNITED KINGDOM.     |
| Fax. (0171) 878 4040    | stuart@ludwig.ucl.ac.uk              |
+-------------------------+--------------------------------------+

RE: [GENERAL][SQL] 'denormalising' with a select

From

Michael J Davis

Date:

01 June 1999, 09:23:27

I would write a function that returns a text containing a list of all codes
with the brecard_id.  Here is an example of the select statement:

    select distinct brecard_id || get_codes(brecard_id) from table_name;

Here is an example of the function (not syntactically correct):

create function get_codes(int4) returns text as '
declare
    rec record;
    rc text;
    sep text;
begin
    rc := '''';
    sep := '''';
    for each rec in select distinct code from table_name where
brecard_id = $1;
        rc := (rc || sep) || rec.code;  -- parens are not needed in
version 6.5
        sep := '', '';
    end for
    return rc;
end' language plpsql

I am not sure how get the codes to become separate fields in the select
statement.  If a code_list field will work instead of separate fields for
each code then the above example should work great.  I have also done a
similar thing using aggregates but the example is more complicated.  Using
an aggregate function will perform better than the above example.

Thanks, Michael

> -----Original Message-----
> From:    Stuart Rison [SMTP:stuart@ludwig.ucl.ac.uk]
> Sent:    Tuesday, June 01, 1999 6:54 AM
> To:    pgsql-general@postgreSQL.org
> Subject:    [GENERAL][SQL] 'denormalising' with a select
>
> Hi there,
>
> This was posted to SQL where it 'truly' belongs but I got no answwer and
> since it has a bit of database design in it (and a lot more people seem to
> read [GENERAL]) I thought I'd try it here.
>
> Consider a table like this:
>
> brecard_id      |code
> ----------------+----
> IEGA18051999006 |COME
> IPHA04031999004 |CRIB
> IPHA04031999005 |COME
> IPHA04031999005 |CRIB
> IPHA26021999006 |SOLI
> IPHA26021999010 |COME
> IPHA26021999010 |SOLI
> ISTL04031999001 |CRIB
> IUCH03031999003 |COME
> IUCH03031999003 |CRIB
> IUCH03031999003 |MICR
> IUCH03031999003 |SOLI
>
> each combination of id and code is unique (they form a composite primary
> key)
> but any brecard_id could have 1 or more codes associated with it
> (theoretically with no upper boundary but let us say a maximum of 5
> codes).
>
> Is there a SELECT which will turn each of the codes for one brecard_id
> into
> a column... ie.
>
> brecard_id      |code1|code2|code3|code4|code5
> ----------------+-----+-----+-----+-----+-----
> IEGA18051999006 |COME |     |     |     |
> IPHA04031999004 |CRIB |     |     |     |
> IPHA04031999005 |COME |CRIB |     |     |
> IPHA26021999006 |SOLI |     |     |     |
> IPHA26021999010 |COME |SOLI |     |     |
> ISTL04031999001 |CRIB |     |     |     |
> IUCH03031999003 |COME |CRIB |MICR |     |
> IUCH03031999003 |SOLI |     |     |     |
>
> and here a a few more brainteasers for you gurus out there...
>
> 1) I'm actually not fussed about the order the codes appear in the
> columns,
> but let's say the order mattered, would this affect the SELECT(s)?
> 2) Would it make the query easier if I knew the maximum number of codes
> one
> brecard_id could have?
> 3) (this one for true Wizards -and Sorceresses, Herouth ;)- only) Could
> you
> write a 'generalised' query which could cope with tables having variable
> 'maximum' numbers of codes associated with each brecard_id?
>
> For the life of me I don't know how you'd do it in SQL, I have PL/pgSQL
> and
> have started playing around with that but I'd hate to re-invent the wheel!
>
> regards,
>
> Stuart.
>
> +-------------------------+--------------------------------------+
> | Stuart Rison            | Ludwig Institute for Cancer Research |
> +-------------------------+ 91 Riding House Street               |
> | Tel. (0171) 878 4041    | London, W1P 8BT, UNITED KINGDOM.     |
> | Fax. (0171) 878 4040    | stuart@ludwig.ucl.ac.uk              |
> +-------------------------+--------------------------------------+

RE: [GENERAL][SQL] 'denormalising' with a select

From

Michael J Davis

Date:

01 June 1999, 11:32:44

Here is a version using aggregation:

    create function get_codes_agf(text, text) returns text as '
    begin
        if (length($1) > 0) then
            return $1 || '', '' || $2
        else
            return $2;
        end if;
    end' language 'plpsql';

    CREATE AGGREGATE get_codes_ag (
        sfunc1 = get_codes_agf,
        basetype = text,
        stype1 = text,
        initcond1 = ''
        );


select brecard_id, get_codes_ag(code) from table_name group by brecard_id;



> > -----Original Message-----
> > From:    Stuart Rison [SMTP:stuart@ludwig.ucl.ac.uk]
> > Sent:    Tuesday, June 01, 1999 6:54 AM
> > To:    pgsql-general@postgreSQL.org
> > Subject:    [GENERAL][SQL] 'denormalising' with a select
> >
> > Hi there,
> >
> > This was posted to SQL where it 'truly' belongs but I got no answwer and
> > since it has a bit of database design in it (and a lot more people seem
> to
> > read [GENERAL]) I thought I'd try it here.
> >
> > Consider a table like this:
> >
> > brecard_id      |code
> > ----------------+----
> > IEGA18051999006 |COME
> > IPHA04031999004 |CRIB
> > IPHA04031999005 |COME
> > IPHA04031999005 |CRIB
> > IPHA26021999006 |SOLI
> > IPHA26021999010 |COME
> > IPHA26021999010 |SOLI
> > ISTL04031999001 |CRIB
> > IUCH03031999003 |COME
> > IUCH03031999003 |CRIB
> > IUCH03031999003 |MICR
> > IUCH03031999003 |SOLI
> >
> > each combination of id and code is unique (they form a composite primary
> > key)
> > but any brecard_id could have 1 or more codes associated with it
> > (theoretically with no upper boundary but let us say a maximum of 5
> > codes).
> >
> > Is there a SELECT which will turn each of the codes for one brecard_id
> > into
> > a column... ie.
> >
> > brecard_id      |code1|code2|code3|code4|code5
> > ----------------+-----+-----+-----+-----+-----
> > IEGA18051999006 |COME |     |     |     |
> > IPHA04031999004 |CRIB |     |     |     |
> > IPHA04031999005 |COME |CRIB |     |     |
> > IPHA26021999006 |SOLI |     |     |     |
> > IPHA26021999010 |COME |SOLI |     |     |
> > ISTL04031999001 |CRIB |     |     |     |
> > IUCH03031999003 |COME |CRIB |MICR |     |
> > IUCH03031999003 |SOLI |     |     |     |
> >
> > and here a a few more brainteasers for you gurus out there...
> >
> > 1) I'm actually not fussed about the order the codes appear in the
> > columns,
> > but let's say the order mattered, would this affect the SELECT(s)?
> > 2) Would it make the query easier if I knew the maximum number of codes
> > one
> > brecard_id could have?
> > 3) (this one for true Wizards -and Sorceresses, Herouth ;)- only) Could
> > you
> > write a 'generalised' query which could cope with tables having variable
> > 'maximum' numbers of codes associated with each brecard_id?
> >
> > For the life of me I don't know how you'd do it in SQL, I have PL/pgSQL
> > and
> > have started playing around with that but I'd hate to re-invent the
> wheel!
> >
> > regards,
> >
> > Stuart.
> >
> > +-------------------------+--------------------------------------+
> > | Stuart Rison            | Ludwig Institute for Cancer Research |
> > +-------------------------+ 91 Riding House Street               |
> > | Tel. (0171) 878 4041    | London, W1P 8BT, UNITED KINGDOM.     |
> > | Fax. (0171) 878 4040    | stuart@ludwig.ucl.ac.uk              |
> > +-------------------------+--------------------------------------+

RE: [GENERAL][SQL] 'denormalising' with a select

From

Stuart Rison

Date:

02 June 1999, 11:18:36

A synopsis of 'the top-tastic answers' I got from Micheal and Herouth to my
posting (copied at the end of the e-mail).

1) by SQL

Herouth wrote:

>The question is, why would you want to do something like that? It doesn't
>add any more information than your basic table has. It is merely a
>different way of presenting the same information.
>
>Do you simply want to display it this way? What do you want to do with it
>on the frontend?

Very good point Herouth, I was indeed just trying to present the
information in a different format rather than actually requiring the codes
to be available as seperate fields in a new table.

>It's not impossible. You can do this with a five-way self-join and a lot of
> <> clauses in the where.

Out of curiosity, I tried it and, as Herouth points out, the 'general
formula' (here illustrated to get three codes into seperate fields) is:

SELECT DISTINCT t1.brecard_id, t1.code as code1, t2.code as code2, t3.code
as code3 -- etc.
FROM benign_pathologies t1, benign_pathologies t2, benign_pathologies t3 --etc.
WHERE t1.code<>t2.code
AND t2.code<>t3.code
AND t3.code<>t1.code
-- continue the inequalities for all possible pairs so that 3 tables yield
3 inequalities, 4 table yield 6 inequalities, 5 tables yield 10
inequalities etc.
AND t1.brecard_id=t2.brecard_id
AND t2.brecard_id=t3.brecard_id
AND t3.brecard_id=t1.brecard_id
-- continue all equijoins such that each table is equi-joined to the next
one except for the last table which is joined to the first table
;

>I don't think there is a way you can do the above in SQL without knowing
>the maximum number of codes in advance.

I'm pretty sure you are correct, certainly not a non-procedural way.

The other limitation with this SELECT is that it will ONLY select
brecard_id's which have X or more codes associated with it (where X is the
number of tables in your target list) and if there are more the X codes,
the 'extra' ones will be ignored (but not in a consistant fashion).

and of course, it is very (and exponentially) inefficient!

The other two suggestion were a procedural method (Michael) and using
aggregates (Michael and Herouth)

2) procedurally:

from Michael:

>I would write a function that returns a text containing a list of all codes
>with the brecard_id.

using his suggestion which needed only minimal changes, I got:

create function get_codes(bpchar) returns text as
'
declare
        rec record;
        rc text;
        sep text;
begin
        rc := '''';
        sep := '''';
        for rec in select distinct code from benign_pathologies where
brecard_id = $1 LOOP
                rc := (rc || sep) || rec.code;  -- parens are not needed in
version 6.5
                sep := '', '';
        end LOOP;
        return rc;
end;
'
language 'plpgsql';

followed by:

SELECT DISTINCT brecard_id, get_codes(brecard_id) from benign_pathologies;

Which I recon would not be too dificult to adapt to INSERT rec.code(s) as
seperate fields into a temp table.

3) using aggregates (and again very minimally adapted from Micheal):

CREATE FUNCTION get_codes_agf(bpchar, bpchar) returns bpchar as '
begin
    IF (length($1) > 0) THEN
        return ($1 || '', '') || $2;
        ELSE
        return $2;
        END IF;
END;'
language 'plpgsql';

CREATE AGGREGATE get_codes_ag (
    sfunc1 = get_codes_agf,
    basetype = bpchar,
    stype1 = bpchar,
    initcond1 = ''
);

I just could not believe how elegant this last solution was, of course,
because at no point in the definitions does a table name appear, the
solution is general too!

Finally, and for those of you who put up with all my woffle, is it the
'done thing' in these mailing lists to present a synopsis of answers to
your postings or did I just bore everyone and waste loads of bandwidth?

Regards,

Stuart.

>> > -----Original Message-----
>> > From:    Stuart Rison [SMTP:stuart@ludwig.ucl.ac.uk]
>> > Sent:    Tuesday, June 01, 1999 6:54 AM
>> > To:    pgsql-general@postgreSQL.org
>> > Subject:    [GENERAL][SQL] 'denormalising' with a select
>> >
>> > Hi there,
>> >
>> > This was posted to SQL where it 'truly' belongs but I got no answwer and
>> > since it has a bit of database design in it (and a lot more people seem
>> to
>> > read [GENERAL]) I thought I'd try it here.
>> >
>> > Consider a table like this:
>> >
>> > brecard_id      |code
>> > ----------------+----
>> > IEGA18051999006 |COME
>> > IPHA04031999004 |CRIB
>> > IPHA04031999005 |COME
>> > IPHA04031999005 |CRIB
>> > IPHA26021999006 |SOLI
>> > IPHA26021999010 |COME
>> > IPHA26021999010 |SOLI
>> > ISTL04031999001 |CRIB
>> > IUCH03031999003 |COME
>> > IUCH03031999003 |CRIB
>> > IUCH03031999003 |MICR
>> > IUCH03031999003 |SOLI
>> >
>> > each combination of id and code is unique (they form a composite primary
>> > key)
>> > but any brecard_id could have 1 or more codes associated with it
>> > (theoretically with no upper boundary but let us say a maximum of 5
>> > codes).
>> >
>> > Is there a SELECT which will turn each of the codes for one brecard_id
>> > into
>> > a column... ie.
>> >
>> > brecard_id      |code1|code2|code3|code4|code5
>> > ----------------+-----+-----+-----+-----+-----
>> > IEGA18051999006 |COME |     |     |     |
>> > IPHA04031999004 |CRIB |     |     |     |
>> > IPHA04031999005 |COME |CRIB |     |     |
>> > IPHA26021999006 |SOLI |     |     |     |
>> > IPHA26021999010 |COME |SOLI |     |     |
>> > ISTL04031999001 |CRIB |     |     |     |
>> > IUCH03031999003 |COME |CRIB |MICR |     |
>> > IUCH03031999003 |SOLI |     |     |     |
>> >
>> > and here a a few more brainteasers for you gurus out there...
>> >
>> > 1) I'm actually not fussed about the order the codes appear in the
>> > columns,
>> > but let's say the order mattered, would this affect the SELECT(s)?
>> > 2) Would it make the query easier if I knew the maximum number of codes
>> > one
>> > brecard_id could have?
>> > 3) (this one for true Wizards -and Sorceresses, Herouth ;)- only) Could
>> > you
>> > write a 'generalised' query which could cope with tables having variable
>> > 'maximum' numbers of codes associated with each brecard_id?
>> >
>> > For the life of me I don't know how you'd do it in SQL, I have PL/pgSQL
>> > and
>> > have started playing around with that but I'd hate to re-invent the
>> wheel!
>> >
>> > regards,
>> >
>> > Stuart.
+-------------------------+--------------------------------------+
| Stuart Rison            | Ludwig Institute for Cancer Research |
+-------------------------+ 91 Riding House Street               |
| Tel. (0171) 878 4041    | London, W1P 8BT, UNITED KINGDOM.     |
| Fax. (0171) 878 4040    | stuart@ludwig.ucl.ac.uk              |
+-------------------------+--------------------------------------+