Re: Flattening a kind of 'dynamic' table - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: Flattening a kind of 'dynamic' table
Date
Msg-id 6EE64EF3AB31D5448D0007DD34EEB3412A75F0@Herge.rcsinc.local
Whole thread Raw
In response to Flattening a kind of 'dynamic' table  (Alexandre Leclerc <alexandre.leclerc@gmail.com>)
Responses Re: Flattening a kind of 'dynamic' table
List pgsql-performance
Alexandre wrote:
> On Thu, 27 Jan 2005 10:44:45 -0500, Merlin Moncure
> <merlin.moncure@rcsonline.com> wrote:
> > Alexandre wrote:
> > ok, you have a couple of different options here.  The first thing
that
> > jumps out at me is to use arrays to cheat using arrays.
> >   Let's start with the normalized result set.
> >
> > select product_id, department_id,  sum(req_time) group by
product_id,
> > department_id
> >
> > product_id |  department_id    |  sum
> > 924           a                   6000
> > 924           c                   1575
> > 907           a                   1500
> > [...]
>
> Hello Merlin,
>
> First of all, thanks for your time. Yes this is exactly what I'm doing
> right now (if I understand well your point here). All records in
> design.product_department_time are unique for each (product_id,
> req_time) combo and 0-null values are not possible. This is the first
> listing you have.

Right.  I expanding departments into columns is basically a dead end.
First of all, SQL is not really designed to do this, and second of all
(comments continued below)

>  product_id |  a   |  c   |  d
> ------------+------+------+------
>         906 | 3000 |      |
>         906 |      | 3000 |
>         906 |      |      | 1935
>         907 | 1500 |      |
>         907 |      | 1500 |
>         907 |      |      | 4575
>         924 | 6000 |      |
>         924 |      | 1575 |

the above table is more expensive to group than the normalized version
above because it is much, much longer.  This will get worse and worse as
you add more departments.  So, whatever you end up doing, I'd advise
against expanding rows from a table into columns of a result except for
very, very special cases.  This is not one of those cases.

> I did not used arrays because I didn't think about it, but I don't
> know if this is still the most efficient way. My software will have to
> work out the data, unless the array expands in good columns. But I'm
> not an expert at all. I try to do good DB design, but sometimes this
> is more complicated to work with the data.

Arrays are a quick'n'dirty way to de-normalize a result set.  According
to me, de-normalization is o.k. for result sets *only*.  Generally, it
is inappropriate to de-normalize any persistent object in the database,
such as a view (or especially) a table.  de-normalizing sets can
sometimes simplify client-side coding issues or provide a performance
benefit at the query stage (or slow it down, so be careful!)

> And i also have a jobs table which has one product_id attached to one
> job with the required quantity to produce. So I must shouw the user
> how much time this will require by departments for each jobs. :) This
> is a nice report, but I don't want to kill the database each time the
> user want to see it.

You always have the option to do this in code.  This basically means
ordering the result set and writing a nested loop to pass over the data.
If you happen to be using a report engine (and it sounds like you are),
some engines can simplify this via a grouping criteria, some can't.

If parsing an array string is a pain I happen to have a C++ class handy
that can compose/decompose a postgresql array string if:
a: no more than 1 dimension and
b: array bounds are known

Let me know if you need it and I'll send it over.
Merlin

pgsql-performance by date:

Previous
From: Alexandre Leclerc
Date:
Subject: Re: Flattening a kind of 'dynamic' table
Next
From: Andrew Sullivan
Date:
Subject: Re: PostgreSQL vs. Oracle vs. Microsoft