Thread: Denormalization question, history+ current

Denormalization question, history+ current

From
eric soroos
Date:
Hi,

I've got a system where I want to store a full history in multiple rows
as well as a current view which has the current values in one record. I
know that all of the data that I want is in the event history, it's just
somewhat involved to get it from the event history.

I have an inheritance model where:

abstract_core -+- history_foo (4 more colums than core)
                \-- current_foo (same columns as core)

Where abstract_core is a large and unknown set of columns.
(currently
low hundreds in most cases)

What I currently do from an outside app:

insert into history_foo (column1, ...) values (bar, ...);
update current_foo set column1=bar, ... where pkey=a;

What I'd like to do is insert, then have a trigger/rule assemble an
update statement for the current_foo table, only updating the columns
that
are non-null and in the current_foo table.

I can live with adding the 4 extra columns from history_foo to
current_foo, they won't make logical sense, but I can hide them from
those who would be confused with well placed views.

Any clues for me on how to get pg to do this?

eric




Re: Denormalization question, history+ current

From
Tom Lane
Date:
eric soroos <eric-psql@soroos.net> writes:
> What I currently do from an outside app:

> insert into history_foo (column1, ...) values (bar, ...);
> update current_foo set column1=bar, ... where pkey=a;

> What I'd like to do is insert, then have a trigger/rule assemble an
> update statement for the current_foo table, only updating the columns
> that are non-null and in the current_foo table.

Is it critical (or even desirable?) for your app to control the values
being inserted into the extra columns of the history table?  Or are
those columns well-defined values like timestamps?

The way I'd be inclined to do this is to have the app manipulate
current_foo directly, not touching the history table, and then let
a rule or trigger insert into the history table.

A simple rule would be along the lines of

ON INSERT TO current DO
INSERT INTO history SELECT new.*, current_timestamp, current_user;

You might get better performance with a trigger, however.

            regards, tom lane

Re: Denormalization question, history+ current

From
"Josh Berkus"
Date:
Eric,

I don't think I can help you.  Your application is well beyond the
bounds of generic advice.

-Josh Berkus

> > Your question is somewhat involved.  I'm confused.  What's
> "abstract
> > core" and why does it have hundreds of columns (IMHO, any table
> with
> > more than a few dozen columns is probably a result of poor
> > normalization)?  Why do your column names have to be dynamic?  I
> think
> > we need more detail, with at least partial table definitions.
>
> Abstract core is an abstract base class table where all of the
> columns
> that are user dependent are managed for the other tables that require
>
> this set of fields. There are at least a few that aren't shown in the
>
> previous message.
>
> Essentially, what I have is:
>
> Event = set of field/value pairs, some system defined, most user
> defined.
> History = ordered sequence of events for a pkey.
> Current View = Current value of all name value pairs for one primary
> key.
>
> The possible set of fields are defined by users of the system. That's
>
> the killer.
>
> Events end up having tens of columns with data and the rest null.
>  The
> current view for an active key will approach having all of the fields
> filled
> as time goes by.
>
> This could be stored as a series of triples, eventId:field:value. In
> fact, I
> have a sample implementation of this where I can generate anything I
> would need from this, at the cost of most interesting queries being 3
>
> or 4 way self joins.  For a small system, I'd be seeing  millions of
> rows
> in this model, and performance is not acceptable.    I can get around
>
> that by wasting (a bunch of) space.
>
> I'm at the point where I know that the many column model is the worst
>
> option, except for all the others.
>
> eric
>
>
>

______AGLIO DATABASE SOLUTIONS___________________________
                                       Josh Berkus
  Complete information technology      josh@agliodbs.com
   and data management solutions       (415) 565-7293
  for law firms, small businesses        fax 621-2533
    and non-profit organizations.      San Francisco