Thread: Tagging rows into collections?

Tagging rows into collections?

From

Steve Wampler

Date:

19 June 2002, 16:10:15

I've got what I think is a fairly simple SQL question.

Frist, some background.
I'm building a database to log events produced by some  scientific instruments.  The postgresql database is  attached
toby muliple java programs using JDBC.
 
An event has:  timestamp,event_name,list_of_attributes  The list_of_attributes are simple (string) name,value pairs.
Anevent may have 1 or more attributes in its list (no upper  limit is imposed by the system).
 
Insertion performance is critical - selection performance  *much* less so.
I wrote a simple test version using a single table with  columns:
lab.devel.eventdb=# \d events_table                 Table "events_table" Attribute |           Type           |
Modifier   -----------+--------------------------+--------------- time      | timestamp with time zone | default now()
event    | character varying(64)    | not null name      | character varying(64)    | not null svalue    | character
varying(80)   | 
 
The Java code accepts (via CORBA) events and then splitsthe events into multiple rows in this table (one row
perattributein the list_of_attributes).
 

This works fine for insertion - it's simple and fast enough with
transactions.

However, although selection performance isn't a priority, the
ability to reconstruct the events from the database is needed
and the above simple table doesn't provide enough information
to do so.  (The resolution on the timestamp field isn't
enough to distinquish separate events that have the same name.)

Is there an obvious way to solve this (preserving insert
performance as much as possible) at the database level?  Or
should I bite the bullet and solve it at the java level?
I'm partial to a solution solving it at the database level
because the fact that there are multiple event sources (java
apps distributed across a network).

I'm leaning toward two tables - one with rows of "timestamp,event"
and an internal tag that can be used to identify rows of "name,svalue"
in a second table, but I'm new enough to SQL to be uncertain
of the best way to set this up and still get good insert performance.
(And whether this would really be better than some approach using a
single table.) 

Thanks for any guidance!


Steve
-- 
Steve Wampler -- swampler@noao.edu
The gods that smiled on your birth are now laughing out loud.

Re: Tagging rows into collections?

From

Tom Lane

Date:

19 June 2002, 20:21:48

Steve Wampler <swampler@noao.edu> writes:
>  An event has:  timestamp,event_name,list_of_attributes
>    The list_of_attributes are simple (string) name,value pairs.

> However, although selection performance isn't a priority, the
> ability to reconstruct the events from the database is needed
> and the above simple table doesn't provide enough information
> to do so.  (The resolution on the timestamp field isn't
> enough to distinquish separate events that have the same name.)

What PG version are you using?  In 7.2 the default timestamp resolution
is microseconds, rather than seconds.  That might be enough to fix your
problem.

If not, your two-table approach sounds reasonable.  You could stick
with one table and use arrays for the name/value columns, but that
will make searches harder.
        regards, tom lane

Re: Tagging rows into collections?

From

Steve Wampler

Date:

20 June 2002, 12:54:25

Tom Lane wrote:
> 
> Steve Wampler <swampler@noao.edu> writes:
> >  An event has:  timestamp,event_name,list_of_attributes
> >    The list_of_attributes are simple (string) name,value pairs.
> 
> > However, although selection performance isn't a priority, the
> > ability to reconstruct the events from the database is needed
> > and the above simple table doesn't provide enough information
> > to do so.  (The resolution on the timestamp field isn't
> > enough to distinquish separate events that have the same name.)
> 
> What PG version are you using?  In 7.2 the default timestamp resolution
> is microseconds, rather than seconds.  That might be enough to fix your
> problem.

Still 7.1.  I'd rather not rely on the timestamp resolution, though
that would probably work with 7.2's resolution.

> If not, your two-table approach sounds reasonable.  You could stick
> with one table and use arrays for the name/value columns, but that
> will make searches harder.

I'll try the two-table approach - thanks!

How can I associate <timestamp,event_name> entries in the first
table with <name,value> entries in the second?  Would using the OID
work - and, if so, how?  (Since insertion performance is important,
I'd like to avoid having to do an: [insert <timestamp,event_name>;
query-to-get-id;insert <name,value> sequences].  Is there a way in
SQL to set up the linkage between a row in the first table and
(multiple) rows in the second table?

Thanks!
-- 
Steve Wampler -- swampler@noao.edu
The gods that smiled on your birth are now laughing out loud.

Re: Tagging rows into collections?

From

Richard Huxton

Date:

20 June 2002, 17:38:37

On Wednesday 19 Jun 2002 10:19 pm, Tom Lane wrote:
> Steve Wampler <swampler@noao.edu> writes:
> >  An event has:  timestamp,event_name,list_of_attributes
> >    The list_of_attributes are simple (string) name,value pairs.
> >
> > However, although selection performance isn't a priority, the
> > ability to reconstruct the events from the database is needed
> > and the above simple table doesn't provide enough information
> > to do so.  (The resolution on the timestamp field isn't
> > enough to distinquish separate events that have the same name.)
>
> What PG version are you using?  In 7.2 the default timestamp resolution
> is microseconds, rather than seconds.  That might be enough to fix your
> problem.

Still doesn't *guarantee* uniqueness though, just makes it less likely.

> If not, your two-table approach sounds reasonable.  You could stick
> with one table and use arrays for the name/value columns, but that
> will make searches harder.

How about using a sequence to generate unique numbers for you? Looks like a
SERIAL type won't be much use, but a sequence can used without tying it to a
field.

One thing to be careful of - if you have multiple clients inserting then the
numbers won't necessarily be in order. That is, client 1 might insert
10,11,12,13 and client 2 20,21,22,23 but in time-order they might be
10,11,20,12,22,23,13. This is because each client will get a batch of numbers
to use (for efficiency reasons). Be aware that I'm not 100% certain on that
last sentence.

- Richard Huxton