Thread: Tagging rows into collections?
I've got what I think is a fairly simple SQL question. Frist, some background. I'm building a database to log events produced by some scientific instruments. The postgresql database is attached toby muliple java programs using JDBC. An event has: timestamp,event_name,list_of_attributes The list_of_attributes are simple (string) name,value pairs. Anevent may have 1 or more attributes in its list (no upper limit is imposed by the system). Insertion performance is critical - selection performance *much* less so. I wrote a simple test version using a single table with columns: lab.devel.eventdb=# \d events_table Table "events_table" Attribute | Type | Modifier -----------+--------------------------+--------------- time | timestamp with time zone | default now() event | character varying(64) | not null name | character varying(64) | not null svalue | character varying(80) | The Java code accepts (via CORBA) events and then splitsthe events into multiple rows in this table (one row perattributein the list_of_attributes). This works fine for insertion - it's simple and fast enough with transactions. However, although selection performance isn't a priority, the ability to reconstruct the events from the database is needed and the above simple table doesn't provide enough information to do so. (The resolution on the timestamp field isn't enough to distinquish separate events that have the same name.) Is there an obvious way to solve this (preserving insert performance as much as possible) at the database level? Or should I bite the bullet and solve it at the java level? I'm partial to a solution solving it at the database level because the fact that there are multiple event sources (java apps distributed across a network). I'm leaning toward two tables - one with rows of "timestamp,event" and an internal tag that can be used to identify rows of "name,svalue" in a second table, but I'm new enough to SQL to be uncertain of the best way to set this up and still get good insert performance. (And whether this would really be better than some approach using a single table.) Thanks for any guidance! Steve -- Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
Steve Wampler <swampler@noao.edu> writes: > An event has: timestamp,event_name,list_of_attributes > The list_of_attributes are simple (string) name,value pairs. > However, although selection performance isn't a priority, the > ability to reconstruct the events from the database is needed > and the above simple table doesn't provide enough information > to do so. (The resolution on the timestamp field isn't > enough to distinquish separate events that have the same name.) What PG version are you using? In 7.2 the default timestamp resolution is microseconds, rather than seconds. That might be enough to fix your problem. If not, your two-table approach sounds reasonable. You could stick with one table and use arrays for the name/value columns, but that will make searches harder. regards, tom lane
Tom Lane wrote: > > Steve Wampler <swampler@noao.edu> writes: > > An event has: timestamp,event_name,list_of_attributes > > The list_of_attributes are simple (string) name,value pairs. > > > However, although selection performance isn't a priority, the > > ability to reconstruct the events from the database is needed > > and the above simple table doesn't provide enough information > > to do so. (The resolution on the timestamp field isn't > > enough to distinquish separate events that have the same name.) > > What PG version are you using? In 7.2 the default timestamp resolution > is microseconds, rather than seconds. That might be enough to fix your > problem. Still 7.1. I'd rather not rely on the timestamp resolution, though that would probably work with 7.2's resolution. > If not, your two-table approach sounds reasonable. You could stick > with one table and use arrays for the name/value columns, but that > will make searches harder. I'll try the two-table approach - thanks! How can I associate <timestamp,event_name> entries in the first table with <name,value> entries in the second? Would using the OID work - and, if so, how? (Since insertion performance is important, I'd like to avoid having to do an: [insert <timestamp,event_name>; query-to-get-id;insert <name,value> sequences]. Is there a way in SQL to set up the linkage between a row in the first table and (multiple) rows in the second table? Thanks! -- Steve Wampler -- swampler@noao.edu The gods that smiled on your birth are now laughing out loud.
On Wednesday 19 Jun 2002 10:19 pm, Tom Lane wrote: > Steve Wampler <swampler@noao.edu> writes: > > An event has: timestamp,event_name,list_of_attributes > > The list_of_attributes are simple (string) name,value pairs. > > > > However, although selection performance isn't a priority, the > > ability to reconstruct the events from the database is needed > > and the above simple table doesn't provide enough information > > to do so. (The resolution on the timestamp field isn't > > enough to distinquish separate events that have the same name.) > > What PG version are you using? In 7.2 the default timestamp resolution > is microseconds, rather than seconds. That might be enough to fix your > problem. Still doesn't *guarantee* uniqueness though, just makes it less likely. > If not, your two-table approach sounds reasonable. You could stick > with one table and use arrays for the name/value columns, but that > will make searches harder. How about using a sequence to generate unique numbers for you? Looks like a SERIAL type won't be much use, but a sequence can used without tying it to a field. One thing to be careful of - if you have multiple clients inserting then the numbers won't necessarily be in order. That is, client 1 might insert 10,11,12,13 and client 2 20,21,22,23 but in time-order they might be 10,11,20,12,22,23,13. This is because each client will get a batch of numbers to use (for efficiency reasons). Be aware that I'm not 100% certain on that last sentence. - Richard Huxton