Thread: Table design question...
Hi, all! Ok, I've banged my head on this, and would like to hear some opinion from the list (I'm just short of trying it, though I don't have the hardware yet). We have along running data logging application, where we essentially get data records at varying speeds from a facility. The data can be treated as arrays of floats, arrays of strings or binary dumps, where each element in the arrays represent one value. Additionally, there is the possibility of having a varying number of 'multisampler' values in one packet, where each value itself is an array. Note, that in one packet the arrays need not be of same length! So, this gives me at least 2 tables for the data, one that can handle the 'scalar' data (float, string, binary) and one that needs an array for every value (floats only). All data will have two time stamps (facility and receive times), as data may be sent off-line to the logger. So, all this boils down to the simple question: Is it better (in terms of indices) to have a seperate table for the time stamps and join the data to it via an foreign key id field, or have both timestamps in each data table? What if I would create more data tables? Please keep in mind, that I expect to have multi-million records in each table ;-) Thx,Joerg -- Leading SW developer - S.E.A GmbH Mail: joerg.hessdoerfer@sea-gmbh.com WWW: http://www.sea-gmbh.com
Thank you for your help, On Tuesday 06 August 2002 11:46, you wrote: > On Tuesday 06 Aug 2002 10:17 am, Joerg Hessdoerfer wrote: > > [ data logging with variable formats for data some of which might be arrays > for a single item - nice :-/ ] > Yep. This is what it boils down to - and noone can tell me in advance what the data type for each entry really is... it can even change mid-operation (now how's that for design!). > > So, all this boils down to the simple question: Is it better (in terms of > > indices) to have a seperate table for the time stamps and join the data > > to it via an foreign key id field, or have both timestamps in each data > > table? What if I would create more data tables? > > First instinct is to have timestamps, data-type and reference in one table > and separate tables for each data-type for the data. > Second instinct - have one table with null columns for unused types and > take a hit on the wasted space. > > But - it really depends on how you're going to use the data. Will you be > listing all entries between time X and Y or viewing entries of a specific > type? Do you have any feel for what the common operations are going to be? > > - Richard Huxton Well, the access is mainly 'export dumps' in a special format, where varying data items need to be exported from between two time stamps. So there was my idea using the arrays coming from. I'd have something like CREATE TABLE data_float( times_id int4 references times(id), names int4[], values float[] ); Where names would index into a table with the name->id mapping, because I could throw out all unneeded data items after the SELECT(), thus having much less data overhead (and much less records, we're talking ~500 data items per packet - varying, of course ;-). But is the above better than CREATE TABLE data_float( scet timestamp without time zone, recv timestamp without time zone, names int4[], values float[] ); given I have at least two data tables (to acommodate for the 'array in one value' data items, which are much less common (1/250, roughly)). Thanks,Joerg -- Leading SW developer - S.E.A GmbH Mail: joerg.hessdoerfer@sea-gmbh.com WWW: http://www.sea-gmbh.com
On Tuesday 06 Aug 2002 11:35 am, Joerg Hessdoerfer wrote: > Thank you for your help, > > On Tuesday 06 August 2002 11:46, you wrote: > > On Tuesday 06 Aug 2002 10:17 am, Joerg Hessdoerfer wrote: > > > > [ data logging with variable formats for data some of which might be > > arrays for a single item - nice :-/ ] > > Yep. This is what it boils down to - and noone can tell me in advance what > the data type for each entry really is... it can even change mid-operation > (now how's that for design!). > Well, the access is mainly 'export dumps' in a special format, where > varying data items need to be exported from between two time stamps. So > there was my idea using the arrays coming from. I'd have something like > > CREATE TABLE data_float( > times_id int4 references times(id), > names int4[], > values float[] > ); > But is the above better than > > CREATE TABLE data_float( > scet timestamp without time zone, > recv timestamp without time zone, > names int4[], > values float[] > ); > > given I have at least two data tables (to acommodate for the 'array in one > value' data items, which are much less common (1/250, roughly)). If your exports are all of one type (i.e. all arrays or all integer or all float) I'd go for (B) since that makes things nice and easy. On the other hand, if you want to export different types in one query you'll need to do a union unless you put all possible fields in one table: CREATE TABLE data_all ( scet timestamp, recv timestamp, data_type char(1) CHECK (data_type IN ('i','f','a')),intval int4, floatval float, arr_names int4[], arr_values int4[] ); If you go the route of (A) or (B) I'd run a few tests to see which is more efficient. You can always hide the implementation details behind a view and some functions. - Richard Huxton
Hi again, On Tuesday 06 August 2002 14:59, you wrote: [...] > If your exports are all of one type (i.e. all arrays or all integer or all > float) I'd go for (B) since that makes things nice and easy. On the other > hand, if you want to export different types in one query you'll need to do > a union unless you put all possible fields in one table: > > CREATE TABLE data_all ( > scet timestamp, > recv timestamp, > data_type char(1) CHECK (data_type IN ('i','f','a')), > intval int4, > floatval float, > arr_names int4[], > arr_values int4[] > ); > > If you go the route of (A) or (B) I'd run a few tests to see which is more > efficient. You can always hide the implementation details behind a view and > some functions. > > - Richard Huxton > [...] Yes, I'm currently going the testing route - my initial idea was also something like your table def above, but this rules itself out (I really think 500,000,000 rows is a bit too much ;-). Don't get me wrong, I think a 'big iron' could manage that easily, but I need to make do with a quite common PeeCee (running Linux, that is)... Thanks again for your input,Joerg -- Leading SW developer - S.E.A GmbH Mail: joerg.hessdoerfer@sea-gmbh.com WWW: http://www.sea-gmbh.com