newbie table design question - Mailing list pgsql-general

From Andrew Smith
Subject newbie table design question
Date
Msg-id 300497a30905310854j5d34e9e8h3d805ffb5f160da@mail.gmail.com
Whole thread Raw
Responses Re: newbie table design question  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: newbie table design question  (Sam Mason <sam@samason.me.uk>)
Re: newbie table design question  (Grzegorz Jaśkiewicz <gryzman@gmail.com>)
List pgsql-general
Hi all,

I'm a beginner when it comes to Postgresql, and have a table design question about a project I'm currently working on.  I have 1500 data items that need to be copied every minute from an external system into my database.  The items have a timestamp, an identifier and a value.  For example:

12/10/2008 05:00       ID_ABC      14
12/10/2008 05:01       ID_ABC      17
12/10/2008 05:02       ID_ABC      13

Pretty simple stuff.  The current solution (implemented using SQL Server a few years ago) looks like this (an approximation using Postgresql syntax):

CREATE TABLE "DataImport"
(
  "DataImportID" serial NOT NULL PRIMARY KEY,
  "Time" timestamp without time zone NOT NULL,
  "ID_ABC" integer NOT NULL,
  "ID_DEF" integer NOT NULL,
  "ID_HIJ" integer NOT NULL,
  etc
);

While this design results in only 14400 records being present in the table per day, I don't like it.  One problem is that every time a data item is added or removed from the import process, the structure of the table needs to be altered.  I don't know what sort of overhead that involves in the Postgresql world, but I'm thinking that once I define the structure of the table, I don't want to be messing with it very often, if at all.

My initial thought for the design of the new solution looks like this:

CREATE TABLE "DataImport"
(
  "DataImportID" serial NOT NULL PRIMARY KEY,
  "Time" timestamp without time zone NOT NULL,
  "Identifier" text NOT NULL,
  "Value" integer NOT NULL
);

Users will then be doing regular queries on this data (say, a few hundred times per day), such as:

SELECT "Time", "Value" FROM "DataImport" WHERE "Identifier" = 'ID_ABC' AND "Time" between '2008-11-07' and '2008-11-11';

My concern is that 1500 values * 14400 minutes per day = 21,600,000 records.  Add this up over the course of a month (the length of time I need to keep the data in this table) and I'll have more than half a billion records being stored in there. 

I guess my question is: is my approach reasonable?  I haven't dealt with tables of this size before (using any DBMS) - should I expect really slow queries due to the sheer number of records present?  Is there some better way I should be structuring my imported data?  All suggestions welcome.

Cheers,

Andrew


pgsql-general by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Pl/java in 8.4 bet1 sources compilation failed
Next
From: Tom Lane
Date:
Subject: Re: newbie table design question