Re: High Frequency Inserts to Postgres Database vs Writing to a File - Mailing list pgsql-performance

From Anj Adu
Subject Re: High Frequency Inserts to Postgres Database vs Writing to a File
Date
Msg-id f2fd819a0911040858w103c90cbi63d2fda3b814707f@mail.gmail.com
Whole thread Raw
In response to High Frequency Inserts to Postgres Database vs Writing to a File  (Jay Manni <JManni@FireEye.com>)
List pgsql-performance
> I have an application wherein a process needs to read data from a stream and
> store the records for further analysis and reporting. The data in the stream
> is in the form of variable length records with clearly defined fields – so
> it can be stored in a database or in a file. The only caveat is that the
> rate of records coming in the stream could be several 1000 records a second.
> The design choice I am faced with currently is whether to use a postgres
> database or a flat file for this purpose. My application already maintains a
> postgres (8.3.4) database for other reasons – so it seemed like the
> straightforward thing to do. However I am concerned about the performance
> overhead of writing several 1000 records a second to the database. The same
> database is being used simultaneously for other activities as well and I do
> not want those to be adversely affected by this operation (especially the
> query times). The advantage of running complex queries to mine the data in
> various different ways is very appealing but the performance concerns are
> making me wonder if just using a flat file to store the data would be a
> better approach.
>
>
>
> Anybody have any experience in high frequency writes to a postgres database?


As mentioned earlier in this thread,,make sure your hardware can
scale. You may hit a "monolithic hardware" wall and may have to
distribute your data across multiple boxes and have your application
manage the distribution and access. A RAID 10 storage
architecture(since fast writes are critical) with a mulitple core box
(preferably 8) having fast scsi disks (15K rpm) may be a good starting
point.

We have a similar requirement and we scale by distributing the data
across multiple boxes. This is key.

If you need to run complex queries..plan on aggregation strategies
(processes that aggregate and optimize the data storage to facilitate
faster access).

Partitioning is key. You will need to purge old data at some point.
Without partitions..you will run into trouble with the time taken to
delete old data as well as availability of disk space.

These are just guidelines for a big warehouse style database.

pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: maintaining a reference to a fetched row
Next
From: Brian Karlak
Date:
Subject: Re: maintaining a reference to a fetched row