Re: High Frequency Inserts to Postgres Database vs Writing to a File - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: High Frequency Inserts to Postgres Database vs Writing to a File
Date
Msg-id b42b73150911040559w660d1ad2w38a014fbc34e877f@mail.gmail.com
Whole thread Raw
In response to High Frequency Inserts to Postgres Database vs Writing to a File  (Jay Manni <JManni@FireEye.com>)
Responses Re: High Frequency Inserts to Postgres Database vs Writing to a File  (Craig Ringer <craig@postnewspapers.com.au>)
List pgsql-performance
n Tue, Nov 3, 2009 at 10:12 PM, Jay Manni <JManni@fireeye.com> wrote:
> Hi:
>
> I have an application wherein a process needs to read data from a stream and
> store the records for further analysis and reporting. The data in the stream
> is in the form of variable length records with clearly defined fields – so
> it can be stored in a database or in a file. The only caveat is that the
> rate of records coming in the stream could be several 1000 records a second.

Postgres doing this is going to depend on primarily two things:
*) Your hardware
*) The mechanism you use to insert the data into the database

Postgres can handle multiple 1000 insert/sec but your hardware most
likely can't handle multiple 1000 transaction/sec if fsync is on.  You
definitely want to batch the insert into the database somehow, so that
something accumulates the data (could be a simple file), and flushes
it in to the database.   The 'flush' ideally should use copy but
multiple row insert is ok too.  Try to avoid inserting one row at a
time even if in a transaction.

If you are bulk inserting 1000+ records/sec all day long, make sure
you have provisioned enough storage for this (that's 86M records/day),
and you should immediately start thinking about partitioning and
rotating the log table (if you log to the database, partition/rotate
is basically already baked in anyways).

The effects on other users of the database are really hard to predict
-- it's going to depend on how much resources you have (cpu and
especially disk) to direct towards the loading and how the database is
being used.  I expect it shouldn't be too bad unless your dataase is
already i/o loaded.  The good news is testing this is relatively easy
you can simulate a load test and just run it during typical use and
see how it affects other users.  Standard o/s tools (iostat, top), and
database log with min_duration_statement are going to be a big help
here.   If you start seeing big leaps in iowait corresponding with
unexpectedly lagging queries in your app , you probably should think
about scrapping the idea.

merlin

pgsql-performance by date:

Previous
From: Ivan Voras
Date:
Subject: Re: Free memory usage Sol10, 8.2.9
Next
From: Greg Smith
Date:
Subject: Re: High Frequency Inserts to Postgres Database vs Writing to a File