Re: 1 or 2 servers for large DB scenario. - Mailing list pgsql-performance

From Matthew
Subject Re: 1 or 2 servers for large DB scenario.
Date
Msg-id Pine.LNX.4.64.0801251645110.4642@aragorn.flymine.org
Whole thread Raw
In response to 1 or 2 servers for large DB scenario.  (David Brain <dbrain@bandwidth.com>)
List pgsql-performance
On Fri, 25 Jan 2008, David Brain wrote:
> We currently have one large DB (~1.2TB on disk), that essentially consists of
> 1 table with somewhere in the order of 500 million rows , this database has
> daily inserts as well as being used for some semi-data mining type
> operations, so there are a fairly large number of indices on the table.  The
> hardware storing this DB (a software RAID6) array seems to be very IO bound
> for writes and this is restricting our insert performance to ~50TPS.

As you have such a complex insert procedure, I'm not so surprised that you
are getting this kind of performance. Your average discs will do something
like 200 seeks per second, so if you are having to perform four seeks per
transaction, that would explain it. Remember, on software RAID 6 (without
a battery backed up cache) all the discs will probably need to participate
in each transaction.

Your suggestion of splitting the data seems hinged around having a smaller
table resulting in quicker SELECTs - it might be worth doing an experiment
to see whether this is actually the case. My guess is that you may not
actually get much of an improvement.

So, my suggestion would be to:
1. Make sure the server has plenty of RAM, so hopefully a lot of the
    SELECT traffic hits the cache.
2. Upgrade your disc system to hardware RAID, with a battery-backed-up
    cache. This will enable the writes to occur immediately without having
    to wait for the discs each time. RAID 6 sounds fine, as long as there
    is a battery-backed-up cache in there somewhere. Without that, it can
    be a little crippled.

We don't actually have that much information on how much time Postgres is
spending on each of the different activities, but the above is probably a
good place to start.

Hope that helps,

Matthew

pgsql-performance by date:

Previous
From: Greg Smith
Date:
Subject: Re: 1 or 2 servers for large DB scenario.
Next
From: Matthew
Date:
Subject: Re: 1 or 2 servers for large DB scenario.