Thread: 1 or 2 servers for large DB scenario.

1 or 2 servers for large DB scenario.

From

David Brain

Date:

25 January 2008, 12:38:34

Hi,

I'd appreciate some assistance in working through what would be the
optimal configuration for the following situation.

We currently have one large DB (~1.2TB on disk), that essentially
consists of 1 table with somewhere in the order of 500 million rows ,
this database has daily inserts as well as being used for some semi-
data mining type operations, so there are a fairly large number of
indices on the table.  The hardware storing this DB (a software RAID6)
array seems to be very IO bound for writes and this is restricting our
insert performance to ~50TPS.

As we need to achieve significantly faster insert performance I have
been considering splitting  the table into 'new' and 'old' data,
making inserts into the 'new' table (which will also help as there are
really 1 insert, an update and some selects involved in populating the
table), then moving the data over to the 'old' DB on a periodic
basis.  There would be new hardware involved, I'm thinking of HW RAID
10 to improve the write performance.

The question really is, is it best to use two separate servers and
databases (and have to come up with some copy process to move data
from one to another), or to just add the faster storage hardware to
the existing server and create a new tablespace for the 'new data'
table on that hardware.  Doing this would enable us to potentially
move data more easily from new to old (we can't use partitioning
because there is some logic involved in when things would need to be
moved to 'old').  Are there any global resources that make just adding
the faster storage to the existing box a bad idea (the wal_log springs
to mind - although that could be moved too), that would make adding an
additional server instead a better idea?

Also are there any settings that might boost our TPS on the existing
hardware (sync=off isn't an option.. (-: ).  I have already
significantly increased the various buffers, but this was mainly to
improve select performance?

Verson of  Postgresql is 8.2.3.

Thanks,

David.

Re: 1 or 2 servers for large DB scenario.

From

Greg Smith

Date:

25 January 2008, 12:55:51

On Fri, 25 Jan 2008, David Brain wrote:

> The hardware storing this DB (a software RAID6) array seems to be very
> IO bound for writes and this is restricting our insert performance to
> ~50TPS.

If you're seeing <100TPS you should consider if it's because you're
limited by how fast WAL commits can make it to disk.  If you really want
good insert performance, there is no substitute for getting a disk
controller with a good battery-backed cache to work around that.  You
could just put the WAL xlog directory on a RAID-1 pair of disks to
accelerate that, you don't have to move the whole database to a new
controller.

> Also are there any settings that might boost our TPS on the existing hardware
> (sync=off isn't an option..

Having a good write cache lets you run almost as fast as when fsync is
off.

> Verson of  Postgresql is 8.2.3.

I'm hearing an echo here...8.2 versions before 8.2.4 have a bug related to
statistics that can limit performance in several situations.  You should
consider an upgrade just to remove that as a potential contributor to your
issues.

To do a quick check on whether this is impacting things, run top, press
"c" to show the full process lines, and note whether the statistics
collector process is taking up a significant amount of CPU time.  If it
is, you're definately being nailed by the bug, and you really need that
upgrade.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: 1 or 2 servers for large DB scenario.

From

Matthew

Date:

25 January 2008, 12:58:16

On Fri, 25 Jan 2008, David Brain wrote:
> We currently have one large DB (~1.2TB on disk), that essentially consists of
> 1 table with somewhere in the order of 500 million rows , this database has
> daily inserts as well as being used for some semi-data mining type
> operations, so there are a fairly large number of indices on the table.  The
> hardware storing this DB (a software RAID6) array seems to be very IO bound
> for writes and this is restricting our insert performance to ~50TPS.

As you have such a complex insert procedure, I'm not so surprised that you
are getting this kind of performance. Your average discs will do something
like 200 seeks per second, so if you are having to perform four seeks per
transaction, that would explain it. Remember, on software RAID 6 (without
a battery backed up cache) all the discs will probably need to participate
in each transaction.

Your suggestion of splitting the data seems hinged around having a smaller
table resulting in quicker SELECTs - it might be worth doing an experiment
to see whether this is actually the case. My guess is that you may not
actually get much of an improvement.

So, my suggestion would be to:
1. Make sure the server has plenty of RAM, so hopefully a lot of the
    SELECT traffic hits the cache.
2. Upgrade your disc system to hardware RAID, with a battery-backed-up
    cache. This will enable the writes to occur immediately without having
    to wait for the discs each time. RAID 6 sounds fine, as long as there
    is a battery-backed-up cache in there somewhere. Without that, it can
    be a little crippled.

We don't actually have that much information on how much time Postgres is
spending on each of the different activities, but the above is probably a
good place to start.

Hope that helps,

Matthew

Re: 1 or 2 servers for large DB scenario.

From

Matthew

Date:

25 January 2008, 13:01:50

On Fri, 25 Jan 2008, Greg Smith wrote:
> If you're seeing <100TPS you should consider if it's because you're limited
> by how fast WAL commits can make it to disk.  If you really want good insert
> performance, there is no substitute for getting a disk controller with a good
> battery-backed cache to work around that.  You could just put the WAL xlog
> directory on a RAID-1 pair of disks to accelerate that, you don't have to
> move the whole database to a new controller.

Hey, you *just* beat me to it.

Yes, that's quite right. My suggestion was to move the whole thing, but
Greg is correct - you only need to put the WAL on a cached disc system.
That'd be quite a bit cheaper, I'd imagine.

Another case of that small SSD drive being useful, I think.

Matthew

Re: 1 or 2 servers for large DB scenario.

From

"Heikki Linnakangas"

Date:

27 January 2008, 06:48:24

Matthew wrote:
> On Fri, 25 Jan 2008, Greg Smith wrote:
>> If you're seeing <100TPS you should consider if it's because you're
>> limited by how fast WAL commits can make it to disk.  If you really
>> want good insert performance, there is no substitute for getting a
>> disk controller with a good battery-backed cache to work around that.
>> You could just put the WAL xlog directory on a RAID-1 pair of disks to
>> accelerate that, you don't have to move the whole database to a new
>> controller.
>
> Hey, you *just* beat me to it.
>
> Yes, that's quite right. My suggestion was to move the whole thing, but
> Greg is correct - you only need to put the WAL on a cached disc system.
> That'd be quite a bit cheaper, I'd imagine.
>
> Another case of that small SSD drive being useful, I think.

PostgreSQL 8.3 will have "asynchronous commits" feature, which should
eliminate that bottleneck without new hardware, if you can accept the
loss of last few transaction commits in case of sudden power loss:

http://www.postgresql.org/docs/8.3/static/wal-async-commit.html

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: 1 or 2 servers for large DB scenario.

From

"Jignesh K. Shah"

Date:

27 January 2008, 20:53:06

Hi David,

I have been running few tests with 8.2.4 and here is what I have seen:

If fysnc=off is not an option (and it should not be an option :-) )
then commit_delay=10 setting seems to help a lot in my OLTP runs.
Granted it will delay your transactions a bit, but the gain is big
considering the WAL writes end up doing bigger writes under high load
and got a good boost in performance due to that change (IIRC it was
about 6-10% depending on load and contention). So that might help out.

Curiosly I did spend why it helps out on write contention. Atleast on
Solaris my observation is WAL logs then end up getting bigger than 8K
(Blocksize). This meant an overall reduction in IOPS on the filesystem
thats holding the logs and hence more IOPS capacity available to do do
more Log writes. (Using EAStress type of benchmark, it ended up doing
somewhere between 128K-256KB writes on the logs which was pretty
fascinating since the benchmark does drive fair amount of WAL writes and
without commit_delay, the disks were pretty saturated quickly.

Also if the load is high, then the delay in transaction is pretty much
non existent. (atleast what I observed with commit_delay=10 and
commit_siblings left to default)

Of course as already replied back, 8.3's async commit helps on top of
commit_delay so thats an option if few transactions loss potential is
acceptable.

-Jignesh

David Brain wrote:
> Hi,
>
> I'd appreciate some assistance in working through what would be the
> optimal configuration for the following situation.
>
> We currently have one large DB (~1.2TB on disk), that essentially
> consists of 1 table with somewhere in the order of 500 million rows ,
> this database has daily inserts as well as being used for some
> semi-data mining type operations, so there are a fairly large number
> of indices on the table.  The hardware storing this DB (a software
> RAID6) array seems to be very IO bound for writes and this is
> restricting our insert performance to ~50TPS.
>
> As we need to achieve significantly faster insert performance I have
> been considering splitting  the table into 'new' and 'old' data,
> making inserts into the 'new' table (which will also help as there are
> really 1 insert, an update and some selects involved in populating the
> table), then moving the data over to the 'old' DB on a periodic
> basis.  There would be new hardware involved, I'm thinking of HW RAID
> 10 to improve the write performance.
>
> The question really is, is it best to use two separate servers and
> databases (and have to come up with some copy process to move data
> from one to another), or to just add the faster storage hardware to
> the existing server and create a new tablespace for the 'new data'
> table on that hardware.  Doing this would enable us to potentially
> move data more easily from new to old (we can't use partitioning
> because there is some logic involved in when things would need to be
> moved to 'old').  Are there any global resources that make just adding
> the faster storage to the existing box a bad idea (the wal_log springs
> to mind - although that could be moved too), that would make adding an
> additional server instead a better idea?
>
> Also are there any settings that might boost our TPS on the existing
> hardware (sync=off isn't an option.. (-: ).  I have already
> significantly increased the various buffers, but this was mainly to
> improve select performance?
>
> Verson of  Postgresql is 8.2.3.
>
> Thanks,
>
> David.
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

Re: 1 or 2 servers for large DB scenario.

From

"Merlin Moncure"

Date:

28 January 2008, 00:25:35

On Jan 25, 2008 11:36 AM, David Brain <dbrain@bandwidth.com> wrote:
> I'd appreciate some assistance in working through what would be the
> optimal configuration for the following situation.
>
> We currently have one large DB (~1.2TB on disk), that essentially
> consists of 1 table with somewhere in the order of 500 million rows ,
> this database has daily inserts as well as being used for some semi-
> data mining type operations, so there are a fairly large number of
> indices on the table.  The hardware storing this DB (a software RAID6)
> array seems to be very IO bound for writes and this is restricting our
> insert performance to ~50TPS.
>
> As we need to achieve significantly faster insert performance I have
> been considering splitting  the table into 'new' and 'old' data,
> making inserts into the 'new' table (which will also help as there are
> really 1 insert, an update and some selects involved in populating the
> table), then moving the data over to the 'old' DB on a periodic
> basis.  There would be new hardware involved, I'm thinking of HW RAID
> 10 to improve the write performance.
>
> The question really is, is it best to use two separate servers and
> databases (and have to come up with some copy process to move data
> from one to another), or to just add the faster storage hardware to
> the existing server and create a new tablespace for the 'new data'
> table on that hardware.  Doing this would enable us to potentially
> move data more easily from new to old (we can't use partitioning
> because there is some logic involved in when things would need to be
> moved to 'old').  Are there any global resources that make just adding
> the faster storage to the existing box a bad idea (the wal_log springs
> to mind - although that could be moved too), that would make adding an
> additional server instead a better idea?
>
> Also are there any settings that might boost our TPS on the existing
> hardware (sync=off isn't an option.. (-: ).  I have already
> significantly increased the various buffers, but this was mainly to
> improve select performance?

I would (amalgamating suggestions from others and adding my own):
*) get off raid 6 asap.  raid 6 is wild wild west in database terms
*) partition this table.  if you have a lot of indexes on the table
you might be running into random read problems.  I'm not a huge fan in
partitioning in most cases, but your case passes the smell test.
Unique constraints are a problem, so partition wisely.
*) move wal to separate device(s).  you could see as much as double
tps, but probably less than that.  a single 15k drive will do, or two
in a raid 1.  contrary to the others, I would advise _against_ a ssd
for the wal...wal writing is mostly sequential and ssd is unlikely to
help (where ssd is most likely to pay off is in the database volume
for faster random reads...likely not cost effective).
*) and, for heaven's sake, if there is any way for you to normalize
your database into more than one table, do so :-)

merlin