Home > mailing lists

Thread: two disks - best way to use them?

two disks - best way to use them?

From

"Rick Schumeyer"

Date:

02 December 2005, 15:00:09

I installed another drive in my linux pc in an attempt to improve performance

on a large COPY to a table with a geometry index.

Based on previous discussion, it seems there are three things competing for the hard

drive:

1) the input data file

2) the pg table

3) the WAL

What is the best way to distribute these among two drives? From Tom’s comments

I would think that the pg table and the WAL should be separate. Does it matter where

the input data is?

Re: two disks - best way to use them?

From

Ron

Date:

02 December 2005, 16:05:43

At 01:58 PM 12/2/2005, Rick Schumeyer wrote:
>I installed another drive in my linux pc in an attempt to improve performance
>on a large COPY to a table with a geometry index.
>
>Based on previous discussion, it seems there are three things
>competing for the hard drive:
>
>1)       the input data file
>2)       the pg table
>3)       the WAL
>
>What is the best way to distribute these among two drives?  From
>Tom's comments
>I would think that the pg table and the WAL should be
>separate.  Does it matter where the input data is?

Best is to have 3 HD or HD sets, one for each of the above.

With only 2, and assuming the input file is too large to fit
completely into RAM at once, I'd test to see whether:
a=  input on one + pg table & WAL on the other, or
b= WAL on one + pg table & input file on the other
is best.

If the input file can be made 100% RAM resident, then use
c= pg table on one + WAL and input file on the other.

The big goal here is to minimize HD head seeks.

Ron

Re: two disks - best way to use them?

From

Olleg Samoylov

Date:

05 December 2005, 09:56:57

Rick Schumeyer wrote:
> 1)       the input data file
> 2)       the pg table
> 3)       the WAL

And journal of file system, especially if you not set "noatime" mount
option. WAL and file system journal like to make sync.

IMHO: on first disk (raid mirror:)) I place /, pg_table and file system
journal from second disk. On second /var and pg tables. Thus first disc
is synced time to time, second not.
--
Olleg Samoylov

Re: two disks - best way to use them?

From

Thomas Harold

Date:

05 December 2005, 11:47:59

Ron wrote:

> At 01:58 PM 12/2/2005, Rick Schumeyer wrote:
>
>> I installed another drive in my linux pc in an attempt to improve
>> performance
>> on a large COPY to a table with a geometry index.
>>
>> Based on previous discussion, it seems there are three things
>> competing for the hard drive:
>>
>> 1)       the input data file
>> 2)       the pg table
>> 3)       the WAL
>>
>> What is the best way to distribute these among two drives?  From Tom's
>> comments
>> I would think that the pg table and the WAL should be separate.  Does
>> it matter where the input data is?
>
>
> Best is to have 3 HD or HD sets, one for each of the above.
>
> With only 2, and assuming the input file is too large to fit completely
> into RAM at once, I'd test to see whether:
> a=  input on one + pg table & WAL on the other, or
> b= WAL on one + pg table & input file on the other
> is best.
>
> If the input file can be made 100% RAM resident, then use
> c= pg table on one + WAL and input file on the other.
>
> The big goal here is to minimize HD head seeks.

(noob question incoming)

Section 26.4 WAL Internals
http://www.postgresql.org/docs/8.1/interactive/wal-internals.html

This seems to be the applicable chapter.  They talk about creating a
symlink for the data/pg_xlog folder to point at another disk set.

If I have (2) RAID1 sets with LVM2, can I instead create a logical
volume on the 2nd disk set and just mount data/pg_xlog to point at the
logical volume on the 2nd disk set?

For example, I have an LVM on my primary mirror called 'pgsql'.  And
I've created a 2nd LVM on my secondary mirror called 'pgxlog'.  These
are mounted as:

/dev/vgraida/pgsql on /var/lib/postgresql type ext3 (rw,noatime)

/dev/vgraidb/pgxlog on /var/lib/postgresql/data/pg_xlog type ext3
(rw,noatime)

 From the application's P.O.V., it's the same thing, right? (It seems to
be working, I'm just trying to double-check that I'm not missing something.)

Re: two disks - best way to use them?

From

David Lang

Date:

05 December 2005, 12:18:41

On Mon, 5 Dec 2005, Thomas Harold wrote:

> (noob question incoming)
>
> Section 26.4 WAL Internals
> http://www.postgresql.org/docs/8.1/interactive/wal-internals.html
>
> This seems to be the applicable chapter.  They talk about creating a symlink
> for the data/pg_xlog folder to point at another disk set.
>
> If I have (2) RAID1 sets with LVM2, can I instead create a logical volume on
> the 2nd disk set and just mount data/pg_xlog to point at the logical volume
> on the 2nd disk set?
>
> For example, I have an LVM on my primary mirror called 'pgsql'.  And I've
> created a 2nd LVM on my secondary mirror called 'pgxlog'.  These are mounted
> as:
>
> /dev/vgraida/pgsql on /var/lib/postgresql type ext3 (rw,noatime)
>
> /dev/vgraidb/pgxlog on /var/lib/postgresql/data/pg_xlog type ext3
> (rw,noatime)
>
> From the application's P.O.V., it's the same thing, right? (It seems to be
> working, I'm just trying to double-check that I'm not missing something.)
>

the application can' tell the difference, but the reason for seperating
them isn't for the application, it's so that different pieces of hardware
can work on different things without having to bounce back and forth
between them.

useing the same drives with LVM doesn't achieve this goal.

the problem is that the WAL is doing a LOT of writes, and postgres waits
until each write is completed before going on to the next thing (for
safety), if a disk is dedicated to the WAL then the head doesn't move
much. if the disk is used for other things as well then the heads have to
move across the disk surface between the WAL and where the data is. this
drasticly slows down the number of items that can go into the WAL, and
therefor slows down the entire system.

this slowdown isn't even something as simple as cutting your speed in half
(half the time spent working on the WAL, half spent on the data itself),
it's more like 10% spent on the WAL, 10% spent on the data, and 80%
moveing back and forth between them (I am probably wrong on the exact
numbers, but it is something similarly drastic)

this is also the reason why it's so good to have a filesystem journal on a
different drive.

David Lang

Re: two disks - best way to use them?

From

Thomas Harold

Date:

05 December 2005, 12:45:02

David Lang wrote:

> the application can' tell the difference, but the reason for seperating
> them isn't for the application, it's so that different pieces of
> hardware can work on different things without having to bounce back and
> forth between them.
>
> useing the same drives with LVM doesn't achieve this goal.
>
> the problem is that the WAL is doing a LOT of writes, and postgres waits
> until each write is completed before going on to the next thing (for
> safety), if a disk is dedicated to the WAL then the head doesn't move
> much. if the disk is used for other things as well then the heads have
> to move across the disk surface between the WAL and where the data is.
> this drasticly slows down the number of items that can go into the WAL,
> and therefor slows down the entire system.
>
> this slowdown isn't even something as simple as cutting your speed in
> half (half the time spent working on the WAL, half spent on the data
> itself), it's more like 10% spent on the WAL, 10% spent on the data, and
> 80% moveing back and forth between them (I am probably wrong on the
> exact numbers, but it is something similarly drastic)

Yeah, I don't think I was clear about the config.  It's (4) disks setup
as a pair of RAID1 sets.  My original config was pgsql on the first RAID
set (data and WAL).  I'm now experimenting with putting the data/pg_xlog
folder on the 2nd set of disks.

Under the old setup (everything on the original RAID1 set, in a
dedicated 32GB LVM volume), I was seeing 80-90% wait percentages in
"top".  My understanding is that this is an indicator of an overloaded /
bottlenecked disk system.  This was while doing massive inserts into a
test table (millions of narrow rows).  I'm waiting to see what happens
once I have data/pg_xlog on the 2nd disk set.

Thanks for the input.

Re: two disks - best way to use them?

From

David Lang

Date:

05 December 2005, 23:58:38

On Mon, 5 Dec 2005, Thomas Harold wrote:

> Yeah, I don't think I was clear about the config.  It's (4) disks setup as a
> pair of RAID1 sets.  My original config was pgsql on the first RAID set (data
> and WAL).  I'm now experimenting with putting the data/pg_xlog folder on the
> 2nd set of disks.
>
> Under the old setup (everything on the original RAID1 set, in a dedicated
> 32GB LVM volume), I was seeing 80-90% wait percentages in "top".  My
> understanding is that this is an indicator of an overloaded / bottlenecked
> disk system.  This was while doing massive inserts into a test table
> (millions of narrow rows).  I'm waiting to see what happens once I have
> data/pg_xlog on the 2nd disk set.

in that case you logicly have two disks, so see the post from Ron earlier
in this thread.

David Lang

Re: two disks - best way to use them?

From

Thomas Harold

Date:

06 December 2005, 01:52:45

David Lang wrote:

> in that case you logicly have two disks, so see the post from Ron
> earlier in this thread.

And it's a very nice performance gain.  Percent spent waiting according
to "top" is down around 10-20% instead of 80-90%.  While I'm not
prepared to benchmark, database performance is way up.  The client
machines that are writing the data are running closer to 100% CPU
(before they were well below 50% CPU utilization).

Re: two disks - best way to use them?

From

Ron

Date:

06 December 2005, 04:12:43

At 12:52 AM 12/6/2005, Thomas Harold wrote:
>David Lang wrote:
>
>>in that case you logicly have two disks, so see the post from Ron
>>earlier in this thread.
>
>And it's a very nice performance gain.  Percent spent waiting
>according to "top" is down around 10-20% instead of 80-90%.  While
>I'm not prepared to benchmark, database performance is way up.  The
>client machines that are writing the data are running closer to 100%
>CPU (before they were well below 50% CPU utilization).
For accuracy's sake, which exact config did you finally use?

How did you choose the config you finally used?  Did you test the
three options or just pick one?

Ron

Re: two disks - best way to use them?

From

Thomas Harold

Date:

06 December 2005, 12:00:10

Ron wrote:

> For accuracy's sake, which exact config did you finally use?
>
> How did you choose the config you finally used?  Did you test the three
> options or just pick one?

(Note: I'm not the original poster.)

I just picked the option of putting the data/pg_xlog directory (WAL) on
a 2nd set of spindles.  That was the easiest thing for me to change on
this test box.

The test server is simply a Gentoo box running software RAID and LVM2.
The primary disk set is 2x7200RPM 300GB drives and the secondary disk
set is 2x5400RPM 300GB drives.  Brand new install of PGSQL 8.1, with
mostly default settings (I changed FSM pages to be a higher value,
max_fsm_pages = 150000).  PGSQL was given it's own ext3 32GB LVM volume
on the primary disk set (2x7200RPM).  Originally, all files were on the
primary disk.

The task at hand was inserting large quantity of ~45 byte rows
(according to "vacuum verbose"), on the order of millions of records per
table.  There was an unique key and a unique index.  Test clients were
accessing the database via ODBC / ADO and doing the inserts in a fairly
brute-force mode (attempt the insert, calling .CancelUpdate if it fails).

When the tables were under 2 million rows, performance was okay. At one
point, I had a 1.8Ghz P4, dual Opteron 246, and Opteron 148 CPUs running
at nearly 100% CPU processing and doing inserts into the database.  So I
had 4 clients running, performing inserts to 4 separate tables in the
same database.  The P4 ran at about half the throughput as the Opterons
(client-bound due to the code that generated row data prior to the
insert), so I'd score my throughput as roughly 3.3-3.4.  Where 1.0 would
be full utilization of the Opteron 148 box.

However, once the tables started getting above ~2 million rows,
performance took a nose dive.  CPU utilizations on the 4 client CPUs
dropped into the basement (5-20% CPU) and I had to back off on the
number of clients.  So throughput had dropped down to around 0.25 or so.
  The linux box was spending nearly all of its time waiting on the
primary disks.

Moving the data/pg_xlog (WAL) to the 2nd set of disks (2x5400RPM) in the
test server made a dramatic difference for this mass insert.  I'm
running the P4 (100% CPU) and the Opteron 148 (~80% CPU) at the moment.
  While it's not up to full speed, a throughput of ~1.3 is a lot better
then the ~0.25 that I was getting prior.  (The two tables currently
being written have over 5 million rows each.  One table has ~16 million
rows.)  Wait percentage in "top" is only running 20-30% (dipping as low
as 10%).  I haven't pushed this new setup hard enough to determine where
the upper limit for throughput is.

It's very much a niche test (millions of inserts of narrow rows into
multiple tables using fairly brain-dead code).  But it gives me data
points on which to base purchasing of the production box.  The original
plan was a simple RAID1 setup (2 spindles), but this tells me it's
better to order 4 spindles and set it up as a pair of RAID1 sets.

Whether 4 spindles is better as two separate RAID1 arrays, or configured
as a single RAID1+0 array... dunno.  Our application is typically more
limited by insert speed then read speed (so I'm leaning towards separate
RAID arrays).

I'm sure there's also more tuning that could be done to the PGSQL
database (in the configuration file).  Also, the code is throwaway code
that isn't the most elegant.

Re: two disks - best way to use them?

From

David Lang

Date:

06 December 2005, 18:00:33

On Tue, 6 Dec 2005, Thomas Harold wrote:

> Ron wrote:
>
>> For accuracy's sake, which exact config did you finally use?
>>
>> How did you choose the config you finally used?  Did you test the three
>> options or just pick one?
>
> (Note: I'm not the original poster.)
>
> I just picked the option of putting the data/pg_xlog directory (WAL) on a 2nd
> set of spindles.  That was the easiest thing for me to change on this test
> box.
>
> The test server is simply a Gentoo box running software RAID and LVM2. The
> primary disk set is 2x7200RPM 300GB drives and the secondary disk set is
> 2x5400RPM 300GB drives.  Brand new install of PGSQL 8.1, with mostly default
> settings (I changed FSM pages to be a higher value, max_fsm_pages = 150000).
> PGSQL was given it's own ext3 32GB LVM volume on the primary disk set
> (2x7200RPM).  Originally, all files were on the primary disk.

the WAL is more sensitive to drive speeds then the data is, so you may
pick up a little more performance by switching the WAL to the 7200 rpm
drives instead of the 5400 rpm drives.

if you see a noticable difference with this, consider buying a pair of
smaller, but faster drives (10k or 15k rpm drives, or a solid-state
drive). you can test this (with significant data risk) by putting the WAL
on a ramdisk and see what your performance looks like.

David Lang