Thread: further testing on IDE drives

further testing on IDE drives

From
"scott.marlowe"
Date:
I was testing to get some idea of how to speed up the speed of pgbench
with IDE drives and the write caching turned off in Linux (i.e. hdparm -W0
/dev/hdx).

The only parameter that seems to make a noticeable difference was setting
wal_sync_method = open_sync.  With it set to either fsync, or fdatasync,
the speed with pgbench -c 5 -t 1000 ran from 11 to 17 tps.  With open_sync
it jumped to the range of 45 to 52 tps.  with write cache on I was getting
280 to 320 tps.  so, not instead of being 20 to 30 times slower, I'm only
about 5 times slower, much better.

Now I'm off to start a "pgbench -c 10 -t 10000" and pull the power cord
and see if the data gets corrupted with write caching turned on, i.e. do
my hard drives have the ability to write at least some of their cache
during spin down.




Re: further testing on IDE drives

From
"scott.marlowe"
Date:
On Thu, 2 Oct 2003, scott.marlowe wrote:

> I was testing to get some idea of how to speed up the speed of pgbench
> with IDE drives and the write caching turned off in Linux (i.e. hdparm -W0
> /dev/hdx).
>
> The only parameter that seems to make a noticeable difference was setting
> wal_sync_method = open_sync.  With it set to either fsync, or fdatasync,
> the speed with pgbench -c 5 -t 1000 ran from 11 to 17 tps.  With open_sync
> it jumped to the range of 45 to 52 tps.  with write cache on I was getting
> 280 to 320 tps.  so, not instead of being 20 to 30 times slower, I'm only
> about 5 times slower, much better.
>
> Now I'm off to start a "pgbench -c 10 -t 10000" and pull the power cord
> and see if the data gets corrupted with write caching turned on, i.e. do
> my hard drives have the ability to write at least some of their cache
> during spin down.

OK, back from testing.

Information:  Dual PIV system with a pair of 80 gig IDE drives, model
number: ST380023A (seagate).  File system is ext3 and is on a seperate
drive from the OS.

These drives DO NOT write cache when they lose power.  Testing was done by
issuing a 'hdparm -W0/1 /dev/hdx' command where x is the real drive
letter, and 0 or 1 was chosen in place of 0/1.  Then I'd issue a 'pgbench
-c 50 -t 100000000' command, wait for a few minutes, then pull the power
cord.

I'm running RH linux 9.0 stock install, kernel: 2.4.20-8smp.

Three times pulling the plug with 'hdparm -W0 /dev/hdx' resulted in a
machine that would boot up, recover with journal, and a database that came
up within about 30 seconds, with all the accounts still intact.

Switching the caching back on with 'hdparm -W1 /dev/hdx' and doing the
same 'pgbench -c 50 -t 100000000' resulted in a corrupted database each
time.

Also, I tried each of the following fsync methods: fsync, fdatasync, and
open_sync with write caching turned off.  Each survived a power off test
with no corruption of the database.  fsync and fdatasync result in 11 to
17 tps with 'pgbench -c 5 -t 500' while open_sync resulted in 45 to 55
tps, as mentioned in the previous post.

I'd be interested in hearing from other folks which sync method works
for them and whether or not there are any IDE drives out there that can
write their cache to the platters on power off when caching is enabled.


Re: further testing on IDE drives

From
Bruce Momjian
Date:
scott.marlowe wrote:
> I was testing to get some idea of how to speed up the speed of pgbench
> with IDE drives and the write caching turned off in Linux (i.e. hdparm -W0
> /dev/hdx).
>
> The only parameter that seems to make a noticeable difference was setting
> wal_sync_method = open_sync.  With it set to either fsync, or fdatasync,
> the speed with pgbench -c 5 -t 1000 ran from 11 to 17 tps.  With open_sync
> it jumped to the range of 45 to 52 tps.  with write cache on I was getting
> 280 to 320 tps.  so, not instead of being 20 to 30 times slower, I'm only
> about 5 times slower, much better.
>
> Now I'm off to start a "pgbench -c 10 -t 10000" and pull the power cord
> and see if the data gets corrupted with write caching turned on, i.e. do
> my hard drives have the ability to write at least some of their cache
> during spin down.

Is this a reason we should switch to open_sync as a default, if it is
availble, rather than fsync?  I think we are doing a single write before
fsync a lot more often than we are doing multiple writes before fsync.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: further testing on IDE drives

From
Bruce Momjian
Date:
How did this drive come by default?  Write-cache disabled?

---------------------------------------------------------------------------

scott.marlowe wrote:
> On Thu, 2 Oct 2003, scott.marlowe wrote:
>
> > I was testing to get some idea of how to speed up the speed of pgbench
> > with IDE drives and the write caching turned off in Linux (i.e. hdparm -W0
> > /dev/hdx).
> >
> > The only parameter that seems to make a noticeable difference was setting
> > wal_sync_method = open_sync.  With it set to either fsync, or fdatasync,
> > the speed with pgbench -c 5 -t 1000 ran from 11 to 17 tps.  With open_sync
> > it jumped to the range of 45 to 52 tps.  with write cache on I was getting
> > 280 to 320 tps.  so, not instead of being 20 to 30 times slower, I'm only
> > about 5 times slower, much better.
> >
> > Now I'm off to start a "pgbench -c 10 -t 10000" and pull the power cord
> > and see if the data gets corrupted with write caching turned on, i.e. do
> > my hard drives have the ability to write at least some of their cache
> > during spin down.
>
> OK, back from testing.
>
> Information:  Dual PIV system with a pair of 80 gig IDE drives, model
> number: ST380023A (seagate).  File system is ext3 and is on a seperate
> drive from the OS.
>
> These drives DO NOT write cache when they lose power.  Testing was done by
> issuing a 'hdparm -W0/1 /dev/hdx' command where x is the real drive
> letter, and 0 or 1 was chosen in place of 0/1.  Then I'd issue a 'pgbench
> -c 50 -t 100000000' command, wait for a few minutes, then pull the power
> cord.
>
> I'm running RH linux 9.0 stock install, kernel: 2.4.20-8smp.
>
> Three times pulling the plug with 'hdparm -W0 /dev/hdx' resulted in a
> machine that would boot up, recover with journal, and a database that came
> up within about 30 seconds, with all the accounts still intact.
>
> Switching the caching back on with 'hdparm -W1 /dev/hdx' and doing the
> same 'pgbench -c 50 -t 100000000' resulted in a corrupted database each
> time.
>
> Also, I tried each of the following fsync methods: fsync, fdatasync, and
> open_sync with write caching turned off.  Each survived a power off test
> with no corruption of the database.  fsync and fdatasync result in 11 to
> 17 tps with 'pgbench -c 5 -t 500' while open_sync resulted in 45 to 55
> tps, as mentioned in the previous post.
>
> I'd be interested in hearing from other folks which sync method works
> for them and whether or not there are any IDE drives out there that can
> write their cache to the platters on power off when caching is enabled.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: further testing on IDE drives

From
"scott.marlowe"
Date:
Nope, write-cache enabled by default.

On Thu, 9 Oct 2003, Bruce Momjian wrote:

>
> How did this drive come by default?  Write-cache disabled?
>
> ---------------------------------------------------------------------------
>
> scott.marlowe wrote:
> > On Thu, 2 Oct 2003, scott.marlowe wrote:
> >
> > > I was testing to get some idea of how to speed up the speed of pgbench
> > > with IDE drives and the write caching turned off in Linux (i.e. hdparm -W0
> > > /dev/hdx).
> > >
> > > The only parameter that seems to make a noticeable difference was setting
> > > wal_sync_method = open_sync.  With it set to either fsync, or fdatasync,
> > > the speed with pgbench -c 5 -t 1000 ran from 11 to 17 tps.  With open_sync
> > > it jumped to the range of 45 to 52 tps.  with write cache on I was getting
> > > 280 to 320 tps.  so, not instead of being 20 to 30 times slower, I'm only
> > > about 5 times slower, much better.
> > >
> > > Now I'm off to start a "pgbench -c 10 -t 10000" and pull the power cord
> > > and see if the data gets corrupted with write caching turned on, i.e. do
> > > my hard drives have the ability to write at least some of their cache
> > > during spin down.
> >
> > OK, back from testing.
> >
> > Information:  Dual PIV system with a pair of 80 gig IDE drives, model
> > number: ST380023A (seagate).  File system is ext3 and is on a seperate
> > drive from the OS.
> >
> > These drives DO NOT write cache when they lose power.  Testing was done by
> > issuing a 'hdparm -W0/1 /dev/hdx' command where x is the real drive
> > letter, and 0 or 1 was chosen in place of 0/1.  Then I'd issue a 'pgbench
> > -c 50 -t 100000000' command, wait for a few minutes, then pull the power
> > cord.
> >
> > I'm running RH linux 9.0 stock install, kernel: 2.4.20-8smp.
> >
> > Three times pulling the plug with 'hdparm -W0 /dev/hdx' resulted in a
> > machine that would boot up, recover with journal, and a database that came
> > up within about 30 seconds, with all the accounts still intact.
> >
> > Switching the caching back on with 'hdparm -W1 /dev/hdx' and doing the
> > same 'pgbench -c 50 -t 100000000' resulted in a corrupted database each
> > time.
> >
> > Also, I tried each of the following fsync methods: fsync, fdatasync, and
> > open_sync with write caching turned off.  Each survived a power off test
> > with no corruption of the database.  fsync and fdatasync result in 11 to
> > 17 tps with 'pgbench -c 5 -t 500' while open_sync resulted in 45 to 55
> > tps, as mentioned in the previous post.
> >
> > I'd be interested in hearing from other folks which sync method works
> > for them and whether or not there are any IDE drives out there that can
> > write their cache to the platters on power off when caching is enabled.
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: you can get off all lists at once with the unregister command
> >     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> >
>
>


Re: further testing on IDE drives

From
"scott.marlowe"
Date:
On Thu, 9 Oct 2003, Bruce Momjian wrote:

> scott.marlowe wrote:
> > I was testing to get some idea of how to speed up the speed of pgbench
> > with IDE drives and the write caching turned off in Linux (i.e. hdparm -W0
> > /dev/hdx).
> >
> > The only parameter that seems to make a noticeable difference was setting
> > wal_sync_method = open_sync.  With it set to either fsync, or fdatasync,
> > the speed with pgbench -c 5 -t 1000 ran from 11 to 17 tps.  With open_sync
> > it jumped to the range of 45 to 52 tps.  with write cache on I was getting
> > 280 to 320 tps.  so, not instead of being 20 to 30 times slower, I'm only
> > about 5 times slower, much better.
> >
> > Now I'm off to start a "pgbench -c 10 -t 10000" and pull the power cord
> > and see if the data gets corrupted with write caching turned on, i.e. do
> > my hard drives have the ability to write at least some of their cache
> > during spin down.
>
> Is this a reason we should switch to open_sync as a default, if it is
> availble, rather than fsync?  I think we are doing a single write before
> fsync a lot more often than we are doing multiple writes before fsync.

Sounds reasonable to me.  Are there many / any scenarios where a plain
fsync would be faster than open_sync?


Re: further testing on IDE drives

From
Bruce Momjian
Date:
scott.marlowe wrote:
> On Thu, 9 Oct 2003, Bruce Momjian wrote:
>
> > scott.marlowe wrote:
> > > I was testing to get some idea of how to speed up the speed of pgbench
> > > with IDE drives and the write caching turned off in Linux (i.e. hdparm -W0
> > > /dev/hdx).
> > >
> > > The only parameter that seems to make a noticeable difference was setting
> > > wal_sync_method = open_sync.  With it set to either fsync, or fdatasync,
> > > the speed with pgbench -c 5 -t 1000 ran from 11 to 17 tps.  With open_sync
> > > it jumped to the range of 45 to 52 tps.  with write cache on I was getting
> > > 280 to 320 tps.  so, not instead of being 20 to 30 times slower, I'm only
> > > about 5 times slower, much better.
> > >
> > > Now I'm off to start a "pgbench -c 10 -t 10000" and pull the power cord
> > > and see if the data gets corrupted with write caching turned on, i.e. do
> > > my hard drives have the ability to write at least some of their cache
> > > during spin down.
> >
> > Is this a reason we should switch to open_sync as a default, if it is
> > availble, rather than fsync?  I think we are doing a single write before
> > fsync a lot more often than we are doing multiple writes before fsync.
>
> Sounds reasonable to me.  Are there many / any scenarios where a plain
> fsync would be faster than open_sync?

Yes.  If you were doing multiple WAL writes before transaction fsync,
you would be fsyncing every write, rather than doing two writes and
fsync'ing them both.  I wonder if larger transactions would find
open_sync slower?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: further testing on IDE drives

From
Josh Berkus
Date:
Bruce,

> Yes.  If you were doing multiple WAL writes before transaction fsync,
> you would be fsyncing every write, rather than doing two writes and
> fsync'ing them both.  I wonder if larger transactions would find
> open_sync slower?

Want me to test?   I've got an ide-based test machine here, and the TPCC
databases.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: further testing on IDE drives

From
"scott.marlowe"
Date:
On Fri, 10 Oct 2003, Josh Berkus wrote:

> Bruce,
>
> > Yes.  If you were doing multiple WAL writes before transaction fsync,
> > you would be fsyncing every write, rather than doing two writes and
> > fsync'ing them both.  I wonder if larger transactions would find
> > open_sync slower?
>
> Want me to test?   I've got an ide-based test machine here, and the TPCC
> databases.

Just make sure the drive's write cache is disabled.


Re: further testing on IDE drives

From
Bruce Momjian
Date:
Josh Berkus wrote:
> Bruce,
>
> > Yes.  If you were doing multiple WAL writes before transaction fsync,
> > you would be fsyncing every write, rather than doing two writes and
> > fsync'ing them both.  I wonder if larger transactions would find
> > open_sync slower?
>
> Want me to test?   I've got an ide-based test machine here, and the TPCC
> databases.

I would be interested to see if wal_sync_method = fsync is slower than
wal_sync_method = open_sync.  How often are we doing more then one write
before a fsync anyway?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: further testing on IDE drives

From
Josh Berkus
Date:
Bruce,

> I would be interested to see if wal_sync_method = fsync is slower than
> wal_sync_method = open_sync.  How often are we doing more then one write
> before a fsync anyway?

OK.   I'll see if I can get to it around my other stuff I have to do this
weekend.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: further testing on IDE drives

From
Vivek Khera
Date:
>>>>> "BM" == Bruce Momjian <pgman@candle.pha.pa.us> writes:

>> Sounds reasonable to me.  Are there many / any scenarios where a plain
>> fsync would be faster than open_sync?

BM> Yes.  If you were doing multiple WAL writes before transaction fsync,
BM> you would be fsyncing every write, rather than doing two writes and
BM> fsync'ing them both.  I wonder if larger transactions would find
BM> open_sync slower?

consider loading a large database from a backup dump.  one big
transaction during the COPY.  I don't know the implications it has on
this scenario, though.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: further testing on IDE drives

From
Bruce Momjian
Date:
Vivek Khera wrote:
> >>>>> "BM" == Bruce Momjian <pgman@candle.pha.pa.us> writes:
>
> >> Sounds reasonable to me.  Are there many / any scenarios where a plain
> >> fsync would be faster than open_sync?
>
> BM> Yes.  If you were doing multiple WAL writes before transaction fsync,
> BM> you would be fsyncing every write, rather than doing two writes and
> BM> fsync'ing them both.  I wonder if larger transactions would find
> BM> open_sync slower?
>
> consider loading a large database from a backup dump.  one big
> transaction during the COPY.  I don't know the implications it has on
> this scenario, though.

COPY only does fsync on COPY completion, so I am not sure there are
enough fsync's there to make a difference.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: further testing on IDE drives

From
"scott.marlowe"
Date:
On Fri, 10 Oct 2003, Josh Berkus wrote:

> Bruce,
>
> > Yes.  If you were doing multiple WAL writes before transaction fsync,
> > you would be fsyncing every write, rather than doing two writes and
> > fsync'ing them both.  I wonder if larger transactions would find
> > open_sync slower?
>
> Want me to test?   I've got an ide-based test machine here, and the TPCC
> databases.

OK, I decided to do a quick dirty test of things that are big transactions
in each mode my kernel supports.  I did this:

createdb dbname
time pg_dump -O -h otherserver dbname|psql dbname

then I would drop the db, edit postgresql.conf, and restart the server.

open_sync was WAY faster at this than the other two methods.

open_sync:

1st run:

real    11m27.107s
user    0m26.570s
sys     0m1.150s

2nd run:

real    6m5.712s
user    0m26.700s
sys     0m1.700s

fsync:

1st run:

real    15m8.127s
user    0m26.710s
sys     0m0.990s

2nd run:

real    15m8.396s
user    0m26.990s
sys     0m1.870s

fdatasync:

1st run:

real    15m47.878s
user    0m26.570s
sys     0m1.480s

2nd run:


real    15m9.402s
user    0m27.000s
sys     0m1.660s

I did the first runs in order, then started over, i.e. opensync run1,
fsync run1, fdatasync run1, opensync run2, etc...

The machine I was restoring to was under no other load.  The machine I was
reading from had little or no load, but is a production server, so it's
possible the load there could have had a small effect, but probably not
this big of a one.

The machine this is one is setup so that the data partition is on a drive
with write cache enabled, but the pg_xlog and pg_clog directories are on a
drive with write cache disabled.  Same drive models as listed before in my
previous test, Seagate generic 80gig IDE drives, model ST380023A.


Re: further testing on IDE drives

From
Vivek Khera
Date:
>>>>> "BM" == Bruce Momjian <pgman@candle.pha.pa.us> writes:

BM> COPY only does fsync on COPY completion, so I am not sure there are
BM> enough fsync's there to make a difference.


Perhaps then it is part of the indexing that takes so much time with
the WAL.  When I applied Marc's WAL disabling patch, it shaved nearly
50 minutes off of a 4-hour restore.

I sent to Tom the logs from the restores since he was interested in
figuring out where the time was saved.

Re: further testing on IDE drives

From
Tom Lane
Date:
"scott.marlowe" <scott.marlowe@ihs.com> writes:
> open_sync was WAY faster at this than the other two methods.

Do you not have open_datasync?  That's the preferred method if
available.

            regards, tom lane

Re: further testing on IDE drives

From
"scott.marlowe"
Date:
On Tue, 14 Oct 2003, Tom Lane wrote:

> "scott.marlowe" <scott.marlowe@ihs.com> writes:
> > open_sync was WAY faster at this than the other two methods.
>
> Do you not have open_datasync?  That's the preferred method if
> available.

Nope, when I try to start postgresql with it set to that, I get this error
message:

FATAL:  invalid value for "wal_sync_method": "open_datasync"

This is on RedHat 9, but I have the same problem on a RH 7.2 box as well.


Re: further testing on IDE drives

From
Ang Chin Han
Date:
Bruce Momjian wrote:

> Yes.  If you were doing multiple WAL writes before transaction fsync,
> you would be fsyncing every write, rather than doing two writes and
> fsync'ing them both.  I wonder if larger transactions would find
> open_sync slower?

No hard numbers, but I remember testing fsync vs open_sync something ago
on 7.3.x.

open_sync was blazingly fast for pgbench, but for when we switched our
development database over to open_sync, things slowed to a crawl.

This was some months ago, and I might be wrong, so take it with a grain
of salt. It was on Red Hat 8's Linux kernel 2.4.18, I think. YMMV.

Will be testing it real soon tonight, if possible.


Attachment

Update performance doc

From
Bruce Momjian
Date:
I have updated my hardware performance documentation to reflect the
findings during the past few months on the performance list:

    http://candle.pha.pa.us/main/writings/pgsql/hw_performance/index.html

Thanks.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073