Thread: pg_test_fsync performance

pg_test_fsync performance

From

Bruce Momjian

Date:

13 February 2012, 20:43:12

I have heard complaints that /contrib/pg_test_fsync is too slow.  I
thought it was impossible to speed up pg_test_fsync without reducing its
accuracy.

However, now that I some consumer-grade SATA 2 drives, I noticed that
the slowness is really in the open_sync test:

    Compare open_sync with different write sizes:
    (This is designed to compare the cost of writing 16kB
    in different write open_sync sizes.)
             1 * 16kB open_sync write          76.421 ops/sec
             2 *  8kB open_sync writes         38.689 ops/sec
             4 *  4kB open_sync writes         19.140 ops/sec
             8 *  2kB open_sync writes          4.938 ops/sec
            16 *  1kB open_sync writes          2.480 ops/sec

These last few lines can take very long, so I developed the attached
patch that scales down the number of tests.  This makes it more
reasonable to run pg_test_fsync.

I would like to apply this for PG 9.2.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Attachment

test_fsync.diff

Re: pg_test_fsync performance

From

Robert Haas

Date:

13 February 2012, 21:10:08

On Mon, Feb 13, 2012 at 7:42 PM, Bruce Momjian <bruce@momjian.us> wrote:
> I have heard complaints that /contrib/pg_test_fsync is too slow.  I
> thought it was impossible to speed up pg_test_fsync without reducing its
> accuracy.
>
> However, now that I some consumer-grade SATA 2 drives, I noticed that
> the slowness is really in the open_sync test:
>
>        Compare open_sync with different write sizes:
>        (This is designed to compare the cost of writing 16kB
>        in different write open_sync sizes.)
>                 1 * 16kB open_sync write          76.421 ops/sec
>                 2 *  8kB open_sync writes         38.689 ops/sec
>                 4 *  4kB open_sync writes         19.140 ops/sec
>                 8 *  2kB open_sync writes          4.938 ops/sec
>                16 *  1kB open_sync writes          2.480 ops/sec
>
> These last few lines can take very long, so I developed the attached
> patch that scales down the number of tests.  This makes it more
> reasonable to run pg_test_fsync.
>
> I would like to apply this for PG 9.2.

On my MacOS X, it's fsync_writethrough that's insanely slow:

[rhaas pg_test_fsync]$ ./pg_test_fsync
2000 operations per test
Direct I/O is not supported on this platform.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)       open_datasync                    3523.267 ops/sec       fdatasync
3360.023ops/sec       fsync                            2410.048 ops/sec       fsync_writethrough                 12.576
ops/sec      open_sync                        3649.475 ops/sec 

Compare file sync methods using two 8kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)       open_datasync                    1885.284 ops/sec       fdatasync
2544.652ops/sec       fsync                            3241.218 ops/sec       fsync_writethrough              ^C 

Instead of or in addition to a fixed number operations per test, maybe
we should cut off each test after a certain amount of wall-clock time,
like 15 seconds.  It's kind of insane to run one of these tests for 3
minutes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_test_fsync performance

From

Tom Lane

Date:

13 February 2012, 21:28:19

Robert Haas <robertmhaas@gmail.com> writes:
> Instead of or in addition to a fixed number operations per test, maybe
> we should cut off each test after a certain amount of wall-clock time,
> like 15 seconds.

+1, I was about to suggest the same thing.  Running any of these tests
for a fixed number of iterations will result in drastic degradation of
accuracy as soon as the machine's behavior changes noticeably from what
you were expecting.  Run them for a fixed time period instead.  Or maybe
do a few, then check elapsed time and estimate a number of iterations to
use, if you're worried about the cost of doing gettimeofday after each
write.
        regards, tom lane

Re: pg_test_fsync performance

From

Bruce Momjian

Date:

13 February 2012, 22:54:23

On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > Instead of or in addition to a fixed number operations per test, maybe
> > we should cut off each test after a certain amount of wall-clock time,
> > like 15 seconds.
>
> +1, I was about to suggest the same thing.  Running any of these tests
> for a fixed number of iterations will result in drastic degradation of
> accuracy as soon as the machine's behavior changes noticeably from what
> you were expecting.  Run them for a fixed time period instead.  Or maybe
> do a few, then check elapsed time and estimate a number of iterations to
> use, if you're worried about the cost of doing gettimeofday after each
> write.

Good idea, and it worked out very well.  I changed the -o loops
parameter to -s seconds which calls alarm() after (default) 2 seconds,
and then once the operation completes, computes a duration per
operation.

The test now runs in 30 seconds and produces similar output to the
longer version.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Attachment

test_fsync.diff

Re: pg_test_fsync performance

From

Bruce Momjian

Date:

14 February 2012, 12:11:36

On Mon, Feb 13, 2012 at 09:54:06PM -0500, Bruce Momjian wrote:
> On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> > > Instead of or in addition to a fixed number operations per test, maybe
> > > we should cut off each test after a certain amount of wall-clock time,
> > > like 15 seconds.
> > 
> > +1, I was about to suggest the same thing.  Running any of these tests
> > for a fixed number of iterations will result in drastic degradation of
> > accuracy as soon as the machine's behavior changes noticeably from what
> > you were expecting.  Run them for a fixed time period instead.  Or maybe
> > do a few, then check elapsed time and estimate a number of iterations to
> > use, if you're worried about the cost of doing gettimeofday after each
> > write.
> 
> Good idea, and it worked out very well.  I changed the -o loops
> parameter to -s seconds which calls alarm() after (default) 2 seconds,
> and then once the operation completes, computes a duration per
> operation.

Update patch applied, with additional fix for usage message, and use of
macros for start/stop testing.

I like this method much better because not only does it speed up the
test, but it also allows the write test, which completes very quickly,
to run longer and report more accurate numbers.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: pg_test_fsync performance

From

Tom Lane

Date:

14 February 2012, 18:59:23

Bruce Momjian <bruce@momjian.us> writes:
> On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:
>> +1, I was about to suggest the same thing.  Running any of these tests
>> for a fixed number of iterations will result in drastic degradation of
>> accuracy as soon as the machine's behavior changes noticeably from what
>> you were expecting.  Run them for a fixed time period instead.  Or maybe
>> do a few, then check elapsed time and estimate a number of iterations to
>> use, if you're worried about the cost of doing gettimeofday after each
>> write.

> Good idea, and it worked out very well.  I changed the -o loops
> parameter to -s seconds which calls alarm() after (default) 2 seconds,
> and then once the operation completes, computes a duration per
> operation.

I was kind of wondering how portable alarm() is, and the answer
according to the buildfarm is that it isn't.
        regards, tom lane

Re: pg_test_fsync performance

From

Marko Kreen

Date:

14 February 2012, 19:35:26

On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:
> >> +1, I was about to suggest the same thing.  Running any of these tests
> >> for a fixed number of iterations will result in drastic degradation of
> >> accuracy as soon as the machine's behavior changes noticeably from what
> >> you were expecting.  Run them for a fixed time period instead.  Or maybe
> >> do a few, then check elapsed time and estimate a number of iterations to
> >> use, if you're worried about the cost of doing gettimeofday after each
> >> write.
> 
> > Good idea, and it worked out very well.  I changed the -o loops
> > parameter to -s seconds which calls alarm() after (default) 2 seconds,
> > and then once the operation completes, computes a duration per
> > operation.
> 
> I was kind of wondering how portable alarm() is, and the answer
> according to the buildfarm is that it isn't.

I'm using following simplistic alarm() implementation for win32:
 https://github.com/markokr/libusual/blob/master/usual/signal.c#L21

this works with fake sigaction()/SIGALARM hack below - to remember
function to call.

Good enough for simple stats printing, and avoids win32-specific
code spreading around.

-- 
marko

Re: pg_test_fsync performance

From

Bruce Momjian

Date:

14 February 2012, 21:23:27

On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote:
> On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> > > On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:
> > >> +1, I was about to suggest the same thing.  Running any of these tests
> > >> for a fixed number of iterations will result in drastic degradation of
> > >> accuracy as soon as the machine's behavior changes noticeably from what
> > >> you were expecting.  Run them for a fixed time period instead.  Or maybe
> > >> do a few, then check elapsed time and estimate a number of iterations to
> > >> use, if you're worried about the cost of doing gettimeofday after each
> > >> write.
> > 
> > > Good idea, and it worked out very well.  I changed the -o loops
> > > parameter to -s seconds which calls alarm() after (default) 2 seconds,
> > > and then once the operation completes, computes a duration per
> > > operation.
> > 
> > I was kind of wondering how portable alarm() is, and the answer
> > according to the buildfarm is that it isn't.
> 
> I'm using following simplistic alarm() implementation for win32:
> 
>   https://github.com/markokr/libusual/blob/master/usual/signal.c#L21
> 
> this works with fake sigaction()/SIGALARM hack below - to remember
> function to call.
> 
> Good enough for simple stats printing, and avoids win32-specific
> code spreading around.

Wow, I wasn't even aware this compiled in Win32;  I thought it was
ifdef'ed out.  Anyway, I am looking at SetTimer as a way of making this
work.  (Me wonders if the GoGrid Windows images have compilers.)

I see backend/port/win32/timer.c so I might go with a simple "create a
thread, sleep(2), set flag, exit" solution.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: pg_test_fsync performance

From

Magnus Hagander

Date:

15 February 2012, 04:54:21

On Wed, Feb 15, 2012 at 02:23, Bruce Momjian <bruce@momjian.us> wrote:
> On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote:
>> On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:
>> > Bruce Momjian <bruce@momjian.us> writes:
>> > > On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:
>> > >> +1, I was about to suggest the same thing.  Running any of these tests
>> > >> for a fixed number of iterations will result in drastic degradation of
>> > >> accuracy as soon as the machine's behavior changes noticeably from what
>> > >> you were expecting.  Run them for a fixed time period instead.  Or maybe
>> > >> do a few, then check elapsed time and estimate a number of iterations to
>> > >> use, if you're worried about the cost of doing gettimeofday after each
>> > >> write.
>> >
>> > > Good idea, and it worked out very well.  I changed the -o loops
>> > > parameter to -s seconds which calls alarm() after (default) 2 seconds,
>> > > and then once the operation completes, computes a duration per
>> > > operation.
>> >
>> > I was kind of wondering how portable alarm() is, and the answer
>> > according to the buildfarm is that it isn't.
>>
>> I'm using following simplistic alarm() implementation for win32:
>>
>>   https://github.com/markokr/libusual/blob/master/usual/signal.c#L21
>>
>> this works with fake sigaction()/SIGALARM hack below - to remember
>> function to call.
>>
>> Good enough for simple stats printing, and avoids win32-specific
>> code spreading around.
>
> Wow, I wasn't even aware this compiled in Win32;  I thought it was
> ifdef'ed out.  Anyway, I am looking at SetTimer as a way of making this
> work.  (Me wonders if the GoGrid Windows images have compilers.)

They don't, since most of the compilers people would ask for don't
allow that kind of redistribution.

Ping me on im if you need one preconfigured, though...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: pg_test_fsync performance

From

Bruce Momjian

Date:

15 February 2012, 11:15:30

On Wed, Feb 15, 2012 at 09:54:04AM +0100, Magnus Hagander wrote:
> On Wed, Feb 15, 2012 at 02:23, Bruce Momjian <bruce@momjian.us> wrote:
> > On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote:
> >> On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:
> >> > Bruce Momjian <bruce@momjian.us> writes:
> >> > > On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:
> >> > >> +1, I was about to suggest the same thing.  Running any of these tests
> >> > >> for a fixed number of iterations will result in drastic degradation of
> >> > >> accuracy as soon as the machine's behavior changes noticeably from what
> >> > >> you were expecting.  Run them for a fixed time period instead.  Or maybe
> >> > >> do a few, then check elapsed time and estimate a number of iterations to
> >> > >> use, if you're worried about the cost of doing gettimeofday after each
> >> > >> write.
> >> >
> >> > > Good idea, and it worked out very well.  I changed the -o loops
> >> > > parameter to -s seconds which calls alarm() after (default) 2 seconds,
> >> > > and then once the operation completes, computes a duration per
> >> > > operation.
> >> >
> >> > I was kind of wondering how portable alarm() is, and the answer
> >> > according to the buildfarm is that it isn't.
> >>
> >> I'm using following simplistic alarm() implementation for win32:
> >>
> >>   https://github.com/markokr/libusual/blob/master/usual/signal.c#L21
> >>
> >> this works with fake sigaction()/SIGALARM hack below - to remember
> >> function to call.
> >>
> >> Good enough for simple stats printing, and avoids win32-specific
> >> code spreading around.
> >
> > Wow, I wasn't even aware this compiled in Win32;  I thought it was
> > ifdef'ed out.  Anyway, I am looking at SetTimer as a way of making this
> > work.  (Me wonders if the GoGrid Windows images have compilers.)
> 
> They don't, since most of the compilers people would ask for don't
> allow that kind of redistribution.

Shame.

> Ping me on im if you need one preconfigured, though...

How do you do that?  Also, once you create a Windows VM on a public
cloud, how do you connect to it?  SSH?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +

Re: pg_test_fsync performance

From

Bruce Momjian

Date:

15 February 2012, 11:17:27

On Tue, Feb 14, 2012 at 08:23:10PM -0500, Bruce Momjian wrote:
> On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote:
> > On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:
> > > Bruce Momjian <bruce@momjian.us> writes:
> > > > On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:
> > > >> +1, I was about to suggest the same thing.  Running any of these tests
> > > >> for a fixed number of iterations will result in drastic degradation of
> > > >> accuracy as soon as the machine's behavior changes noticeably from what
> > > >> you were expecting.  Run them for a fixed time period instead.  Or maybe
> > > >> do a few, then check elapsed time and estimate a number of iterations to
> > > >> use, if you're worried about the cost of doing gettimeofday after each
> > > >> write.
> > >
> > > > Good idea, and it worked out very well.  I changed the -o loops
> > > > parameter to -s seconds which calls alarm() after (default) 2 seconds,
> > > > and then once the operation completes, computes a duration per
> > > > operation.
> > >
> > > I was kind of wondering how portable alarm() is, and the answer
> > > according to the buildfarm is that it isn't.
> >
> > I'm using following simplistic alarm() implementation for win32:
> >
> >   https://github.com/markokr/libusual/blob/master/usual/signal.c#L21
> >
> > this works with fake sigaction()/SIGALARM hack below - to remember
> > function to call.
> >
> > Good enough for simple stats printing, and avoids win32-specific
> > code spreading around.
>
> Wow, I wasn't even aware this compiled in Win32;  I thought it was
> ifdef'ed out.  Anyway, I am looking at SetTimer as a way of making this
> work.  (Me wonders if the GoGrid Windows images have compilers.)
>
> I see backend/port/win32/timer.c so I might go with a simple "create a
> thread, sleep(2), set flag, exit" solution.

Yeah, two Windows buildfarm machines have now successfully compiled my
patches, so I guess I fixed it;  patch attached.

The fix was surprisingly easy given the use of threads;  scheduling the
timeout in the operating system was just too invasive.

I would like to eventually know if this fix actually produces the right
output.  How would I test that?  Are the buildfarm output binaries
available somewhere?  Should I add this as a 9.2 TODO item?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Attachment

test_fsync.diff

Re: pg_test_fsync performance

From

Magnus Hagander

Date:

15 February 2012, 12:20:57

On Wed, Feb 15, 2012 at 16:14, Bruce Momjian <bruce@momjian.us> wrote:
> On Wed, Feb 15, 2012 at 09:54:04AM +0100, Magnus Hagander wrote:
>> On Wed, Feb 15, 2012 at 02:23, Bruce Momjian <bruce@momjian.us> wrote:
>> > On Wed, Feb 15, 2012 at 01:35:05AM +0200, Marko Kreen wrote:
>> >> On Tue, Feb 14, 2012 at 05:59:06PM -0500, Tom Lane wrote:
>> >> > Bruce Momjian <bruce@momjian.us> writes:
>> >> > > On Mon, Feb 13, 2012 at 08:28:03PM -0500, Tom Lane wrote:
>> >> > >> +1, I was about to suggest the same thing.  Running any of these tests
>> >> > >> for a fixed number of iterations will result in drastic degradation of
>> >> > >> accuracy as soon as the machine's behavior changes noticeably from what
>> >> > >> you were expecting.  Run them for a fixed time period instead.  Or maybe
>> >> > >> do a few, then check elapsed time and estimate a number of iterations to
>> >> > >> use, if you're worried about the cost of doing gettimeofday after each
>> >> > >> write.
>> >> >
>> >> > > Good idea, and it worked out very well.  I changed the -o loops
>> >> > > parameter to -s seconds which calls alarm() after (default) 2 seconds,
>> >> > > and then once the operation completes, computes a duration per
>> >> > > operation.
>> >> >
>> >> > I was kind of wondering how portable alarm() is, and the answer
>> >> > according to the buildfarm is that it isn't.
>> >>
>> >> I'm using following simplistic alarm() implementation for win32:
>> >>
>> >>   https://github.com/markokr/libusual/blob/master/usual/signal.c#L21
>> >>
>> >> this works with fake sigaction()/SIGALARM hack below - to remember
>> >> function to call.
>> >>
>> >> Good enough for simple stats printing, and avoids win32-specific
>> >> code spreading around.
>> >
>> > Wow, I wasn't even aware this compiled in Win32;  I thought it was
>> > ifdef'ed out.  Anyway, I am looking at SetTimer as a way of making this
>> > work.  (Me wonders if the GoGrid Windows images have compilers.)
>>
>> They don't, since most of the compilers people would ask for don't
>> allow that kind of redistribution.
>
> Shame.
>
>> Ping me on im if you need one preconfigured, though...
>
> How do you do that?  Also, once you create a Windows VM on a public
> cloud, how do you connect to it?  SSH?

rdesktop.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/