Thread: Re: [pgsql-hackers-win32] win32 performance - fsync question

Re: [pgsql-hackers-win32] win32 performance - fsync question

From
"Magnus Hagander"
Date:
> > Magnus prepared a trivial patch which added the O_SYNC flag for
> > windows and mapped it to FILE_FLAG_WRITE_THROUGH in win32_open.c.
>
> Attached is this trivial patch. As Merlin says, it needs some
> more reliability testing. But the numbers are at least reasonable - it
> *seems* like it's doing the right thing (as long as you turn
> off write cache). And it's certainly a significant
> performance increase - it brings the speed almost up to the
> same as linux.

I have now run a bunch of pull-the-plug testing on this patch (literally
pulling the plug, yes. to the point of some of my co-workers thinking
I'm crazy)

My results are:
Fisrt, baseline:
* Linux, with fsync (default), write-cache disabled: no data corruption
* Linux, with fsync (default), write-cache enabled: usually no data
corruption, but two runs which had
* Win32, with fsync, write-cache disabled: no data corruption
* Win32, with fsync, write-cache enabled: no data corruption
* Win32, with osync, write cache disabled: no data corruption
* Win32, with osync, write cache enabled: no data corruption. Once I
got:
2005-02-24 12:19:54 LOG:  could not open file "C:/Program
Files/PostgreSQL/8.0/data/pg_xlog/000000010000000000000010" (log file 0,
segment 16): No such file or directory

  but the data in the database was consistent.

Almost all runs showed a line along the line:
2005-02-24 11:22:41 LOG:  record with zero length at 0/A450548


In the final test, the BIOS decided the disk was giving up and
reassigned it as 0Mb.. Required two extra cold boots, then it was back
up to 20Gb. Still no data loss.


My tests was three clients doing lots of inserts and updates, some in
transactions some bare. In some tests, I kicked in a manual vacuum while
at it. Then I yanked the powercord, rebooted, manually started pg, and
verified taht the data in the db came up with the same values the cliens
reported as last committed. I also ran vacuum verbose on all tables
after it was back up to see if there were any warnings.

Test machine is a 1GHz Celeron, 256Mb RAM and a Maxtor IDE disk.

It'd of course be good if others could also test, but I'm getting the
feeling that this patch at least doesn't make things worse than before
:-) ANd it's *a lot* faster.

//Magnus

Re: [pgsql-hackers-win32] win32 performance - fsync question

From
Christopher Kings-Lynne
Date:
> In the final test, the BIOS decided the disk was giving up and
> reassigned it as 0Mb.. Required two extra cold boots, then it was back
> up to 20Gb. Still no data loss.

I think it would be fun to re-run these tests with MySQL...

Chris

Re: [pgsql-hackers-win32] win32 performance - fsync question

From
Christopher Kings-Lynne
Date:
> My results are:
> Fisrt, baseline:
> * Linux, with fsync (default), write-cache disabled: no data corruption
> * Linux, with fsync (default), write-cache enabled: usually no data
> corruption, but two runs which had
> * Win32, with fsync, write-cache disabled: no data corruption
> * Win32, with fsync, write-cache enabled: no data corruption
> * Win32, with osync, write cache disabled: no data corruption
> * Win32, with osync, write cache enabled: no data corruption. Once I
> got:
> 2005-02-24 12:19:54 LOG:  could not open file "C:/Program
> Files/PostgreSQL/8.0/data/pg_xlog/000000010000000000000010" (log file 0,
> segment 16): No such file or directory

In case anyone is wondering, you can turn off write caching on FreeBSD, 
for a terrible perfomance loss...

http://freebsd.active-venture.com/handbook/configtuning-disk.html#AEN8015

Chris


Re: [pgsql-hackers-win32] win32 performance - fsync question

From
Tom Lane
Date:
"Magnus Hagander" <mha@sollentuna.net> writes:
> My results are:
> Fisrt, baseline:
> * Linux, with fsync (default), write-cache disabled: no data corruption
> * Linux, with fsync (default), write-cache enabled: usually no data
> corruption, but two runs which had

That makes sense.

> * Win32, with fsync, write-cache disabled: no data corruption
> * Win32, with fsync, write-cache enabled: no data corruption
> * Win32, with osync, write cache disabled: no data corruption
> * Win32, with osync, write cache enabled: no data corruption. Once I
> got:
> 2005-02-24 12:19:54 LOG:  could not open file "C:/Program
> Files/PostgreSQL/8.0/data/pg_xlog/000000010000000000000010" (log file 0,
> segment 16): No such file or directory
>   but the data in the database was consistent.

It disturbs me that you couldn't produce data corruption in the cases
where it theoretically should occur.  Seems like this is an indication
that your test was insufficiently severe, or that there is something
going on we don't understand.

            regards, tom lane

Re: [pgsql-hackers-win32] win32 performance - fsync

From
pgsql@mohawksoft.com
Date:
> "Magnus Hagander" <mha@sollentuna.net> writes:
>> My results are:
>> Fisrt, baseline:
>> * Linux, with fsync (default), write-cache disabled: no data corruption
>> * Linux, with fsync (default), write-cache enabled: usually no data
>> corruption, but two runs which had
>
> That makes sense.
>
>> * Win32, with fsync, write-cache disabled: no data corruption
>> * Win32, with fsync, write-cache enabled: no data corruption
>> * Win32, with osync, write cache disabled: no data corruption
>> * Win32, with osync, write cache enabled: no data corruption. Once I
>> got:
>> 2005-02-24 12:19:54 LOG:  could not open file "C:/Program
>> Files/PostgreSQL/8.0/data/pg_xlog/000000010000000000000010" (log file 0,
>> segment 16): No such file or directory
>>   but the data in the database was consistent.
>
> It disturbs me that you couldn't produce data corruption in the cases
> where it theoretically should occur.  Seems like this is an indication
> that your test was insufficiently severe, or that there is something
> going on we don't understand.
>
I was thinking about that. A few years back, Microsoft had some serious
issues with write caching drives. They were taken to task for losing data
if Windows shut down too fast, especially on drives with a large cache.

MS is big enough and bad enough to get all the info they need from the
various drive makers to know how to handle write cache flushing. Even the
stuff that isn't documented.

If anyone has a very good debugger and/or emulator or even a logic
analyzer, it would be interesting to see if MS sends commands to the
drives after a disk write or a set of disk writes.

Also, I would like to see this test performed on NTFS and FAT32, and see
if you are more likely to lose data on FAT32.

Re: [pgsql-hackers-win32] win32 performance - fsync question

From
Greg Stark
Date:
"Magnus Hagander" <mha@sollentuna.net> writes:

> * Linux, with fsync (default), write-cache enabled: usually no data
> corruption, but two runs which had

Are you verifying that all the data that was committed was actually stored? Or
just verifying that the database works properly after rebooting?

I'm a bit surprised that the write-cache lead to a corrupt database, and not
merely lost transactions. I had the impression that drives still handled the
writes in the order received.

You may find that if you check this case again that the "usually no data
corruption" is actually "usually lost transactions but no corruption".

-- 
greg



Re: [pgsql-hackers-win32] win32 performance - fsync question

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> I'm a bit surprised that the write-cache lead to a corrupt database, and not
> merely lost transactions. I had the impression that drives still handled the
> writes in the order received.

There'd be little point in having a cache if they did, I should think.
I thought the point of the cache was to allow the disk to schedule I/O
in an order that minimizes seek time (ie, such a disk has got its own
elevator queue or similar).

> You may find that if you check this case again that the "usually no data
> corruption" is actually "usually lost transactions but no corruption".

That's a good point, but it seems difficult to be sure of the last
reportedly-committed transaction in a powerfail situation.  Maybe if
you drive the test from a client on another machine?
        regards, tom lane


Re: [pgsql-hackers-win32] win32 performance - fsync question

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > I'm a bit surprised that the write-cache lead to a corrupt database, and not
> > merely lost transactions. I had the impression that drives still handled the
> > writes in the order received.
> 
> There'd be little point in having a cache if they did, I should think.
> I thought the point of the cache was to allow the disk to schedule I/O
> in an order that minimizes seek time (ie, such a disk has got its own
> elevator queue or similar).

If that were the case then SCSI drives that ship with write caching disabled
and using tagged command queuing instead would perform poorly.

I think the main motivation for write caching on IDE drives is that the IDE
protocol forces commands to be issued synchronously. So you can't send a
second command until the first command has completed. Without write caching
that limits the write bandwidth tremendously. Write caching is being used here
as a poor man's tcq.

-- 
greg