Re: O_DSYNC broken on MacOS X? - Mailing list pgsql-hackers

From Greg Smith
Subject Re: O_DSYNC broken on MacOS X?
Date
Msg-id 4CAE57C6.40400@2ndquadrant.com
Whole thread Raw
In response to Re: O_DSYNC broken on MacOS X?  ("A.M." <agentm@themactionfaction.com>)
Responses Re: O_DSYNC broken on MacOS X?
List pgsql-hackers
A.M. wrote:
> Perhaps a simpler tool could run a basic fsyncs-per-second test and prompt the DBA to check that the numbers are
withinthe realm of possibility.
 
>   

This is what the test_fsync utility that already ships with the database 
should be useful for.  The way Bruce changed it to report numbers in 
commits/second for 9.0 makes it a lot easier to use for this purpose 
than it used to be.  I think there's still some additional improvements 
that could be made there, but it's a tricky test to run accurately.  The 
current code is probably too detailed in some ways (it delivers a lot of 
output not relevant to this use-case) and not detailed enough in 
others.  Providing a summary that understands things like 
fsync_writethrough on platforms that support it was the first 
refactoring I had in my mind.  If that thing came back and said 
"fsync_writethrough works for you, so don't even consider the other 
possibilities if you want reliability even though they are faster", that 
would be nice for example.

> How else can a DBA today ensure that a commit is a commit?
>   

You can't ensure a commit is a commit without running a pull the plug 
test.  And I think the best way to do that accurately is using a "remote 
witness" server focusing on finding this particular problem to look for 
glitches, rather than than using the database as your test program and 
seeing if you happen to hit corruption or not.  The documentation for 
9.0 now suggests running the diskchecker.pl program for this exact 
purpose.  I've seen enough reports of it finding even subtle cache loss 
situations to believe that encouraging heavier use of that would be 
enough to make people much safer than they typically are today.  What we 
probably need to do next is provide people with an exact walkthrough of 
setting up and using the program, showing what a passing result looks 
like, and what a failing one looks like.

-- 
Greg Smith, 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services and Support  www.2ndQuadrant.us




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: I: About "Our CLUSTER implementation is pessimal" patch
Next
From: Greg Smith
Date:
Subject: Re: Issues with Quorum Commit