Re: Checkpoint cost, looks like it is WAL/CRC - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Checkpoint cost, looks like it is WAL/CRC |
Date | |
Msg-id | 1120814272.3940.299.camel@localhost.localdomain Whole thread Raw |
In response to | Re: Checkpoint cost, looks like it is WAL/CRC (Bruce Momjian <pgman@candle.pha.pa.us>) |
Responses |
Re: Checkpoint cost, looks like it is WAL/CRC
|
List | pgsql-hackers |
On Thu, 2005-07-07 at 11:59 -0400, Bruce Momjian wrote: > Tom Lane wrote: > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > Tom Lane wrote: > > >> The point here is that fsync-off is only realistic for development > > >> or playpen installations. You don't turn it off in a production > > >> machine, and I can't see that you'd turn off the full-page-write > > >> option either. So we have not solved anyone's performance problem. > > > > > Yes, this is basically another fsync-like option that isn't for > > > production usage in most cases. Sad but true. > > > > Just to make my position perfectly clear: I don't want to see this > > option shipped in 8.1. It's reasonable to have it in there for now > > as an aid to our performance investigations, but I don't see that it > > has any value for production. > > Well, this is the first I am hearing that, and of course your position > is just one vote. > > One idea would be to just tie its behavior directly to fsync and remove > the option completely (that was the original TODO), or we can adjust it > so it doesn't have the same risks as fsync, or the same lack of failure > reporting as fsync. I second Tom's objection, until we agree either: - a conclusive physical test that shows that specific hardware *never* causes torn pages - a national/international standard name/number for everybody to ask their manufacturer whether or not they comply with that (I doubt that exists...) - a conclusive check for torn pages that can be added to the recovery code to show whether or not they have occurred. Is there also a potential showstopper in the redo machinery? We work on the assumption that the post-checkpoint block is available in WAL as a before image. Redo for all actions merely replay the write action again onto the block. If we must reapply the write action onto the block, the redo machinery must check to see whether the write action has already been successfully applied before it decides to redo. I'm not sure that the current code does that. Having raised that objection, ISTM that checking for torn pages can be accomplished reasonably well using a few rules... These are simple because we do not update in place for MVCC. Since inserts and vacuums alter the pd_upper and pd_lower, we should be able to do a self-consistency check that shows that all items are correctly placed. If there is non-zero data higher than the pd_higher pointer, then we know that the first sector is torn. If a pointer doesn't match with a row version, then the page is torn. It is possible that the first sector of a page could be undetectably torn if it was nearly full and the item pointer pointed to the first sector. However, for every page touched, the last WAL record to touch that page should have an LSN that matches the database page. In most cases they would match, proving the page was not torn. If they did not match we would have no proof either way, so we would be advised to act as if the page were torn for that situation. Possibly, we could reinstate the idea of putting the LSN at the beginning and end of every page, since that would help prove the first sector (only) was not torn. It is possible that a page could be torn and yet still be consistent, but this could only occur for a delete. Reapplying the delete, whether or not it is visible on the page would overcome that without problem. It is possible that there are one or more sectors of empty space in the middle of a block could be torn, but their contents would still be identical so is irrelevant and can be ignored. Best Regards, Simon Riggs
pgsql-hackers by date: