Re: Checkpoint cost, looks like it is WAL/CRC - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Checkpoint cost, looks like it is WAL/CRC |
Date | |
Msg-id | 1120682924.3940.172.camel@localhost.localdomain Whole thread Raw |
In response to | Re: Checkpoint cost, looks like it is WAL/CRC (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Checkpoint cost, looks like it is WAL/CRC
Re: Checkpoint cost, looks like it is WAL/CRC |
List | pgsql-hackers |
On Wed, 2005-06-29 at 23:23 -0400, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > >> Uh, what exactly did you cut out? I suggested dropping the dumping of > >> full page images, but not removing CRCs altogether ... > > > Attached is the patch I used. > > OK, thanks for the clarification. So it does seem that dumping full > page images is a pretty big hit these days. Yes the performance results are fairly damning. That's a shame, I convinced myself that the CRC32 and block-hole compression was enough. The 50% performance gain isn't the main thing for me. The 10 sec drop in response time immediately after checkpoint is the real issue. Most sites are looking for good response as an imperative, rather than throughput. Overall, IMHO we must do something about this for 8.1. Possibly something more for 8.2 also, but definitely *something* now. > (In defense of the original > idea, I believe it was not such a hit at the time --- but as we continue > to improve performance, things that weren't originally at the top of the > profile become significant.) No defense required. As you say, it was the best idea at the time. > It seems like we have two basic alternatives: > > 1. Offer a GUC to turn off full-page-image dumping, which you'd use only > if you really trust your hardware :-( > > 2. Think of a better defense against partial-page writes. > > I like #2, or would if I could think of a better defense. Ideas anyone? Well, I'm all for #2 if we can think of one that will work. I can't. Option #1 seems like the way forward, but I don't think it is sufficiently safe just to have the option to turn things off. With wal_changed_pages= off *any* crash would possibly require an archive recovery, or a replication rebuild. It's good that we now have PITR, but we do also have other options for availability. Users of replication could well be amongst the first to try out this option. The problem is that you just wouldn't *know* whether the possibly was yes or no. The temptation would be to assume "no" and just continue, which could lead to data loss. And that would lead to a lack of trust in PostgreSQL and eventual reputational loss. Would I do an archive recovery, or would I trust that RAID array had written everything properly? With an irate Web Site Manager saying "you think? it might? maybe? You mean you don't know???" If we pick option #1, it *must* also include a way of deciding whether a crash recovery has succeeded, or not. Other commentators have mentioned the torn_pages option for other DBMS. It seems we also need an indicator. That should be a non-optional feature of 8.1 and then perhaps an optional feature in later releases when we have more experience to say that turning it off is OK in some circumstances. We could implement the torn-pages option, but that seems a lot of work. Another way of implementing a tell-tale would be to append the LSN again as a data page trailer as the last 4 bytes of the page. Thus the LSN would be both the first and last item on the data page. Any partial write would fail to update the LSN trailer and we would be able to see that a page was torn. That's considerably easier than trying to write a torn page tell-tale to each 512 byte sector of a page as SQLServer does. During recovery, if a full page image is not available, we would read the page from the database and check that the first and last LSNs match. If they do, then the page is not torn and recovery can be successful. If they do not match, then we attempt to continue recovery, but issue a warning that torn page has been detected and a full archive recovery is recommended. It is likely that the recovery itself will fail almost immediately following this, since changes will try to be made to a page in the wrong state to receive it, but there's no harm in trying.... Like this specific idea or not, I'm saying that we need a tell-tale: a way of knowing whether we have a torn page, or not. That way we can safely continue to rely upon crash recovery. Tom, I think you're the only person that could or would be trusted to make such a change. Even past the 8.1 freeze, I say we need to do something now on this issue. Best Regards, Simon Riggs
pgsql-hackers by date: