Re: Checkpoint cost, looks like it is WAL/CRC - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Checkpoint cost, looks like it is WAL/CRC |
Date | |
Msg-id | 200507062222.j66MMcm04984@candle.pha.pa.us Whole thread Raw |
In response to | Re: Checkpoint cost, looks like it is WAL/CRC (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Checkpoint cost, looks like it is WAL/CRC
Re: Checkpoint cost, looks like it is WAL/CRC Re: Checkpoint cost, looks like it is WAL/CRC |
List | pgsql-hackers |
Simon Riggs wrote: > On Wed, 2005-06-29 at 23:23 -0400, Tom Lane wrote: > > Josh Berkus <josh@agliodbs.com> writes: > > >> Uh, what exactly did you cut out? I suggested dropping the dumping of > > >> full page images, but not removing CRCs altogether ... > > > > > Attached is the patch I used. > > > > OK, thanks for the clarification. So it does seem that dumping full > > page images is a pretty big hit these days. > > Yes the performance results are fairly damning. That's a shame, I > convinced myself that the CRC32 and block-hole compression was enough. > > The 50% performance gain isn't the main thing for me. The 10 sec drop in > response time immediately after checkpoint is the real issue. Most sites > are looking for good response as an imperative, rather than throughput. Yep. > No defense required. As you say, it was the best idea at the time. > > > It seems like we have two basic alternatives: > > > > 1. Offer a GUC to turn off full-page-image dumping, which you'd use only > > if you really trust your hardware :-( > > > > 2. Think of a better defense against partial-page writes. > > > > I like #2, or would if I could think of a better defense. Ideas anyone? > > Well, I'm all for #2 if we can think of one that will work. I can't. > > Option #1 seems like the way forward, but I don't think it is > sufficiently safe just to have the option to turn things off. Well, I added #1 yesterday as 'full_page_writes', and it has the same warnings as fsync (namely, on crash, be prepared to recovery or check your system thoroughly. As far as #2, my posted proposal was to write the full pages to WAL when they are written to the file system, and not when they are first modified in the shared buffers --- the goal being that it will even out the load, and it will happen in a non-critical path, hopefully by the background writer or at checkpoint time. > With wal_changed_pages= off *any* crash would possibly require an > archive recovery, or a replication rebuild. It's good that we now have > PITR, but we do also have other options for availability. Users of > replication could well be amongst the first to try out this option. Seems it is similar to fsync in risk, which is not a new option. > The problem is that you just wouldn't *know* whether the possibly was > yes or no. The temptation would be to assume "no" and just continue, > which could lead to data loss. And that would lead to a lack of trust in > PostgreSQL and eventual reputational loss. Would I do an archive > recovery, or would I trust that RAID array had written everything > properly? With an irate Web Site Manager saying "you think? it might? > maybe? You mean you don't know???" That is a serious problem, but the same problem we have in turning off fsync. > During recovery, if a full page image is not available, we would read > the page from the database and check that the first and last LSNs match. > If they do, then the page is not torn and recovery can be successful. If > they do not match, then we attempt to continue recovery, but issue a > warning that torn page has been detected and a full archive recovery is > recommended. It is likely that the recovery itself will fail almost > immediately following this, since changes will try to be made to a page > in the wrong state to receive it, but there's no harm in trying.... I like the idea of checking the page during recovery so we don't have to check all the pages, just certain pages. > Like this specific idea or not, I'm saying that we need a tell-tale: a > way of knowing whether we have a torn page, or not. That way we can > safely continue to rely upon crash recovery. > > Tom, I think you're the only person that could or would be trusted to > make such a change. Even past the 8.1 freeze, I say we need to do > something now on this issue. I think if we document full_page_writes as similar to fsync in risk, we are OK for 8.1, but if something can be done easily, it sounds good. Now that we have a GUC we can experiment with the full page write load and see how it can be improved. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
pgsql-hackers by date: