Re: Checkpoint cost, looks like it is WAL/CRC - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: Checkpoint cost, looks like it is WAL/CRC
Date
Msg-id 200507071622.j67GMmf19838@candle.pha.pa.us
Whole thread Raw
In response to Re: Checkpoint cost, looks like it is WAL/CRC  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
Simon Riggs wrote:
> On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote:
> > Well, I added #1 yesterday as 'full_page_writes', and it has the same
> > warnings as fsync (namely, on crash, be prepared to recovery or check
> > your system thoroughly.
> 
> Yes, which is why I comment now that the GUC alone is not enough.
> 
> There is no way to "check your system thoroughly". If there is a certain
> way of knowing torn pages had *not* occurred, then I would be happy.

Yep, it is a pain, and like fsync.

> > As far as #2, my posted proposal was to write the full pages to WAL when
> > they are written to the file system, and not when they are first
> > modified in the shared buffers --- the goal being that it will even out
> > the load, and it will happen in a non-critical path, hopefully by the
> > background writer or at checkpoint time.
> 
> The page must be written before the changes to the page are written, so
> that they are available sequentially in the log for replay. The log and
> the database are not connected, so we cannot do it that way. If the page
> is written out of sequence from the changes to it, how would recovery
> know where to get the page from?

See my later email --- the full page will be restored later from WAL, so
our changes don't have to be made at that point.

> ISTM there is mileage in your idea of trying to shift the work to
> another time. My thought is "which blocks exactly are the ones being
> changed?". Maybe that would lead to a reduction.
> 
> > > With wal_changed_pages= off *any* crash would possibly require an
> > > archive recovery, or a replication rebuild. It's good that we now have
> > > PITR, but we do also have other options for availability. Users of
> > > replication could well be amongst the first to try out this option. 
> > 
> > Seems it is similar to fsync in risk, which is not a new option.
> 
> Risk is not acceptable. We must have certainty, either way.
> 
> Why have two GUCs? Why not just have one GUC that does both at the same
> time? When would you want one but not the other?
> risk_data_loss_to_gain_performance = true

Yep, one new one might make sense.

> > I think if we document full_page_writes as similar to fsync in risk, we
> > are OK for 8.1, but if something can be done easily, it sounds good.
> 
> Documenting something simply isn't enough. I simply cannot advise
> anybody ever to use the new GUC. If their data was low value, they
> wouldn't even be using PostgreSQL, they'd use a non-transactional DBMS.
> 
> I agree we *must* have the GUC, but we also *must* have a way for crash
> recovery to tell us for certain that it has definitely worked, not just
> maybe worked.

Right.  I am thinking your CRC write to WAL might do that.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Checkpoint cost, looks like it is WAL/CRC
Next
From: Stephen Frost
Date:
Subject: Must be owner to truncate?