Re: Checkpoint cost, looks like it is WAL/CRC - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Checkpoint cost, looks like it is WAL/CRC
Date
Msg-id 1120690332.3940.206.camel@localhost.localdomain
Whole thread Raw
In response to Re: Checkpoint cost, looks like it is WAL/CRC  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: Checkpoint cost, looks like it is WAL/CRC
Re: Checkpoint cost, looks like it is WAL/CRC
Re: Checkpoint cost, looks like it is WAL/CRC
List pgsql-hackers
On Wed, 2005-07-06 at 18:22 -0400, Bruce Momjian wrote:
> Well, I added #1 yesterday as 'full_page_writes', and it has the same
> warnings as fsync (namely, on crash, be prepared to recovery or check
> your system thoroughly.

Yes, which is why I comment now that the GUC alone is not enough.

There is no way to "check your system thoroughly". If there is a certain
way of knowing torn pages had *not* occurred, then I would be happy.

> As far as #2, my posted proposal was to write the full pages to WAL when
> they are written to the file system, and not when they are first
> modified in the shared buffers --- the goal being that it will even out
> the load, and it will happen in a non-critical path, hopefully by the
> background writer or at checkpoint time.

The page must be written before the changes to the page are written, so
that they are available sequentially in the log for replay. The log and
the database are not connected, so we cannot do it that way. If the page
is written out of sequence from the changes to it, how would recovery
know where to get the page from?

ISTM there is mileage in your idea of trying to shift the work to
another time. My thought is "which blocks exactly are the ones being
changed?". Maybe that would lead to a reduction.

> > With wal_changed_pages= off *any* crash would possibly require an
> > archive recovery, or a replication rebuild. It's good that we now have
> > PITR, but we do also have other options for availability. Users of
> > replication could well be amongst the first to try out this option. 
> 
> Seems it is similar to fsync in risk, which is not a new option.

Risk is not acceptable. We must have certainty, either way.

Why have two GUCs? Why not just have one GUC that does both at the same
time? When would you want one but not the other?
risk_data_loss_to_gain_performance = true

> I think if we document full_page_writes as similar to fsync in risk, we
> are OK for 8.1, but if something can be done easily, it sounds good.

Documenting something simply isn't enough. I simply cannot advise
anybody ever to use the new GUC. If their data was low value, they
wouldn't even be using PostgreSQL, they'd use a non-transactional DBMS.

I agree we *must* have the GUC, but we also *must* have a way for crash
recovery to tell us for certain that it has definitely worked, not just
maybe worked.

Best regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Schedule for 8.1 feature freeze
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Checkpoint cost, looks like it is WAL/CRC