Re: checkpoints are duplicated even while the system is idle - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: checkpoints are duplicated even while the system is idle
Date
Msg-id CA+U5nMJyQ+E6Bxfz4_KrQ481ThfaifTAqU96ntXrD_uwimSB0w@mail.gmail.com
Whole thread Raw
In response to checkpoints are duplicated even while the system is idle  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: checkpoints are duplicated even while the system is idle
List pgsql-hackers
On Wed, Oct 5, 2011 at 6:19 AM, Fujii Masao <masao.fujii@gmail.com> wrote:

> While the system is idle, we skip duplicate checkpoints for some
> reasons. But when wal_level is set to hot_standby, I found that
> checkpoints are wrongly duplicated even while the system is idle.
> The cause is that XLOG_RUNNING_XACTS WAL record always
> follows CHECKPOINT one when wal_level is set to hot_standby.
> So the subsequent checkpoint wrongly thinks that there is inserted
> record (i.e., XLOG_RUNNING_XACTS record) since the start of the
> last checkpoint, the system is not idle, and this checkpoint cannot
> be skipped. Is this intentional behavior? Or a bug?

I think it is avoidable behaviour, but not a bug.

Thinking some more about this, IMHO it is possible to improve the
situation greatly by returning to look at the true purpose of
checkpoints. Checkpoints exist to minimise the time taken during crash
recovery, and as starting points for backups/archive recoveries.

The current idea is that if there has been no activity then we skip
checkpoint. But all it takes is a single WAL record and off we go with
another checkpoint. If there hasn't been much WAL activity, there is
not much point in having another checkpoint record since there is
little if any time to be saved in recovery.

So why not avoid checkpoints until we have written at least 1 WAL file
worth of data? That way checkpoint records are always in different
files, so we are safer with regard to primary and secondary checkpoint
records. That would mean in some cases that dirty data would stay in
shared buffers for days or weeks? No, because the bgwriter would clean
it - but even if it did, so what? Recovery will still be incredibly
quick, which is the whole point.

Testing whether we're in a different segment is easy and much simpler
than trying to wriggle around trying to directly fix the problem you
mention. Patch attached.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Inserting heap tuples in bulk in COPY
Next
From: Tom Lane
Date:
Subject: Re: checkpoints are duplicated even while the system is idle