Re: Avoiding adjacent checkpoint records - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Avoiding adjacent checkpoint records
Date
Msg-id 15212.1339022797@sss.pgh.pa.us
Whole thread Raw
In response to Re: Avoiding adjacent checkpoint records  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Avoiding adjacent checkpoint records
Re: Avoiding adjacent checkpoint records
List pgsql-hackers
I wrote:
> Actually, it looks like there is an extremely simple way to handle this,
> which is to move the call of LogStandbySnapshot (which generates the WAL
> record in question) to before the checkpoint's REDO pointer is set, but
> after we have decided that we need a checkpoint.

On further contemplation, there is a downside to that idea, which
probably explains why the code was written as it was: if we place the
XLOG_RUNNING_XACTS WAL record emitted during a checkpoint before rather
than after the checkpoint's REDO point, then a hot standby slave
starting up from that checkpoint won't process the XLOG_RUNNING_XACTS
record.  That means its KnownAssignedXids machinery won't be fully
operational until the master starts another checkpoint, which might be
awhile.  So this could result in undesirable delay in hot standby mode
becoming active.

I am not sure how significant this really is though.  Comments?

If we don't like that, I can think of a couple of other ways to get there,
but they have their own downsides:

* Instead of trying to detect after-the-fact whether any concurrent
WAL activity happened during the last checkpoint, we could detect it
during the checkpoint and then keep the info in a static variable in
the checkpointer process until next time.  However, I don't see any
bulletproof way to do this without adding at least one or two lines
of code within XLogInsert, which I'm sure Robert will complain about.

* We could expand checkpoint records to contain two different REDO
pointers, one to be used by hot standby slaves and one for normal
crash recovery.  (The LogStandbySnapshot records would appear between
these two points; we'd still be moving them up to the start of the
checkpoint sequence.)  This is a relatively clean solution but would
force pg_upgrade between beta2 and beta3, so that's not so nice.

* Combining the two ideas, we could take the nominal REDO pointer,
run LogStandbySnapshot, make a fresh note of where the insert point
is (real REDO point, which is what we publish in shared memory for
the bufmgr to compare LSNs to), complete the checkpoint, and write
the checkpoint record using the nominal REDO pointer so that that's
where any crash or HS slave starts from.  But save the real REDO
pointer in checkpointer static state, and in the next checkpoint use
that rather than the nominal pointer to decide if anything's happened
that would force a new checkpoint.  I think this dodges both of the
above complaints, but it feels pretty baroque.

Thoughts, other ideas?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: pg_receivexlog and feedback message
Next
From: Noah Misch
Date:
Subject: Re: 9.3: load path to mitigate load penalty for checksums