Re: restartpoints stop generating on streaming replication slave - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: restartpoints stop generating on streaming replication slave
Date
Msg-id CAHGQGwEMTn8gCFEr6iBwkSWF3jA_dWmt0FpzcJzEZ-egcBM4aw@mail.gmail.com
Whole thread Raw
In response to restartpoints stop generating on streaming replication slave  (Mathieu Fenniak <mathieu.fenniak@replicon.com>)
Responses Re: restartpoints stop generating on streaming replication slave
List pgsql-hackers
On Wed, Aug 22, 2012 at 5:52 AM, Mathieu Fenniak
<mathieu.fenniak@replicon.com> wrote:
> Hi all,
>
> I've been investigating an issue with our PostgreSQL 9.1.1 (Linux x86-64
> CentOS 5.8) database where restartpoints suddenly stop being generated on
> the slave after working correctly for a week or two.  The symptom of the
> problem is that the pg_xlog directory on the slave doesn't get cleaned up,
> and the log_checkpoints output (eg. restartpoint starting: time) stops
> appearing.
>
> I was able to extract a core dump of the bgwriter process while it was in
> BgWriterNap.  I inspected ckpt_start_time and last_checkpoint_time;
> ckpt_start_time was 1345578533 (... 19:48:53 GMT) and last_checkpoint_time
> was 1345578248 (... 19:44:08 GMT).  Based upon these values, I concluded
> that it's performing checkpoints but missing the "if (ckpt_performed)"
> condition (ie. CreateRestartPoint returns false); it's then setting
> last_checkpoint_time to now - 5 minutes + 15 seconds.
>
> There seems to be two causes of a false retval in CreateRestartPoint; the
> first is if !RecoveryInProgress(), and the second is if "the last checkpoint
> record we've replayed is already our last restartpoint".  The first
> condition doesn't seem likely; does anyone know how we might be hitting the
> second condition?  We have continuous traffic on the master server in the
> range of 1000 txn/s, and the slave seems to be completely up-to-date, so I
> don't understand how we could be hitting this condition.

To check whether you really hit either of the above two conditions, could you
set log_min_messages to DEBUG2 on the standby? If you hit either, you'll
get the log message like "skipping restartpoint......".

Could you execute pg_controldata on both master and standby, and check
whether their "Latest checkpoint location" are the same?

Regards,

-- 
Fujii Masao



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Expressions without type
Next
From: Amit Kapila
Date:
Subject: Re: [WIP] Performance Improvement by reducing WAL for Update Operation