Re: [Skytools-users] WAL Shipping + checkpoint - Mailing list pgsql-general

From Mark Kirkwood
Subject Re: [Skytools-users] WAL Shipping + checkpoint
Date
Msg-id 4A95B499.6030803@catalyst.net.nz
Whole thread Raw
In response to Re: [Skytools-users] WAL Shipping + checkpoint  (Sébastien Lardière <slardiere@hi-media.com>)
Responses Re: [Skytools-users] WAL Shipping + checkpoint
List pgsql-general
Sébastien Lardière wrote:
> On 26/08/2009 04:46, Mark Kirkwood wrote:
>> Sébastien Lardière wrote:
>>> Hi All,
>>>
>>> I've a cluster ( Pg 8.3.7 ) with WAL Shipping, and a few hours ago,
>>> the master had to restart.
>>>
>>> I use walmgr from Skytools, which works very well.
>>>
>>> I have already restart the master without any problem, but today,
>>> the slave doesn't work like I want. The field "Time of latest
>>> checkpoint" from the pg_controldata on the slave keep the same
>>> values, but WAL File are processed correctly.
>>>
>>> I try to restart the slave, but, after processed again all the WAL
>>> between "Time of latest checkpoint" and, it does nothing else,
>>> latest checkpoint stay at the same value.
>>>
>>> I don't know if it's important ( i think so ), and I can't fix it.
>>>
>> It is normal for it to lag behind somewhat on the slave (depending on
>> what your checkpoint timeout etc settings are).
>>
>> However, I've noticed what you are seeing as well - particularly when
>> there are no actual data changes coming through in the logs - the
>> slave checkpoint time does not change even tho there have been
>> checkpoints on the master (I may have a look in the code to see what
>> the story really is...if I have time).
>>
>
> Yes, but the delay between the last checkpoint on the master and the
> slave is very high, now ( 100 000 sec ), because the last checkpoint
> on the slave was yesterday ( as far as pg_controldata is right )
>
> Here a graph from our munin plugin :
> http://seb.ouvaton.org/tmp/bdd-pg_walmgr-week.png
>
> The blue line represent an average between two WAL processed on the
> slave, and the green line, the delai between last checkpoint on the
> master and the slave.
>
> Maybe it's not some good indicator, but the green line let me think
> there is problem.
>
>
Do you have archive_timeout set? If so, then what *could* be happening
is this:

There are actually no "real" data changes being made on your master for
some reason. So every time archive_timeout is reached a log full of no
changes is shipped to your slave and applied - and no checkpoint times
are changed for reasons I mentioned above.

A way to test the would be to do something that makes real data changes
in the master. A good thing to try would be to:

- create a new database
- create tables and add some reasonable amount of data (e.g. initialized
pgbench scale 100).

Then see if your checkpoint time gets updated a few minutes or so later.


pgsql-general by date:

Previous
From: geoff
Date:
Subject: Re: PG 8.2 instal on Win2k3 - unable to connect to test network socket
Next
From: Sergey Samokhin
Date:
Subject: Re: It looks like transaction, but it isn't transaction