Thread: WAL Shipping + checkpoint

WAL Shipping + checkpoint

From

Sébastien Lardière

Date:

25 August 2009, 13:10:34

Hi All,

I've a cluster ( Pg 8.3.7 ) with WAL Shipping, and a few hours ago, the
master had to restart.

I use walmgr from Skytools, which works very well.

I have already restart the master without any problem, but today, the
slave doesn't work like I want. The field "Time of latest checkpoint"
from the pg_controldata on the slave keep the same values, but WAL File
are processed correctly.

I try to restart the slave, but, after processed again all the WAL
between "Time of latest checkpoint" and, it does nothing else, latest
checkpoint stay at the same value.

I don't know if it's important ( i think so ), and I can't fix it.

Is anybody have an idea ?

thanks,

--
Sébastien Lardière

Re: [Skytools-users] WAL Shipping + checkpoint

From

Mark Kirkwood

Date:

26 August 2009, 01:02:56

Sébastien Lardière wrote:
> Hi All,
>
> I've a cluster ( Pg 8.3.7 ) with WAL Shipping, and a few hours ago,
> the master had to restart.
>
> I use walmgr from Skytools, which works very well.
>
> I have already restart the master without any problem, but today, the
> slave doesn't work like I want. The field "Time of latest checkpoint"
> from the pg_controldata on the slave keep the same values, but WAL
> File are processed correctly.
>
> I try to restart the slave, but, after processed again all the WAL
> between "Time of latest checkpoint" and, it does nothing else, latest
> checkpoint stay at the same value.
>
> I don't know if it's important ( i think so ), and I can't fix it.
>
It is normal for it to lag behind somewhat on the slave (depending on
what your checkpoint timeout etc settings are).

However, I've noticed what you are seeing as well - particularly when
there are no actual data changes coming through in the logs - the slave
checkpoint time does not change even tho there have been checkpoints on
the master (I may have a look in the code to see what the story really
is...if I have time).

Cheers

Mark

Re: [Skytools-users] WAL Shipping + checkpoint

From

Sébastien Lardière

Date:

26 August 2009, 07:26:57

On 26/08/2009 04:46, Mark Kirkwood wrote:
> Sébastien Lardière wrote:
>> Hi All,
>>
>> I've a cluster ( Pg 8.3.7 ) with WAL Shipping, and a few hours ago,
>> the master had to restart.
>>
>> I use walmgr from Skytools, which works very well.
>>
>> I have already restart the master without any problem, but today, the
>> slave doesn't work like I want. The field "Time of latest checkpoint"
>> from the pg_controldata on the slave keep the same values, but WAL
>> File are processed correctly.
>>
>> I try to restart the slave, but, after processed again all the WAL
>> between "Time of latest checkpoint" and, it does nothing else, latest
>> checkpoint stay at the same value.
>>
>> I don't know if it's important ( i think so ), and I can't fix it.
>>
> It is normal for it to lag behind somewhat on the slave (depending on
> what your checkpoint timeout etc settings are).
>
> However, I've noticed what you are seeing as well - particularly when
> there are no actual data changes coming through in the logs - the
> slave checkpoint time does not change even tho there have been
> checkpoints on the master (I may have a look in the code to see what
> the story really is...if I have time).
>

Yes, but the delay between the last checkpoint on the master and the
slave is very high, now ( 100 000 sec ), because the last checkpoint on
the slave was yesterday ( as far as pg_controldata is right )

Here a graph from our munin plugin :
http://seb.ouvaton.org/tmp/bdd-pg_walmgr-week.png

The blue line represent an average between two WAL processed on the
slave, and the green line, the delai between last checkpoint on the
master and the slave.

Maybe it's not some good indicator, but the green line let me think
there is problem.

Thanks,

--
Sébastien Lardière

Re: [Skytools-users] WAL Shipping + checkpoint

From

Mark Kirkwood

Date:

26 August 2009, 19:18:15

Sébastien Lardière wrote:
> On 26/08/2009 04:46, Mark Kirkwood wrote:
>> Sébastien Lardière wrote:
>>> Hi All,
>>>
>>> I've a cluster ( Pg 8.3.7 ) with WAL Shipping, and a few hours ago,
>>> the master had to restart.
>>>
>>> I use walmgr from Skytools, which works very well.
>>>
>>> I have already restart the master without any problem, but today,
>>> the slave doesn't work like I want. The field "Time of latest
>>> checkpoint" from the pg_controldata on the slave keep the same
>>> values, but WAL File are processed correctly.
>>>
>>> I try to restart the slave, but, after processed again all the WAL
>>> between "Time of latest checkpoint" and, it does nothing else,
>>> latest checkpoint stay at the same value.
>>>
>>> I don't know if it's important ( i think so ), and I can't fix it.
>>>
>> It is normal for it to lag behind somewhat on the slave (depending on
>> what your checkpoint timeout etc settings are).
>>
>> However, I've noticed what you are seeing as well - particularly when
>> there are no actual data changes coming through in the logs - the
>> slave checkpoint time does not change even tho there have been
>> checkpoints on the master (I may have a look in the code to see what
>> the story really is...if I have time).
>>
>
> Yes, but the delay between the last checkpoint on the master and the
> slave is very high, now ( 100 000 sec ), because the last checkpoint
> on the slave was yesterday ( as far as pg_controldata is right )
>
> Here a graph from our munin plugin :
> http://seb.ouvaton.org/tmp/bdd-pg_walmgr-week.png
>
> The blue line represent an average between two WAL processed on the
> slave, and the green line, the delai between last checkpoint on the
> master and the slave.
>
> Maybe it's not some good indicator, but the green line let me think
> there is problem.
>
>
Do you have archive_timeout set? If so, then what *could* be happening
is this:

There are actually no "real" data changes being made on your master for
some reason. So every time archive_timeout is reached a log full of no
changes is shipped to your slave and applied - and no checkpoint times
are changed for reasons I mentioned above.

A way to test the would be to do something that makes real data changes
in the master. A good thing to try would be to:

- create a new database
- create tables and add some reasonable amount of data (e.g. initialized
pgbench scale 100).

Then see if your checkpoint time gets updated a few minutes or so later.

Re: [Skytools-users] WAL Shipping + checkpoint

From

Sébastien Lardière

Date:

27 August 2009, 09:07:56

On 27/08/2009 00:18, Mark Kirkwood wrote:
> Sébastien Lardière wrote:
>> On 26/08/2009 04:46, Mark Kirkwood wrote:
>>> Sébastien Lardière wrote:
>>>> Hi All,
>>>>
>>>> I've a cluster ( Pg 8.3.7 ) with WAL Shipping, and a few hours ago,
>>>> the master had to restart.
>>>>
>>>> I use walmgr from Skytools, which works very well.
>>>>
>>>> I have already restart the master without any problem, but today,
>>>> the slave doesn't work like I want. The field "Time of latest
>>>> checkpoint" from the pg_controldata on the slave keep the same
>>>> values, but WAL File are processed correctly.
>>>>
>>>> I try to restart the slave, but, after processed again all the WAL
>>>> between "Time of latest checkpoint" and, it does nothing else,
>>>> latest checkpoint stay at the same value.
>>>>
>>>> I don't know if it's important ( i think so ), and I can't fix it.
>>>>
>>> It is normal for it to lag behind somewhat on the slave (depending
>>> on what your checkpoint timeout etc settings are).
>>>
>>> However, I've noticed what you are seeing as well - particularly
>>> when there are no actual data changes coming through in the logs -
>>> the slave checkpoint time does not change even tho there have been
>>> checkpoints on the master (I may have a look in the code to see what
>>> the story really is...if I have time).
>>>
>>
>> Yes, but the delay between the last checkpoint on the master and the
>> slave is very high, now ( 100 000 sec ), because the last checkpoint
>> on the slave was yesterday ( as far as pg_controldata is right )
>>
>> Here a graph from our munin plugin :
>> http://seb.ouvaton.org/tmp/bdd-pg_walmgr-week.png
>>
>> The blue line represent an average between two WAL processed on the
>> slave, and the green line, the delai between last checkpoint on the
>> master and the slave.
>>
>> Maybe it's not some good indicator, but the green line let me think
>> there is problem.
>>
>>
> Do you have archive_timeout set? If so, then what *could* be happening
> is this:
>
> There are actually no "real" data changes being made on your master
> for some reason. So every time archive_timeout is reached a log full
> of no changes is shipped to your slave and applied - and no checkpoint
> times are changed for reasons I mentioned above.
>
>

thanks, but we have not set archive_timeout, and we have a lot of real
data changes.

That's why i don't understand why checkpoint never happen on the slave.

--
Sébastien Lardière

Re: [Skytools-users] WAL Shipping + checkpoint

From

Martin Pihlak

Date:

27 August 2009, 13:11:25

>> There are actually no "real" data changes being made on your master
>> for some reason. So every time archive_timeout is reached a log full
>> of no changes is shipped to your slave and applied - and no checkpoint
>> times are changed for reasons I mentioned above.
>>
>>
>
> thanks, but we have not set archive_timeout, and we have a lot of real
> data changes.
>
> That's why i don't understand why checkpoint never happen on the slave.
>

What about the other values in pg_controldata output, do they change at
all? If they remain constant, maybe the logs are just not applied. Also,
are there any "restored log file xxx from archive" entries in postgres
log?

regards,
Martin

Re: [Skytools-users] WAL Shipping + checkpoint

From

Sébastien Lardière

Date:

28 August 2009, 05:12:08

On 27/08/2009 18:11, Martin Pihlak wrote:
>>> There are actually no "real" data changes being made on your master
>>> for some reason. So every time archive_timeout is reached a log full
>>> of no changes is shipped to your slave and applied - and no checkpoint
>>> times are changed for reasons I mentioned above.
>>>
>>>
>>>
>> thanks, but we have not set archive_timeout, and we have a lot of real
>> data changes.
>>
>> That's why i don't understand why checkpoint never happen on the slave.
>>
>>
> What about the other values in pg_controldata output, do they change at
> all? If they remain constant, maybe the logs are just not applied. Also,
> are there any "restored log file xxx from archive" entries in postgres
> log?
>
>
>

No, i don't see any change in pg_controldata, but :

there is messages in logfile :

2009-08-28 10:02:51,129 26717 INFO 00000001000003F700000088: Found
2009-08-28 10:02:51,169 26717 INFO {count: 1}
2009-08-28 10:02:51 CEST [18439]: [1862-1] user=,db= LOG:  restored log
file "00000001000003F700000088" from archive

postgres@bdd2:/data/postgresql/8.3/main$ grep "restored log file"
/var/log/postgresql/postgresql-2009-08-28.log | wc -l
164

And I can see that the cluster is updated inside the directory px_xlog
and base, at least.

So we can think it's working, but, when i restart the slave cluster, it
process again all the WAL since the last checkpoint ( for nothing, I
think ), so, when we'll need this slave, it will take a lot of time,
more and more, and this point worried me a bit.



--
Sébastien Lardière

Re: [Skytools-users] WAL Shipping + checkpoint

From

Martin Pihlak

Date:

28 August 2009, 11:56:07

Sébastien Lardière wrote:
> No, i don't see any change in pg_controldata, but :
>
> there is messages in logfile :
>
> 2009-08-28 10:02:51,129 26717 INFO 00000001000003F700000088: Found
> 2009-08-28 10:02:51,169 26717 INFO {count: 1}
> 2009-08-28 10:02:51 CEST [18439]: [1862-1] user=,db= LOG:  restored log
> file "00000001000003F700000088" from archive
>

This is weird, indeed it seems that for some reason the recovery restartpoints
are not created.

Looking quickly at RecoveryRestartPoint() in xlog.c, there are two cases when it
doesn't do a checkpoint. For one thing, it checks if if enough time has elapsed
since last controlfile modification. If not, the checkpoint is skipped. I'm wondering
if it does the correct thing if the clocks of two machines are too far apart.
Another check is "is it safe to do a checkpoint". This is logged with DEBUG2, so
it should be visible if you set the logging level accordingly.

Alternatively, you could attach a debugger to the recovery process and see if the
RecoveryRestartPoint() and CheckPointGuts() functions are called at all.

regards,
Martin

Re: [Skytools-users] WAL Shipping + checkpoint

From

Simon Riggs

Date:

28 August 2009, 12:13:27

On Fri, 2009-08-28 at 17:55 +0300, Martin Pihlak wrote:

> This is weird, indeed it seems that for some reason the recovery restartpoints
> are not created.
>
> Looking quickly at RecoveryRestartPoint() in xlog.c, there are two cases when it
> doesn't do a checkpoint. For one thing, it checks if if enough time has elapsed
> since last controlfile modification. If not, the checkpoint is skipped. I'm wondering
> if it does the correct thing if the clocks of two machines are too far apart.

Probably not that.

> Another check is "is it safe to do a checkpoint". This is logged with DEBUG2, so
> it should be visible if you set the logging level accordingly.

This seems like the most likely cause. I would guess that one of your
GiST indexes has a corruption in it and is preventing a restartpoint
from taking place. That has happened previously.

> Alternatively, you could attach a debugger to the recovery process and see if the
> RecoveryRestartPoint() and CheckPointGuts() functions are called at all.

I would say no need for that, but you can check the Gist pending actions
table.

--
 Simon Riggs           www.2ndQuadrant.com

Re: [Skytools-users] WAL Shipping + checkpoint

From

Sébastien Lardière

Date:

28 August 2009, 12:53:00

On 28/08/2009 17:13, Simon Riggs wrote:
>> Another check is "is it safe to do a checkpoint". This is logged with DEBUG2, so
>> it should be visible if you set the logging level accordingly.
>>
> This seems like the most likely cause. I would guess that one of your
> GiST indexes has a corruption in it and is preventing a restartpoint
> from taking place. That has happened previously.
>
>

You're right.

On the master, we had an error about a gist index on a ltree column :

Aug 25 09:07:02 eurovox-bdd postgres[5575]: [18-1]
user=eurovox,db=eurovox_db PANIC:  failed to add item to index page in
"appels_entrants_arbo_key"

The cluster try to restart ( it fail ), i had to force to stop.

Since this moment, the slave didn't make any checkpoint.

Now, we know why. Thanks a lot !

But how can i fix it ?

--
Sébastien Lardière

Re: [Skytools-users] WAL Shipping + checkpoint

From

Simon Riggs

Date:

28 August 2009, 13:14:25

On Fri, 2009-08-28 at 17:54 +0200, Sébastien Lardière wrote:
> Since this moment, the slave didn't make any checkpoint.
>
> Now, we know why. Thanks a lot !
>
> But how can i fix it ?

Current issue: Rebuild standby from base backup.

Cause: Locate the bug in the Index AM that causes it.

Symptom: Currently index AMs don't detect situation that index is
corrupt so they keep trying to recover the index for ever, even though
attempting to do so permanently prevents restartpoints. I would also
like them to have a mechanism for marking index corrupt and then
auto-rebuild them following recovery.

--
 Simon Riggs           www.2ndQuadrant.com

Re: [Skytools-users] WAL Shipping + checkpoint

From

Sébastien Lardière

Date:

31 August 2009, 05:49:55

On 28/08/2009 18:14, Simon Riggs wrote:
> On Fri, 2009-08-28 at 17:54 +0200, Sébastien Lardière wrote:
>
>> Since this moment, the slave didn't make any checkpoint.
>>
>> Now, we know why. Thanks a lot !
>>
>> But how can i fix it ?
>>
> Current issue: Rebuild standby from base backup.
>
> Cause: Locate the bug in the Index AM that causes it.
>
> Symptom: Currently index AMs don't detect situation that index is
> corrupt so they keep trying to recover the index for ever, even though
> attempting to do so permanently prevents restartpoints. I would also
> like them to have a mechanism for marking index corrupt and then
> auto-rebuild them following recovery.
>

Thanks, I make a restore backup on the slave this morning, and It works !

Thanks a lot,

--
Sébastien Lardière

Re: [Skytools-users] WAL Shipping + checkpoint

From

Yaroslav Tykhiy

Date:

01 September 2009, 00:17:27

On 31/08/2009, at 6:16 PM, Sébastien Lardière wrote:

> On 28/08/2009 18:14, Simon Riggs wrote:
>> On Fri, 2009-08-28 at 17:54 +0200, Sébastien Lardière wrote:
>>
>>> Since this moment, the slave didn't make any checkpoint.
>>>
>>> Now, we know why. Thanks a lot !
>>>
>>> But how can i fix it ?
>>>
>> Current issue: Rebuild standby from base backup.
>>
>> Cause: Locate the bug in the Index AM that causes it.
>>
>> Symptom: Currently index AMs don't detect situation that index is
>> corrupt so they keep trying to recover the index for ever, even
>> though
>> attempting to do so permanently prevents restartpoints. I would also
>> like them to have a mechanism for marking index corrupt and then
>> auto-rebuild them following recovery.
>>
>
> Thanks, I make a restore backup on the slave this morning, and It
> works !

Could you detail your solution please, if any?  I've seen frozen
pg_controldata output on my standby server for ages and attributed
that to the ancient version of pgsql (8.0.x) I'm stuck with.  There
have been no index related error messages in my log though.

Thank you!

Yar

Re: [Skytools-users] WAL Shipping + checkpoint

From

Simon Riggs

Date:

01 September 2009, 04:23:47

On Tue, 2009-09-01 at 13:14 +1000, Yaroslav Tykhiy wrote:

> Could you detail your solution please, if any?  I've seen frozen
> pg_controldata output on my standby server for ages and attributed
> that to the ancient version of pgsql (8.0.x) I'm stuck with.

It won't ever work before 8.2

--
 Simon Riggs           www.2ndQuadrant.com

Re: [Skytools-users] WAL Shipping + checkpoint

From

Sébastien Lardière

Date:

07 September 2009, 05:10:34

On 01/09/2009 05:14, Yaroslav Tykhiy wrote:
>>
>> Thanks, I make a restore backup on the slave this morning, and It
>> works !
>
>
> Could you detail your solution please, if any?  I've seen frozen
> pg_controldata output on my standby server for ages and attributed
> that to the ancient version of pgsql (8.0.x) I'm stuck with.  There
> have been no index related error messages in my log though.
>
>
Hi

sorry for my late reply,

We use walmgr.py from skytools ( http://pgfoundry.org/projects/skytools
), a great tool which let us easy backup, restore, WAL Shipping.

In this case, I use the sub-command 'restore' on the slave, which take
the last nightly backup we made, and launch a slave cluster on WAL
Shipping, and It works, but we use pg 8.3.

--
Sébastien