Thread: recovery lag question

recovery lag question

From
John Lister
Date:
Hi, I've set up a warm standby box with postgresql 8.3.8 and pg_standby.
Everything seems to be ok except and the wal files are being copied
across and being processed during the recovery as you'd expect but I
have one question. The recovery seems to be processing the final wal
files (about 30) at the same rate as they are generated, which is
approximately every 6 minutes. This means that it is currently lagging
behind the primary by about 3 hours. I assumed that the recovery would
process the wal files as fast as possible until it caught up and then
waited, there appears to be no load on the secondary server so I'm
guessing it is sat waiting but i may be wrong.

Thanks

John

Re: recovery lag question

From
Alvaro Herrera
Date:
John Lister wrote:
> Hi, I've set up a warm standby box with postgresql 8.3.8 and
> pg_standby. Everything seems to be ok except and the wal files are
> being copied across and being processed during the recovery as you'd
> expect but I have one question. The recovery seems to be processing
> the final wal files (about 30) at the same rate as they are
> generated, which is approximately every 6 minutes. This means that
> it is currently lagging behind the primary by about 3 hours. I
> assumed that the recovery would process the wal files as fast as
> possible until it caught up and then waited, there appears to be no
> load on the secondary server so I'm guessing it is sat waiting but i
> may be wrong.

There is a sleep time between check for new files ... maybe you are
getting bit by that?  Other than that it is supposed to process new
files as soon as they become available.  How are you copying the files
across?  Maybe that process is getting stuck.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: recovery lag question

From
"Kevin Grittner"
Date:
Alvaro Herrera <alvherre@commandprompt.com> wrote:

> There is a sleep time between check for new files

Even after a successful copy?  Why would you want that?

-Kevin

Re: recovery lag question

From
"John Lister"
Date:
> John Lister wrote:
>> Hi, I've set up a warm standby box with postgresql 8.3.8 and
>> pg_standby. Everything seems to be ok except and the wal files are
>> being copied across and being processed during the recovery as you'd
>> expect but I have one question. The recovery seems to be processing
>> the final wal files (about 30) at the same rate as they are
>> generated, which is approximately every 6 minutes. This means that
>> it is currently lagging behind the primary by about 3 hours. I
>> assumed that the recovery would process the wal files as fast as
>> possible until it caught up and then waited, there appears to be no
>> load on the secondary server so I'm guessing it is sat waiting but i
>> may be wrong.
>
> There is a sleep time between check for new files ... maybe you are
> getting bit by that?  Other than that it is supposed to process new
> files as soon as they become available.  How are you copying the files
> across?  Maybe that process is getting stuck.

Hi, thanks for your reply.

The sleep time is set to 2 seconds which should be quick enough and the
files are being copied across fine as there are quite a few sat waiting in
the local "archive" directory.

However, i'm wondering if it is an I/O issue, the disk specs on the mirror
aren't the same as the primary (lower) and it looks like one cpu (the
postgres process i'm guessing) is doing a lot of io waits. It may have been
coincidence that the delays between recovering the wal files were almost
exactly the same as the delay between copying them across, such that if they
arrive every second, that is how fast they would be processed (admittedly 3
hours later) or every 10min would cause a similar delay.

The lag is now slowly getting bigger which makes me think i need to add some
more disks to the standby to bring it closer to primary.

Thanks for your help and i'll report if new disks solve the problem.

John