Re: testing HS/SR - invalid magic number - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: testing HS/SR - invalid magic number
Date
Msg-id 4BC55F6E.4000708@enterprisedb.com
Whole thread Raw
In response to testing HS/SR - invalid magic number  ("Erik Rijkers" <er@xs4all.nl>)
Responses Re: testing HS/SR - invalid magic number  ("Erik Rijkers" <er@xs4all.nl>)
List pgsql-hackers
Erik Rijkers wrote:
> This replication test that was working well earlier (it ran daily), stopped working
> after reinstall of new instances of cvs HEAD. I think the change must have been today (or at least
> recent).
> ...
> -- logfile standby:
> ...
> 2010-04-14 02:21:11 CEST 5601   start=2010-04-14 02:18:22 CEST FATAL:  could not receive data from
> WAL stream: FATAL:  requested WAL segment 000000010000000000000032 has already been removed
> 
> cp: cannot stat `/var/data1/pg_stuff/dump/hotslave/replication_archive/000000010000000000000032':
> No such file or directory
> 2010-04-14 02:21:11 CEST 5598   start=2010-04-14 02:18:22 CEST LOG:  invalid magic number 0000 in
> log file 0, segment 50, offset 13795328
> cp: cannot stat `/var/data1/pg_stuff/dump/hotslave/replication_archive/000000010000000000000032':
> No such file or directory
> 2010-04-14 02:21:11 CEST 5784   start=2010-04-14 02:21:11 CEST LOG:  streaming replication
> successfully connected to primary

This is probably because of this change:

> date: 2010/04/12 09:52:29;  author: heikki;  state: Exp;  lines: +71 -23
> Change the logic to decide when to delete old WAL segments, so that it
> doesn't take into account how far the WAL senders are. This way a hung
> WAL sender doesn't prevent old WAL segments from being recycled/removed
> in the primary, ultimately causing the disk to fill up. Instead add
> standby_keep_segments setting to control how many old WAL segments are
> kept in the primary. This also makes it more reliable to use streaming
> replication without WAL archiving, assuming that you set
> standby_keep_segments high enough.

If you generate enough WAL records in the master that the standby can't
keep up, the primary will eventually delete a WAL segment that hasn't
been streamed to the standby yet, hence the "requested WAL segment
000000010000000000000032 has already been removed" error. It shouldn't
remove the segment before it's successfully archived, though, and your
logs show that the standby can't find that file in the archive either.
Is archiving set up properly?

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: testing HS/SR - invalid magic number
Next
From: Fujii Masao
Date:
Subject: Re: walreceiver is uninterruptible on win32