Thread: Wrong SYSID in WAL segment

Wrong SYSID in WAL segment

From

"Latrous, Youssef"

Date:

25 October 2010, 17:00:41

Hi there,

In one of our systems we’ve noticed the following (strange?) behavior. In a Master/Slave configuration, we run wal-mgr to achieve data replication from the master to the slave node. Both nodes run PostgreSQL 8.4.1 (on Solaris 10). The slave starts in archive mode and initiates a restore from the log files. After few segments, it complains with the following error message:

...

LOG: WAL file is from different system

DETAIL: WAL file SYSID is 5466170076771909117, pg_control SYSID is 5516922116183112703

LOG: redo done at 0/8FFE920

LOG: last completed transaction was at log time 2010-09-17 10:19:49.545025-04

2010-09-17 10:20:15,621 20400 INFO 000000010000000000000008: Found

2010-09-17 10:20:16,776 20400 INFO {count: 1}

LOG: restored log file "000000010000000000000008" from archive

2010-09-17 10:20:17,118 20417 INFO 00000002.history: not found, ignoring

2010-09-17 10:20:17,119 20417 INFO got SystemExit(1), exiting

LOG: selected new timeline ID: 2

2010-09-17 10:20:17,458 20433 INFO 00000001.history: not found, ignoring

2010-09-17 10:20:17,459 20433 INFO got SystemExit(1), exiting

LOG: archive recovery complete

LOG: database system is ready to accept connections

LOG: autovacuum launcher started

I’m trying to understand few things here:

1) Why one of the segments has a different SYSID (knowing that all WAL segments are originating from the same node)? In other words, under which circumstances the SYSID could be different from one segment to another one on the same node? CRC is ok, which means that none of these segments is corrupted.

2) Once the postmaster encounters this issue, it stops the recovery and switches to master mode, hence breaking Master/Slave mode. What’s the rationale behind changing underneath the system the database mode?

3) How would one know that the replication is progressing correctly?

I’m not sure if it’s the right mailing list. If not, please let me know which one is more appropriate and I’ll post there.

Thank you in advance for your time and help,

Regards,

Youssef

Re: Wrong SYSID in WAL segment

From

Tom Lane

Date:

25 October 2010, 17:14:48

"Latrous, Youssef" <YLatrous@BroadViewNet.com> writes:
> 1) Why one of the segments has a different SYSID (knowing that all WAL
> segments are originating from the same node)?

They aren't.  Somewhere you've got WAL segments slipping in from a
different database cluster.

            regards, tom lane

Re: Wrong SYSID in WAL segment

From

"Latrous, Youssef"

Date:

25 October 2010, 19:26:43

Thank you for the reply.

This is an embedded system and is isolated (only the master and slave
nodes are reachable within this setup). Is there any way either the
slave or the master node changes the pg_control content? The Master node
did not restart at all when this happened. Moreover, if the slave is
rebooted, the problem goes away.

Regards,

Youssef

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Monday, October 25, 2010 4:12 PM
To: Latrous, Youssef
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Wrong SYSID in WAL segment

"Latrous, Youssef" <YLatrous@BroadViewNet.com> writes:
> 1) Why one of the segments has a different SYSID (knowing that all WAL
> segments are originating from the same node)?

They aren't.  Somewhere you've got WAL segments slipping in from a
different database cluster.

            regards, tom lane