Thread: out-of-order XID insertion in KnownAssignedXids

out-of-order XID insertion in KnownAssignedXids

From
Date:

Hi All,


we are running many postgresql master/slave setups. The slaves are initialised from a pg_basebackup from the master  and are sync streaming from the master. When we determine the master has failed, the slave is promoted. Some time after that, the old master is again initialised with a pg_basebackup and starts streaming from the new master.


Our setup seems pretty "stock" and has been running for us for some years (with different versions of postgresql but also different OSes). 


Recently, we have gotten this error a fair amount of times: "out-of-order XID insertion in KnownAssignedXids" when postgresql attempts to start after being initialised with a pg_basebackup from the current master. 


The only reference I can find on that particular error, is from 2012 and the resulting issue is long fixed in our version of postgresql (9.5.3) ... (https://www.postgresql.org/message-id/201205230849.59825.andres@anarazel.de)



Once the issue has occurred, a subsequent re-initialisation (with a completely new pg_basebackup) does not resolve the issue.


I have a setup in the failing state, so I can produce any kind of log mesages / details that would be helpful.



Thank you for your support,

Fredrik



Re: out-of-order XID insertion in KnownAssignedXids

From
Andres Freund
Date:
Hi,

On 2016-10-18 14:57:52 +0200, fredrik@huitfeldt.com wrote:
> we are running many postgresql master/slave setups. The slaves are
> initialised from a pg_basebackup from the master and are sync
> streaming from the master. When we determine the master has failed,
> the slave is promoted. Some time after that, the old master is again
> initialised with a pg_basebackup and starts streaming from the new
> master.

Could you describe in a bit more detail how exactly you're setting up
the standbys? E.g. the exact recovery.conf used, whether you remove any
files during starting a standby. Also how exactly you're promoting
standbys?

> Recently, we have gotten this error a fair amount of times: "out-of-order XID insertion in KnownAssignedXids" when
postgresqlattempts to start after being initialised with a pg_basebackup from the current master. 


Which version are you encountering this on precisely?


> Once the issue has occurred, a subsequent re-initialisation (with a completely new pg_basebackup) does not resolve
theissue. 

How have you recovered from this so far?


> I have a setup in the failing state, so I can produce any kind of log mesages / details that would be helpful.

Could you use pg_xlogdump to dump the WAL file on which replay failed?
And then attach the output in a compressed manner?

Greetings,

Andres Freund


Re: out-of-order XID insertion in KnownAssignedXids

From
Date:
Hi Andres,
thank you very much for your response.

Full details on our postgresql version is:
Name        : postgresql-server
Arch        : x86_64
Version     : 9.2.13
Release     : 1.el7_1
Size        : 3.8 M
Repo        : rhel-7-server-dvd-rpms
Summary     : The programs needed to create and run a PostgreSQL server
URL         : http://www.postgresql.org/
License     : PostgreSQL
Description : PostgreSQL is an advanced Object-Relational database management system (DBMS).
            : The postgresql-server package contains the programs needed to create
            : and run a PostgreSQL server, which will in turn allow you to create
            : and maintain PostgreSQL databases.

We generally promote a slave to master as a reaction to the master being determined dead by our healthchecker. This is not something that happens often "in the wild" but we have a substantial amount of automated and manually operated QA setups where these situations are provoked. In order to promote, we just touch the trigger file. Typically, our monitoring process will determine that the new master is ready, but that there is no slave available to stream to. At this point, the monitoring process turns off sync replication (by settings synchronous_standby_names to 'none') in order for the new master to be workable. At some point later, the old master becomes recycled and joins as a slave. At this point, the monitoring process will turn on sync replication again (by setting synchronous_standby_names to the ips of the master and slave )


In order to set up a slave (and to recycle an old master), we:

- stop postgresql

- copy a full base backup to a staging area, with the command:
sudo -u postgres pg_basebackup --no-password --host=$ip --label=Basebackup-$hostIP --username=root --pgdata=$stagingDir --xlog-method=stream

- wipe the postgresql data dir, with the command:
"find $pgData/* | grep -v server.crt | grep -v server.key | grep -v PG_VERSION | xargs rm -rf".

- move data from the staging dir to postgresql data dir

- remove a file called backup_label, but I am not certain that this file is in fact there (any more).

- remove the trigger file.

- generate the recovery.conf, with this content:
standby_mode = 'on'
primary_conninfo = 'host=" . $masterIP . " port=5432 user=$user password=$password application_name=" . $hostIP . " sslmode=require'
recovery_target_timeline = 'latest'
trigger_file = '/var/run/postgresql_trigger'

- start postgresql

- wait for it to become "streaming" in pg_stat_replication

... it should be noted that while this is going on, there are application servers that are working on the master database (some times relatively hard).


The log messages look something like the following (on the new slave):
Sep 20 14:57:13 ... LOG:  database system was interrupted; last known up at 2016-09-20 14:57:11 IST
...
Sep 20 15:00:27 ... LOG:  entering standby mode
Sep 20 15:00:27 ... LOG:  database system was not properly shut down; automatic recovery in progress
Sep 20 15:00:27 ... LOG:  redo starts at 12/143FA3A8
Sep 20 15:00:27 ... LOG:  12 KnownAssignedXids (num=12 tail=0 head=12) [0]=38572 [1]=38573 [2]=38574 [3]=38575 [4]=38576 [5]=38577 [6]=38578 [7]=38579 [8]=38580 [9]=38581 [10]=38582 [11]=38583
Sep 20 15:00:27 ... CONTEXT:  xlog redo Standby/RUNNING_XACTS: nextXid 38585 latestCompletedXid 38571 oldestRunningXid 38572; 14 xacts: 38573 38575 38579 38578 38574 38581 38580 38576 38577 38572 38582 38584 38583 38583
Sep 20 15:00:27 ... FATAL:  out-of-order XID insertion in KnownAssignedXids
Sep 20 15:00:27 ... CONTEXT:  xlog redo Standby/RUNNING_XACTS: nextXid 38585 latestCompletedXid 38571 oldestRunningXid 38572; 14 xacts: 38573 38575 38579 38578 38574 38581 38580 38576 38577 38572 38582 38584 38583 38583
Sep 20 15:00:27 ... LOG:  startup process (PID 8053) exited with exit code 1
Sep 20 15:00:27 ... LOG:  terminating any other active server processes

As you can see, it spends a few minutes on starting up, where I am assuming that it is streaming changes from the master that occurred between the basebackup and the startup. In this period it gets connection attempts and logs a FATAL message that it is starting up (so I am assuming that it does not process requests, which is fine).

I have just found at least one place where the new master warns in the following manner, basically after the new slave starts starting up, but before it completes it:

Sep 20 14:58:13 ... ERROR:  prepared transaction with identifier "131077_AAAAAAAAAAAAAP//fwAAAd3KJH9X370UA1vQpzEwLjE3OS4yMjYuNjk=_AAAAAAAAAAAAAP//fwAAAd3KJH9X370UA1vQrgAAAAEAAAAA" is busy
Sep 20 14:58:13 ... ROLLBACK PREPARED '131077_AAAAAAAAAAAAAP//fwAAAd3KJH9X370UA1vQpzEwLjE3OS4yMjYuNjk=_AAAAAAAAAAAAAP//fwAAAd3KJH9X370UA1vQrgAAAAEAAAAA'


In order to recover from this situation, we basically wipe the system and start over from a backup. This is not *as* bad as it sounds, but pretty painful nonetheless. It should be noted that we have not spent a significant amount of time determinig the best way to recover, since we have been focused on how to avoid the situation altogether

I am not certain which xlog file it is having difficulties with?

Best regards,
Fredrik

On October 18, 2016 at 8:49:29 pm +02:00, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2016-10-18 14:57:52 +0200, fredrik@huitfeldt.com wrote:
we are running many postgresql master/slave setups. The slaves are
initialised from a pg_basebackup from the master and are sync
streaming from the master. When we determine the master has failed,
the slave is promoted. Some time after that, the old master is again
initialised with a pg_basebackup and starts streaming from the new
master.

Could you describe in a bit more detail how exactly you're setting up
the standbys? E.g. the exact recovery.conf used, whether you remove any
files during starting a standby. Also how exactly you're promoting
standbys?

Recently, we have gotten this error a fair amount of times: "out-of-order XID insertion in KnownAssignedXids" when postgresql attempts to start after being initialised with a pg_basebackup from the current master.


Which version are you encountering this on precisely?


Once the issue has occurred, a subsequent re-initialisation (with a completely new pg_basebackup) does not resolve the issue.

How have you recovered from this so far?


I have a setup in the failing state, so I can produce any kind of log mesages / details that would be helpful.

Could you use pg_xlogdump to dump the WAL file on which replay failed?
And then attach the output in a compressed manner?

Greetings,

Andres Freund

Re: out-of-order XID insertion in KnownAssignedXids

From
Michael Paquier
Date:
On Thu, Oct 20, 2016 at 10:21 PM,  <fredrik@huitfeldt.com> wrote:
> Version     : 9.2.13

You are largely out of date here. The last minor version available in
the 9.2 series is 9.2.18, meaning that you are missing more than one
year worth of bug fixes.

> - remove a file called backup_label, but I am not certain that this file is
> in fact there (any more).

It is never a good idea when you are trying to restore from a backup,
backup_label contains critical information when restoring from a
backup, so you may finish with a corrupted data folder.
--
Michael


Re: out-of-order XID insertion in KnownAssignedXids

From
Andres Freund
Date:
On 2016-10-20 22:37:15 +0900, Michael Paquier wrote:
> On Thu, Oct 20, 2016 at 10:21 PM,  <fredrik@huitfeldt.com> wrote:
> > - remove a file called backup_label, but I am not certain that this file is
> > in fact there (any more).
>
> It is never a good idea when you are trying to restore from a backup,
> backup_label contains critical information when restoring from a
> backup, so you may finish with a corrupted data folder.

And this actually seems like a likely source of these errors.  Removing
a backup label unfortunately causes hard to diagnose errors, because
everything appears to be ok as long as there's no checkpoints while
taking the base backups (or when the control file was copied early
enough). But as soon as a second checkpoint happens before the control
file is copied...

Fredrik, how did you end up removing the label?

Greetings,

Andres Freund


Re: out-of-order XID insertion in KnownAssignedXids

From
Kevin Grittner
Date:
On Thu, Oct 20, 2016 at 8:21 AM,  <fredrik@huitfeldt.com> wrote:

> Version     : 9.2.13

You are missing over a year's worth of bug fixes.

https://www.postgresql.org/support/versioning/

> - remove a file called backup_label

http://tbeitr.blogspot.com/2015/07/deleting-backuplabel-on-restore-will.html

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: out-of-order XID insertion in KnownAssignedXids

From
Date:
Hi All,

thank you all, I sincerely appreciate your feedback.

I have done a fair amount of testing on the solution proposed by you all (not removing backup_label), and it seems to have completely addressed the issue.

This was actually introduced some time back, and I am not completely certain how it crept into our codebase. I think that at least part of the explanation lies in the fact that we are experiencing a fair amount of growth in the database size and use on some of our installations. This could be the reason why extensive testing did not show the issue back then and why we are seeing it now. 

Would it make sense to log a warning in the case of a missing backup_label file, or would it be difficult to identify that situation in the code? I would be happy to dig in and develop a patch?

With regards to the package version; we *are* working with a few "stock" scenarios, where one of them is a fairly old RHEL installation. We also have centos versions that are much more updated. 

Best regards, and thank you all again,
Fredrik

On 20 October 2016 at 22:38:26 +02:00, Andres Freund <andres@anarazel.de> wrote:
On 2016-10-20 22:37:15 +0900, Michael Paquier wrote:
On Thu, Oct 20, 2016 at 10:21 PM, <fredrik@huitfeldt.com> wrote:
- remove a file called backup_label, but I am not certain that this file is
in fact there (any more).

It is never a good idea when you are trying to restore from a backup,
backup_label contains critical information when restoring from a
backup, so you may finish with a corrupted data folder.

And this actually seems like a likely source of these errors. Removing
a backup label unfortunately causes hard to diagnose errors, because
everything appears to be ok as long as there's no checkpoints while
taking the base backups (or when the control file was copied early
enough). But as soon as a second checkpoint happens before the control
file is copied...

Fredrik, how did you end up removing the label?

Greetings,

Andres Freund

Re: out-of-order XID insertion in KnownAssignedXids

From
Kevin Grittner
Date:
On Mon, Oct 24, 2016 at 8:10 AM,  <fredrik@huitfeldt.com> wrote:

> This was actually introduced some time back, and I am not completely certain
> how it crept into our codebase. I think that at least part of the
> explanation lies in the fact that we are experiencing a fair amount of
> growth in the database size and use on some of our installations. This could
> be the reason why extensive testing did not show the issue back then and why
> we are seeing it now.

If there is no checkpoint during the backup, you dodge the
corruption.  Higher update volume increases the frequency of
checkpoints and larger cluster size makes the backup take longer --
either of which would make corruption more likely.

> Would it make sense to log a warning in the case of a missing backup_label
> file, or would it be difficult to identify that situation in the code?

The problem is, without a backup_label file things look exactly
like a crash recovery, which is why it just goes to the last usable
checkpoint; that's the correct behavior for crash recovery.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company