Thread: Monitoring PITR recovery progress

Monitoring PITR recovery progress

From
Ivan Voras
Date:
Hello,

I got a sort of POLA violation moment today - a colleague has restored a PITR archive up to a point in time, and when the developers looked at the data, it looked wrong - as it if wasn't from that particular time. Later, he told me he got an error trying to use pg_dump to extract the desired tables to restore: 

Dumping the contents of table "xxx" failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR: canceling statement due to conflict with recovery
DETAIL: User query might have needed to see row versions that must be removed.

This could be the source of data problems the devs are seeing. Offhand, this looks like it's indicating a conflict between the PITR recovery process and pg_dump. But, something's puzzling to me: why did he manage to connect to the db at all?

And, the actual question: how to monitor the WAL replay process? Currently, the recovery.conf file is sitting there, with the database running, but pg processes are idle, and pg_stat_activity doesn't list anything which appears to be related to the recovery process.

Re: Monitoring PITR recovery progress

From
Alan Hodgson
Date:
On Wed, 2019-01-23 at 18:58 +0100, Ivan Voras wrote:

And, the actual question: how to monitor the WAL replay process? Currently, the recovery.conf file is sitting there, with the database running, but pg processes are idle, and pg_stat_activity doesn't list anything which appears to be related to the recovery process.




The server logs each wal segment that gets processed during recovery. And you would definitely see a busy high-I/IO process applying the recovery.

It also logs when the recovery is complete. And I'm pretty sure it renames recovery.conf to recovery.done or something when it's done.