I got a sort of POLA violation moment today - a colleague has restored a PITR archive up to a point in time, and when the developers looked at the data, it looked wrong - as it if wasn't from that particular time. Later, he told me he got an error trying to use pg_dump to extract the desired tables to restore:
Dumping the contents of table "xxx" failed: PQgetResult() failed.
pg_dump: Error message from server: ERROR: canceling statement due to conflict with recovery
DETAIL: User query might have needed to see row versions that must be removed.
This could be the source of data problems the devs are seeing. Offhand, this looks like it's indicating a conflict between the PITR recovery process and pg_dump. But, something's puzzling to me: why did he manage to connect to the db at all?
And, the actual question: how to monitor the WAL replay process? Currently, the recovery.conf file is sitting there, with the database running, but pg processes are idle, and pg_stat_activity doesn't list anything which appears to be related to the recovery process.