Home > mailing lists

Re: patch proposal - Mailing list pgsql-hackers

From	Venkata B Nagothi
Subject	Re: patch proposal
Date	August 16, 2016 05:08:30
Msg-id	CAEyp7J_2KhY5QzJWAWq-VBBiU6_R+rX8n+-pbYiQxyC6JmAaFQ@mail.gmail.com Whole thread
In response to	Re: patch proposal (David Steele <david@pgmasters.net>)
Responses	Re: patch proposal Re: patch proposal
List	pgsql-hackers

Tree view

On Tue, Aug 16, 2016 at 2:50 AM, David Steele <david@pgmasters.net> wrote:

On 8/15/16 2:33 AM, Venkata B Nagothi wrote:

> During the recovery process, It would be nice if PostgreSQL generates an
> error by aborting the recovery process (instead of starting-up the
> cluster) if the intended recovery target point is not reached and give
> an option to DBA to resume the recovery process from where it exactly
> stopped.

Thom wrote a patch [1] recently that gives warnings in this case. You
might want to have a look at that first.

That is good to know. Yes, this patch is about generating a more meaningful output messages for recovery process, which makes sense.

> The issue here is, if by any chance, the required WALs are not available
> or if there is any WAL missing or corrupted at the restore_command
> location, then PostgreSQL recovers until the end of the last available
> WAL and starts-up the cluster.

You can use pause_at_recovery_target/recovery_target_action (depending
on your version) to prevent promotion. That would work for your stated
scenario but not for the scenario where replay starts (or the database
reaches consistency) after the recovery target.

The above said parameters can be configured to pause, shutdown or prevent promotion only after reaching the recovery target point.

To clarify, I am referring to a scenario where recovery target point is not reached at all ( i mean, half-complete or in-complete recovery) and there are lots of WALs still pending to be replayed - in this situation, PostgreSQL just completes the archive recovery until the end of the last available WAL (WAL file "00000001000000000000001E" in my case) and starts-up the cluster by generating an error message (saying "00000001000000000000001F" not found).

Note: I am testing in PostgreSQL-9.5

LOG: restored log file "00000001000000000000001E" from archive
cp: cannot stat ‘/data/pgrestore9531/00000001000000000000001F’: No such file or directory
LOG: redo done at 0/1EFFDBB8
LOG: last completed transaction was at log time 2016-08-15 11:04:26.795902+10

I have used the following recovery* parameters in the recovery.conf file here and have intentionally not supplied all the WAL archives needed for the recovery process to reach the target xid.

recovery_target_xid = xxxx,
recovery_target_inclusive = true
recovery_target_action = pause

It would be nice if PostgreSQL pauses the recovery in-case its not complete (because of missing or corrupt WAL), shutdown the cluster and allows the DBA to restart the replay of the remaining WAL Archive files to continue recovery (from where it stopped previously) until the recovery target point is reached.

Regards,

Venkata B N

Fujitsu Australia

pgsql-hackers by date:

From: Rushabh Lathia
Date: 16 August 2016, 05:05:29
Subject: Re: [parallel query] random server crash while running tpc-h query on power2

From: Gavin Flower
Date: 16 August 2016, 05:47:10
Subject: Re: Why --backup-and-modify-in-place in perltidy config?

Re: patch proposal - Mailing list pgsql-hackers

Previous

Next