Re: [HACKERS] patch proposal - Mailing list pgsql-hackers

From Venkata B Nagothi
Subject Re: [HACKERS] patch proposal
Date
Msg-id CAEyp7J-7q1_VmyGtnjMNgQnn6korSzBhRz6SM5EkBPmcScJS2g@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] patch proposal  (David Steele <david@pgmasters.net>)
Responses Re: [HACKERS] patch proposal  (David Steele <david@pgmasters.net>)
List pgsql-hackers

On Tue, Mar 21, 2017 at 8:46 AM, David Steele <david@pgmasters.net> wrote:
Hi Venkata,

On 2/28/17 11:59 PM, Venkata B Nagothi wrote:
On Wed, Mar 1, 2017 at 1:14 AM, Venkata B Nagothi <nag1010@gmail.com
<mailto:nag1010@gmail.com>> wrote:
    On Tue, Jan 31, 2017 at 6:49 AM, David Steele <david@pgmasters.net
    <mailto:david@pgmasters.net>> wrote:

        Do you know when those will be ready?

    Attached are both the patches with tests included.

Thanks for adding the tests.  They bring clarity to the patch.

Thank you again reviewing the patch and your comments !
 
Unfortunately, I don't think the first patch (recoveryStartPoint) will work as currently implemented.  The problem I see is that the new function recoveryStartsHere() depends on pg_control containing a checkpoint right at the end of the backup.  There's no guarantee that this is true, even if pg_control is copied last.  That means a time, lsn, or xid that occurs somewhere in the middle of the backup can be selected without complaint from this code depending on timing.

Yes, that is true.  The latest best position, checkpoint position, xid and timestamp of the restored backup of the backup is shown up in the pg_controldata, which means, that is the position from which the recovery would start. Which in-turn means, WALs start getting replayed from that position towards --> minimum recovery position (which is the end backup, which again means applying WALs generated between start and the end backup) all the way through to  --> recovery target position. The best start position to check with would the position shown up in the pg_control file, which is way much better compared to the current postgresql behaviour. 

The tests pass (or rather fail as expected) because they are written using values before the start of the backup.  It's actually the end of the backup (where the database becomes consistent on recovery) that defines where PITR can start and this distinction definitely matters for very long backups.  A better test would be to start the backup, get a time/lsn/xid after writing some data, and then make sure that time/lsn/xid is flagged as invalid on recovery.

Yes, i agree, the databases restored from the backup would start the recovery and would become consistent once the end-backup is reached. The point here is that, when the backup is restored and recovery proceeds further, there is NO WAY to identify the next consistent position or end-position of the backup. This patch is only implementing a check to ensure recovery starts and proceeds at the right position instead of pre-maturely getting promoted which is the current behaviour. 

The current behaviour is that, if the XID shown-up by the pg_controldata is say 1400 and the specified recovery_target_xid is say 200, then, postgresql just completes the recovery and promotes at the immediate consistent recovery target, which means, the DBA has to all the way start the restoration and recovery process again to promote the database at the intended position.

It is also problematic to assume that transaction IDs commit in order. If checkPoint.latestCompletedXid contains 5 then a recovery to xid 4 may or may not be successful depending on the commit order.  However, it appears in this case the patch would disallow recovery to 4.

If the txid 4 is the recovery target xid, then, the appropriate backup taken previous to txid 4 must be used or as an alternative recovery target like recovery_target_timestamp must be used to proceed to the respective recovery target xid.
 
I think this logic would need to be baked into recoveryStopsAfter() and/or recoveryStopsBefore() in order to work.  It's not clear to me what that would look like, however, or if the xid check is even practical.

The above two functions would determine if the recovery process has to stop at a particular position or not and the patch has nothing to do with it.

To summarise, this patch only determines if the WAL replay has to start at all. In other words, if the specified recovery target falls prior to the restored database position, then, the WAL replay should not start, which prevent the database from getting promoted at an unintended recovery target.

Any thoughts/comments ?

Regards,

Venkata Balaji N
Database Consultant

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Next
From: "Ideriha, Takeshi"
Date:
Subject: Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki