Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation() - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()
Date
Msg-id CABOikdNsH=bNecE7=hYdGcGTTUvGoWUcD_V1cuRGdr5=GOBNNw@mail.gmail.com
Whole thread Raw
In response to Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()  (Andrew Dunstan <andrew.dunstan@2ndquadrant.com>)
List pgsql-hackers


On Tue, Mar 27, 2018 at 7:28 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:


I had not noticed that in the kerfuffle over the more extreme version,
but I think this is a different route to breaking the same guarantees.
The point of the xl_prev links is, essentially, to allow verification
that we are reading a coherent series of WAL records, ie that record B
which appears to follow record A actually is supposed to follow it.
This throws away that cross-check just as effectively as the other patch
did, leaving us reliant solely on the per-record CRCs.  CRCs are good,
but they do have a false-positive rate.

An example of a case where xl_prev makes one feel a lot more comfortable
is the WAL-segment-switch option.  The fact that the first record of the
next WAL file links back to the XLOG_SWITCH record is evidence that
ignoring multiple megabytes of WAL was indeed the intended thing to do.
An xl_curr field cannot provide any evidence as to whether WAL records
were missed.

TBH I still don't see why this does not provide the same guarantee that the current code provides, but given the concerns expressed by others, I am not gonna pursue beyond a point. But one last time :-)

The current code uses xl_prev to cross-verify the record B, read after record A, indeed follows A and has a valid back-link to A. This deals with problems where B might actually be an old WAL record, carried over from a stale WAL file.

Now if we store xl_curr, we can definitely guarantee that B is ahead of A because B->xl_curr will be greater than A->xl_curr (otherwise we bail out). So that deals with the problem of stale WAL records. In addition, we also know where A ends (we can deduce that even for XLOG_SWITCH records knowing where the next record will start after the switch) and hence we know where B should start. So we read at B and also confirm that B->xl_curr matches B's position. If it does not, we declare end-of-WAL and bail out. So where is the problem?


> 2. Does the new logic in pg_rewind to search backward for a checkpoint
> work reliably, and will it be slow?

If you have to search backwards, this breaks it.  Full stop.

We don't really need to fetch the previous record. We really need to find the last checkpoint prior to a given LSN. That can be done by reading WAL segments forward. It can be a little slow, but hopefully not a whole lot.

A <- B <- C <- CHKPT <- D <- E <- F <- G

So today, if we want to find last checkpoint prio to G, we go through the back-links until we find the first checkpoint record. In the proposed code, we read forward the current WAL segment, remember the last CHKPT record seen and once we see G, we know we have found the prior checkpoint. If the checkpoint does not exist in the current WAL, we read forward the previous WAL and return the last checkpoint record in that WAL and so on. So in the worst case, we might read a WAL segment extra before we find the checkpoint record. That's not ideal but not too bad given that only pg_rewind needs this and that too only once.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Marina Polyakova
Date:
Subject: Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors
Next
From: Aleksander Alekseev
Date:
Subject: Re: Re: Cast jsonb to numeric, int, float, bool