Re: Detecting File Damage & Inconsistencies - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Detecting File Damage & Inconsistencies
Date
Msg-id CAGRY4nzFuco-0od26hoUxKC_N7CzLHJeMX0tUbyqfX1LRqVA=Q@mail.gmail.com
Whole thread Raw
In response to Re: Detecting File Damage & Inconsistencies  (Simon Riggs <simon.riggs@enterprisedb.com>)
Responses Re: Detecting File Damage & Inconsistencies
List pgsql-hackers
On Tue, 22 Jun 2021 at 00:24, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
On Thu, Mar 18, 2021 at 6:20 AM Craig Ringer
<craig.ringer@enterprisedb.com> wrote:
>
> On Mon, 15 Mar 2021 at 21:01, David Steele <david@pgmasters.net> wrote:
>>
>> On 11/18/20 5:23 AM, Simon Riggs wrote:
>> > On Wed, 18 Nov 2020 at 06:42, Craig Ringer
>> > <craig.ringer@enterprisedb.com> wrote:
>> >>
>> >> On Fri, Nov 13, 2020 at 7:24 PM Simon Riggs <simon@2ndquadrant.com> wrote:
>> >>>
>> >>>
>> >>> What I'm proposing is an option to add 16 bytes onto each COMMIT
>> >>> record
>> >>
>> >>
>> >> Would it make sense to write this at the time we write a topxid assignment to WAL instead?
>> >>
>> >> Otherwise it won't be accessible to streaming-mode logical decoding.
>> >
>> > Do you mean extend the xl_xact_assignment record? My understanding is
>> > that is not sent in all cases, so not sure what you mean by "instead".
>>
>> Craig, can you clarify?
>
>
> Right. Or write a separate WAL record when the feature is enabled. But it's probably sufficient to write it as an optional chunk on xl_xact_assignment records. We often defer writing them so we can optimise away xacts that never actually wrote anything, but IIRC we still write one before we write any WAL that references the xid. That'd be fine, since we don't need the info any sooner than that during decoding. I'd have to double check that we write it in all cases and won't get to that too soon, but I'm pretty sure we do...

The commit record is optimized away if no xid is assigned, though is
still present if we didn't write any WAL records.

But if a commit record exists in the WAL stream, we want to know where
it came from.

A later patch will add PITR capability based on this information so
attaching it directly to the commit record is fairly important, IMHO.

Why?

All the proposed info:

* 8-byte session start time (from MyStartTime)
* 2-byte pid (from MyProcPid)
* 4-byte user oid

are available at topxid assignment time. If we defer writing them until commit, we lose the ability to use this information during streaming logical decoding. That's something I believe you've wanted for other functionality in the past, such as logical decoding based audit functionality.

IIRC the restart_lsn horizon already ensures that we can't miss the xl_xact_assignment at the start of a txn. We would ensure that the desired info is available throughout decoding of the txn, including at commit record processing time, by adding it to the toplevel ReorderBufferTxn.

The only advantage I can see to annotating the commit record instead is that we don't have to spend a few bytes per reorder-buffered topxid to track this info between start of decoding for the tx and processing of the commit record. I don't think that's worth caring about.The advantages that having it earlier would give us are much more significant.

A few examples:

* Skip reorder buffering of non-target transactions early, so we can decode the WAL stream to find the target transactions much faster using less memory and I/O;

* Read the database change stream and use the session info to stream info into an intrusion detection system and/or audit engine in real time, using txn streaming to avoid the need to create huge reorder buffers;

* Re-decode the WAL stream to identify a target txn you know was aborted, and commit it instead, so you can recover data from aborted txns from the WAL stream using logical decoding. (Only possible if the catalog_xmin hasn't advanced past that point already though)

So yeah. I think it'd be better to log the info you want at start-of-txn unless there's a compelling reason not so, and I don't see one yet.

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Doc chapter for Hash Indexes
Next
From: Dilip Kumar
Date:
Subject: Re: Toast compression method options