Re: Is txid_status() actually safe? / What is 011_crash_recovery.pl testing? - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Is txid_status() actually safe? / What is 011_crash_recovery.pl testing?
Date
Msg-id CAMsr+YHPZetkB0OeBsBu76+Km=Mb4SEbP1=-jJ6fF9MGSYn8mw@mail.gmail.com
Whole thread Raw
In response to Is txid_status() actually safe? / What is 011_crash_recovery.pl testing?  (Andres Freund <andres@anarazel.de>)
Responses Re: Is txid_status() actually safe? / What is 011_crash_recovery.pl testing?
List pgsql-hackers
On Tue, 9 Feb 2021 at 05:52, Andres Freund <andres@anarazel.de> wrote:

Craig, it kind of looks to me like you assumed it'd be guaranteed that
the xid at this point would show in-progress?

At the time I wrote that code, I don't think I understood that xid assignment wasn't necessarily durable until either (a) the next checkpoint; or (b) commit of some txn with a greater xid.

IIRC I expected that after crash and recovery the tx would always be treated as aborted, because the xid had been assigned but no corresponding commit was found before end-of-recovery. No explicit abort records are written to WAL for such txns since we crashed, but the server's oldest in-progress txn threshold is used to determine that they must be aborted rather than in-progress even though their clog entries aren't set to aborted.

Which was fine as far as it went, but I failed to account for the xid assignment not necessarily being durable when the client calls txid_status().


I don't think the use of txid_status() described in the docs added in
the commit is actually ever safe?

I agree. The client can query for its xid with txid_current() but as you note there's no guarantee that the assigned xid is durable.

The client would have to ensure that an xid was assigned, then ensure that the WAL was durably flushed past the point of the xid assignment before relying on the xid.

If we do a txn that performs a small write, calls txid_current(), and sends a commit that the server crashes before completing, we can't know for sure that the xid we recorded client-side before the server crash is the same txn we check the status of after crash recovery. Some other txn could've re-used the xid after crash so long as no other txn with a greater xid durably committed before the crash.

That scenario isn't hugely likely, but it's definitely possible on systems that don't do a lot of concurrent txns or do mostly long, heavyweight txns.

The txid_status() function was originally intended to be paired with a way to report topxid assignment to the client automatically, NOTIFY or GUC_REPORT-style. But that would not make this usage safe either, unless we delayed the report until WAL was flushed past the LSN of the xid assignment *or* some other txn with a greater xid committed.

This could be made safe with a variant of txid_current() that forced the xid assignment to be logged immediately if it was not already, and did not return until WAL flushed past the point of the assignment. If the client did most of the txn's work before requesting a guaranteed-durable xid, it would in practice not land up having to wait for a flush. But we'd have to keep track of when we assigned the xid in every single topxact in order to be able to promise we'd flushed it without having to immediately force a flush. That's pointless overhead all the rest of the time, just in case someone wants to get an xid for later use with txid_status().

The simplest option with no overhead on anything that doesn't care about txid_status() is to expose a function to force flush of WAL up to the current insert LSN. Then update the docs to say you have to call it after txid_current(), and before sending your commit. But at that point you might as well use 2PC, since you're paying the same double flush and double round-trip costs. The main point of txid_status() was to avoid the cost of that double-flush.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise

pgsql-hackers by date:

Previous
From: Michał Wadas
Date:
Subject: Proposal: per expression intervalstyle
Next
From: Craig Ringer
Date:
Subject: Re: Is txid_status() actually safe? / What is 011_crash_recovery.pl testing?