Re: using an end-of-recovery record in all cases - Mailing list pgsql-hackers

From Robert Haas
Subject Re: using an end-of-recovery record in all cases
Date
Msg-id CA+TgmoZZDL_2E_zuahqpJ-WmkuxmUi8+g7=dLEny=18r-+c-iQ@mail.gmail.com
Whole thread Raw
In response to Re: using an end-of-recovery record in all cases  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: using an end-of-recovery record in all cases  (Nathan Bossart <nathandbossart@gmail.com>)
List pgsql-hackers
On Tue, Apr 19, 2022 at 4:38 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> Shouldn't latestCompletedXid be set to MaxTransactionId in this case?  Or
> is this related to the logic in FullTransactionIdRetreat() that avoids
> skipping over the "actual" special transaction IDs?

The problem here is this code:

    /* also initialize latestCompletedXid, to nextXid - 1 */
    LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE);
    ShmemVariableCache->latestCompletedXid = ShmemVariableCache->nextXid;
    FullTransactionIdRetreat(&ShmemVariableCache->latestCompletedXid);
    LWLockRelease(ProcArrayLock);

If nextXid is 3, then latestCompletedXid gets 2. But in
GetRunningTransactionData:

    Assert(TransactionIdIsNormal(CurrentRunningXacts->latestCompletedXid));

> Your reasoning seems sound to me.

I was talking with Thomas Munro yesterday and he thinks there is a
problem with relfilenode reuse here. In normal running, when a
relation is dropped, we leave behind a 0-length file until the next
checkpoint; this keeps that relfilenode from being used even if the
OID counter wraps around. If we didn't do that, then imagine that
while running with wal_level=minimal, we drop an existing relation,
create a new relation with the same OID, load some data into it, and
crash, all within the same checkpoint cycle, then we will be able to
replay the drop, but we will not be able to restore the relation
contents afterward because at wal_level=minimal they are not logged.
Apparently, we don't create tombstone files during recovery because we
know that there will be a checkpoint at the end.

With the existing use of the end-of-recovery record, we always know
that wal_level>minimal, because we're only using it on standbys. But
with this use that wouldn't be true any more. So I guess we need to
start creating tombstone files even during recovery, or else do
something like what Dilip coded up in
http://postgr.es/m/CAFiTN-u=r8UTCSzu6_pnihYAtwR1=esq5sRegTEZ2tLa92fovA@mail.gmail.com
which I think would be a better solution at least in the long term.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [RFC] building postgres with meson -v8
Next
From: Tomas Vondra
Date:
Subject: Re: Bad estimate with partial index