Hi,
While running tests with Neon, we discovered an assertion failure that can
occur during re-entrant AbortTransaction() calls.
The issue arises when an error occurs during AbortTransaction() after
ProcArrayEndTransaction() has cleared MyProc->xid. If another error is raised
during cleanup (e.g., in AtEOXact_Inval()), the PostgresMain error handler
invokes AbortCurrentTransaction() again. The second AbortTransaction() call
reads a still-valid s->transactionId (CleanupTransaction() hasn't run yet)
and passes it to ProcArrayEndTransaction(), which then hits:
Assert(TransactionIdIsValid(proc->xid))
because MyProc->xid was already cleared by the first call.
The attached patch fixes this by checking MyProc->xid validity before calling
RecordTransactionAbort() and only passing a valid latestXid when appropriate.
**Reproduction:**
This can be reproduced reliably using the injection_points extension:
1. Attach the injection point:
SELECT injection_points_attach('transaction-end-process-inval', 'error');
2. Create invalidation messages: CREATE TABLE test(id int);
3. Trigger abort: ROLLBACK;
Without the fix: assertion crash on ProcArrayEndTransaction()
With the fix applied: the script will panic with "ERRORDATA_STACK_SIZE exceeded"
due to re-entrant error handling, demonstrating that the assertion is resolved.
I've included a reproduction script and the fix that clearly shows both behaviors.
**Files attached:**
- 0001-xact-Prevent-assertion-failure-in-re-entrant-Abort.patch
- repro_minimal_panic_if_fixed.sh
Thoughts?
Best regards,
Alexey