Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash
Date
Msg-id CA+TgmoYSg+1PtN_wpqPD8f8Xfepvkf_5PPewc9DRuYGipBWiLg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
On Fri, Jul 21, 2017 at 1:31 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Thanks Neha.  It's be best to post the back trace and if possible
> print oldestXact and ShmemVariableCache->oldestXid from the stack
> frame for TruncateCLOG.
>
> The failing assertion in TruncateCLOG() has a comment that says
> "vac_truncate_clog already advanced oldestXid", but vac_truncate_clog
> calls SetTransactionIdLimit() to write ShmemVariableCache->oldestXid
> *after* it calls TruncateCLOG().  What am I missing here?

This problem was introduced by commit
ea42cc18c35381f639d45628d792e790ff39e271, so this should be added to
the PostgreSQL 10 open items list. That commit intended to introduce a
distinction between (1) the oldest XID that can be safely examined and
(2) the oldest XID that can't yet be safely reused.  These are the
same except when we're in the middle of truncating CLOG: (1) advances
before the truncation, and (2) advances afterwards. That's why
AdvanceOldestClogXid() happens before truncation proper and
SetTransactionIdLimit() happens afterwards, and changing the order
would, I think, be quite wrong.

AFAICS, that assertion is simply a holdover from an earlier version of
the patch that escaped review.  There's just no reason to suppose that
it's true.

> What actually prevents ShmemVariableCache->oldestXid from going
> backwards anyway?  Suppose there are two or more autovacuum processes
> that reach vac_truncate_clog() concurrently.  They do a scan of
> pg_database whose tuples they access without locking through a
> pointer-to-volatile because they expect concurrent in-place writers,
> come up with a value for frozenXID, and then arrive at
> SetTransactionIdLimit() in whatever order and clobber
> ShmemVariableCache->oldestXid.  What am I missing here?

Hmm, there could be a bug there, but I don't think it's *this* bug.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] typo for using "OBJECT_TYPE" for "security label ondomain" in "gram.y"
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Macros bundling RELKIND_* conditions