Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash
Date
Msg-id 20170721.161729.140149762.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash  (Neha Sharma <neha.sharma@enterprisedb.com>)
Responses Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash
List pgsql-hackers
At Fri, 21 Jul 2017 11:39:38 +0530, Neha Sharma <neha.sharma@enterprisedb.com> wrote in
<CANiYTQuZm+hDvuHB14d65SkL2ko98ESR3Jf2kUiX=m1haL=xrg@mail.gmail.com>
> Here is the back trace from the core dump attached.
> 
> (gdb) bt
> #0  0x00007f4a71424495 in raise () from /lib64/libc.so.6
> #1  0x00007f4a71425c75 in abort () from /lib64/libc.so.6
> #2  0x00000000009dc18a in ExceptionalCondition (conditionName=0xa905d0
> "!(TransactionIdPrecedesOrEquals(oldestXact,
> ShmemVariableCache->oldestXid))",
>     errorType=0xa9044f "FailedAssertion", fileName=0xa90448 "clog.c",
> lineNumber=683) at assert.c:54
> #3  0x0000000000524215 in TruncateCLOG (oldestXact=150036635,
> oldestxid_datoid=13164) at clog.c:682

In vac_truncate_clog, TruncateCLOG is called before
SetTransactionIdLimit, which advances
ShmemVariableCache->oldestXid. Given that the assertion in
TruncateCLOG is valid, they should be called in reverse order. I
suppose that CLOG files can be safely truncated after advancing
XID limits.

By the way, the attached patch is made by "git diff --patience".

filterdiff converts it into somewhat wrong shape. Specifically,
the result is missing the addition part of the difference, as the
second attached patch. I'm not sure which of git(2.9.2) or
filterdiff (0.3.3), (or me?) is doing wrong..


> #4  0x00000000006a6be8 in vac_truncate_clog (frozenXID=150036635,
> minMulti=1, lastSaneFrozenXid=200562449, lastSaneMinMulti=1) at
> vacuum.c:1197
> #5  0x00000000006a6948 in vac_update_datfrozenxid () at vacuum.c:1063
> #6  0x00000000007ce0a2 in do_autovacuum () at autovacuum.c:2625
> #7  0x00000000007cc987 in AutoVacWorkerMain (argc=0, argv=0x0) at
> autovacuum.c:1715
> #8  0x00000000007cc562 in StartAutoVacWorker () at autovacuum.c:1512
> #9  0x00000000007e2acd in StartAutovacuumWorker () at postmaster.c:5414
> #10 0x00000000007e257e in sigusr1_handler (postgres_signal_arg=10) at
> postmaster.c:5111
> #11 <signal handler called>
> #12 0x00007f4a714d3603 in __select_nocancel () from /lib64/libc.so.6
> #13 0x00000000007dde88 in ServerLoop () at postmaster.c:1717
> #14 0x00000000007dd67d in PostmasterMain (argc=3, argv=0x2eb8b00) at
> postmaster.c:1361
> #15 0x000000000071a218 in main (argc=3, argv=0x2eb8b00) at main.c:228
> (gdb) print ShmemVariableCache->oldestXid
> $3 = 548
> 
> 
> Regards,
> Neha Sharma
> 
> On Fri, Jul 21, 2017 at 11:01 AM, Thomas Munro <
> thomas.munro@enterprisedb.com> wrote:
> 
> > On Fri, Jul 21, 2017 at 4:16 PM, Neha Sharma
> > <neha.sharma@enterprisedb.com> wrote:
> > >
> > > Attached is the core dump file received on PG 10beta2 version.
> >
> > Thanks Neha.  It's be best to post the back trace and if possible
> > print oldestXact and ShmemVariableCache->oldestXid from the stack
> > frame for TruncateCLOG.
> >
> > The failing assertion in TruncateCLOG() has a comment that says
> > "vac_truncate_clog already advanced oldestXid", but vac_truncate_clog
> > calls SetTransactionIdLimit() to write ShmemVariableCache->oldestXid
> > *after* it calls TruncateCLOG().  What am I missing here?
> >
> > What actually prevents ShmemVariableCache->oldestXid from going
> > backwards anyway?  Suppose there are two or more autovacuum processes
> > that reach vac_truncate_clog() concurrently.  They do a scan of
> > pg_database whose tuples they access without locking through a
> > pointer-to-volatile because they expect concurrent in-place writers,
> > come up with a value for frozenXID, and then arrive at
> > SetTransactionIdLimit() in whatever order and clobber
> > ShmemVariableCache->oldestXid.  What am I missing here?
> >
> > --
> > Thomas Munro
> > http://www.enterprisedb.com
> >

-- 
堀口恭太郎

日本電信電話株式会社 NTTオープンソースソフトウェアセンタ
Phone: 03-5860-5115 / Fax: 03-5463-5490
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index faa1812..cd8be92 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -1192,13 +1192,6 @@ vac_truncate_clog(TransactionId frozenXID,    AdvanceOldestCommitTsXid(frozenXID);    /*
-     * Truncate CLOG, multixact and CommitTs to the oldest computed value.
-     */
-    TruncateCLOG(frozenXID, oldestxid_datoid);
-    TruncateCommitTs(frozenXID);
-    TruncateMultiXact(minMulti, minmulti_datoid);
-
-    /*     * Update the wrap limit for GetNewTransactionId and creation of new     * MultiXactIds.  Note: these
functionswill also signal the postmaster     * for an(other) autovac cycle if needed.   XXX should we avoid possibly
 
@@ -1206,6 +1199,14 @@ vac_truncate_clog(TransactionId frozenXID,     */    SetTransactionIdLimit(frozenXID,
oldestxid_datoid);   SetMultiXactIdLimit(minMulti, minmulti_datoid, false);
 
+
+    /*
+     * Truncate CLOG, multixact and CommitTs to the oldest computed value
+     * after advancing xid limits.
+     */
+    TruncateCLOG(frozenXID, oldestxid_datoid);
+    TruncateCommitTs(frozenXID);
+    TruncateMultiXact(minMulti, minmulti_datoid);}
*** a/src/backend/commands/vacuum.c
--- b/src/backend/commands/vacuum.c
***************
*** 1192,1204 **** vac_truncate_clog(TransactionId frozenXID,     AdvanceOldestCommitTsXid(frozenXID);      /*
-      * Truncate CLOG, multixact and CommitTs to the oldest computed value.
-      */
-     TruncateCLOG(frozenXID, oldestxid_datoid);
-     TruncateCommitTs(frozenXID);
-     TruncateMultiXact(minMulti, minmulti_datoid);
- 
-     /*      * Update the wrap limit for GetNewTransactionId and creation of new      * MultiXactIds.  Note: these
functionswill also signal the postmaster      * for an(other) autovac cycle if needed.   XXX should we avoid possibly
 
--- 1192,1197 ----

pgsql-hackers by date:

Previous
From: Sokolov Yura
Date:
Subject: Re: [HACKERS] autovacuum can't keep up, bloat just continues to rise
Next
From: Ashutosh Bapat
Date:
Subject: Re: [HACKERS] PgFDW connection invalidation by ALTER SERVER/ALTERUSER MAPPING