Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash - Mailing list pgsql-hackers

From Neha Sharma
Subject Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash
Date
Msg-id CANiYTQuZm+hDvuHB14d65SkL2ko98ESR3Jf2kUiX=m1haL=xrg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: [HACKERS] [TRAP: FailedAssertion] causing server to crash
List pgsql-hackers
Here is the back trace from the core dump attached.

(gdb) bt
#0  0x00007f4a71424495 in raise () from /lib64/libc.so.6
#1  0x00007f4a71425c75 in abort () from /lib64/libc.so.6
#2  0x00000000009dc18a in ExceptionalCondition (conditionName=0xa905d0 "!(TransactionIdPrecedesOrEquals(oldestXact, ShmemVariableCache->oldestXid))",
    errorType=0xa9044f "FailedAssertion", fileName=0xa90448 "clog.c", lineNumber=683) at assert.c:54
#3  0x0000000000524215 in TruncateCLOG (oldestXact=150036635, oldestxid_datoid=13164) at clog.c:682
#4  0x00000000006a6be8 in vac_truncate_clog (frozenXID=150036635, minMulti=1, lastSaneFrozenXid=200562449, lastSaneMinMulti=1) at vacuum.c:1197
#5  0x00000000006a6948 in vac_update_datfrozenxid () at vacuum.c:1063
#6  0x00000000007ce0a2 in do_autovacuum () at autovacuum.c:2625
#7  0x00000000007cc987 in AutoVacWorkerMain (argc=0, argv=0x0) at autovacuum.c:1715
#8  0x00000000007cc562 in StartAutoVacWorker () at autovacuum.c:1512
#9  0x00000000007e2acd in StartAutovacuumWorker () at postmaster.c:5414
#10 0x00000000007e257e in sigusr1_handler (postgres_signal_arg=10) at postmaster.c:5111
#11 <signal handler called>
#12 0x00007f4a714d3603 in __select_nocancel () from /lib64/libc.so.6
#13 0x00000000007dde88 in ServerLoop () at postmaster.c:1717
#14 0x00000000007dd67d in PostmasterMain (argc=3, argv=0x2eb8b00) at postmaster.c:1361
#15 0x000000000071a218 in main (argc=3, argv=0x2eb8b00) at main.c:228
(gdb) print ShmemVariableCache->oldestXid
$3 = 548


Regards,
Neha Sharma

On Fri, Jul 21, 2017 at 11:01 AM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Fri, Jul 21, 2017 at 4:16 PM, Neha Sharma
<neha.sharma@enterprisedb.com> wrote:
>
> Attached is the core dump file received on PG 10beta2 version.

Thanks Neha.  It's be best to post the back trace and if possible
print oldestXact and ShmemVariableCache->oldestXid from the stack
frame for TruncateCLOG.

The failing assertion in TruncateCLOG() has a comment that says
"vac_truncate_clog already advanced oldestXid", but vac_truncate_clog
calls SetTransactionIdLimit() to write ShmemVariableCache->oldestXid
*after* it calls TruncateCLOG().  What am I missing here?

What actually prevents ShmemVariableCache->oldestXid from going
backwards anyway?  Suppose there are two or more autovacuum processes
that reach vac_truncate_clog() concurrently.  They do a scan of
pg_database whose tuples they access without locking through a
pointer-to-volatile because they expect concurrent in-place writers,
come up with a value for frozenXID, and then arrive at
SetTransactionIdLimit() in whatever order and clobber
ShmemVariableCache->oldestXid.  What am I missing here?

--
Thomas Munro
http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] PgFDW connection invalidation by ALTER SERVER/ALTERUSER MAPPING
Next
From: Rafia Sabih
Date:
Subject: Re: [HACKERS] Partition-wise join for join between (declaratively)partitioned tables