Parallel scan with SubTransGetTopmostTransaction assert coredump - Mailing list pgsql-hackers

From Pengchengliu
Subject Parallel scan with SubTransGetTopmostTransaction assert coredump
Date
Msg-id 000901d742f1$acae3ad0$060ab070$@tju.edu.cn
Whole thread Raw
Responses Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
List pgsql-hackers

Hi Hackers

Last email, format error, missing some information, so I resend this email.  

 With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested subtransaction with parallel scan, I got a subtransaction coredump as below:

 

```

(gdb) bt

#0  0x00001517ce61f7ff in raise () from /lib64/libc.so.6

#1  0x00001517ce609c35 in abort () from /lib64/libc.so.6

#2  0x0000000000aaf93f in ExceptionalCondition (conditionName=0xb4c920 "TransactionIdFollowsOrEquals(xid, TransactionXmin)", errorType=0xb4c796 "FailedAssertion", 

    fileName=0xb4c738 "/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/subtrans.c", lineNumber=156)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/utils/error/assert.c:67

#3  0x0000000000563111 in SubTransGetTopmostTransaction (xid=196963)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/subtrans.c:156

#4  0x0000000000b05206 in XidInMVCCSnapshot (xid=196963, snapshot=0x2f8ec58)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/utils/time/snapmgr.c:2293

#5  0x00000000004ff2bc in HeapTupleSatisfiesMVCC (htup=0x7ffc21807120, snapshot=0x2f8ec58, buffer=1946)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam_visibility.c:1073

#6  0x0000000000500363 in HeapTupleSatisfiesVisibility (tup=0x7ffc21807120, snapshot=0x2f8ec58, buffer=1946)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam_visibility.c:1695

#7  0x00000000004e423b in heapgetpage (sscan=0x2f8e840, page=1685) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam.c:447

#8  0x00000000004e68bf in heapgettup_pagemode (scan=0x2f8e840, dir=ForwardScanDirection, nkeys=0, key=0x0)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam.c:1077

#9  0x00000000004e6eb6 in heap_getnextslot (sscan=0x2f8e840, direction=ForwardScanDirection, slot=0x2fd8a38)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam.c:1333

#10 0x000000000075350e in table_scan_getnextslot (sscan=0x2f8e840, direction=ForwardScanDirection, slot=0x2fd8a38)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/include/access/tableam.h:906

#11 0x00000000007535d6 in SeqNext (node=0x2fd86a8) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/nodeSeqscan.c:80

#12 0x000000000071af3c in ExecScanFetch (node=0x2fd86a8, accessMtd=0x753542 <SeqNext>, recheckMtd=0x7535e7 <SeqRecheck>)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execScan.c:133

#13 0x000000000071afdd in ExecScan (node=0x2fd86a8, accessMtd=0x753542 <SeqNext>, recheckMtd=0x7535e7 <SeqRecheck>)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execScan.c:199

#14 0x0000000000753631 in ExecSeqScan (pstate=0x2fd86a8) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/nodeSeqscan.c:112

#15 0x00000000007173af in ExecProcNodeFirst (node=0x2fd86a8) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execProcnode.c:450

#16 0x000000000070b7f7 in ExecProcNode (node=0x2fd86a8) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/include/executor/executor.h:248

#17 0x000000000070e302 in ExecutePlan (estate=0x2fd7ca0, planstate=0x2fd86a8, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, 

    direction=ForwardScanDirection, dest=0x2f8ddd8, execute_once=true) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execMain.c:1632

#18 0x000000000070be22 in standard_ExecutorRun (queryDesc=0x2f8e7a8, direction=ForwardScanDirection, count=0, execute_once=true)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execMain.c:350

#19 0x000000000070bc50 in ExecutorRun (queryDesc=0x2f8e7a8, direction=ForwardScanDirection, count=0, execute_once=true)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execMain.c:294

#20 0x00000000007131d5 in ParallelQueryMain (seg=0x2ef30b8, toc=0x1517cf85c000)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execParallel.c:1448

#21 0x000000000055f70c in ParallelWorkerMain (main_arg=897455922) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c:1470

#22 0x000000000086e255 in StartBackgroundWorker () at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/bgworker.c:879

#23 0x000000000088192c in do_start_bgworker (rw=0x2f221c0) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:5870

#24 0x0000000000881cd8 in maybe_start_bgworkers () at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:6095

#25 0x0000000000880d14 in sigusr1_handler (postgres_signal_arg=10) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:5255

#26 <signal handler called>

#27 0x00001517ce6dc4bb in select () from /lib64/libc.so.6

#28 0x000000000087c867 in ServerLoop () at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:1703

#29 0x000000000087c232 in PostmasterMain (argc=3, argv=0x2ef1070) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:1412

#30 0x0000000000783418 in main (argc=3, argv=0x2ef1070) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/main/main.c:210

(gdb) f 3

#3  0x0000000000563111 in SubTransGetTopmostTransaction (xid=196963)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/subtrans.c:325

325             Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));

(gdb) p xid

$1 = 196963

(gdb) f 4

#4  0x0000000000b05206 in XidInMVCCSnapshot (xid=196963, snapshot=0x2f8ec58)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/utils/time/snapmgr.c:2293

2293                            xid = SubTransGetTopmostTransaction(xid);

(gdb) p *snapshot

$2 = {snapshot_type = SNAPSHOT_MVCC, xmin = 196962, xmax = 210314, xip = 0x2f8ecc0, xcnt = 82, subxip = 0x0, subxcnt = 0, suboverflowed = true, 

  takenDuringRecovery = false, copied = true, curcid = 14, speculativeToken = 2139062143, active_count = 0, regd_count = 1, ph_node = {first_child = 0x0, 

    next_sibling = 0xf65ca0 <CatalogSnapshotData+64>, prev_or_parent = 0x2f8dbc8}, whenTaken = 0, lsn = 0}

(gdb) p TransactionXmin

$3 = 196992

(gdb) f 3

#3  0x0000000000563111 in SubTransGetTopmostTransaction (xid=196963)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/subtrans.c:325

325             Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));

(gdb) p xid

$4 = 196963

(gdb) p TransactionXmin

$5 = 196992

```

 

After simple analyze, I think it should be a bug.

 

1, Main process first get active snapshot(xmin 196962, xmax 210314), and push active snapshot. 

2, Main process call InitializeParallelDSM, get transaction snapshot(xmin 196992, xmax 210320), and send this snapshot with parameters PARALLEL_KEY_TRANSACTION_SNAPSHOT.

3, Main process call ExecParallelInitializeDSM->ExecSeqScanInitializeDSM->table_parallelscan_initialize  SerializeSnapshot active snapshot(xmin 196962, xmax 210314)

4, Main process Lanch Parallel Workers process.

 

Parallel Workers process work flow:

1, ParallelWorkerMain->BackgroundWorkerInitializeConnectionByOid->GetTransactionSnapshot(xmin 196992, xmax 210320) and set TransactionXmin to 196992.

2, ParallelWorkerMain->RestoreTransactionSnapshot->SetTransactionSnapshot->ProcArrayInstallRestoredXmin and set TransactionXmin to 196992.

3, ParallelWorkerMain->ExecParallelInitializeWorker->ExecSeqScanInitializeWorker->table_beginscan_parallel restore active snapshot(xmin 196962, xmax 210314) from main process.

4, Got coredump ParallelWorkerMain->ParallelQueryMain->ExecutorRun...ExecScan->heapgetpage->HeapTupleSatisfiesMVCC->XidInMVCCSnapshot->SubTransGetTopmostTransaction 

 

So the root cause is the Parallel Workers process set the TransactionXmin with later transcation snapshot. When parallel scan, Parallel Workers process use the older active snapshot.

It leads to subtrans assert coredump. I don't know how to fix it. Is there any ideas?

 

Thanks

Pengcheng

 

 

 

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [BUG]"FailedAssertion" reported in lazy_scan_heap() when running logical replication
Next
From: Andres Freund
Date:
Subject: Re: Anti-critical-section assertion failure in mcxt.c reached by walsender