Thread: Parallel scan with SubTransGetTopmostTransaction assert coredump

Parallel scan with SubTransGetTopmostTransaction assert coredump

From
"Pengchengliu"
Date:

Hi Hackers

Last email, format error, missing some information, so I resend this email.  

 With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested subtransaction with parallel scan, I got a subtransaction coredump as below:

 

```

(gdb) bt

#0  0x00001517ce61f7ff in raise () from /lib64/libc.so.6

#1  0x00001517ce609c35 in abort () from /lib64/libc.so.6

#2  0x0000000000aaf93f in ExceptionalCondition (conditionName=0xb4c920 "TransactionIdFollowsOrEquals(xid, TransactionXmin)", errorType=0xb4c796 "FailedAssertion", 

    fileName=0xb4c738 "/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/subtrans.c", lineNumber=156)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/utils/error/assert.c:67

#3  0x0000000000563111 in SubTransGetTopmostTransaction (xid=196963)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/subtrans.c:156

#4  0x0000000000b05206 in XidInMVCCSnapshot (xid=196963, snapshot=0x2f8ec58)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/utils/time/snapmgr.c:2293

#5  0x00000000004ff2bc in HeapTupleSatisfiesMVCC (htup=0x7ffc21807120, snapshot=0x2f8ec58, buffer=1946)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam_visibility.c:1073

#6  0x0000000000500363 in HeapTupleSatisfiesVisibility (tup=0x7ffc21807120, snapshot=0x2f8ec58, buffer=1946)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam_visibility.c:1695

#7  0x00000000004e423b in heapgetpage (sscan=0x2f8e840, page=1685) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam.c:447

#8  0x00000000004e68bf in heapgettup_pagemode (scan=0x2f8e840, dir=ForwardScanDirection, nkeys=0, key=0x0)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam.c:1077

#9  0x00000000004e6eb6 in heap_getnextslot (sscan=0x2f8e840, direction=ForwardScanDirection, slot=0x2fd8a38)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam.c:1333

#10 0x000000000075350e in table_scan_getnextslot (sscan=0x2f8e840, direction=ForwardScanDirection, slot=0x2fd8a38)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/include/access/tableam.h:906

#11 0x00000000007535d6 in SeqNext (node=0x2fd86a8) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/nodeSeqscan.c:80

#12 0x000000000071af3c in ExecScanFetch (node=0x2fd86a8, accessMtd=0x753542 <SeqNext>, recheckMtd=0x7535e7 <SeqRecheck>)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execScan.c:133

#13 0x000000000071afdd in ExecScan (node=0x2fd86a8, accessMtd=0x753542 <SeqNext>, recheckMtd=0x7535e7 <SeqRecheck>)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execScan.c:199

#14 0x0000000000753631 in ExecSeqScan (pstate=0x2fd86a8) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/nodeSeqscan.c:112

#15 0x00000000007173af in ExecProcNodeFirst (node=0x2fd86a8) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execProcnode.c:450

#16 0x000000000070b7f7 in ExecProcNode (node=0x2fd86a8) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/include/executor/executor.h:248

#17 0x000000000070e302 in ExecutePlan (estate=0x2fd7ca0, planstate=0x2fd86a8, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, 

    direction=ForwardScanDirection, dest=0x2f8ddd8, execute_once=true) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execMain.c:1632

#18 0x000000000070be22 in standard_ExecutorRun (queryDesc=0x2f8e7a8, direction=ForwardScanDirection, count=0, execute_once=true)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execMain.c:350

#19 0x000000000070bc50 in ExecutorRun (queryDesc=0x2f8e7a8, direction=ForwardScanDirection, count=0, execute_once=true)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execMain.c:294

#20 0x00000000007131d5 in ParallelQueryMain (seg=0x2ef30b8, toc=0x1517cf85c000)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execParallel.c:1448

#21 0x000000000055f70c in ParallelWorkerMain (main_arg=897455922) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c:1470

#22 0x000000000086e255 in StartBackgroundWorker () at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/bgworker.c:879

#23 0x000000000088192c in do_start_bgworker (rw=0x2f221c0) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:5870

#24 0x0000000000881cd8 in maybe_start_bgworkers () at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:6095

#25 0x0000000000880d14 in sigusr1_handler (postgres_signal_arg=10) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:5255

#26 <signal handler called>

#27 0x00001517ce6dc4bb in select () from /lib64/libc.so.6

#28 0x000000000087c867 in ServerLoop () at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:1703

#29 0x000000000087c232 in PostmasterMain (argc=3, argv=0x2ef1070) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:1412

#30 0x0000000000783418 in main (argc=3, argv=0x2ef1070) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/main/main.c:210

(gdb) f 3

#3  0x0000000000563111 in SubTransGetTopmostTransaction (xid=196963)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/subtrans.c:325

325             Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));

(gdb) p xid

$1 = 196963

(gdb) f 4

#4  0x0000000000b05206 in XidInMVCCSnapshot (xid=196963, snapshot=0x2f8ec58)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/utils/time/snapmgr.c:2293

2293                            xid = SubTransGetTopmostTransaction(xid);

(gdb) p *snapshot

$2 = {snapshot_type = SNAPSHOT_MVCC, xmin = 196962, xmax = 210314, xip = 0x2f8ecc0, xcnt = 82, subxip = 0x0, subxcnt = 0, suboverflowed = true, 

  takenDuringRecovery = false, copied = true, curcid = 14, speculativeToken = 2139062143, active_count = 0, regd_count = 1, ph_node = {first_child = 0x0, 

    next_sibling = 0xf65ca0 <CatalogSnapshotData+64>, prev_or_parent = 0x2f8dbc8}, whenTaken = 0, lsn = 0}

(gdb) p TransactionXmin

$3 = 196992

(gdb) f 3

#3  0x0000000000563111 in SubTransGetTopmostTransaction (xid=196963)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/subtrans.c:325

325             Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));

(gdb) p xid

$4 = 196963

(gdb) p TransactionXmin

$5 = 196992

```

 

After simple analyze, I think it should be a bug.

 

1, Main process first get active snapshot(xmin 196962, xmax 210314), and push active snapshot. 

2, Main process call InitializeParallelDSM, get transaction snapshot(xmin 196992, xmax 210320), and send this snapshot with parameters PARALLEL_KEY_TRANSACTION_SNAPSHOT.

3, Main process call ExecParallelInitializeDSM->ExecSeqScanInitializeDSM->table_parallelscan_initialize  SerializeSnapshot active snapshot(xmin 196962, xmax 210314)

4, Main process Lanch Parallel Workers process.

 

Parallel Workers process work flow:

1, ParallelWorkerMain->BackgroundWorkerInitializeConnectionByOid->GetTransactionSnapshot(xmin 196992, xmax 210320) and set TransactionXmin to 196992.

2, ParallelWorkerMain->RestoreTransactionSnapshot->SetTransactionSnapshot->ProcArrayInstallRestoredXmin and set TransactionXmin to 196992.

3, ParallelWorkerMain->ExecParallelInitializeWorker->ExecSeqScanInitializeWorker->table_beginscan_parallel restore active snapshot(xmin 196962, xmax 210314) from main process.

4, Got coredump ParallelWorkerMain->ParallelQueryMain->ExecutorRun...ExecScan->heapgetpage->HeapTupleSatisfiesMVCC->XidInMVCCSnapshot->SubTransGetTopmostTransaction 

 

So the root cause is the Parallel Workers process set the TransactionXmin with later transcation snapshot. When parallel scan, Parallel Workers process use the older active snapshot.

It leads to subtrans assert coredump. I don't know how to fix it. Is there any ideas?

 

Thanks

Pengcheng

 

 

 

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Andres Freund
Date:
Hi,

On 2021-05-07 11:32:57 +0800, Pengchengliu wrote:
> Hi Hackers,
> 
> Last email, format error, missing some information, so I resend this email.  
> 
>  With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested subtransaction with parallel scan, I got a
subtransactioncoredump as below:
 


> So the root cause is the Parallel Workers process set the TransactionXmin with later transcation snapshot. When
parallelscan, Parallel Workers process use the older active snapshot.
 
> 
> It leads to subtrans assert coredump. I don't know how to fix it. Is there any ideas?

Do you have steps to reliably reproduce this?

Greetings,

Andres Freund



RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
"Pengchengliu"
Date:
Hi Andres,
   Reproduce steps.

1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file  "src/include/access/subtrans.h" line number 15.
2, configure with enable assert and build it.
3, init a new database cluster.
4, modify  postgres.conf  and add some parameters as below. As the coredump from parallel scan, so we adjust parallel
setting,make it easy to reproduce.  

  max_connections = 2000

  parallel_setup_cost=0
  parallel_tuple_cost=0
  min_parallel_table_scan_size=0
  max_parallel_workers_per_gather=8
  max_parallel_workers = 32

5, start the database cluster.
6, use the script init_test.sql  in attachment to create tables.
7, use pgbench with script sub_120.sql in attachment to test it. Try it sometimes, you should get the coredump file.
    pgbench  -d postgres -p 33550  -n -r -f sub_120.sql   -c 200 -j 200 -T 120

Thanks
Pengcheng


-----Original Message-----
From: Andres Freund <andres@anarazel.de>
Sent: 2021年5月7日 11:55
To: Pengchengliu <pengchengliu@tju.edu.cn>
Cc: pgsql-hackers@postgresql.org
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

Hi,

On 2021-05-07 11:32:57 +0800, Pengchengliu wrote:
> Hi Hackers,
>
> Last email, format error, missing some information, so I resend this email.
>
>  With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested subtransaction with parallel scan, I got a
subtransactioncoredump as below: 


> So the root cause is the Parallel Workers process set the TransactionXmin with later transcation snapshot. When
parallelscan, Parallel Workers process use the older active snapshot. 
>
> It leads to subtrans assert coredump. I don't know how to fix it. Is there any ideas?

Do you have steps to reliably reproduce this?

Greetings,

Andres Freund



RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
"Pengchengliu"
Date:
Hi Andres,
   Reproduce steps.

1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file  "src/include/access/subtrans.h" line number 15.
2, configure with enable assert and build it.
3, init a new database cluster.
4, modify  postgres.conf  and add some parameters as below. As the coredump from parallel scan, so we adjust parallel
setting,make it easy to reproduce.  

  max_connections = 2000

  parallel_setup_cost=0
  parallel_tuple_cost=0
  min_parallel_table_scan_size=0
  max_parallel_workers_per_gather=8
  max_parallel_workers = 32

5, start the database cluster.
6, use the script init_test.sql  in attachment to create tables.
7, use pgbench with script sub_120.sql in attachment to test it. Try it sometimes, you should get the coredump file.
    pgbench  -d postgres -p 33550  -n -r -f sub_120.sql   -c 200 -j 200 -T 120

Thanks
Pengcheng


-----Original Message-----
From: Andres Freund <andres@anarazel.de>
Sent: 2021年5月7日 11:55
To: Pengchengliu <pengchengliu@tju.edu.cn>
Cc: pgsql-hackers@postgresql.org
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

Hi,

On 2021-05-07 11:32:57 +0800, Pengchengliu wrote:
> Hi Hackers,
>
> Last email, format error, missing some information, so I resend this email.
>
>  With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested subtransaction with parallel scan, I got a
subtransactioncoredump as below: 


> So the root cause is the Parallel Workers process set the TransactionXmin with later transcation snapshot. When
parallelscan, Parallel Workers process use the older active snapshot. 
>
> It leads to subtrans assert coredump. I don't know how to fix it. Is there any ideas?

Do you have steps to reliably reproduce this?

Greetings,

Andres Freund

Attachment

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Tue, May 11, 2021 at 11:28 AM Pengchengliu <pengchengliu@tju.edu.cn> wrote:
>
> Hi Andres,
>    Reproduce steps.
>
> 1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file  "src/include/access/subtrans.h" line number
15.
> 2, configure with enable assert and build it.
> 3, init a new database cluster.
> 4, modify  postgres.conf  and add some parameters as below. As the coredump from parallel scan, so we adjust parallel
setting,make it easy to reproduce.
 
>
>   max_connections = 2000
>
>   parallel_setup_cost=0
>   parallel_tuple_cost=0
>   min_parallel_table_scan_size=0
>   max_parallel_workers_per_gather=8
>   max_parallel_workers = 32
>
> 5, start the database cluster.
> 6, use the script init_test.sql  in attachment to create tables.
> 7, use pgbench with script sub_120.sql in attachment to test it. Try it sometimes, you should get the coredump file.
>     pgbench  -d postgres -p 33550  -n -r -f sub_120.sql   -c 200 -j 200 -T 120
>

Hi,

I had a go at reproducing your reported issue, making sure to follow
all your steps.
Unfortunately, your script seemed to run OK with pgbench and no
crash/coredump occurred for me (and yes, I definitely had asserts
enabled).
I tried with both the 13.2 source code
(3fb4c75e857adee3da4386e947ba58a75f3e74b7), running through the script
with pgbench twice to completion, and also did the same using the
latest Postgres source code.

Will be interesting to see if anyone is able to reproduce your issue.

Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
"Pengchengliu"
Date:

Hi Andres,

  Thanks for you replay.

  And If you still cannot reproduce it in 2 minitus. Could you run pgbench longer time, such as 30 or 60 minutes.

 

  This coredump, It should be from parallel scan only.

  For normal scan(without parallel), SubTransGetTopmostTransaction assert(HeapTupleSatisfiesMVCC->XidInMVCCSnapshot->SubTransGetTopmostTransaction->Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin))),

  I think this Assert is correct. As single scan get transaction snapshot while set the TransactionXmin and Snapshot->xmin.

  In XidInMVCCSnapshot, it will check whether xid precedes snapshot->xmin first. If it is, XidInMVCCSnapshot will return false directly.

  So in XidInMVCCSnapshot->SubTransGetTopmostTransaction, xid cannot precede snapshot->xmin.

 

  But for parallel scan, it is different. I modify the code and use the sleep to replace the  SubTransGetTopmostTransaction Assert.

  Then we can check TransactionXmin and the snapshot from DSA.

  The stack is as below, when got the Assert error.

 

(gdb) bt

#0  0x0000149fb3d254bb in select () from /lib64/libc.so.6

#1  0x0000000000b1d3b3 in pg_usleep (microsec=1000000) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/port/pgsleep.c:56

#2  0x0000000000562a3b in SubTransGetTopmostTransaction (xid=799225) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/subtrans.c:164

#3  0x0000000000b04acb in XidInMVCCSnapshot (xid=799225, snapshot=0x2af2d00) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/utils/time/snapmgr.c:2293

#4  0x00000000004ff24c in HeapTupleSatisfiesMVCC (htup=0x7fffc1465f60, snapshot=0x2af2d00, buffer=109832)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam_visibility.c:1073

#5  0x00000000005002f3 in HeapTupleSatisfiesVisibility (tup=0x7fffc1465f60, snapshot=0x2af2d00, buffer=109832)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam_visibility.c:1695

#6  0x00000000004e41cb in heapgetpage (sscan=0x2af3118, page=10846) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam.c:447

#7  0x00000000004e684f in heapgettup_pagemode (scan=0x2af3118, dir=ForwardScanDirection, nkeys=0, key=0x0)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam.c:1077

#8  0x00000000004e6e46 in heap_getnextslot (sscan=0x2af3118, direction=ForwardScanDirection, slot=0x2affab0)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam.c:1333

#9  0x0000000000752e1a in table_scan_getnextslot (sscan=0x2af3118, direction=ForwardScanDirection, slot=0x2affab0)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/include/access/tableam.h:906

#10 0x0000000000752ee2 in SeqNext (node=0x2aff538) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/nodeSeqscan.c:80

#11 0x000000000071a848 in ExecScanFetch (node=0x2aff538, accessMtd=0x752e4e <SeqNext>, recheckMtd=0x752ef3 <SeqRecheck>)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execScan.c:133

#12 0x000000000071a8e9 in ExecScan (node=0x2aff538, accessMtd=0x752e4e <SeqNext>, recheckMtd=0x752ef3 <SeqRecheck>)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execScan.c:199

#13 0x0000000000752f3d in ExecSeqScan (pstate=0x2aff538) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/nodeSeqscan.c:112

#14 0x0000000000725794 in ExecProcNode (node=0x2aff538) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/include/executor/executor.h:248

#15 0x0000000000725c7f in fetch_input_tuple (aggstate=0x2afeff0) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/nodeAgg.c:589

#16 0x0000000000728f98 in agg_retrieve_direct (aggstate=0x2afeff0) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/nodeAgg.c:2463

#17 0x00000000007289f2 in ExecAgg (pstate=0x2afeff0) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/nodeAgg.c:2183

#18 0x0000000000716cbb in ExecProcNodeFirst (node=0x2afeff0) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execProcnode.c:450

#19 0x000000000070b103 in ExecProcNode (node=0x2afeff0) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/include/executor/executor.h:248

#20 0x000000000070dc0e in ExecutePlan (estate=0x2afeb30, planstate=0x2afeff0, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0,

    direction=ForwardScanDirection, dest=0x2ab0578, execute_once=true) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execMain.c:1632

#21 0x000000000070b72e in standard_ExecutorRun (queryDesc=0x2af2c68, direction=ForwardScanDirection, count=0, execute_once=true)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execMain.c:350

#22 0x000000000070b55c in ExecutorRun (queryDesc=0x2af2c68, direction=ForwardScanDirection, count=0, execute_once=true)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execMain.c:294

#23 0x0000000000712ae1 in ParallelQueryMain (seg=0x2a0a0c8, toc=0x149fb4dab000) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execParallel.c:1448

#24 0x000000000055f69c in ParallelWorkerMain (main_arg=1403863538) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c:1470

#25 0x000000000086db61 in StartBackgroundWorker () at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/bgworker.c:879

#26 0x0000000000881238 in do_start_bgworker (rw=0x2a351b0) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:5870

#27 0x00000000008815e4 in maybe_start_bgworkers () at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:6095

#28 0x0000000000880620 in sigusr1_handler (postgres_signal_arg=10) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:5255

#29 <signal handler called>

#30 0x0000149fb3d254bb in select () from /lib64/libc.so.6

#31 0x000000000087c173 in ServerLoop () at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:1703

#32 0x000000000087bb3e in PostmasterMain (argc=3, argv=0x2a08080) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:1412

#33 0x0000000000782d24 in main (argc=3, argv=0x2a08080) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/main/main.c:210

(gdb) f 24

#24 0x000000000055f69c in ParallelWorkerMain (main_arg=1403863538) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c:1470

1470            entrypt(seg, toc);

(gdb) p *ActiveSnapshot->as_snap  //active snapshot from main process

$18 = {snapshot_type = SNAPSHOT_MVCC, xmin = 799162, xmax = 822061, xip = 0x2ab0190, xcnt = 169, subxip = 0x0, subxcnt = 0, suboverflowed = true, takenDuringRecovery = false,

  copied = true, curcid = 119, speculativeToken = 2139062143, active_count = 1, regd_count = 2, ph_node = {first_child = 0x2af2d40, next_sibling = 0x0, prev_or_parent = 0x0},

  whenTaken = 0, lsn = 0}

(gdb) p *CurrentSnapshot  //transaction snapshot from main process

$19 = {snapshot_type = SNAPSHOT_MVCC, xmin = 799425, xmax = 822293, xip = 0x2ab1c00, xcnt = 172, subxip = 0x149f29302010, subxcnt = 0, suboverflowed = true,

  takenDuringRecovery = false, copied = false, curcid = 119, speculativeToken = 0, active_count = 0, regd_count = 0, ph_node = {first_child = 0x0, next_sibling = 0x0,

    prev_or_parent = 0x0}, whenTaken = 0, lsn = 0}

(gdb) f 4

#4  0x00000000004ff24c in HeapTupleSatisfiesMVCC (htup=0x7fffc1465f60, snapshot=0x2af2d00, buffer=109832)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam_visibility.c:1073

1073                            XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))

(gdb) p *snapshot  //ative snap shot from main process for scan

$20 = {snapshot_type = SNAPSHOT_MVCC, xmin = 799162, xmax = 822061, xip = 0x2af2d68, xcnt = 169, subxip = 0x0, subxcnt = 0, suboverflowed = true, takenDuringRecovery = false,

  copied = true, curcid = 119, speculativeToken = 2139062143, active_count = 0, regd_count = 1, ph_node = {first_child = 0x0, next_sibling = 0xf65ca0 <CatalogSnapshotData+64>,

    prev_or_parent = 0x2ab0168}, whenTaken = 0, lsn = 0}

(gdb) p TransactionXmin

$21 = 799425

(gdb) f 3

#3  0x0000000000b04acb in XidInMVCCSnapshot (xid=799225, snapshot=0x2af2d00) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/utils/time/snapmgr.c:2293

2293                            xid = SubTransGetTopmostTransaction(xid);

(gdb) p xid

$22 = 799225

 

The main process:

  1, Main process get the transaction snapshot(xmin 799162, xmax  822061) and push the active snapshot first. And set this active snapshot to QueryDesc in CreateQueryDesc.

  2, Main process collect active snapshot(xmin 799162, xmax  82206) and get the newer transaction snapshot(xmin 799425, xmax 822293). And use the parameter PARALLEL_KEY_TRANSACTION_SNAPSHOT, PARALLEL_KEY_ACTIVE_SNAPSHOT

     store transaction snapshot and active snapshot.

  3, Main process ExecGather->ExecInitParallelPlan->ExecParallelInitializeDSM->ExecSeqScanInitializeDSM->table_parallelscan_initialize, send active snapshot(xmin 799162, xmax  82206) with plan id to parallel work process.

  4, lauch parallel work process.

 

  The parallel work process:

  1, Get Snapshot and set TransactionXmin itself, in ParallelWorkerMain->BackgroundWorkerInitializeConnectionByOid->GetTransactionSnapshot->GetSnapshotData.

  2, Acooding PARALLEL_KEY_TRANSACTION_SNAPSHOT(xmin 799425, xmax 82229) from main process, and set TransactionXmin 799425 in ParallelWorkerMain->RestoreTransactionSnapshot->SetTransactionSnapshot->ProcArrayInstallRestoredXmin.

  3, ExecParallelInitializeWorker->ExecSeqScanInitializeWorker->table_beginscan_parallel get the active snapshot(xmin 799162, xmax  82206) from main process, and set this snapshot to scan->rs_base.rs_snapshot.

  4, parallel scan begin, with active snapshot(xmin 799162, xmax  82206) and TransactionXmin(799425),when scan tuple(xmin 799225) SubTransGetTopmostTransaction assert got.

     In  HeapTupleSatisfiesMVCC->XidInMVCCSnapshot->SubTransGetTopmostTransaction.

 

  As main process gets the active snapshot (xmin 799162, xmax 822061) which is earlier than transaction snapshot(xmin 799425, xmax 822293). Parallel work process set TransactionXmin with transaction snapshot(xmin 799425, xmax 822293).

  But scan tuple with active snapshot (xmin 799162, xmax 822061).

Thanks

Pengcheng

 

  

  

 

 

-----Original Message-----
From: Greg Nancarrow <gregn4422@gmail.com>
Sent: 2021511 19:08
To: Pengchengliu <pengchengliu@tju.edu.cn>
Cc: Andres Freund <andres@anarazel.de>; PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

 

On Tue, May 11, 2021 at 11:28 AM Pengchengliu <pengchengliu@tju.edu.cn> wrote:

> Hi Andres,

>    Reproduce steps.

> 1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file  "src/include/access/subtrans.h" line number 15.

> 2, configure with enable assert and build it.

> 3, init a new database cluster.

> 4, modify  postgres.conf  and add some parameters as below. As the coredump from parallel scan, so we adjust parallel setting, make it easy to reproduce.

>   max_connections = 2000

>   parallel_setup_cost=0

>   parallel_tuple_cost=0

>   min_parallel_table_scan_size=0

>   max_parallel_workers_per_gather=8

>   max_parallel_workers = 32

> 5, start the database cluster.

> 6, use the script init_test.sql  in attachment to create tables.

> 7, use pgbench with script sub_120.sql in attachment to test it. Try it sometimes, you should get the coredump file.

>     pgbench  -d postgres -p 33550  -n -r -f sub_120.sql   -c 200 -j 200 -T 120

 

Hi,

 

I had a go at reproducing your reported issue, making sure to follow all your steps.

Unfortunately, your script seemed to run OK with pgbench and no crash/coredump occurred for me (and yes, I definitely had asserts enabled).

I tried with both the 13.2 source code

(3fb4c75e857adee3da4386e947ba58a75f3e74b7), running through the script with pgbench twice to completion, and also did the same using the latest Postgres source code.

 

Will be interesting to see if anyone is able to reproduce your issue.

 

Regards,

Greg Nancarrow

Fujitsu Australia

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Thu, May 13, 2021 at 11:25 AM Pengchengliu <pengchengliu@tju.edu.cn> wrote:
>

> Hi Andres,
>  Thanks for you replay.

Er, it's Greg who has replied so far (not Andres).

>
>   And If you still cannot reproduce it in 2 minitus. Could you run pgbench longer time, such as 30 or 60 minutes.
>

Actually, I did run it, multiple times, for more than 60 minutes, but
no assert/crash/coredump occurred in my environment.

>
>   The parallel work process:
>
>   1, Get Snapshot and set TransactionXmin itself, in
ParallelWorkerMain->BackgroundWorkerInitializeConnectionByOid->GetTransactionSnapshot->GetSnapshotData.
>
>   2, Acooding PARALLEL_KEY_TRANSACTION_SNAPSHOT(xmin 799425, xmax 82229) from main process, and set TransactionXmin
799425in ParallelWorkerMain->RestoreTransactionSnapshot->SetTransactionSnapshot->ProcArrayInstallRestoredXmin.
 
>
>   3, ExecParallelInitializeWorker->ExecSeqScanInitializeWorker->table_beginscan_parallel get the active snapshot(xmin
799162,xmax  82206) from main process, and set this snapshot to scan->rs_base.rs_snapshot.
 
>
>   4, parallel scan begin, with active snapshot(xmin 799162, xmax  82206) and TransactionXmin(799425),when scan
tuple(xmin799225) SubTransGetTopmostTransaction assert got.
 
>
>      In  HeapTupleSatisfiesMVCC->XidInMVCCSnapshot->SubTransGetTopmostTransaction.
>

I added some logging at a couple of points in the code:
1) In the Worker process code - ParallelWorkerMain() - where it
restores the serialized transaction and active snapshots (i.e. passed
to the Worker from the main process).
2) In the HeapTupleSatisfiesMVCC() function, immediately before it
calls XidInMVCCSnapshot()

After running it for an hour, examination of the log showed that in
ALL cases, the transaction snapshot xmin,xmax was always THE SAME as
the active snapshot xmin,xmax.
(Can you verify that this occurs on your system when things are
working, prior to the coredump?)

This is different to what you are getting in your environment (at
least, different to what you described when the problem occurs).
In your case, you say that the main process gets "the newer
transaction snapshot" - where exactly is this happening in your case?
(or this is what you don't yet know?)
Perhaps very occasionally this somehow happens on your system and
triggers the Assert (and coredump)?  I have not been able to reproduce
that on my system.

Have you reproduced this issue on any other system, using the same
steps as you provided?
I'm wondering if there might be something else in your environment
that may be influencing this problem.


Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
"Pengchengliu"
Date:

Hi Greg,

  Thanks for you replay and test.

 

  When main process gets the transaction snapshot in InitializeParallelDSM->GetTransactionSnapshot,  the transaction snapshot xmin is very likely follows active snapshot xmin.

 

  Use gdb it is easy to verify it.

  Create env as blow:

 

  1, Use PG13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), init a cluster database.

  2, Append the postgres.conf as below:

 

  max_connections = 2000

  parallel_setup_cost=0

  parallel_tuple_cost=0

  min_parallel_table_scan_size=0

  max_parallel_workers_per_gather=8

  max_parallel_workers = 128

  

  3, Start the cluster database. Use the init_test.sql script in attachment to create some test tables.

  4, Use the sub_120.sql script in attachment with pgbench to test it.

 

  pgbench  -d postgres -p 33550  -n -r -f sub_120.sql   -c 200 -j 200 -T 1800

 

 

 

  Then you can login the database, and use GDB to verify it.

  1, First use explain, make sure force Parallel is OK.

 

  postgres=# explain (verbose,analyze) select count(*) from contend1;

                                                                     QUERY PLAN                                                     

                

-------------------------------------------------------------------------------------------------------------------------------------

----------------

Finalize Aggregate  (cost=12006.11..12006.12 rows=1 width=8) (actual time=1075.214..1075.449 rows=1 loops=1)

   Output: count(*)

   ->  Gather  (cost=12006.08..12006.09 rows=8 width=8) (actual time=1075.198..1075.433 rows=1 loops=1)

         Output: (PARTIAL count(*))

         Workers Planned: 8

         Workers Launched: 0

         ->  Partial Aggregate  (cost=12006.08..12006.09 rows=1 width=8) (actual time=1074.674..1074.676 rows=1 loops=1)

               Output: PARTIAL count(*)

               ->  Parallel Seq Scan on public.contend1  (cost=0.00..11690.06 rows=126406 width=0) (actual time=0.008..587.454 rows=1

010200 loops=1)

                     Output: id, val, c2, c3, c4, c5, c6, c7, c8, c9, c10, crt_time

Planning Time: 0.123 ms

Execution Time: 1075.588 ms

 postgres=# select pg_backend_pid();

pg_backend_pid

----------------

        2182678

                               

 

  2, use gdb to debug our backend process. Add the breakpoint in parallel.c:219 and continue.

 

  gdb  -q -p 2182678

  ...

  (gdb) b parallel.c:219

Breakpoint 1 at 0x55d085: file /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c, line 219.

  (gdb) c

Continuing.

 

  3, In the psql clinet, we can execute the explain command in step1 again.

     After we get the breakpoint in gdb, we wait a moment. Then we execute next.

     Use gdb check active_snapshot and transaction_snapshot, active_snapshot->xmin is 158987 and transaction_snapshot->xmin is 162160.

When I use gdb test it, sometimes active_snapshot is the same as transaction_snapshot. Then you can try it multiple times, and before execute next, try wait longer time.

  

   Breakpoint 1, InitializeParallelDSM (pcxt=0x2d53670)

    at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c:219

219             Snapshot        transaction_snapshot = GetTransactionSnapshot();

(gdb) n

220             Snapshot        active_snapshot = GetActiveSnapshot();

(gdb)

223             oldcontext = MemoryContextSwitchTo(TopTransactionContext);

(gdb) p *transaction_snapshot

$1 = {snapshot_type = SNAPSHOT_MVCC, xmin = 162160, xmax = 183011, xip = 0x2d50d10, xcnt = 179, subxip = 0x148a9cddf010,

  subxcnt = 0, suboverflowed = true, takenDuringRecovery = false, copied = false, curcid = 0, speculativeToken = 0,

  active_count = 0, regd_count = 0, ph_node = {first_child = 0x0, next_sibling = 0x0, prev_or_parent = 0x0}, whenTaken = 0, lsn = 0}

(gdb) p *active_snapshot

$2 = {snapshot_type = SNAPSHOT_MVCC, xmin = 158987, xmax = 173138, xip = 0x2d53288, xcnt = 178, subxip = 0x0, subxcnt = 0,

  suboverflowed = true, takenDuringRecovery = false, copied = true, curcid = 0, speculativeToken = 0, active_count = 1,

  regd_count = 2, ph_node = {first_child = 0x0, next_sibling = 0x0, prev_or_parent = 0x2d52e48}, whenTaken = 0, lsn = 0}

(gdb) 

 

Thanks

Pengcheng

 

  

 

-----Original Message-----
From: Greg Nancarrow <gregn4422@gmail.com>
Sent: 2021513 22:15
To: Pengchengliu <pengchengliu@tju.edu.cn>
Cc: Andres Freund <andres@anarazel.de>; PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

 

On Thu, May 13, 2021 at 11:25 AM Pengchengliu <pengchengliu@tju.edu.cn> wrote:

 

> Hi Andres,

>  Thanks for you replay.

 

Er, it's Greg who has replied so far (not Andres).

 

>   And If you still cannot reproduce it in 2 minitus. Could you run pgbench longer time, such as 30 or 60 minutes.

 

Actually, I did run it, multiple times, for more than 60 minutes, but no assert/crash/coredump occurred in my environment.

 

>   The parallel work process:

>   1, Get Snapshot and set TransactionXmin itself, in ParallelWorkerMain->BackgroundWorkerInitializeConnectionByOid->GetTransactionSnapshot->GetSnapshotData.

>   2, Acooding PARALLEL_KEY_TRANSACTION_SNAPSHOT(xmin 799425, xmax 82229) from main process, and set TransactionXmin 799425 in ParallelWorkerMain->RestoreTransactionSnapshot->SetTransactionSnapshot->ProcArrayInstallRestoredXmin.

>   3, ExecParallelInitializeWorker->ExecSeqScanInitializeWorker->table_beginscan_parallel get the active snapshot(xmin 799162, xmax  82206) from main process, and set this snapshot to scan->rs_base.rs_snapshot.

>   4, parallel scan begin, with active snapshot(xmin 799162, xmax  82206) and TransactionXmin(799425),when scan tuple(xmin 799225) SubTransGetTopmostTransaction assert got.

>      In  HeapTupleSatisfiesMVCC->XidInMVCCSnapshot->SubTransGetTopmostTransaction.

 

I added some logging at a couple of points in the code:

1) In the Worker process code - ParallelWorkerMain() - where it restores the serialized transaction and active snapshots (i.e. passed to the Worker from the main process).

2) In the HeapTupleSatisfiesMVCC() function, immediately before it calls XidInMVCCSnapshot()

 

After running it for an hour, examination of the log showed that in ALL cases, the transaction snapshot xmin,xmax was always THE SAME as the active snapshot xmin,xmax.

(Can you verify that this occurs on your system when things are working, prior to the coredump?)

 

This is different to what you are getting in your environment (at least, different to what you described when the problem occurs).

In your case, you say that the main process gets "the newer transaction snapshot" - where exactly is this happening in your case?

(or this is what you don't yet know?)

Perhaps very occasionally this somehow happens on your system and triggers the Assert (and coredump)?  I have not been able to reproduce that on my system.

 

Have you reproduced this issue on any other system, using the same steps as you provided?

I'm wondering if there might be something else in your environment that may be influencing this problem.

 

 

Regards,

Greg Nancarrow

Fujitsu Australia

Attachment

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Fri, May 14, 2021 at 12:25 PM Pengchengliu <pengchengliu@tju.edu.cn> wrote:
>
> Hi Greg,
>
>   Thanks for you replay and test.
>
>
>
>   When main process gets the transaction snapshot in InitializeParallelDSM->GetTransactionSnapshot,  the transaction
snapshotxmin is very likely follows active snapshot xmin.
 
>
>
>
>   Use gdb it is easy to verify it.
>
>   Create env as blow:
>
>
>
>   1, Use PG13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), init a cluster database.
>
>   2, Append the postgres.conf as below:
>
>
>
>   max_connections = 2000
>
>   parallel_setup_cost=0
>
>   parallel_tuple_cost=0
>
>   min_parallel_table_scan_size=0
>
>   max_parallel_workers_per_gather=8
>
>   max_parallel_workers = 128
>
>
>
>   3, Start the cluster database. Use the init_test.sql script in attachment to create some test tables.
>
>   4, Use the sub_120.sql script in attachment with pgbench to test it.
>
>
>
>   pgbench  -d postgres -p 33550  -n -r -f sub_120.sql   -c 200 -j 200 -T 1800
>
>
>
>
>
>
>
>   Then you can login the database, and use GDB to verify it.
>
>   1, First use explain, make sure force Parallel is OK.
>
>
>
>   postgres=# explain (verbose,analyze) select count(*) from contend1;
>
>                                                                      QUERY PLAN
>
>
>
>
-------------------------------------------------------------------------------------------------------------------------------------
>
> ----------------
>
> Finalize Aggregate  (cost=12006.11..12006.12 rows=1 width=8) (actual time=1075.214..1075.449 rows=1 loops=1)
>
>    Output: count(*)
>
>    ->  Gather  (cost=12006.08..12006.09 rows=8 width=8) (actual time=1075.198..1075.433 rows=1 loops=1)
>
>          Output: (PARTIAL count(*))
>
>          Workers Planned: 8
>
>          Workers Launched: 0
>
>          ->  Partial Aggregate  (cost=12006.08..12006.09 rows=1 width=8) (actual time=1074.674..1074.676 rows=1
loops=1)
>
>                Output: PARTIAL count(*)
>
>                ->  Parallel Seq Scan on public.contend1  (cost=0.00..11690.06 rows=126406 width=0) (actual
time=0.008..587.454rows=1
 
>
> 010200 loops=1)
>
>                      Output: id, val, c2, c3, c4, c5, c6, c7, c8, c9, c10, crt_time
>
> Planning Time: 0.123 ms
>
> Execution Time: 1075.588 ms
>
>  postgres=# select pg_backend_pid();
>
> pg_backend_pid
>
> ----------------
>
>         2182678
>
>
>
>
>
>   2, use gdb to debug our backend process. Add the breakpoint in parallel.c:219 and continue.
>
>
>
>   gdb  -q -p 2182678
>
>   ...
>
>   (gdb) b parallel.c:219
>
> Breakpoint 1 at 0x55d085: file
/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c,line 219.
 
>
>   (gdb) c
>
> Continuing.
>
>
>
>   3, In the psql clinet, we can execute the explain command in step1 again.
>
>      After we get the breakpoint in gdb, we wait a moment. Then we execute next.
>
>      Use gdb check active_snapshot and transaction_snapshot, active_snapshot->xmin is 158987 and
transaction_snapshot->xminis 162160.
 
>
> When I use gdb test it, sometimes active_snapshot is the same as transaction_snapshot. Then you can try it multiple
times,and before execute next, try wait longer time.
 
>
>
>
>    Breakpoint 1, InitializeParallelDSM (pcxt=0x2d53670)
>
>     at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c:219
>
> 219             Snapshot        transaction_snapshot = GetTransactionSnapshot();
>
> (gdb) n
>
> 220             Snapshot        active_snapshot = GetActiveSnapshot();
>
> (gdb)
>
> 223             oldcontext = MemoryContextSwitchTo(TopTransactionContext);
>
> (gdb) p *transaction_snapshot
>
> $1 = {snapshot_type = SNAPSHOT_MVCC, xmin = 162160, xmax = 183011, xip = 0x2d50d10, xcnt = 179, subxip =
0x148a9cddf010,
>
>   subxcnt = 0, suboverflowed = true, takenDuringRecovery = false, copied = false, curcid = 0, speculativeToken = 0,
>
>   active_count = 0, regd_count = 0, ph_node = {first_child = 0x0, next_sibling = 0x0, prev_or_parent = 0x0},
whenTaken= 0, lsn = 0}
 
>
> (gdb) p *active_snapshot
>
> $2 = {snapshot_type = SNAPSHOT_MVCC, xmin = 158987, xmax = 173138, xip = 0x2d53288, xcnt = 178, subxip = 0x0, subxcnt
=0,
 
>
>   suboverflowed = true, takenDuringRecovery = false, copied = true, curcid = 0, speculativeToken = 0, active_count =
1,
>
>   regd_count = 2, ph_node = {first_child = 0x0, next_sibling = 0x0, prev_or_parent = 0x2d52e48}, whenTaken = 0, lsn =
0}
>
> (gdb)
>
>

Hi Pengcheng,

I followed all your steps.
However, I perhaps get different behavior in my environment.
99% of the time, the xmin and xmax of the active_snapshot and
transaction_snapshot are the same (regardless of how long I wait at
different points after the breakpoint is hit). I've had one or two
instances where the xmax values differ. I managed to catch just one
case where there were different xmin and xmax values in the snapshots,
but this occurred just prior to the pgbench client completing and
terminating, and when I continued in the debugger, there was no
crash/coredump.

However, I think I've spotted something potentially important to this issue:
For me, almost always "suboverflowed = false" in the snapshots (except
in that one case, right at the end of the pgbench run), yet in your
gdb example "suboverflowed = true" in both of the snapshots (i.e. the
snapshot subxip array has overflowed). I'm guessing that this may be
related to the coredump issue, but I'm not exactly sure how it has
happened, and why it seemingly isn't being handled correctly and
causes that Assert to fire in your case.
Can you try and find out how the snapshot suboverflow is being set in
your case? (since you are getting this readily in your examined
snapshots???) I think there's only several places where it can be set
to "true" (e.g. procarray.c:1641).
Also, does increasing PGPROC_MAX_CACHED_SUBXIDS avoid, or delay, the
problem for you? It's currently defined as 64.
I notice that there's been some changes related to snapshot data
handling and subxid overflow since 13.2, so I'm wondering whether your
coredump issue can be reproduced with the latest code?

Regards,
Greg Nancarrow
Fujitsu Australia



RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
"Pengchengliu"
Date:

Hi Greg,

 

  When you get the different xmin between active snapshot and transaction snapshot, maybe there is no coredump.

  As maybe there is not tupe(xmin) between ActiveSnapshot->xmin and TransactionSnapshot->xmin which needs to be scaned in parallel process.

 

  There is no doubt, it is very likely that ActiveSnapshot->xmin precedes TransactionSnapshot->xmin.

 

  For this coredump, we must make sure parallel and snapshot overflow. If snapshot is not overflow, you cannot get the coredump.

  As coredump is from parallel scan in MVCC when snapshot is overflow.

 

  Did you use pgbench with the script sub_120.sql which I provide in attachment?

  As the default PGPROC_MAX_CACHED_SUBXIDS is 64. In script sub_120.sql, for one transaction, it will use 120 subtransactions which is much larger than 64.

  While getting the snapshot, it must be overflow. I really don't know why your snapshot is not overflow.

  Did you increase the number PGPROC_MAX_CACHED_SUBXIDS? Please don't change any codes, now we just use the origin codes in PG13.2.

  I have checked the codes in master branch, there is no change about this mechanism. This issue should still exist.

  

  

  Thanks

  Pengcheng

 

-----Original Message-----
From: Greg Nancarrow <gregn4422@gmail.com>
Sent: 2021514 16:47
To: Pengchengliu <pengchengliu@tju.edu.cn>
Cc: Andres Freund <andres@anarazel.de>; PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

 

On Fri, May 14, 2021 at 12:25 PM Pengchengliu <pengchengliu@tju.edu.cn> wrote:

> Hi Greg,

>   Thanks for you replay and test.

>   When main process gets the transaction snapshot in InitializeParallelDSM->GetTransactionSnapshot,  the transaction snapshot xmin is very likely follows active snapshot xmin.

>   Use gdb it is easy to verify it.

>   Create env as blow:

>   1, Use PG13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), init a cluster database.

>   2, Append the postgres.conf as below:

>   max_connections = 2000

>   parallel_setup_cost=0

>   parallel_tuple_cost=0

>   min_parallel_table_scan_size=0

>   max_parallel_workers_per_gather=8

>   max_parallel_workers = 128

>   3, Start the cluster database. Use the init_test.sql script in attachment to create some test tables.

>   4, Use the sub_120.sql script in attachment with pgbench to test it.

>   pgbench  -d postgres -p 33550  -n -r -f sub_120.sql   -c 200 -j 200 -T 1800

>   Then you can login the database, and use GDB to verify it.

>   1, First use explain, make sure force Parallel is OK.

>   postgres=# explain (verbose,analyze) select count(*) from contend1;

>                                                                     

> QUERY PLAN

> ----------------------------------------------------------------------

> ---------------------------------------------------------------

> ----------------

> Finalize Aggregate  (cost=12006.11..12006.12 rows=1 width=8) (actual

> time=1075.214..1075.449 rows=1 loops=1)

>    Output: count(*)

>    ->  Gather  (cost=12006.08..12006.09 rows=8 width=8) (actual

> time=1075.198..1075.433 rows=1 loops=1)

>          Output: (PARTIAL count(*))

>          Workers Planned: 8

>          Workers Launched: 0

>          ->  Partial Aggregate  (cost=12006.08..12006.09 rows=1

> width=8) (actual time=1074.674..1074.676 rows=1 loops=1)

>                Output: PARTIAL count(*)

>                ->  Parallel Seq Scan on public.contend1 

> (cost=0.00..11690.06 rows=126406 width=0) (actual time=0.008..587.454

> rows=1

> 010200 loops=1)

>                      Output: id, val, c2, c3, c4, c5, c6, c7, c8, c9,

> c10, crt_time

> Planning Time: 0.123 ms

> Execution Time: 1075.588 ms

>  postgres=# select pg_backend_pid();

> pg_backend_pid

> ----------------

>         2182678

>   2, use gdb to debug our backend process. Add the breakpoint in parallel.c:219 and continue.

>   gdb  -q -p 2182678

>   ...

>   (gdb) b parallel.c:219

> Breakpoint 1 at 0x55d085: file /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c, line 219.

>   (gdb) c

> Continuing.

>   3, In the psql clinet, we can execute the explain command in step1 again.

>      After we get the breakpoint in gdb, we wait a moment. Then we execute next.

>      Use gdb check active_snapshot and transaction_snapshot, active_snapshot->xmin is 158987 and transaction_snapshot->xmin is 162160.

> When I use gdb test it, sometimes active_snapshot is the same as transaction_snapshot. Then you can try it multiple times, and before execute next, try wait longer time.

>    Breakpoint 1, InitializeParallelDSM (pcxt=0x2d53670)

>     at

> /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/ac

> cess/transam/parallel.c:219

> 219             Snapshot        transaction_snapshot = GetTransactionSnapshot();

> (gdb) n

> 220             Snapshot        active_snapshot = GetActiveSnapshot();

> (gdb)

> 223             oldcontext = MemoryContextSwitchTo(TopTransactionContext);

> (gdb) p *transaction_snapshot

> $1 = {snapshot_type = SNAPSHOT_MVCC, xmin = 162160, xmax = 183011, xip

> = 0x2d50d10, xcnt = 179, subxip = 0x148a9cddf010,

>   subxcnt = 0, suboverflowed = true, takenDuringRecovery = false,

> copied = false, curcid = 0, speculativeToken = 0,

>   active_count = 0, regd_count = 0, ph_node = {first_child = 0x0,

> next_sibling = 0x0, prev_or_parent = 0x0}, whenTaken = 0, lsn = 0}

> (gdb) p *active_snapshot

> $2 = {snapshot_type = SNAPSHOT_MVCC, xmin = 158987, xmax = 173138, xip

> = 0x2d53288, xcnt = 178, subxip = 0x0, subxcnt = 0,

>   suboverflowed = true, takenDuringRecovery = false, copied = true,

> curcid = 0, speculativeToken = 0, active_count = 1,

>   regd_count = 2, ph_node = {first_child = 0x0, next_sibling = 0x0,

> prev_or_parent = 0x2d52e48}, whenTaken = 0, lsn = 0}

> (gdb)

 

Hi Pengcheng,

 

I followed all your steps.

However, I perhaps get different behavior in my environment.

99% of the time, the xmin and xmax of the active_snapshot and transaction_snapshot are the same (regardless of how long I wait at different points after the breakpoint is hit). I've had one or two instances where the xmax values differ. I managed to catch just one case where there were different xmin and xmax values in the snapshots, but this occurred just prior to the pgbench client completing and terminating, and when I continued in the debugger, there was no crash/coredump.

 

However, I think I've spotted something potentially important to this issue:

For me, almost always "suboverflowed = false" in the snapshots (except in that one case, right at the end of the pgbench run), yet in your gdb example "suboverflowed = true" in both of the snapshots (i.e. the snapshot subxip array has overflowed). I'm guessing that this may be related to the coredump issue, but I'm not exactly sure how it has happened, and why it seemingly isn't being handled correctly and causes that Assert to fire in your case.

Can you try and find out how the snapshot suboverflow is being set in your case? (since you are getting this readily in your examined

snapshots???) I think there's only several places where it can be set to "true" (e.g. procarray.c:1641).

Also, does increasing PGPROC_MAX_CACHED_SUBXIDS avoid, or delay, the problem for you? It's currently defined as 64.

I notice that there's been some changes related to snapshot data handling and subxid overflow since 13.2, so I'm wondering whether your coredump issue can be reproduced with the latest code?

 

Regards,

Greg Nancarrow

Fujitsu Australia

Attachment

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Fri, May 14, 2021 at 8:36 PM Pengchengliu <pengchengliu@tju.edu.cn> wrote:
>   Did you use pgbench with the script sub_120.sql which I provide in attachment?

yes

>
>   Did you increase the number PGPROC_MAX_CACHED_SUBXIDS? Please don't change any codes, now we just use the origin
codesin PG13.2.
 
>

No, I have made no source code changes at all.
That was my suggestion, for you to try - because if the problem is
avoided by increasing PGPROC_MAX_CACHED_SUBXIDS (to say 128) then it
probably indicates the overflow condition is affecting the xmin.xmax
of the two snapshots such that it invalidates the condition that is
asserted.


I think one problem is that in your settings, you haven't set
"max_worker_processes", yet have set "max_parallel_workers = 128".
I'm finding no more than 8 parallel workers are actually active at any one time.
On top of this, you've got pgbench running with 200 concurrent clients.
So many queries are actually executing parallel plans without using
parallel workers, as the workers can't actually be launched (and this
is probably why I'm finding it hard to reproduce the issue, if the
problem involves snapshot suboverflow and parallel workers).
I find that the following settings improve the parallelism per query
and the whole test runs very much faster:

max_connections = 2000
parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=4
max_parallel_workers = 100
max_worker_processes = 128

and adjust the pgbench command-line:   pgbench  -d postgres -p 33550
-n -r -f sub_120.sql   -c 25 -j 25 -T 1800

Problem is, I still get no coredump when using this.
Can you try these settings and let me know if the crash still happens
if you use these settings?

I also tried:

max_connections = 2000
parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=2
max_parallel_workers = 280
max_worker_processes = 300

and the pgbench command-line:   pgbench  -d postgres -p 33550  -n -r
-f sub_120.sql   -c 140 -j 140 -T 1800

- but I still get no coredump.

Regards,
Greg Nancarrow
Fujitsu Australia



Hi Greg,
  It is really weird. Could you make sure is the SnapShot overflow in you ENV? It is very impoint. 
  Abount SnapShot overflow and Subtrans, you can refer this https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/. 
 
  In the script sub_120.sql, for one transaction, we use 120 transcations. So this pgxact->overflowed will be set to true. 
  Then snapshot must be overflow. When MVCC, it will call SubTransGetTopmostTransaction. 
  So the snapshot overflow is requirement.
  
  Even though there is no coredump in you ENV, from the codes, we can find some clue.
 
  First, in main process , ActiveSnapshot xmin is very likely preceds TransactionSnapShot xmin.
  Second, in parallel work process, it sets TransactionXmin with TransactionSnapShot from main process. But table Scan with ative Snapshot from main process.
  So in parallel work process SubTransGetTopmostTransaction, the Assert TransactionIdFollowsOrEquals(xid, TransactionXmin) is not correct. 
  At least this assert is unsuitable for parallel work process. 
  For my analyze, if there is any incorrect, please corret me.
  
  BTW, I test it in a high performance server. It is verly easily be reproduced. My colleague and me use different environment both can reproduce it. 
  
  
  Thanks
  Pengcheng

RE: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
"Pengchengliu"
Date:
Hi Tom & Robert,
  Could you review this Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin)) in SubTransGetTopmostTransaction.
  I think this assert is unsuitable for parallel work process.

 Before we discuss it in
  https://www.postgresql-archive.org/Parallel-scan-with-SubTransGetTopmostTransaction-assert-coredump-td6197408.html

Thanks
Pengcheng

-----Original Message-----
From: Greg Nancarrow <gregn4422@gmail.com>
Sent: 2021年5月15日 0:44
To: Pengchengliu <pengchengliu@tju.edu.cn>
Cc: Andres Freund <andres@anarazel.de>; PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

On Fri, May 14, 2021 at 8:36 PM Pengchengliu <pengchengliu@tju.edu.cn> wrote:
>   Did you use pgbench with the script sub_120.sql which I provide in attachment?

yes

>
>   Did you increase the number PGPROC_MAX_CACHED_SUBXIDS? Please don't change any codes, now we just use the origin
codesin PG13.2. 
>

No, I have made no source code changes at all.
That was my suggestion, for you to try - because if the problem is avoided by increasing PGPROC_MAX_CACHED_SUBXIDS (to
say128) then it probably indicates the overflow condition is affecting the xmin.xmax of the two snapshots such that it
invalidatesthe condition that is asserted. 


I think one problem is that in your settings, you haven't set "max_worker_processes", yet have set
"max_parallel_workers= 128". 
I'm finding no more than 8 parallel workers are actually active at any one time.
On top of this, you've got pgbench running with 200 concurrent clients.
So many queries are actually executing parallel plans without using parallel workers, as the workers can't actually be
launched(and this is probably why I'm finding it hard to reproduce the issue, if the problem involves snapshot
suboverflowand parallel workers). 
I find that the following settings improve the parallelism per query and the whole test runs very much faster:

max_connections = 2000
parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=4
max_parallel_workers = 100
max_worker_processes = 128

and adjust the pgbench command-line:   pgbench  -d postgres -p 33550
-n -r -f sub_120.sql   -c 25 -j 25 -T 1800

Problem is, I still get no coredump when using this.
Can you try these settings and let me know if the crash still happens if you use these settings?

I also tried:

max_connections = 2000
parallel_setup_cost=0
parallel_tuple_cost=0
min_parallel_table_scan_size=0
max_parallel_workers_per_gather=2
max_parallel_workers = 280
max_worker_processes = 300

and the pgbench command-line:   pgbench  -d postgres -p 33550  -n -r
-f sub_120.sql   -c 140 -j 140 -T 1800

- but I still get no coredump.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Sat, May 15, 2021 at 12:37 PM 刘鹏程 <pengchengliu@tju.edu.cn> wrote:
>
>
>   BTW, I test it in a high performance server. It is verly easily be reproduced. My colleague and me use different
environmentboth can reproduce it. 
>

Hi Pengcheng,

Although the issue won't reproduce easily in my system, I can
certainly see how, for the snapshots used in the parallel worker case,
the Active snapshot used is potentially an earlier snapshot that the
Transaction snapshot. I don't know why it is getting a newer
Transaction snapshot in InitializeParallelDSM(), when it has
previously pushed the return value of GetTransactionSnapshot() as the
Active snapshot.

So I too hope Tom or Robert can explain what is going on here and how
to resolve it (as you requested them to, in your other post).

I actually think that the Assert in SubTransGetTopmostTransaction() is
correct, but in the parallel-worker case, the snapshots are not being
setup correctly.

Can you try the trivial change below and see if it prevents the coredump?

Regards,
Greg Nancarrow
Fujitsu Australia


diff --git a/src/backend/access/transam/parallel.c
b/src/backend/access/transam/parallel.c
index 14a8690019..870889053f 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -216,7 +216,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
  int i;
  FixedParallelState *fps;
  dsm_handle session_dsm_handle = DSM_HANDLE_INVALID;
- Snapshot transaction_snapshot = GetTransactionSnapshot();
+ Snapshot transaction_snapshot = GetActiveSnapshot();
  Snapshot active_snapshot = GetActiveSnapshot();

  /* We might be running in a very short-lived memory context. */



RE: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
"Pengchengliu"
Date:
Hi Greg,
  > I actually think that the Assert in SubTransGetTopmostTransaction() is correct, but in the parallel-worker case,
thesnapshots are not being setup correctly. 

   I agree with you that Assert in SubTransGetTopmostTransaction() is correct.  The root cause is Transaction Xmin are
notbeing setup correctly in the parallel-worker. 

   Actually I am very confused about ActiveSnapshot and TransactionSnapshot. I don't know why main process send
ActiveSnapshotand TransactionSnapshot separately.  And what is exact difference between them? 
   If you know that, could you explain that for me? It will be very appreciated.
   Before we know them exactly, I think we should not modify the TransactionSnapshot to ActiveSnapshot in main process.
Ifit is, the main process should send ActiveSnapshot only.   

Thanks
Pengcheng



-----Original Message-----
From: Greg Nancarrow <gregn4422@gmail.com>
Sent: 2021年5月17日 20:59
To: 刘鹏程 <pengchengliu@tju.edu.cn>
Cc: Andres Freund <andres@anarazel.de>; PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

On Sat, May 15, 2021 at 12:37 PM 刘鹏程 <pengchengliu@tju.edu.cn> wrote:
>
>
>   BTW, I test it in a high performance server. It is verly easily be reproduced. My colleague and me use different
environmentboth can reproduce it. 
>

Hi Pengcheng,

Although the issue won't reproduce easily in my system, I can certainly see how, for the snapshots used in the parallel
workercase, the Active snapshot used is potentially an earlier snapshot that the Transaction snapshot. I don't know why
itis getting a newer Transaction snapshot in InitializeParallelDSM(), when it has previously pushed the return value of
GetTransactionSnapshot()as the Active snapshot. 

So I too hope Tom or Robert can explain what is going on here and how to resolve it (as you requested them to, in your
otherpost). 

I actually think that the Assert in SubTransGetTopmostTransaction() is correct, but in the parallel-worker case, the
snapshotsare not being setup correctly. 

Can you try the trivial change below and see if it prevents the coredump?

Regards,
Greg Nancarrow
Fujitsu Australia


diff --git a/src/backend/access/transam/parallel.c
b/src/backend/access/transam/parallel.c
index 14a8690019..870889053f 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -216,7 +216,7 @@ InitializeParallelDSM(ParallelContext *pcxt)
  int i;
  FixedParallelState *fps;
  dsm_handle session_dsm_handle = DSM_HANDLE_INVALID;
- Snapshot transaction_snapshot = GetTransactionSnapshot();
+ Snapshot transaction_snapshot = GetActiveSnapshot();
  Snapshot active_snapshot = GetActiveSnapshot();

  /* We might be running in a very short-lived memory context. */



Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Pavel Borisov
Date:
I've also seen the reports of the same Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin)) with a subsequent crash in a parallel worker in PostgreSQL v11-based build, Though I was unable to investigate deeper and reproduce the issue. The details above in the thread make me think it is a real and long-time-persistent error that is surely worth to be fixed.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Tue, May 18, 2021 at 11:27 AM Pengchengliu <pengchengliu@tju.edu.cn> wrote:
>
> Hi Greg,
>
>    Actually I am very confused about ActiveSnapshot and TransactionSnapshot. I don't know why main process send
ActiveSnapshotand TransactionSnapshot separately.  And what is exact difference between them?
 
>    If you know that, could you explain that for me? It will be very appreciated.

In the context of a parallel-worker, I am a little confused too, so I
can't explain it either.
It is not really explained in the file
"src\backend\access\transam\README.parallel", it only mentions the
following as part of the state that needs to be copied to each worker:

 - The transaction snapshot.
 - The active snapshot, which might be different from the transaction snapshot.

So they might be different, but exactly when and why?

When I debugged a typical parallel-SELECT case, I found that prior to
plan execution, GetTransactionSnapshot() was called and its return
value was stored in both the QueryDesc and the estate (es_snapshot),
which was then pushed on the ActiveSnapshot stack. So by the time
InitializeParallelDSM() was called, the (top) ActiveSnapshot was the
last snapshot returned from GetTransactionSnapshot().
So why InitializeParallelDSM() calls GetTransactionSnapshot() again is
not clear to me (because isn't then the ActiveSnapshot a potentially
earlier snapshot? - which it shouldn't be, AFAIK. And also, it's then
different to the non-parallel case).

>    Before we know them exactly, I think we should not modify the TransactionSnapshot to ActiveSnapshot in main
process.If it is, the main process should send ActiveSnapshot only.
 

I think it would be worth you trying my suggested change (if you have
a development environment, which I assume you have). Sure, IF it was
deemed a proper solution, you'd only send the one snapshot, and adjust
accordingly in ParallelWorkerMain(), but we need not worry about that
in order to test it.


Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
"houzj.fnst@fujitsu.com"
Date:
> To: Pengchengliu <pengchengliu@tju.edu.cn>
> Cc: Greg Nancarrow <gregn4422@gmail.com>; Andres Freund <andres@anarazel.de>; PostgreSQL-development
<pgsql-hackers@postgresql.org>
> Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

> I've also seen the reports of the same Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin)) with a subsequent
crashin a parallel worker in PostgreSQL v11-based 
 
> build, Though I was unable to investigate deeper and reproduce the issue. The details above in the thread make me
thinkit is a real and long-time-persistent error that is 
 
> surely worth to be fixed.

I followed Liu's reproduce steps and successfully reproduce it in about half an hour running.
My compile option is : " ./configure --enable-cassert --prefix=/home/pgsql".

After applying greg-san's change, the coredump did not happened in two hour(it is still running).
Note, I have not taken a deep look into the change, just provide some test information in advance.

Best regards,
houzj

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Tue, May 18, 2021 at 9:41 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> > To: Pengchengliu <pengchengliu@tju.edu.cn>
> > Cc: Greg Nancarrow <gregn4422@gmail.com>; Andres Freund <andres@anarazel.de>; PostgreSQL-development
<pgsql-hackers@postgresql.org>
> > Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
>
> > I've also seen the reports of the same Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin)) with a subsequent
crashin a parallel worker in PostgreSQL v11-based
 
> > build, Though I was unable to investigate deeper and reproduce the issue. The details above in the thread make me
thinkit is a real and long-time-persistent error that is
 
> > surely worth to be fixed.
>
> I followed Liu's reproduce steps and successfully reproduce it in about half an hour running.
> My compile option is : " ./configure --enable-cassert --prefix=/home/pgsql".
>
> After applying greg-san's change, the coredump did not happened in two hour(it is still running).
> Note, I have not taken a deep look into the change, just provide some test information in advance.
>

+1
Thanks for doing that.
I'm unsure if that "fix" is the right approach, so please investigate it too.

Regards,
Greg Nancarrow
Fujitsu Australia



RE: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
"Pengchengliu"
Date:
Hi Greg,
   Thanks a lot for you explanation and your fix.

   I think your fix can resolve the core dump issue. As with your fix, parallel process reset Transaction Xmin from
ActiveSnapshot. 
   But it will change Transaction snapshot for all parallel  scenarios. I don't know whether it bring in other issue.
   For test only, I think it is enough.

   So is there anybody can explain what's exactly difference between ActiveSnapshot and TransactionSnapshot in parallel
workprocess.  
   Then maybe we can find a better solution and try to fix it really.

Thanks
Pengcheng

-----Original Message-----
From: Greg Nancarrow <gregn4422@gmail.com>
Sent: 2021年5月18日 17:15
To: Pengchengliu <pengchengliu@tju.edu.cn>
Cc: Andres Freund <andres@anarazel.de>; PostgreSQL-development <pgsql-hackers@postgresql.org>
Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

On Tue, May 18, 2021 at 11:27 AM Pengchengliu <pengchengliu@tju.edu.cn> wrote:
>
> Hi Greg,
>
>    Actually I am very confused about ActiveSnapshot and TransactionSnapshot. I don't know why main process send
ActiveSnapshotand TransactionSnapshot separately.  And what is exact difference between them? 
>    If you know that, could you explain that for me? It will be very appreciated.

In the context of a parallel-worker, I am a little confused too, so I can't explain it either.
It is not really explained in the file
"src\backend\access\transam\README.parallel", it only mentions the following as part of the state that needs to be
copiedto each worker: 

 - The transaction snapshot.
 - The active snapshot, which might be different from the transaction snapshot.

So they might be different, but exactly when and why?

When I debugged a typical parallel-SELECT case, I found that prior to plan execution, GetTransactionSnapshot() was
calledand its return value was stored in both the QueryDesc and the estate (es_snapshot), which was then pushed on the
ActiveSnapshotstack. So by the time 
InitializeParallelDSM() was called, the (top) ActiveSnapshot was the last snapshot returned from
GetTransactionSnapshot().
So why InitializeParallelDSM() calls GetTransactionSnapshot() again is not clear to me (because isn't then the
ActiveSnapshota potentially earlier snapshot? - which it shouldn't be, AFAIK. And also, it's then different to the
non-parallelcase). 

>    Before we know them exactly, I think we should not modify the TransactionSnapshot to ActiveSnapshot in main
process.If it is, the main process should send ActiveSnapshot only. 

I think it would be worth you trying my suggested change (if you have a development environment, which I assume you
have).Sure, IF it was deemed a proper solution, you'd only send the one snapshot, and adjust accordingly in
ParallelWorkerMain(),but we need not worry about that in order to test it. 


Regards,
Greg Nancarrow
Fujitsu Australia



Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Thu, May 20, 2021 at 11:18 AM Pengchengliu <pengchengliu@tju.edu.cn> wrote:
>
> Hi Greg,
>    Thanks a lot for you explanation and your fix.
>
>    I think your fix can resolve the core dump issue. As with your fix, parallel process reset Transaction Xmin from
ActiveSnapshot.
>    But it will change Transaction snapshot for all parallel  scenarios. I don't know whether it bring in other
issue.
>    For test only, I think it is enough.
>
>    So is there anybody can explain what's exactly difference between ActiveSnapshot and TransactionSnapshot in
parallelwork process.
 
>    Then maybe we can find a better solution and try to fix it really.
>

I am proposing the attached patch to fix this issue (patches for both
PG13.2 and latest PG14 source are provided).

Perhaps this will trigger others who better know the intricacies of
snapshot handling, and know the reasons and history behind why the
transaction_snapshot and active_snapshot are currently passed
separately to parallel workers, to speak up here and discuss the issue
further, and check my fix (and note, I haven't attempted to modify
README.parallel in the patch until I get further clarification on this
issue).
Perhaps someone can explain the purpose of calling
GetTransactionSnapshot() AGAIN in InitializeParallelDSM() and how is
this consistent with the current ActiveSnapshot?
AFAICS, that doesn't seem correct, and that's why the patch removes it.

I've rebuilt Postgres with the patch applied and run the regression
tests, with and without "force_parallel_mode=regress", and all tests
pass.
So if the fix is wrong, no tests currently exist that detect issues with it.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Thu, May 20, 2021 at 4:08 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>

Keep cfbot happy, use the PG14 patch as latest.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Michael Paquier
Date:
On Mon, May 24, 2021 at 12:04:37PM +1000, Greg Nancarrow wrote:
> Keep cfbot happy, use the PG14 patch as latest.

This stuff is usually very tricky.  Do we have a way to reliably
reproduce the report discussed here?
--
Michael

Attachment

Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Mon, May 24, 2021 at 2:50 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, May 24, 2021 at 12:04:37PM +1000, Greg Nancarrow wrote:
> > Keep cfbot happy, use the PG14 patch as latest.
>
> This stuff is usually very tricky.

Agreed. That's why I was looking for experts in this snapshot-handling
code, to look closer at this issue, check my proposed fix, come up
with a better solution etc.

>Do we have a way to reliably
> reproduce the report discussed here?

I couldn't reproduce it in my environment (though I could understand
what was going wrong, based on the description provided).
houzj (houzj.fnst@fujitsu.com) was able to reproduce it in his
environment and kindly provided to me the following information:
(He said that he followed most of the steps described by the original
problem reporter, Pengcheng, but perhaps steps 2 and 7 are a little
different from his steps. See the emails higher in the thread for the
two scripts "init_test.sql" and "sub_120.sql")

===

1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file
"src/include/access/subtrans.h" line number 15.
2, configure with enable assert and build it.( ./configure
--enable-cassert --prefix=/home/pgsql)
3, init a new database cluster.
4, modify  postgres.conf  and add some parameters as below. As the
coredump from parallel scan, so we adjust parallel setting, make it
easy to reproduce.

  max_connections = 2000

  parallel_setup_cost=0
  parallel_tuple_cost=0
  min_parallel_table_scan_size=0
  max_parallel_workers_per_gather=8
  max_parallel_workers = 32

5, start the database cluster.
6, use the script init_test.sql  in attachment to create tables.
7, use pgbench with script sub_120.sql in attachment to test it. Try
it sometimes, you should get the coredump file.
    pgbench  -d postgres -p 33550  -n -r -f sub_120.sql   -c 200 -j 200 -T 12000
   (If cannot reproduce it, maybe you can try run two parallel pgbench
xx at the same time)

In my environment(CentOS 8.2, 128G RAM, 40 processors, disk SAS
Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz),
sometimes I can reproduce in about 5 minutes , but sometimes it needs
about half an hour.

Best regards,
houzj

===

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Pavel Borisov
Date:


пн, 24 мая 2021 г. в 09:22, Greg Nancarrow <gregn4422@gmail.com>:
On Mon, May 24, 2021 at 2:50 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, May 24, 2021 at 12:04:37PM +1000, Greg Nancarrow wrote:
> > Keep cfbot happy, use the PG14 patch as latest.
>
> This stuff is usually very tricky.

Agreed. That's why I was looking for experts in this snapshot-handling
code, to look closer at this issue, check my proposed fix, come up
with a better solution etc.

>Do we have a way to reliably
> reproduce the report discussed here?

Using a recipe similar to what has been described above in the thread, I reliably reproduced the bug in many Postgres versions. (v.11, v.13 etc.).
1. Make & make install
2. Make check
3. run  SubTransGetTopmostTransaction-rep.sh in the Postgres source code dir.

The test fails with coredumps in around 10 minutes. With applied fix has never failed yet. 
(Though transaction snapshots kitchen is indeed tricky and I am not 100% sure the fix does right thing which is safe in all circumstances)

Attachment

Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Mon, May 24, 2021 at 11:56 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>
> Using a recipe similar to what has been described above in the thread, I reliably reproduced the bug in many Postgres
versions.(v.11, v.13 etc.).
 
> 1. Make & make install
> 2. Make check
> 3. run  SubTransGetTopmostTransaction-rep.sh in the Postgres source code dir.
>

Thanks, using the latest PG14 code
(713a431c781fbfe1a22fae4991836077f0f4c513), with no modifications, and
building with "--enable-cassert", your script reproduced the coredump
on my system in about 10 minutes, on the 1st run.

#0  0x00007f1542d1137f in raise () from /lib64/libc.so.6
#1  0x00007f1542cfbdb5 in abort () from /lib64/libc.so.6
#2  0x00000000009380cb in ExceptionalCondition ()
#3  0x00000000005aadbc in SubTransGetTopmostTransaction ()
#4  0x00000000009741ef in XidInMVCCSnapshot ()
#5  0x0000000000563326 in HeapTupleSatisfiesVisibility ()
#6  0x0000000000559d50 in heapgetpage ()
#7  0x000000000055a0e9 in heapgettup_pagemode ()
#8  0x000000000055b98e in heap_getnextslot ()
#9  0x00000000006df713 in SeqNext ()
#10 0x00000000006b725f in ExecScan ()
#11 0x00000000006bef41 in fetch_input_tuple ()
#12 0x00000000006c1f6f in ExecAgg ()
#13 0x00000000006ade83 in standard_ExecutorRun ()
#14 0x00000000006b21ed in ParallelQueryMain ()
#15 0x00000000005a8c11 in ParallelWorkerMain ()
#16 0x0000000000795f3c in StartBackgroundWorker ()
#17 0x00000000007a1bc1 in maybe_start_bgworkers ()
#18 0x00000000007a25c5 in sigusr1_handler ()
#19 <signal handler called>
#20 0x00007f1542dce25b in select () from /lib64/libc.so.6
#21 0x00000000007a2cf6 in ServerLoop ()
#22 0x00000000007a4659 in PostmasterMain ()
#23 0x00000000005042ed in main ()

(Using a debug-build, it seems much harder to reproduce on my system)

With my patch applied, I've had no crashes, from many runs.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Maxim Orlov
Date:
Just a note here. After examining the core dump I did notice something.

While in XidInMVCCSnapshot call the snapshot->suboverflowed is set true 
although subxip == NULL and subxcnt == 0. As far as I understand, 
snapshot->suboverflowed is set true in the GetRunningTransactionData 
call.

And then I decided to put elog around CurrentRunningXacts->subxcnt's 
assigment.
diff --git a/src/backend/storage/ipc/procarray.c 
b/src/backend/storage/ipc/procarray.c
index 42a89fc5dc9..3d2db02f580 100644
--- a/src/backend/storage/ipc/procarray.c
+++ b/src/backend/storage/ipc/procarray.c
@@ -2781,6 +2781,9 @@ GetRunningTransactionData(void)
         * increases if slots do.
         */

+       if (suboverflowed)
+               elog(WARNING, " >>> CurrentRunningXacts->subxid_overflow 
is true");
+
        CurrentRunningXacts->xcnt = count - subcount;
        CurrentRunningXacts->subxcnt = subcount;
        CurrentRunningXacts->subxid_overflow = suboverflowed;

... and did get a bunch of messages. I.e. subxid_overflow is set true 
very often.

I've increased the value of PGPROC_MAX_CACHED_SUBXIDS. Once it becomes 
more than 120 there are no messages and no failed assertions are 
provided any more.

---
Best regards,
Maxim Orlov.
Attachment

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Ranier Vilela
Date:

>Just a note here. After examining the core dump I did notice something.

>While in XidInMVCCSnapshot call the snapshot->suboverflowed is set true
>although subxip == NULL and subxcnt == 0. As far as I understand,
>snapshot->suboverflowed is set true in the GetRunningTransactionData
>call.

>And then I decided to put elog around CurrentRunningXacts->subxcnt's
>assigment.
>diff --git a/src/backend/storage/ipc/procarray.c
>b/src/backend/storage/ipc/procarray.c
>index 42a89fc5dc9..3d2db02f580 100644
>--- a/src/backend/storage/ipc/procarray.c
>+++ b/src/backend/storage/ipc/procarray.c
>@@ -2781,6 +2781,9 @@ GetRunningTransactionData(void)
>         * increases if slots do.
>         */

>+       if (suboverflowed)
>+               elog(WARNING, " >>> CurrentRunningXacts->subxid_overflow
>is true");
>+
>        CurrentRunningXacts->xcnt = count - subcount;
>        CurrentRunningXacts->subxcnt = subcount;
>        CurrentRunningXacts->subxid_overflow = suboverflowed;

>... and did get a bunch of messages. I.e. subxid_overflow is set true
>very often.

>I've increased the value of PGPROC_MAX_CACHED_SUBXIDS. Once it becomes
>more than 120 there are no messages and no failed assertions are
>provided any more.

Please, avoid using decimal based values.

128 is multiple of 64.

See :

https://github.com/trevstanhope/scratch/blob/master/C/docs/O%27Reilly%20-%20Practical%20C%20Programming%203rd%20Edition.pdf

15.6.1 The Power of Powers of 2

regards,

Ranier Vilela

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Pavel Borisov
Date:

Please, avoid using decimal based values.

128 is multiple of 64.

It's true that 128 is better to use than 120 but the main problem is not in the value but in the fact we never get 
CurrentRunningXacts->subxid_overflow = suboverflowed; with value more than 120. This solves the problem but it doesn't seem the right way to fix the issue. Instead it's better to process suboverflowed state which is legit itself  not resulting getting the crash on the Assert. So the discussion of "better" value doesn't seem related to the problem. It is for demonstration only.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Ranier Vilela
Date:
Em sex., 4 de jun. de 2021 às 12:07, Pavel Borisov <pashkin.elfe@gmail.com> escreveu:

Please, avoid using decimal based values.

128 is multiple of 64.

It's true that 128 is better to use than 120 but the main problem is not in the value but in the fact we never get 
CurrentRunningXacts->subxid_overflow = suboverflowed; with value more than 120. This solves the problem but it doesn't seem the right way to fix the issue.
It seems to me a solution too.

Instead it's better to process suboverflowed state which is legit itself  not resulting getting the crash on the Assert.
Of course it would be great to find the root of the problem.
 
So the discussion of "better" value doesn't seem related to the problem. It is for demonstration only.
IMHO, you could propose a patch, documenting this whole situation and proposing this workaround.
I've been studying commits, and on several occasions problems have been fixed like this.
But important is the documentation of the problem.

best regards,
Ranier Vilela

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Maxim Orlov
Date:
The analysis in the beginning of the discussion seems to be right, but 
the fix v2 looks too invasive for me.

Personally, I'd like not to remove snapshot even if transaction is 
read-only. I propose to consider "xid < TransactionXmin" as a legit case 
and just promote xid to TransactionXmin.

It's annoying this old bug still not fixed. What do you think?
---
Best regards,
Maxim Orlov.
Attachment

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Wed, Jun 23, 2021 at 1:06 AM Maxim Orlov <m.orlov@postgrespro.ru> wrote:
>
> The analysis in the beginning of the discussion seems to be right, but
> the fix v2 looks too invasive for me.
>
> Personally, I'd like not to remove snapshot even if transaction is
> read-only. I propose to consider "xid < TransactionXmin" as a legit case
> and just promote xid to TransactionXmin.
>
> It's annoying this old bug still not fixed. What do you think?


This v3 patch doesn't look right to me at all.
It's not addressing the fundamental problem, it just seems to be
working around it, by fiddling a xid value to avoid an assertion
failure.

I really can't see how the v2 patch "removes a snapshot" in the
read-only transaction case, and is "invasive".
Have you found a case that the v2 patch breaks?

The v2 patch does follow the analysis in the beginning of the
discussion, which identified that in setting up parallel workers, a
"later transaction snapshot" was taken than the one actually used in
the statement execution, and that's what eventually leads to the
observed Assertion failure.
The original problem reporter stated: "So the root cause is the
Parallel Workers process set the TransactionXmin with later
transcation snapshot".
Also, as far as I can see, it makes no sense to pass parallel workers
both an active snapshot and a (later) transaction snapshot. In the
leader, prior to execution and running parallel workers, a transaction
snapshot is obtained and pushed as the active snapshot (note: only ONE
snapshot at this point). It makes little sense to me to then obtain
ANOTHER transaction snapshot when setting up parallel workers, and
pass that (only) to the workers along with the original (earlier)
active snapshot. (If there is a reason for passing the TWO snapshots
to parallel workers, original code authors and/or snapshot experts
should speak up now, and code comments should be added accordingly to
explain the purpose and how it is MEANT to work)
This is why the v2 patch updates the code to just pass one snapshot
(the active snapshot) to the parallel workers, which gets restored in
each worker as the transaction snapshot and set as the active snapshot
(so is then consistent with the snapshot setup in the parallel
leader).
The v3 patch doesn't address any of this at all. It is just checking
if the xid is < the TransactionXMin in the snapshot, and if so,
setting the xid to be TransactionXMin, and thus avoiding the Assertion
failure. But that TransactionXMin was from the "later transaction
snapshot", which was only obtained for the workers and doesn't
necessarily match the earlier active snapshot taken, that is
associated with the actual statement execution. As far as I am
concerned, that "later transaction snapshot", taken for use by the
workers, doesn't seem right, so that's why the v2 patch removes the
use of it and only uses the active snapshot, for consistency with the
rest of the code. I think that if there was something fundamentally
wrong with the v2 patch's removal of that "later transaction snapshot"
used by the parallel workers, then surely there would be a test
failure somewhere - but that is not the case.
I think the v2 patch should be restored as the proposed solution here.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Pavel Borisov
Date:
Have you found a case that the v2 patch breaks?

The v2 patch does follow the analysis in the beginning of the
discussion, which identified that in setting up parallel workers, a
"later transaction snapshot" was taken than the one actually used in
the statement execution, and that's what eventually leads to the
observed Assertion failure.
The original problem reporter stated: "So the root cause is the
Parallel Workers process set the TransactionXmin with later
transcation snapshot".
Also, as far as I can see, it makes no sense to pass parallel workers
both an active snapshot and a (later) transaction snapshot. In the
leader, prior to execution and running parallel workers, a transaction
snapshot is obtained and pushed as the active snapshot (note: only ONE
snapshot at this point). It makes little sense to me to then obtain
ANOTHER transaction snapshot when setting up parallel workers, and
pass that (only) to the workers along with the original (earlier)
active snapshot. (If there is a reason for passing the TWO snapshots
to parallel workers, original code authors and/or snapshot experts
should speak up now, and code comments should be added accordingly to
explain the purpose and how it is MEANT to work)
This is why the v2 patch updates the code to just pass one snapshot
(the active snapshot) to the parallel workers, which gets restored in
each worker as the transaction snapshot and set as the active snapshot
(so is then consistent with the snapshot setup in the parallel
leader).

I fully agree that _probably_ two snapshots are not needed for read-only transactions. But I think we need to be very careful with everything related to transaction integrity as there are different isolation levels etc. The only fact I can not break something, seems like not an argument solid enough. 

Robert could you give us some help and insight into why there is a need for the separate active and transaction snapshots in the parallel infrastructure patch as per your commit 924bcf4f16d ?

Until then just a workaround like v3 patch seems safer for me (still I am not a very big specialist in this). 

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Pavel Borisov
Date:
By the way in the initial discussion on parallel infrastructure https://www.postgresql.org/message-id/20150208002027.GH9201%40alap3.anarazel.de I've seen a proposal to set the worker's PGXACT->xmin
to be the minimum of the top transaction id and the snapshots.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Tomas Vondra
Date:
Hi,

I took a quick look on this - I'm no expert in the details of snapshots,
so take my comments with a grain of salt.

AFAICS both Greg Nancarrow and Pavel Borisov are kinda right. I think
Greg is right the v3 patch does not seem like the right (or correct)
solution, for a couple reasons:


1) It fixes the symptoms, not the cause. If we set TransactionXmin to a
bogus value, this only fixes it locally in SubTransGetTopmostTransaction
but I'd be surprised if other places did not have the same issue.


2) The xid/TransactionXmin comparison in the v2 fix:

  xid = xid > TransactionXmin ? xid : TransactionXmin;

seems broken, because it compares the XIDs directly, but that's not
going to work after generating enough transactions.


3) But even if this uses TransactionIdFollowsOrEquals, it seems very
strange because the "xid" comes from

  XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))

i.e. it's the raw xmin from the tuple, so why should we be setting it to
TransactionXmin? That seems pretty strange, TBH.


So yeah, I think this is due to confusion with two snapshots and failing
to consider both of them when calculating TransactionXmin.

But I think removing one of the snapshots (as the v2 does it) is rather
strange too. I very much doubt having both the transaction and active
snapshots in the parallel worker is not intentional, and Pavel may very
well be right this breaks isolation levels that use the xact snapshot
(i.e. REPEATABLE READ and SERIALIZABLE). I haven't checked, though.

So I think we need to keep both snapshots, but fix TransactionXmin to
consider both of them - I suppose ParallelWorkerMain could simply look
at the two snapshots, and use the minimum. Which is what [1] (already
linked by Pavel) talks about, although that's related to concerns about
one of the processes dying, which is not what's happening here.


I'm wondering what consequences this may have on production systems,
though. We've only seen this failing because of the assert, so what
happens when the build has asserts disabled?

Looking at SubTransGetTopmostTransaction, it seems the while loop ends
immediately (because it's pretty much the opposite of the assert), so we
just return the "xid" as topmost XID. But will that produce incorrect
query results, or what?



regards


[1]
https://www.postgresql.org/message-id/20150208002027.GH9201%40alap3.anarazel.de


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Maxim Orlov
Date:
On 2021-07-09 20:36, Tomas Vondra wrote:
> Hi,
> 
> I took a quick look on this - I'm no expert in the details of 
> snapshots,
> so take my comments with a grain of salt.
> 
> AFAICS both Greg Nancarrow and Pavel Borisov are kinda right. I think
> Greg is right the v3 patch does not seem like the right (or correct)
> solution, for a couple reasons:
> 
> 
> 1) It fixes the symptoms, not the cause. If we set TransactionXmin to a
> bogus value, this only fixes it locally in 
> SubTransGetTopmostTransaction
> but I'd be surprised if other places did not have the same issue.
> 
> 
> 2) The xid/TransactionXmin comparison in the v2 fix:
> 
>   xid = xid > TransactionXmin ? xid : TransactionXmin;
> 
> seems broken, because it compares the XIDs directly, but that's not
> going to work after generating enough transactions.
> 
> 
> 3) But even if this uses TransactionIdFollowsOrEquals, it seems very
> strange because the "xid" comes from
> 
>   XidInMVCCSnapshot(HeapTupleHeaderGetRawXmin(tuple), snapshot))
> 
> i.e. it's the raw xmin from the tuple, so why should we be setting it 
> to
> TransactionXmin? That seems pretty strange, TBH.
> 
> 
> So yeah, I think this is due to confusion with two snapshots and 
> failing
> to consider both of them when calculating TransactionXmin.
> 
> But I think removing one of the snapshots (as the v2 does it) is rather
> strange too. I very much doubt having both the transaction and active
> snapshots in the parallel worker is not intentional, and Pavel may very
> well be right this breaks isolation levels that use the xact snapshot
> (i.e. REPEATABLE READ and SERIALIZABLE). I haven't checked, though.
> 
> So I think we need to keep both snapshots, but fix TransactionXmin to
> consider both of them - I suppose ParallelWorkerMain could simply look
> at the two snapshots, and use the minimum. Which is what [1] (already
> linked by Pavel) talks about, although that's related to concerns about
> one of the processes dying, which is not what's happening here.
> 
> 
> I'm wondering what consequences this may have on production systems,
> though. We've only seen this failing because of the assert, so what
> happens when the build has asserts disabled?
> 
> Looking at SubTransGetTopmostTransaction, it seems the while loop ends
> immediately (because it's pretty much the opposite of the assert), so 
> we
> just return the "xid" as topmost XID. But will that produce incorrect
> query results, or what?
> 
> 
> 
> regards
> 
> 
> [1]
> https://www.postgresql.org/message-id/20150208002027.GH9201%40alap3.anarazel.de

PFA v4 patch based on the ideas above.

In principle I see little difference with v3. But I agree it is more 
correct.

I did test this patch on Linux and MacOS using testing algo above and 
got no error. On master branch before the patch I still see this error.

---
Best regards,
Maxim Orlov.
Attachment

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Pavel Borisov
Date:
> So yeah, I think this is due to confusion with two snapshots and
> failing
> to consider both of them when calculating TransactionXmin.
>
> But I think removing one of the snapshots (as the v2 does it) is rather
> strange too. I very much doubt having both the transaction and active
> snapshots in the parallel worker is not intentional, and Pavel may very
> well be right this breaks isolation levels that use the xact snapshot
> (i.e. REPEATABLE READ and SERIALIZABLE). I haven't checked, though.
>
> So I think we need to keep both snapshots, but fix TransactionXmin to
> consider both of them - I suppose ParallelWorkerMain could simply look
> at the two snapshots, and use the minimum. Which is what [1] (already
> linked by Pavel) talks about, although that's related to concerns about
> one of the processes dying, which is not what's happening here.
>
>
> I'm wondering what consequences this may have on production systems,
> though. We've only seen this failing because of the assert, so what
> happens when the build has asserts disabled? 
> Looking at SubTransGetTopmostTransaction, it seems the while loop ends
> immediately (because it's pretty much the opposite of the assert), so
> we
> just return the "xid" as topmost XID. But will that produce incorrect
> query results, or what?
I haven't seen anything incorrect on production systems with asserts turned off, by I should note this assertion is not immediately reproduced so it is not that easy to catch the possible logical inconsequences of parallel scan results. As I've noticed that subxids cache is used in most cases and even this assertion is also rare in parallel scans. Maybe that is why we lived with this bug unnoticed so long.

/www.postgresql.org/message-id/20150208002027.GH9201%40alap3.anarazel.de

PFA v4 patch based on the ideas above.

In principle I see little difference with v3. But I agree it is more
correct.

I did test this patch on Linux and MacOS using testing algo above and
got no error. On master branch before the patch I still see this error.

I've tested the patch and the error doesn't appear anymore.

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Sat, Jul 10, 2021 at 3:36 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> But I think removing one of the snapshots (as the v2 does it) is rather
> strange too. I very much doubt having both the transaction and active
> snapshots in the parallel worker is not intentional, and Pavel may very
> well be right this breaks isolation levels that use the xact snapshot
> (i.e. REPEATABLE READ and SERIALIZABLE). I haven't checked, though.
>

Unfortunately there is currently no test, code-comment, README or
developer-discussion that definitively determines which approach (v2
vs  v3/v4) is a valid fix for this issue.
We don't know if having both the transaction and active snapshots in a
parallel worker is intentional or not, and if so, why so?
(certainly in the non-parallel case of the same statement execution,
there is only one snapshot in question here - the obtained transaction
snapshot is pushed as the active snapshot, as it is done in 95% of
cases in the code)
It seems that only the original code authors know how the snapshot
handling in parallel-workers is MEANT to work, and they have yet to
speak up about it here.
At this point, we can only all agree that there is a problem to be fixed here.

My concern with the v3/v4 patch approach is that because the
parallel-workers use a later snapshot to what is actually used in the
execution context for the statement in the parallel leader, then it is
possible for the parallel leader and parallel workers to have
different transaction visibility, and surely this cannot be correct.
For example, suppose a transaction that deletes a row, completes in
the window between these two snapshots.
Couldn't the row be visible to the parallel workers but not to the
parallel leader?
My guess is that currently there are not enough
concurrent-transactions tests to expose such a problem, and the window
here is fairly small.

So we can fiddle xmin values to avoid the immediate Assert issue here,
but it's not addressing potential xmax-related issues.


Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Robert Haas
Date:
On Wed, Jul 14, 2021 at 10:34 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> Unfortunately there is currently no test, code-comment, README or
> developer-discussion that definitively determines which approach (v2
> vs  v3/v4) is a valid fix for this issue.
> We don't know if having both the transaction and active snapshots in a
> parallel worker is intentional or not, and if so, why so?
> (certainly in the non-parallel case of the same statement execution,
> there is only one snapshot in question here - the obtained transaction
> snapshot is pushed as the active snapshot, as it is done in 95% of
> cases in the code)
> It seems that only the original code authors know how the snapshot
> handling in parallel-workers is MEANT to work, and they have yet to
> speak up about it here.
> At this point, we can only all agree that there is a problem to be fixed here.

Hi.

Thanks to Thomas Munro for drawing my attention to this thread. I
wasn't intentionally ignoring it, but there's a lot of email in the
world and only so much time.

When I wrote this code originally, the idea that I had in mind was
simply this: whatever state we have in the leader ought to be
reproduced in each worker. So if there's an active snapshot in the
leader, we ought to make that active in all the workers, and if
there's a transaction snapshot in the leader, we ought to make that
the transaction snapshot in all of the workers.

But I see now that my thinking was fuzzy, and I'm going to blame that
on the name GetTransactionSnapshot() being slightly misleading. If
IsolationUsesXactSnapshot() is true, then there's really such a thing
as a transaction snapshot and reproducing that in the worker is a
sensible thing to do. But when !IsolationUsesXactSnapshot(),
GetTransactionSnapshot() doesn't just "get the transaction snapshot",
because there isn't any such thing. It takes a whole new snapshot, on
the theory that you wouldn't be calling this function unless you had
finished up with the snapshot you got the last time you called this
function. And in the case of initiating parallel query, that is the
wrong thing.

I think that, at least in the case where IsolationUsesXactSnapshot()
is true, we need to make sure that calling GetTransactionSnapshot() in
a worker produces the same result that it would have produced in the
leader. Say one of the workers calls an sql or plpgsql function and
that function runs a bunch of SQL statements. It seems to me that
there's probably a way for this to result in calls inside the worker
to GetTransactionSnapshot(), and if that doesn't return the same
snapshot as in the leader, then we've broken MVCC.

What about when IsolationUsesXactSnapshot() is false? Perhaps it's OK
to just skip this altogether in that case. Certainly what we're doing
can't be right, because copying a snapshot that wouldn't have been
taken without parallel query can't ever be the right thing to do.
Perhaps we need to copy something else instead. I'm not really sure.

So I think v2 is probably on the right track, but wrong when the
transaction isolation level is REPEATABLE READ or SERIALIZABLE, and v3
and v4 just seem like unprincipled hacks that try to avoid the
assertion failure by lying about whether there's a problem.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Thu, Jul 22, 2021 at 2:15 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Thanks to Thomas Munro for drawing my attention to this thread. I
> wasn't intentionally ignoring it, but there's a lot of email in the
> world and only so much time.
>
> When I wrote this code originally, the idea that I had in mind was
> simply this: whatever state we have in the leader ought to be
> reproduced in each worker. So if there's an active snapshot in the
> leader, we ought to make that active in all the workers, and if
> there's a transaction snapshot in the leader, we ought to make that
> the transaction snapshot in all of the workers.
>
> But I see now that my thinking was fuzzy, and I'm going to blame that
> on the name GetTransactionSnapshot() being slightly misleading. If
> IsolationUsesXactSnapshot() is true, then there's really such a thing
> as a transaction snapshot and reproducing that in the worker is a
> sensible thing to do. But when !IsolationUsesXactSnapshot(),
> GetTransactionSnapshot() doesn't just "get the transaction snapshot",
> because there isn't any such thing. It takes a whole new snapshot, on
> the theory that you wouldn't be calling this function unless you had
> finished up with the snapshot you got the last time you called this
> function. And in the case of initiating parallel query, that is the
> wrong thing.
>
> I think that, at least in the case where IsolationUsesXactSnapshot()
> is true, we need to make sure that calling GetTransactionSnapshot() in
> a worker produces the same result that it would have produced in the
> leader. Say one of the workers calls an sql or plpgsql function and
> that function runs a bunch of SQL statements. It seems to me that
> there's probably a way for this to result in calls inside the worker
> to GetTransactionSnapshot(), and if that doesn't return the same
> snapshot as in the leader, then we've broken MVCC.
>
> What about when IsolationUsesXactSnapshot() is false? Perhaps it's OK
> to just skip this altogether in that case. Certainly what we're doing
> can't be right, because copying a snapshot that wouldn't have been
> taken without parallel query can't ever be the right thing to do.
> Perhaps we need to copy something else instead. I'm not really sure.
>
> So I think v2 is probably on the right track, but wrong when the
> transaction isolation level is REPEATABLE READ or SERIALIZABLE, and v3
> and v4 just seem like unprincipled hacks that try to avoid the
> assertion failure by lying about whether there's a problem.
>

Many thanks for taking time to respond to this (and thanks to Thomas Munro too).
It's much appreciated, as this is a complex area.
For the time being, I'll reinstate the v2 patch (say as v5) as a
partial solution, and then work on addressing the REPEATABLE READ and
SERIALIZABLE transaction isolation levels, which you point out are not
handled correctly by the patch.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Pavel Borisov
Date:
пт, 23 июл. 2021 г. в 10:00, Greg Nancarrow <gregn4422@gmail.com>:
On Thu, Jul 22, 2021 at 2:15 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Thanks to Thomas Munro for drawing my attention to this thread. I
> wasn't intentionally ignoring it, but there's a lot of email in the
> world and only so much time.
>
> When I wrote this code originally, the idea that I had in mind was
> simply this: whatever state we have in the leader ought to be
> reproduced in each worker. So if there's an active snapshot in the
> leader, we ought to make that active in all the workers, and if
> there's a transaction snapshot in the leader, we ought to make that
> the transaction snapshot in all of the workers.
>
> But I see now that my thinking was fuzzy, and I'm going to blame that
> on the name GetTransactionSnapshot() being slightly misleading. If
> IsolationUsesXactSnapshot() is true, then there's really such a thing
> as a transaction snapshot and reproducing that in the worker is a
> sensible thing to do. But when !IsolationUsesXactSnapshot(),
> GetTransactionSnapshot() doesn't just "get the transaction snapshot",
> because there isn't any such thing. It takes a whole new snapshot, on
> the theory that you wouldn't be calling this function unless you had
> finished up with the snapshot you got the last time you called this
> function. And in the case of initiating parallel query, that is the
> wrong thing.
>
> I think that, at least in the case where IsolationUsesXactSnapshot()
> is true, we need to make sure that calling GetTransactionSnapshot() in
> a worker produces the same result that it would have produced in the
> leader. Say one of the workers calls an sql or plpgsql function and
> that function runs a bunch of SQL statements. It seems to me that
> there's probably a way for this to result in calls inside the worker
> to GetTransactionSnapshot(), and if that doesn't return the same
> snapshot as in the leader, then we've broken MVCC.
>
> What about when IsolationUsesXactSnapshot() is false? Perhaps it's OK
> to just skip this altogether in that case. Certainly what we're doing
> can't be right, because copying a snapshot that wouldn't have been
> taken without parallel query can't ever be the right thing to do.
> Perhaps we need to copy something else instead. I'm not really sure.
>
> So I think v2 is probably on the right track, but wrong when the
> transaction isolation level is REPEATABLE READ or SERIALIZABLE, and v3
> and v4 just seem like unprincipled hacks that try to avoid the
> assertion failure by lying about whether there's a problem.
>

Many thanks for taking time to respond to this (and thanks to Thomas Munro too).
It's much appreciated, as this is a complex area.
For the time being, I'll reinstate the v2 patch (say as v5) as a
partial solution, and then work on addressing the REPEATABLE READ and
SERIALIZABLE transaction isolation levels, which you point out are not
handled correctly by the patch.
`
I've looked at v5 patch. It is completely similar to v2 patch, which I've already tested using the workflow, I've described in the comments above.  Before the patch I get the errors quite soon, the patch corrects them. Furthermore I've tested the same patch under REPEATABLE READ and SERIALIZABLE and detected no flaws. So, now, when we've got Robert's explanation, it seems to me that v2 (aka v5) patch can be committed (leaving possible REPEATABLE READ and SERIALIZABLE improvements for the future). I don't really sure it is still possible on 07/21 СF, but I'd change status of the patch to the ready-for-committer. Also I'd like the bugfix to be backported to the previous PG versions. 

Further consideration on the patch and on the backporting is very much welcome!

--
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Robert Haas
Date:
On Tue, Aug 3, 2021 at 9:59 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> I've looked at v5 patch. It is completely similar to v2 patch, which I've already tested using the workflow, I've
describedin the comments above.  Before the patch I get the errors quite soon, the patch corrects them. Furthermore
I'vetested the same patch under REPEATABLE READ and SERIALIZABLE and detected no flaws. So, now, when we've got
Robert'sexplanation, it seems to me that v2 (aka v5) patch can be committed (leaving possible REPEATABLE READ and
SERIALIZABLEimprovements for the future). I don't really sure it is still possible on 07/21 СF, but I'd change status
ofthe patch to the ready-for-committer. Also I'd like the bugfix to be backported to the previous PG versions. 

I agree that the fix should be back-ported, but I'm not keen to commit
anything unless it works for all isolation levels.

The idea I sort of had floating around in my mind is a little
different than what Greg has implemented. I was thinking that we could
just skip SerializeSnapshot and the corresponding shm_toc_allocate()
if !IsolationUsesXactSnapshot(). Then on the restore side we could
just call shm_toc_lookup() with noError = true and skip
RestoreTransactionSnapshot/RestoreSnapshot if it returns NULL.

I don't know why Greg's patch is changing anything related to the
active snapshot (as opposed to the transaction snapshot). Maybe
there's a reason why we need that change, but I don't know what it is.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Wed, Aug 4, 2021 at 3:21 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
>The idea I sort of had floating around in my mind is a little
>different than what Greg has implemented. I was thinking that we could
>just skip SerializeSnapshot and the corresponding shm_toc_allocate()
>if !IsolationUsesXactSnapshot(). Then on the restore side we could
>just call shm_toc_lookup() with noError = true and skip
>RestoreTransactionSnapshot/RestoreSnapshot if it returns NULL.

I've tried to follow your description and have attached a patch to
hopefully match it, but it doesn't pass "make check-world".
Perhaps I messed something up (apologies if so), or additional changes
are needed to match what you had in mind or correct additional issues
you didn't foresee?

t/001_pgbench_with_server.pl .. 10/?
#   Failed test 'pgbench scale 1 initialization status (got 1 vs expected 0)'
#   at t/001_pgbench_with_server.pl line 108.
...
# creating primary keys...
# pgbench: fatal: query failed: ERROR:  cannot take query snapshot
during a parallel operation
# CONTEXT:  parallel worker
# pgbench: query was: alter table pgbench_accounts add primary key (aid)


>I don't know why Greg's patch is changing anything related to the
>active snapshot (as opposed to the transaction snapshot). Maybe
>there's a reason why we need that change, but I don't know what it is.

I don't think my v2/v5 patch is changing anything related to the
active snapshot (is it?).
It's restoring the serialized active snapshot, installing it as the
transaction snapshot and setting it as the active snapshot.
The patch removes the additionally-acquired transaction snapshot in
InitializeParallelDSM (which seems like a later snapshot to that which
is actually being used in the execution state for the statement, back
in the leader, if I recall correctly). Basically, in the parallel
workers, I've tried to match the snapshot setup to that used in the
leader.
If the v2/v5 patch doesn't work correctly for some isolation level,
I've yet to find it (but can't absolutely rule out there's some case
not accounted for).

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Pavel Borisov
Date:


ср, 4 авг. 2021 г. в 07:41, Greg Nancarrow <gregn4422@gmail.com>:
On Wed, Aug 4, 2021 at 3:21 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
>The idea I sort of had floating around in my mind is a little
>different than what Greg has implemented. I was thinking that we could
>just skip SerializeSnapshot and the corresponding shm_toc_allocate()
>if !IsolationUsesXactSnapshot(). Then on the restore side we could
>just call shm_toc_lookup() with noError = true and skip
>RestoreTransactionSnapshot/RestoreSnapshot if it returns NULL.

I've tried to follow your description and have attached a patch to
hopefully match it, but it doesn't pass "make check-world".
Perhaps I messed something up (apologies if so), or additional changes
are needed to match what you had in mind or correct additional issues
you didn't foresee?

t/001_pgbench_with_server.pl .. 10/?
#   Failed test 'pgbench scale 1 initialization status (got 1 vs expected 0)'
#   at t/001_pgbench_with_server.pl line 108.
...
# creating primary keys...
# pgbench: fatal: query failed: ERROR:  cannot take query snapshot
during a parallel operation
# CONTEXT:  parallel worker
# pgbench: query was: alter table pgbench_accounts add primary key (aid)

Greg, thanks for the fast response! I suppose that a check for IsolationUsesXactSnapshot() is also useful in a GetTransactionSnapshot for the correct processing of a case with NULL transaction snapshot. 
This corrects mentioned check-world test.
PFA v7 patch.
Attachment

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Wed, Aug 4, 2021 at 7:55 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>
>>
> Greg, thanks for the fast response! I suppose that a check for IsolationUsesXactSnapshot() is also useful in a
GetTransactionSnapshotfor the correct processing of a case with NULL transaction snapshot.
 
> This corrects mentioned check-world test.
> PFA v7 patch.

Ah, thanks for that (I didn't debug that failure).
But is the coredump issue reproducible now? (using v7 and your test script)


Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Wed, Aug 4, 2021 at 8:17 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>
> Ah, thanks for that (I didn't debug that failure).
> But is the coredump issue reproducible now? (using v7 and your test script)
>

Er, with the v7 patch, the problem still occurs (that Assert still
fires during a run of the SubTransGetTopmostTransaction-rep.sh
script).
Stracktrace from the coredump is below:

#0  0x00007f06e5e7d37f in raise () from /lib64/libc.so.6
#1  0x00007f06e5e67db5 in abort () from /lib64/libc.so.6
#2  0x0000000000b06547 in ExceptionalCondition (
    conditionName=0xba8d78 "TransactionIdFollowsOrEquals(xid,
TransactionXmin)", errorType=0xba8d0b "FailedAssertion",
    fileName=0xba8d00 "subtrans.c", lineNumber=156) at assert.c:69
#3  0x0000000000576f0f in SubTransGetTopmostTransaction (xid=3676) at
subtrans.c:156
#4  0x0000000000b5f55a in XidInMVCCSnapshot (xid=3676,
snapshot=0x2e44560) at snapmgr.c:2293
#5  0x000000000050e014 in HeapTupleSatisfiesMVCC (htup=0x7ffdc6dee6f0,
snapshot=0x2e44560, buffer=15559)
    at heapam_visibility.c:1070
#6  0x000000000050f148 in HeapTupleSatisfiesVisibility
(tup=0x7ffdc6dee6f0, snapshot=0x2e44560, buffer=15559)
    at heapam_visibility.c:1771
#7  0x00000000004f1d26 in heapgetpage (sscan=0x2e9a9a0, page=10310) at
heapam.c:466
#8  0x00000000004f45cf in heapgettup_pagemode (scan=0x2e9a9a0,
dir=ForwardScanDirection, nkeys=0, key=0x0)
    at heapam.c:1118
#9  0x00000000004f4c91 in heap_getnextslot (sscan=0x2e9a9a0,
direction=ForwardScanDirection, slot=0x2e8f9a0)
    at heapam.c:1396
#10 0x000000000077abb1 in table_scan_getnextslot (sscan=0x2e9a9a0,
direction=ForwardScanDirection, slot=0x2e8f9a0)
    at ../../../src/include/access/tableam.h:1044
#11 0x000000000077ac79 in SeqNext (node=0x2e8f428) at nodeSeqscan.c:80
#12 0x000000000073d997 in ExecScanFetch (node=0x2e8f428,
accessMtd=0x77abe5 <SeqNext>, recheckMtd=0x77ac8a <SeqRecheck>)
    at execScan.c:133


Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Pavel Borisov
Date:
ср, 4 авг. 2021 г. в 14:18, Greg Nancarrow <gregn4422@gmail.com>:
On Wed, Aug 4, 2021 at 7:55 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>
>>
> Greg, thanks for the fast response! I suppose that a check for IsolationUsesXactSnapshot() is also useful in a GetTransactionSnapshot for the correct processing of a case with NULL transaction snapshot.
> This corrects mentioned check-world test.
> PFA v7 patch.

Ah, thanks for that (I didn't debug that failure).
But is the coredump issue reproducible now? (using v7 and your test script)

Now I've run my test script attached above in the thread on v6 and v7 and quite soon got crashes with the Assert and a backtrace identical to the original one. So it may be useful for further development, but now it is not enough to fix the original crash.

And the same script run on v2/v5 patch was completed without crash at every isolation level, I've tested i.e. READ COMMITTED, REPEATABLE READ and SERIALIZABLE. If I remember correctly none of us could demonstrate any errors with REPEATABLE READ and SERIALIZABLE on v2/v5. That fact was the base of my proposal to commit v2/v5 i.e. to fix the obvious bug and let the further improvements (if any) be potentially done later.

At SERIALIZABLE level with v2/v5 I get an error which I don't have before the patch (but no crash):
pgbench: error: client 6 script 0 aborted in command 594 query 0: ERROR:  could not serialize access due to read/write dependencies among transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during conflict out checking.


Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Pavel Borisov
Date:
At SERIALIZABLE level with v2/v5 I get an error which I don't have before the patch (but no crash):
pgbench: error: client 6 script 0 aborted in command 594 query 0: ERROR:  could not serialize access due to read/write dependencies among transactions
DETAIL:  Reason code: Canceled on identification as a pivot, during conflict out checking.
I should correct myself: the mentioned error under SERIALIZABLE is also present before the patch.

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Robert Haas
Date:
On Tue, Aug 3, 2021 at 11:41 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> I've tried to follow your description and have attached a patch to
> hopefully match it, but it doesn't pass "make check-world".
> Perhaps I messed something up (apologies if so), or additional changes
> are needed to match what you had in mind or correct additional issues
> you didn't foresee?

This is the sort of thing I was thinking about but I don't understand
why it doesn't fix the reported problem. Apparently I haven't
correctly understood what the issue is.

> >I don't know why Greg's patch is changing anything related to the
> >active snapshot (as opposed to the transaction snapshot). Maybe
> >there's a reason why we need that change, but I don't know what it is.
>
> I don't think my v2/v5 patch is changing anything related to the
> active snapshot (is it?).
> It's restoring the serialized active snapshot, installing it as the
> transaction snapshot and setting it as the active snapshot.

Why do you think it's right to install the serialized *active*
snapshot as the *transaction* snapshot? I've been operating on the
presumption that we wanted the worker to install the leader's
transaction snapshot as its transaction snapshot as its transaction
snapshot and the leader's active snapshot as its active snapshot,
because in my mind the active snapshot and the transaction snapshot
are two separate things. However, there's clearly something wrong with
that line of reasoning, because apparently your proposal fixes the
problem and mine doesn't.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Wed, Aug 4, 2021 at 11:43 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Why do you think it's right to install the serialized *active*
> snapshot as the *transaction* snapshot? I've been operating on the
> presumption that we wanted the worker to install the leader's
> transaction snapshot as its transaction snapshot as its transaction
> snapshot and the leader's active snapshot as its active snapshot,
> because in my mind the active snapshot and the transaction snapshot
> are two separate things. However, there's clearly something wrong with
> that line of reasoning, because apparently your proposal fixes the
> problem and mine doesn't.
>

In setting up the snapshot for the execution state used in command
execution, GetTransactionSnapshot() is called and (possibly a copy of)
the returned snapshot is pushed as the ActiveSnapshot.
The way I see it is that there is essentially only one snapshot here,
the last-acquired TransactionSnapshot, which the ActiveSnapshot points
to (or a copy of it). Rightly or wrongly, my v2/v5 patch is basically
changing the code to do the same snapshot setup in each of the
parallel workers.

So why (current Postgres code, no patches applied) in setting up for
parallel-worker execution (in InitializeParallelDSM) does the Postgres
code then acquire ANOTHER TransactionSnapshot (by calling
GetTransactionSnashot(), which could return CurrentSnapshot or a new
snapshot) and serialize that, as well as serializing what the
ActiveSnapshot points to, and then restore those in the workers as two
separate snapshots? Is it a mistake? Or if intentional and correct,
how so?

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Thu, Aug 5, 2021 at 12:03 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
>

As the current v7 patch doesn't fix the coredump issue and also the
cfbot is now failing (as one of the regression tests fails) I'm
reinstating my v2/v5 patch (as v8) as the current best solution to
this issue.
So far I haven't found a test case (e.g. isolation level) that breaks
the patch. Pavel did encounter an error (but no crash) when testing
the patch with SERIALIZABLE, but he found that this error occurred
anyway without the patch.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Robert Haas
Date:
On Wed, Aug 4, 2021 at 10:03 PM Greg Nancarrow <gregn4422@gmail.com> wrote:
> In setting up the snapshot for the execution state used in command
> execution, GetTransactionSnapshot() is called and (possibly a copy of)
> the returned snapshot is pushed as the ActiveSnapshot.

I mean, there are LOTS of PushActiveSnapshot() calls in the code. A
lot of those specifically say
PushActiveSnapshot(GetTransactionSnapshot()) but others are pushing
snapshots obtained from various other places. I really don't think it
can possibly be correct in general to assume that the snapshot on top
of the active snapshot stack is the same as the transaction snapshot.

> So why (current Postgres code, no patches applied) in setting up for
> parallel-worker execution (in InitializeParallelDSM) does the Postgres
> code then acquire ANOTHER TransactionSnapshot (by calling
> GetTransactionSnashot(), which could return CurrentSnapshot or a new
> snapshot) and serialize that, as well as serializing what the
> ActiveSnapshot points to, and then restore those in the workers as two
> separate snapshots? Is it a mistake? Or if intentional and correct,
> how so?

Well, I already agreed that in cases where GetTransactionSnapshot()
will acquire an altogether new snapshot, we shouldn't call it, but
beyond that I don't see why you think this is wrong. I mean, suppose
we only call GetTransactionSnapshot() at parallel worker when
IsolationUsesXactSnapshot(). In that case, CurrentSnapshot is a
durable, transaction-lifetime piece of backend-local state that can
affect the results of future calls to GetTransactionSnapshot(), and
therefore seems to need to be replicated to workers. Separately,
regardless of IsolationUsesXactSnapshot(), the active snapshot is
accessible via calls to GetActiveSnapshot() and therefore should also
be replicated to workers. Now, I don't know of any necessary reason
why those two things need to be the same, because again, there are
PushActiveSnapshot() calls all over the place, and they're not all
pushing the transaction snapshot. So therefore it makes sense to me
that we need to replicate those two things separately - the active
snapshot in the leader becomes the active snapshot in the workers and
the transaction snapshot in the leader becomes the transaction
snapshot in the worker.

Now there is evidently something wrong with this line of reasoning
because the code is buggy and my proposed fix doesn't work, but I
don't know what is wrong with it. You seem to think that it's crazy
that we try to replicate the active snapshot to the active snapshot
and the transaction snapshot to the transaction snapshot, but that
did, and still does, seem really sane to me. The only part that now
seems clearly wrong to me is that !IsolationUsesXactSnapshot() case
takes an *extra* snapshot, but since fixing that didn't fix the bug,
there's evidently more to the problem than that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Tue, Aug 10, 2021 at 12:35 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Now there is evidently something wrong with this line of reasoning
> because the code is buggy and my proposed fix doesn't work, but I
> don't know what is wrong with it. You seem to think that it's crazy
> that we try to replicate the active snapshot to the active snapshot
> and the transaction snapshot to the transaction snapshot, but that
> did, and still does, seem really sane to me. The only part that now
> seems clearly wrong to me is that !IsolationUsesXactSnapshot() case
> takes an *extra* snapshot, but since fixing that didn't fix the bug,
> there's evidently more to the problem than that.
>

I traced through snapshot processing during a parallel SELECT, up to
the point of the existing GetTransactionSnapshot() and
GetCurrentSnapshot() calls in InitializeParallelDSM().
I'm seeing the following sequence of calls, numbered below:


PortalStart():

   case PORTAL_ONE_SELECT:

(1)    PushActiveSnapshot(GetTransactionSnapshot());

...

       queryDesc = CreateQueryDesc(linitial_node(PlannedStmt, portal->stmts),
                       portal->sourceText,
(2)                   GetActiveSnapshot(),
                       InvalidSnapshot,
                       None_Receiver,
                       params,
                       portal->queryEnv,
                       0);

...

(3)    PopActiveSnapshot();


PortalRunSelect():

(4)    PushActiveSnapshot(queryDesc->snapshot);
        ExecutorRun(queryDesc, direction, (uint64) count,
                                 portal->run_once);

        InitializeParallelDSM():

(5)        Snapshot   transaction_snapshot = GetTransactionSnapshot();
(6)        Snapshot   active_snapshot = GetActiveSnapshot();


         nprocessed = queryDesc->estate->es_processed;
(7)     PopActiveSnapshot();



The snapshot used in execution of the query is clearly the
ActiveSnapshot at the time of creating the QueryDesc [at (2)] which is
a copy of the TransactionSnapshot originally acquired [at (1)].

In InitializeParallelDSM() it acquires both the TransactionSnapshot
[at (5)] and the ActiveSnapshot [at (6)], to be serialized in the DSM
for the workers (each of which will deserialize and restore these).
But the problem I see is that the GetTransactionSnapshot() call [at
(5)] may acquire a new snapshot (i.e. a later snapshot than the
ActiveSnapshot used in the execution of the query), for example, if a
concurrent transaction has completed since GetTransactionSnapshot()
was last called [in (1)].
In this case, GetTransactionSnapshot() calls GetSnapshotDataReuse()
and it returns false, causing a new snapshot to be built by
GetTransactionSnapshot().

    curXactCompletionCount = ShmemVariableCache->xactCompletionCount;
    if (curXactCompletionCount != snapshot->snapXactCompletionCount)
        return false;

When this TransactionSnapshot is restored in a worker process, it
accordingly sets TransactionXmin, and if we look back at the coredump
stacktrace and the Assert condition that failed in the worker, we see
that the xid was expected to be >= TransactionXmin, but the Assert
fired because the xid was < TransactionXmin.

    Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin));

This is explained by the TransactionSnapshot being a later snapshot in
this case.
So this is why it seems to be wrong to call GetTransactionSnapshot()
in InitializeParallelDSM() and use a separate, potentially later,
snapshot than that used in the execution state for the query.

Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Robert Haas
Date:
On Wed, Aug 11, 2021 at 8:32 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> This is explained by the TransactionSnapshot being a later snapshot in
> this case.
> So this is why it seems to be wrong to call GetTransactionSnapshot()
> in InitializeParallelDSM() and use a separate, potentially later,
> snapshot than that used in the execution state for the query.

Thanks for the research. I agree with your logic here, but:

1. Then why doesn't the approach I proposed fix it?

2. Consider the case where the toplevel query is something like SELECT
complexfunc() FROM generate_series(1,10) g -- in a case like this, I
think complexfunc() can cause snapshots to be taken internally. For
example suppose we end up inside exec_eval_simple_expr, or
SPI_cursor_open_internal, in either case with read_only = false. Here
we're going to again call GetTransactionSnapshot() and then execute a
query which may use parallelism.


-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Thu, Aug 12, 2021 at 5:37 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> 1. Then why doesn't the approach I proposed fix it?
>

I think that with your approach, it is not doing the expected
initialization done by SetTransactionSnapshot() (which is called by
RestoreTransactionSnapshot(), which your approach skips in the case of
the SQL script that reproduces the problem, because
IsolationUsesXactSnapshot() returns false for XACT_READ_COMMITTED).
There's some comments in SetTransactionSnapshot() explaining the
tricky parts of this initialization, testing that the source
transaction is still running, dealing with a race condition, and
setting up TransactionXmin.
Also, there's an "if (IsolationUsesXactSnapshot()) ..." block within
that function, doing some required setup for transaction-snapshot
mode, so it doesn't seem like a good idea to not call
RestoreTransactionSnapshot() if !IsolationUsesXactSnapshot(), as the
function is obviously catering for both cases, when the isolation
level does and doesn't use a transaction snapshot. So I think
SetTransactionSnapshot() always needs to be called.

With your proposed approach, what I'm seeing is that the worker calls
GetTransactionSnapshot() at some point, which then builds a new
snapshot, and results in increasing TransactionXmin (probably because
another concurrent transaction has since completed). This snapshot is
thus later than the snapshot used in the execution state of the query
being executed. This causes the Assert in
SubTransGetTopmostTransaction() to fire because the xid doesn't follow
or equal the TransactionXmin value.

> 2. Consider the case where the toplevel query is something like SELECT
> complexfunc() FROM generate_series(1,10) g -- in a case like this, I
> think complexfunc() can cause snapshots to be taken internally. For
> example suppose we end up inside exec_eval_simple_expr, or
> SPI_cursor_open_internal, in either case with read_only = false. Here
> we're going to again call GetTransactionSnapshot() and then execute a
> query which may use parallelism.
>
>

A query always uses the ActiveSnapshot at the time the QueryDesc is
created  - so as long as you don't (as the current code does) obtain a
potentially later snapshot and try to restore that in the worker as
the TransactionSnapshot (or let the worker create a new snapshot,
because no TransactionSnapshot was restored, which may have a greater
xmin than the ActiveSnapshot) then I think it should be OK.


Regards,
Greg Nancarrow
Fujitsu Australia



Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Robert Haas
Date:
On Fri, Aug 13, 2021 at 2:52 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> With your proposed approach, what I'm seeing is that the worker calls
> GetTransactionSnapshot() at some point, which then builds a new
> snapshot, and results in increasing TransactionXmin (probably because
> another concurrent transaction has since completed). This snapshot is
> thus later than the snapshot used in the execution state of the query
> being executed. This causes the Assert in
> SubTransGetTopmostTransaction() to fire because the xid doesn't follow
> or equal the TransactionXmin value.

Ah ha! Thank you. So I think what I was missing here is that even
though the transaction snapshot is not a well-defined concept when
!IsolationUsesXactSnapshot(), we still need TransactionXmin to be set
to a value that's earlier than any XID we might inquire about. So the
proposal to install the leader's active snapshot as the worker's
transaction snapshot is really just a way of making that happen. Now
that I understand better, that seems OK to me when
!IsolationUsesXactSnapshot(), but otherwise I think we need to
serialize and restore the actual transaction snapshot. Do you agree?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Wed, Aug 18, 2021 at 5:00 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Ah ha! Thank you. So I think what I was missing here is that even
> though the transaction snapshot is not a well-defined concept when
> !IsolationUsesXactSnapshot(), we still need TransactionXmin to be set
> to a value that's earlier than any XID we might inquire about. So the
> proposal to install the leader's active snapshot as the worker's
> transaction snapshot is really just a way of making that happen. Now
> that I understand better, that seems OK to me when
> !IsolationUsesXactSnapshot(), but otherwise I think we need to
> serialize and restore the actual transaction snapshot. Do you agree?
>

Yes, I think I agree on that.
I've updated the patch to restore the actual transaction snapshot in
the IsolationUsesXactSnapshot() case, otherwise the active snapshot is
installed as the transaction snapshot.
I've tested the patch for the different transaction isolation levels,
and the reported coredump (from assertion failure) is not occurring.
(In the "serializable" case there are "could not serialize access due
to read/write dependencies among transactions" errors, as Pavel has
previously reported, but these occur without the patch and it appears
to be an unrelated issue)


Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Robert Haas
Date:
On Wed, Aug 18, 2021 at 9:28 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> Yes, I think I agree on that.
> I've updated the patch to restore the actual transaction snapshot in
> the IsolationUsesXactSnapshot() case, otherwise the active snapshot is
> installed as the transaction snapshot.
> I've tested the patch for the different transaction isolation levels,
> and the reported coredump (from assertion failure) is not occurring.
> (In the "serializable" case there are "could not serialize access due
> to read/write dependencies among transactions" errors, as Pavel has
> previously reported, but these occur without the patch and it appears
> to be an unrelated issue)

I think this looks pretty good. I am not sure I see any reason to
introduce a new function RestoreTxnSnapshotAndSetAsActive. Couldn't we
just use RestoreTransactionSnapshot() and then call
PushActiveSnapshot() from parallel.c? That seems preferable to me from
the standpoint of avoiding multiplication of APIs.

I also think that the comments should explain why we are doing this,
rather than just that we are doing this. So instead of this:

+    /*
+     * If the transaction snapshot was serialized, restore it and restore the
+     * active snapshot too. Otherwise, the active snapshot is also installed as
+     * the transaction snapshot.
+     */

...perhaps something like:

If the transaction isolation level is READ COMMITTED or SERIALIZABLE,
the leader has serialized the transaction snapshot and we must restore
it. At lower isolation levels, there is no transaction-lifetime
snapshot, but we need TransactionXmin to get set to a value which is
less than or equal to the xmin of every snapshot that will be used by
this worker. The easiest way to accomplish that is to install the
active snapshot as the transaction snapshot. Code running in this
parallel worker might take new snapshots via GetTransactionSnapshot()
or GetLatestSnapshot(), but it shouldn't have any way of acquiring a
snapshot older than the active snapshot.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Tue, Aug 24, 2021 at 5:00 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
>
> I think this looks pretty good. I am not sure I see any reason to
> introduce a new function RestoreTxnSnapshotAndSetAsActive. Couldn't we
> just use RestoreTransactionSnapshot() and then call
> PushActiveSnapshot() from parallel.c? That seems preferable to me from
> the standpoint of avoiding multiplication of APIs.
>

I initially thought this too, but RestoreTransactionSnapshot() sets up
the resultant transaction snapshot in "CurrentSnapshot", which is
static to snapmgr.c (like the other pointers to valid snapshots) and I
didn't really want to mess with the visibility of that, to allow a
call to PushActiveSnapshot(CurrentSnapshot) in parallel.c. Also, I
wasn't sure if it was safe to call GetTransactionSnapshot() here
without the risk of unwanted side-effects - but, looking at it again,
I think it is probably OK, so I did use it in my revised patch
(attached) and removed
that new function RestoreTxnSnapshotAndSetAsActive(). Do you agree
that it is OK to call GetTransactionSnapshot() here?

> I also think that the comments should explain why we are doing this,
> rather than just that we are doing this. So instead of this:
>
> +    /*
> +     * If the transaction snapshot was serialized, restore it and restore the
> +     * active snapshot too. Otherwise, the active snapshot is also installed as
> +     * the transaction snapshot.
> +     */
>
> ...perhaps something like:
>
> If the transaction isolation level is READ COMMITTED or SERIALIZABLE,
> the leader has serialized the transaction snapshot and we must restore
> it. At lower isolation levels, there is no transaction-lifetime
> snapshot, but we need TransactionXmin to get set to a value which is
> less than or equal to the xmin of every snapshot that will be used by
> this worker. The easiest way to accomplish that is to install the
> active snapshot as the transaction snapshot. Code running in this
> parallel worker might take new snapshots via GetTransactionSnapshot()
> or GetLatestSnapshot(), but it shouldn't have any way of acquiring a
> snapshot older than the active snapshot.
>

I agree, that is a better comment and detailed description, but didn't
you mean "If the transaction isolation level is REPEATABLE READ or
SERIALIZABLE ..."?

since we have:

#define XACT_READ_UNCOMMITTED   0
#define XACT_READ_COMMITTED     1
#define XACT_REPEATABLE_READ    2
#define XACT_SERIALIZABLE       3

#define IsolationUsesXactSnapshot() (XactIsoLevel >= XACT_REPEATABLE_READ)


Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Robert Haas
Date:
On Tue, Aug 24, 2021 at 12:20 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> I initially thought this too, but RestoreTransactionSnapshot() sets up
> the resultant transaction snapshot in "CurrentSnapshot", which is
> static to snapmgr.c (like the other pointers to valid snapshots) and I
> didn't really want to mess with the visibility of that, to allow a
> call to PushActiveSnapshot(CurrentSnapshot) in parallel.c. Also, I
> wasn't sure if it was safe to call GetTransactionSnapshot() here
> without the risk of unwanted side-effects - but, looking at it again,
> I think it is probably OK, so I did use it in my revised patch
> (attached) and removed
> that new function RestoreTxnSnapshotAndSetAsActive(). Do you agree
> that it is OK to call GetTransactionSnapshot() here?

I guess I was thinking more of rejiggering things so that we save the
results of each RestoreSnapshot() call in a local variable, e.g.
asnapshot and tsnapshot. And then I think we could just
RestoreTransactionSnapshot() on whichever one we want, and then
PushActiveSnapshot(asnapshot) either way. I think it would be worth
trying to move the PushActiveSnapshot() call out of the if statement
instead it in two places, written differently but doing the same
thing.

> I agree, that is a better comment and detailed description, but didn't
> you mean "If the transaction isolation level is REPEATABLE READ or
> SERIALIZABLE ..."?

I sure did!

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Greg Nancarrow
Date:
On Wed, Aug 25, 2021 at 1:37 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> I guess I was thinking more of rejiggering things so that we save the
> results of each RestoreSnapshot() call in a local variable, e.g.
> asnapshot and tsnapshot. And then I think we could just
> RestoreTransactionSnapshot() on whichever one we want, and then
> PushActiveSnapshot(asnapshot) either way. I think it would be worth
> trying to move the PushActiveSnapshot() call out of the if statement
> instead it in two places, written differently but doing the same
> thing.
>

I've attached an updated patch, hopefully more along the lines that
you were thinking of.

Regards,
Greg Nancarrow
Fujitsu Australia

Attachment

Re: Parallel scan with SubTransGetTopmostTransaction assert coredump

From
Robert Haas
Date:
On Wed, Aug 25, 2021 at 5:36 AM Greg Nancarrow <gregn4422@gmail.com> wrote:
> I've attached an updated patch, hopefully more along the lines that
> you were thinking of.

LGTM. Committed and back-patched to v10 and up. In theory the same bug
exists in 9.6, but you'd have to have third-party code using the
parallel context infrastructure in order to hit it. If the patch
back-patched cleanly I would have done so just in case, but
shm_toc_lookup lacks a bool noError option in that version.

-- 
Robert Haas
EDB: http://www.enterprisedb.com