Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump - Mailing list pgsql-hackers

From Greg Nancarrow
Subject Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Date
Msg-id CAJcOf-cNLhA7iaUYAQqZ44tz3oHJoPxGRm1+tNE27iJXTXObzQ@mail.gmail.com
Whole thread Raw
In response to Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump  (Pavel Borisov <pashkin.elfe@gmail.com>)
List pgsql-hackers
On Mon, May 24, 2021 at 2:50 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, May 24, 2021 at 12:04:37PM +1000, Greg Nancarrow wrote:
> > Keep cfbot happy, use the PG14 patch as latest.
>
> This stuff is usually very tricky.

Agreed. That's why I was looking for experts in this snapshot-handling
code, to look closer at this issue, check my proposed fix, come up
with a better solution etc.

>Do we have a way to reliably
> reproduce the report discussed here?

I couldn't reproduce it in my environment (though I could understand
what was going wrong, based on the description provided).
houzj (houzj.fnst@fujitsu.com) was able to reproduce it in his
environment and kindly provided to me the following information:
(He said that he followed most of the steps described by the original
problem reporter, Pengcheng, but perhaps steps 2 and 7 are a little
different from his steps. See the emails higher in the thread for the
two scripts "init_test.sql" and "sub_120.sql")

===

1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file
"src/include/access/subtrans.h" line number 15.
2, configure with enable assert and build it.( ./configure
--enable-cassert --prefix=/home/pgsql)
3, init a new database cluster.
4, modify  postgres.conf  and add some parameters as below. As the
coredump from parallel scan, so we adjust parallel setting, make it
easy to reproduce.

  max_connections = 2000

  parallel_setup_cost=0
  parallel_tuple_cost=0
  min_parallel_table_scan_size=0
  max_parallel_workers_per_gather=8
  max_parallel_workers = 32

5, start the database cluster.
6, use the script init_test.sql  in attachment to create tables.
7, use pgbench with script sub_120.sql in attachment to test it. Try
it sometimes, you should get the coredump file.
    pgbench  -d postgres -p 33550  -n -r -f sub_120.sql   -c 200 -j 200 -T 12000
   (If cannot reproduce it, maybe you can try run two parallel pgbench
xx at the same time)

In my environment(CentOS 8.2, 128G RAM, 40 processors, disk SAS
Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz),
sometimes I can reproduce in about 5 minutes , but sometimes it needs
about half an hour.

Best regards,
houzj

===

Regards,
Greg Nancarrow
Fujitsu Australia



pgsql-hackers by date:

Previous
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: Parallel INSERT SELECT take 2
Next
From: Amit Kapila
Date:
Subject: Re: [PATCH] Add `truncate` option to subscription commands