RE: Forget close an open relation in ReorderBufferProcessTXN() - Mailing list pgsql-hackers

From osumi.takamichi@fujitsu.com
Subject RE: Forget close an open relation in ReorderBufferProcessTXN()
Date
Msg-id OSBPR01MB48885887A66B794919941132ED269@OSBPR01MB4888.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Forget close an open relation in ReorderBufferProcessTXN()  (Amit Langote <amitlangote09@gmail.com>)
Responses Re: Forget close an open relation in ReorderBufferProcessTXN()
List pgsql-hackers
On Saturday, May 22, 2021 11:58 AM Amit Langote <amitlangote09@gmail.com> wrote:
> On Sat, May 22, 2021 at 11:00 AM osumi.takamichi@fujitsu.com
> <osumi.takamichi@fujitsu.com> wrote:
> > I've checked the core file of v3's failure core and printed the entry
> > to get more confidence. Sorry for inappropriate measure to verify the
> solution.
> >
> > $1 = {relid = 16388, schema_sent = false, streamed_txns = 0x0,
> replicate_valid = false, pubactions = {pubinsert = false, pubupdate = false,
> pubdelete = false, pubtruncate = false}, publish_as_relid = 16388,
> >   map = 0x7f7f7f7f7f7f7f7f}
> >
> > Yes, the process tried to free garbage.
> > Now, we are convinced that we have addressed the problem. That's it !
> 
> Thanks for confirming that.
Langote-san, I need to report another issue.

When I execute make check-world with v6 additionally,
I've gotten another failure. I get this about once in
20 times of make check-world with v6.

The test ended with stderr outputs below.

NOTICE:  database "regression" does not exist, skipping
make[2]: *** [check] Error 1
make[1]: *** [check-isolation-recurse] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [check-world-src/test-recurse] Error 2
make: *** Waiting for unfinished jobs....

And, I had ./src/test/isolation/output_iso/regression.diffs and regression.out,
which told me below.

test detach-partition-concurrently-1 ... ok          705 ms
test detach-partition-concurrently-2 ... ok          260 ms
test detach-partition-concurrently-3 ... FAILED      618 ms
test detach-partition-concurrently-4 ... ok         1384 ms

The diffs file showed me below.

diff -U3 /home/k5user/new_disk/repro_fail_v6/src/test/isolation/expected/detach-partition-concurrently-3.out
/home/k5user/new_disk/repro_fail_v6/src/test/isolation/output_iso/results/detach-partition-concurrently-3.out
--- /home/k5user/new_disk/repro_fail_v6/src/test/isolation/expected/detach-partition-concurrently-3.out 2021-05-24
01:22:22.381488295+0000
 
+++ /home/k5user/new_disk/repro_fail_v6/src/test/isolation/output_iso/results/detach-partition-concurrently-3.out
2021-05-2402:47:08.292488295 +0000
 
@@ -190,7 +190,7 @@

 t
 step s2detach: <... completed>
-error in steps s1cancel s2detach: ERROR:  canceling statement due to user request
+ERROR:  canceling statement due to user request
 step s2detach2: ALTER TABLE d3_listp DETACH PARTITION d3_listp2 CONCURRENTLY;
 ERROR:  partition "d3_listp1" already pending detach in partitioned table "public.d3_listp"
 step s1c: COMMIT;

I'm not sure if this is related to the patch or we already have this from OSS HEAD yet.

FYI: the steps I did are 
1 - clone PG(I used f5024d8)
2 - git am the 2 patches for HEAD
    * HEAD-v6-0001-pgoutput-fix-memory-management-of-RelationSyncEnt.patch
    * HEAD-v6-0002-pgoutput-don-t-send-leaf-partition-schema-when-pu.patch
3 - configure with --enable-cassert --enable-debug --enable-tap-tests --with-icu CFLAGS=-O0
--prefix=/where/you/wanna/put/PG
4 - make -j2 2> make.log # did not get stderr output.
5 - make check-world -j8 2> make_check_world.log
    (after this I've conducted another tight loop test by repeating make check-world and got the error)


Best Regards,
    Takamichi Osumi


pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Race condition in recovery?
Next
From: Amit Langote
Date:
Subject: Re: Forget close an open relation in ReorderBufferProcessTXN()