Thread: Notify downstream to discard the streamed transaction which was aborted due to crash.
Notify downstream to discard the streamed transaction which was aborted due to crash.
From
"houzj.fnst@fujitsu.com"
Date:
Hi, When developing another feature, I find an existing bug which was reported from Dilip[1]. Currently, it's possible that we only send a streaming block without sending a end of stream message(stream abort) when decoding and streaming a transaction that was aborted due to crash because we might not WAL log a XLOG_XACT_ABORT for such a crashed transaction. This will cause the subscriber(with streaming=on) create a stream file but won't delete it until the apply worker restart. BUG repro(borrowed from Dilip): --- 1. start 2 servers(config: logical_decoding_work_mem=64kB) ./pg_ctl -D data/ -c -l pub_logs start ./pg_ctl -D data1/ -c -l sub_logs start 2. Publisher: create table t(a int PRIMARY KEY ,b text); create publication test_pub for table t with(PUBLISH='insert,delete,update,truncate'); alter table t replica identity FULL ; 3. Subscription Server: create table t(a int,b text); create subscription test_sub CONNECTION 'host=localhost port=10000 dbname=postgres' PUBLICATION test_pub WITH ( slot_name = test_slot_sub1,streaming=on); 4. Publication Server: begin ; insert into t values (generate_series(1,50000), 'zzzzzzzzz'); -- (while executing this restart publisher in 2-3 secs) Restart the publication server, while the transaction is still in an uncommitted state. ./pg_ctl -D data/ -c -l pub_logs restart -mi --- After restarting the publisher, we can see the subscriber receive a streaming block and create a stream file(/base/pgsql_tmp/xxx.fileset). To fix it, One idea is to send a stream abort message when we are cleaning up crashed transaction on publisher(e.g. in ReorderBufferAbortOld()). And here is a tiny patch which changes the same. I have confirmed that the bug is fixed and all regression tests pass. I didn't add a testcase because we need to make sure the crash happens before all the WAL logged transactions data are decoded which doesn't seem easy to write a stable test for this. Thoughts ? [1] https://www.postgresql.org/message-id/CAFiTN-sTYk%3Dh75Jn1a7ee%2B5hOcdQFjKGDvF_0NWQQXmoBv4A%2BA%40mail.gmail.com Best regards, Hou zj
Attachment
Re: Notify downstream to discard the streamed transaction which was aborted due to crash.
From
Amit Kapila
Date:
On Fri, Jan 6, 2023 at 9:25 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > > To fix it, One idea is to send a stream abort message when we are cleaning up > crashed transaction on publisher(e.g. in ReorderBufferAbortOld()). And here is > a tiny patch which changes the same. I have confirmed that the bug is fixed and > all regression tests pass. I didn't add a testcase because we need to make sure > the crash happens before all the WAL logged transactions data are decoded which > doesn't seem easy to write a stable test for this. > Your fix looks good to me. Have you tried this in PG-14 where it was introduced? -- With Regards, Amit Kapila.
RE: Notify downstream to discard the streamed transaction which was aborted due to crash.
From
"houzj.fnst@fujitsu.com"
Date:
On Friday, January 6, 2023 1:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Jan 6, 2023 at 9:25 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> > wrote: > > > > > > To fix it, One idea is to send a stream abort message when we are > > cleaning up crashed transaction on publisher(e.g. in > > ReorderBufferAbortOld()). And here is a tiny patch which changes the > > same. I have confirmed that the bug is fixed and all regression tests > > pass. I didn't add a testcase because we need to make sure the crash > > happens before all the WAL logged transactions data are decoded which > doesn't seem easy to write a stable test for this. > > > > Your fix looks good to me. Have you tried this in PG-14 where it was > introduced? Yes, I have confirmed that PG-14 has the same problem and can be fixed after applying the patch. Best regards, Hou zj
Re: Notify downstream to discard the streamed transaction which was aborted due to crash.
From
Dilip Kumar
Date:
On Fri, Jan 6, 2023 at 9:25 AM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > Hi, > > When developing another feature, I find an existing bug which was reported from Dilip[1]. > > Currently, it's possible that we only send a streaming block without sending a > end of stream message(stream abort) when decoding and streaming a transaction > that was aborted due to crash because we might not WAL log a XLOG_XACT_ABORT > for such a crashed transaction. This will cause the subscriber(with > streaming=on) create a stream file but won't delete it until the apply > worker restart. > > BUG repro(borrowed from Dilip): > --- > 1. start 2 servers(config: logical_decoding_work_mem=64kB) > ./pg_ctl -D data/ -c -l pub_logs start > ./pg_ctl -D data1/ -c -l sub_logs start > > 2. Publisher: > create table t(a int PRIMARY KEY ,b text); > create publication test_pub for table t > with(PUBLISH='insert,delete,update,truncate'); > alter table t replica identity FULL ; > > 3. Subscription Server: > create table t(a int,b text); > create subscription test_sub CONNECTION 'host=localhost port=10000 > dbname=postgres' PUBLICATION test_pub WITH ( slot_name = > test_slot_sub1,streaming=on); > > 4. Publication Server: > begin ; > insert into t values (generate_series(1,50000), 'zzzzzzzzz'); -- (while executing this restart publisher in 2-3 secs) > > Restart the publication server, while the transaction is still in an > uncommitted state. > ./pg_ctl -D data/ -c -l pub_logs restart -mi > --- > > After restarting the publisher, we can see the subscriber receive a streaming > block and create a stream file(/base/pgsql_tmp/xxx.fileset). > > To fix it, One idea is to send a stream abort message when we are cleaning up > crashed transaction on publisher(e.g. in ReorderBufferAbortOld()). And here is > a tiny patch which changes the same. I have confirmed that the bug is fixed and > all regression tests pass. I didn't add a testcase because we need to make sure > the crash happens before all the WAL logged transactions data are decoded which > doesn't seem easy to write a stable test for this. > > Thoughts ? Fix looks good to me. Thanks for working on this. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Notify downstream to discard the streamed transaction which was aborted due to crash.
From
Amit Kapila
Date:
On Fri, Jan 6, 2023 at 11:18 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > To fix it, One idea is to send a stream abort message when we are cleaning up > > crashed transaction on publisher(e.g. in ReorderBufferAbortOld()). And here is > > a tiny patch which changes the same. I have confirmed that the bug is fixed and > > all regression tests pass. I didn't add a testcase because we need to make sure > > the crash happens before all the WAL logged transactions data are decoded which > > doesn't seem easy to write a stable test for this. > > > > Thoughts ? > > Fix looks good to me. Thanks for working on this. > Pushed! -- With Regards, Amit Kapila.