Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions |
Date | |
Msg-id | CAFiTN-uV1WOZW9LJaw5kBZXUyUNgN1utUOiKyptKeYrOjca4BQ@mail.gmail.com Whole thread Raw |
In response to | Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions (Amit Kapila <amit.kapila16@gmail.com>) |
List | pgsql-hackers |
On Mon, Jul 6, 2020 at 3:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Jul 6, 2020 at 11:44 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Mon, Jul 6, 2020 at 11:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > 10. I have got the below failure once. I have not investigated this > > > > > in detail as the patch is still under progress. See, if you have any > > > > > idea? > > > > > # Failed test 'check extra columns contain local defaults' > > > > > # at t/013_stream_subxact_ddl_abort.pl line 81. > > > > > # got: '2|0' > > > > > # expected: '1000|500' > > > > > # Looks like you failed 1 test of 2. > > > > > make[2]: *** [check] Error 1 > > > > > make[1]: *** [check-subscription-recurse] Error 2 > > > > > make[1]: *** Waiting for unfinished jobs.... > > > > > make: *** [check-world-src/test-recurse] Error 2 > > > > > > > > Even I got the failure once and after that, it did not reproduce. I > > > > have executed it multiple time but it did not reproduce again. Are > > > > you able to reproduce it consistently? > > > > > > > > > > No, I am also not able to reproduce it consistently but I think this > > > can fail if a subscriber sends the replay_location before actually > > > replaying the changes. First, I thought that extra send_feedback we > > > have in apply_handle_stream_commit might have caused this but I guess > > > that can't happen because we need the commit time location for that > > > and we are storing the same at the end of apply_handle_stream_commit > > > after applying all messages. I am not sure what is going on here. I > > > think we somehow need to reproduce this or some variant of this test > > > consistently to find the root cause. > > > > And I think it appeared first time for me, so maybe either induced > > from past few versions so some changes in the last few versions might > > have exposed it. I have noticed that almost 50% of the time I am able > > to reproduce after the clean build so I can trace back from which > > version it started appearing that way it will be easy to narrow down. > > > > One more comment > ReorderBufferLargestTopTXN > { > .. > dlist_foreach(iter, &rb->toplevel_by_lsn) > { > ReorderBufferTXN *txn; > + Size size = 0; > + Size largest_size = 0; > > txn = dlist_container(ReorderBufferTXN, node, iter.cur); > > - /* if the current transaction is larger, remember it */ > - if ((!largest) || (txn->size > largest->size)) > + /* > + * If this transaction have some incomplete changes then only consider > + * the size upto last complete lsn. > + */ > + if (rbtxn_has_incomplete_tuple(txn)) > + size = txn->complete_size; > + else > + size = txn->total_size; > + > + /* If the current transaction is larger then remember it. */ > + if ((largest != NULL || size > largest_size) && size > 0) > > Here largest_size is a local variable inside the loop which is > initialized to 0 in each iteration and that will lead to picking each > next txn as largest. This seems wrong to me. You are right, will fix. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: