Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions |
Date | |
Msg-id | CAFiTN-sFPUKbT2EhPro1WVCzgR+MrcKZ3qeEMuJrmvPubWYCNg@mail.gmail.com Whole thread Raw |
In response to | Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
|
List | pgsql-hackers |
On Tue, Dec 1, 2020 at 11:38 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Mon, Nov 30, 2020 at 6:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Nov 30, 2020 at 3:14 AM Noah Misch <noah@leadboat.com> wrote: > > > > > > On Mon, Sep 07, 2020 at 12:00:41PM +0530, Amit Kapila wrote: > > > > Thanks, I have pushed the last patch. Let's wait for a day or so to > > > > see the buildfarm reports > > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2020-09-08%2006%3A24%3A14 > > > failed the new 015_stream.pl test with the subscriber looping like this: > > > > > > 2020-09-08 11:22:49.848 UTC [13959252:1] LOG: logical replication apply worker for subscription "tap_sub" has started > > > 2020-09-08 11:22:54.045 UTC [13959252:2] ERROR: could not open temporary file "16393-510.changes.0" from BufFile "16393-510.changes":No such file or directory > > > 2020-09-08 11:22:54.055 UTC [7602182:1] LOG: logical replication apply worker for subscription "tap_sub" has started > > > 2020-09-08 11:22:54.101 UTC [31785284:4] LOG: background worker "logical replication worker" (PID 13959252) exitedwith exit code 1 > > > 2020-09-08 11:23:01.142 UTC [7602182:2] ERROR: could not open temporary file "16393-510.changes.0" from BufFile "16393-510.changes":No such file or directory > > > ... > > > > > > What happened there? > > > > > > > What is going on here is that the expected streaming file is missing. > > Normally, the first time we send a stream of changes (some percentage > > of transaction changes) we create the streaming file, and then in > > respective streams we just keep on writing in that file the changes we > > receive from the publisher, and on commit, we read that file and apply > > all the changes. > > > > The above kind of error can happen due to the following reasons: (a) > > the first time we sent the stream and created the file and that got > > removed before the second stream reached the subscriber. (b) from the > > publisher-side, we never sent the indication that it is the first > > stream and the subscriber directly tries to open the file thinking it > > is already there. > > > > Now, the publisher and subscriber log doesn't directly indicate any of > > the above problems but I have some observations. > > > > The subscriber log indicates that before the apply worker exits due to > > an error the new apply worker gets started. We delete the > > streaming-related temporary files on proc_exit, so one possibility > > could have been that the new apply worker has created the streaming > > file which the old apply worker has removed but that is not possible > > because we always create these temp-files by having procid in the > > path. > > Yeah, and I have tried to test on this line, basically, after the > streaming has started I have set the binary=on. Now using gdb I have > made the worker wait before it deletes the temp file and meanwhile the > new worker started and it worked properly as expected. > > > The other thing I observed in the code is that we can mark the > > transaction as streamed (via ReorderBufferTruncateTxn) if we try to > > stream a transaction that has no changes the first time we try to > > stream the transaction. This would lead to symptom (b) because the > > second-time when there are more changes we would stream the changes as > > it is not the first time. However, this shouldn't happen because we > > never pick-up a transaction to stream which has no changes. I can try > > to fix the code here such that we don't mark the transaction as > > streamed unless we have streamed at least one change but I don't see > > how it is related to this particular test failure. > > Yeah, this can be improved but as you mentioned that we never select > an empty transaction for streaming so this case should not occur. I > will perform some testing/review around this and report. I have executed "make check" in the loop with only this file. I have repeated it 5000 times but no failure, I am wondering shall we try to execute in the same machine in a loop where it failed once? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: