Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date
Msg-id CAA4eK1+z07HvtULZXxkcvvBios8ZeKkRMuDoUZvAHvVcvSjAew@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
List pgsql-hackers
On Wed, Dec 2, 2020 at 1:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Dec 1, 2020 at 11:38 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Nov 30, 2020 at 6:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Nov 30, 2020 at 3:14 AM Noah Misch <noah@leadboat.com> wrote:
> > > >
> > > > On Mon, Sep 07, 2020 at 12:00:41PM +0530, Amit Kapila wrote:
> > > > > Thanks, I have pushed the last patch. Let's wait for a day or so to
> > > > > see the buildfarm reports
> > > >
> > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2020-09-08%2006%3A24%3A14
> > > > failed the new 015_stream.pl test with the subscriber looping like this:
> > > >
> > > > 2020-09-08 11:22:49.848 UTC [13959252:1] LOG:  logical replication apply worker for subscription "tap_sub" has
started
> > > > 2020-09-08 11:22:54.045 UTC [13959252:2] ERROR:  could not open temporary file "16393-510.changes.0" from
BufFile"16393-510.changes": No such file or directory
 
> > > > 2020-09-08 11:22:54.055 UTC [7602182:1] LOG:  logical replication apply worker for subscription "tap_sub" has
started
> > > > 2020-09-08 11:22:54.101 UTC [31785284:4] LOG:  background worker "logical replication worker" (PID 13959252)
exitedwith exit code 1
 
> > > > 2020-09-08 11:23:01.142 UTC [7602182:2] ERROR:  could not open temporary file "16393-510.changes.0" from
BufFile"16393-510.changes": No such file or directory
 
> > > > ...
> > > >
> > > > What happened there?
> > > >
> > >
> > > What is going on here is that the expected streaming file is missing.
> > > Normally, the first time we send a stream of changes (some percentage
> > > of transaction changes) we create the streaming file, and then in
> > > respective streams we just keep on writing in that file the changes we
> > > receive from the publisher, and on commit, we read that file and apply
> > > all the changes.
> > >
> > > The above kind of error can happen due to the following reasons: (a)
> > > the first time we sent the stream and created the file and that got
> > > removed before the second stream reached the subscriber. (b) from the
> > > publisher-side, we never sent the indication that it is the first
> > > stream and the subscriber directly tries to open the file thinking it
> > > is already there.
> > >
>
> I have executed "make check" in the loop with only this file.  I have
> repeated it 5000 times but no failure, I am wondering shall we try to
> execute in the same machine in a loop where it failed once?
>

Yes, that might help. Noah, would it be possible for you to try that
out, and if it failed then probably get the stack trace of subscriber?
If we are able to reproduce it then we can add elogs in functions
SharedFileSetInit, BufFileCreateShared, BufFileOpenShared, and
SharedFileSetDeleteAll to print the paths to see if we are sometimes
unintentionally removing some files. I have checked the code and there
doesn't appear to be any such problems but I might be missing
something.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Next
From: "tsunakawa.takay@fujitsu.com"
Date:
Subject: [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently