Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id CAA4eK1+0y3fwU2HqDWZRqJYniBsDgsSCJf37TNt=HcPDV0_xsA@mail.gmail.com
Whole thread Raw
In response to Re: Perform streaming logical transactions by background workers and parallel apply  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Fri, Oct 7, 2022 at 8:38 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Thu, Oct 6, 2022 at 10:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Sep 30, 2022 at 1:56 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > Here are my review comments for the v35-0001 patch:
> > >
> > > ======
> > >
> > > 3. GENERAL
> > >
> > > (this comment was written after I wrote all the other ones below so
> > > there might be some unintended overlaps...)
> > >
> > > I found the mixed use of the same member names having different
> > > meanings to be quite confusing.
> > >
> > > e.g.1
> > > PGOutputData 'streaming' is now a single char internal representation
> > > the subscription parameter streaming mode ('f','t','p')
> > > - bool streaming;
> > > + char streaming;
> > >
> > > e.g.2
> > > WalRcvStreamOptions 'streaming' is a C string version of the
> > > subscription streaming mode ("on", "parallel")
> > > - bool streaming; /* Streaming of large transactions */
> > > + char    *streaming; /* Streaming of large transactions */
> > >
> > > e.g.3
> > > SubOpts 'streaming' is again like the first example - a single char
> > > for the mode.
> > > - bool streaming;
> > > + char streaming;
> > >
> > >
> > > IMO everything would become much simpler if you did:
> > >
> > > 3a.
> > > Rename "char streaming;" -> "char streaming_mode;"
> > >
> > > 3b.
> > > Re-designed the "char *streaming;" code to also use the single char
> > > notation, then also call that member 'streaming_mode'. Then everything
> > > will be consistent.
> > >
> >
> > Won't this impact the previous version publisher which already uses
> > on/off? We may need to maintain multiple values which would be
> > confusing.
> >
>
> I only meant that the *internal* struct member names mentioned could
> change - not anything exposed as user-visible parameter names or
> column names etc. Or were you referring to it as causing unnecessary
> troubles for back-patching? Anyway, the main point of this review
> comment was #3b.
>

My response was for 3b only.

> Unless I am mistaken, there is no reason why that one
> cannot be changed to use 'char' instead of 'char *', for consistency
> across all the same named members.
>

I feel this will bring more complexity to the code if you have to keep
it working with old-version publishers.

> > >
> > > 9. - parallel_apply_can_start
> > >
> > > +/*
> > > + * Returns true, if it is allowed to start a parallel apply worker, false,
> > > + * otherwise.
> > > + */
> > > +static bool
> > > +parallel_apply_can_start(TransactionId xid)
> > >
> > > (The commas are strange)
> > >
> > > SUGGESTION
> > > Returns true if it is OK to start a parallel apply worker, false otherwise.
> > >
> >
> > +1 for this.
> > >
> > > 28. - logicalrep_worker_detach
> > >
> > > + /* Stop the parallel apply workers. */
> > > + if (am_leader_apply_worker())
> > > + {
> > >
> > > Should that comment rather say like below?
> > >
> > > /* If this is the leader apply worker then stop all of its parallel
> > > apply workers. */
> > >
> >
> > I think this would be just saying what is apparent from the code, so
> > not sure if it is an improvement.
> >
> > >
> > > 38. - apply_handle_commit_prepared
> > >
> > > + *
> > > + * Note that we don't need to wait here if the transaction was prepared in a
> > > + * parallel apply worker. Because we have already waited for the prepare to
> > > + * finish in apply_handle_stream_prepare() which will ensure all the operations
> > > + * in that transaction have happened in the subscriber and no concurrent
> > > + * transaction can create deadlock or transaction dependency issues.
> > >   */
> > >  static void
> > >  apply_handle_commit_prepared(StringInfo s)
> > >
> > > "worker. Because" -> "worker because"
> > >
> >
> > I think this will make this line too long. Can we think of breaking it
> > in some way?
>
> OK, how about below:
>
> Note that we don't need to wait here if the transaction was prepared
> in a parallel apply worker. In that case, we have already waited for
> the prepare to finish in apply_handle_stream_prepare() which will
> ensure all the operations in that transaction have happened in the
> subscriber, so no concurrent transaction can cause deadlock or
> transaction dependency issues.
>

Yeah, this looks better.

> >
> > > 58.
> > >
> > > --- fail - streaming must be boolean
> > > +-- fail - streaming must be boolean or 'parallel'
> > >  CREATE SUBSCRIPTION regress_testsub CONNECTION
> > > 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
> > > false, streaming = foo);
> > >
> > > I think there are tests already for explicitly create/set the
> > > subscription parameter streaming = on/off/parallel
> > >
> > > But what about when there is no value explicitly specified? Shouldn't
> > > there also be tests like below to check that *implied* boolean true
> > > still works for this enum?
> > >
> > > CREATE SUBSCRIPTION ... WITH (streaming)
> > > ALTER SUBSCRIPTION ... SET (streaming)
> > >
> >
> > I think before adding new tests for this, please check if we have any
> > similar tests for other boolean options.
>
> IMO this one is a bit different because it's not really a boolean
> option anymore - it's a kind of a hybrid boolean/enum. That's why I
> thought this ought to be tested regardless if there are existing tests
> for the (normal) boolean options.
>

I am not really sure if adding such tests are valuable but if Hou-San
and you feel it is good to have it then I am fine with it.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Support logical replication of DDLs
Next
From: Pavel Stehule
Date:
Subject: Re: proposal: possibility to read dumped table's name from file