Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Perform streaming logical transactions by background workers and parallel apply |
Date | |
Msg-id | CAHut+Pscac+ipFSFx89ACmacjPe4Dn=qVq8T0V=nQkv38QgnBw@mail.gmail.com Whole thread Raw |
In response to | Re: Perform streaming logical transactions by background workers and parallel apply (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Perform streaming logical transactions by background workers and parallel apply
|
List | pgsql-hackers |
On Thu, Oct 6, 2022 at 10:38 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Sep 30, 2022 at 1:56 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > Here are my review comments for the v35-0001 patch: > > > > ====== > > > > 3. GENERAL > > > > (this comment was written after I wrote all the other ones below so > > there might be some unintended overlaps...) > > > > I found the mixed use of the same member names having different > > meanings to be quite confusing. > > > > e.g.1 > > PGOutputData 'streaming' is now a single char internal representation > > the subscription parameter streaming mode ('f','t','p') > > - bool streaming; > > + char streaming; > > > > e.g.2 > > WalRcvStreamOptions 'streaming' is a C string version of the > > subscription streaming mode ("on", "parallel") > > - bool streaming; /* Streaming of large transactions */ > > + char *streaming; /* Streaming of large transactions */ > > > > e.g.3 > > SubOpts 'streaming' is again like the first example - a single char > > for the mode. > > - bool streaming; > > + char streaming; > > > > > > IMO everything would become much simpler if you did: > > > > 3a. > > Rename "char streaming;" -> "char streaming_mode;" > > > > 3b. > > Re-designed the "char *streaming;" code to also use the single char > > notation, then also call that member 'streaming_mode'. Then everything > > will be consistent. > > > > Won't this impact the previous version publisher which already uses > on/off? We may need to maintain multiple values which would be > confusing. > I only meant that the *internal* struct member names mentioned could change - not anything exposed as user-visible parameter names or column names etc. Or were you referring to it as causing unnecessary troubles for back-patching? Anyway, the main point of this review comment was #3b. Unless I am mistaken, there is no reason why that one cannot be changed to use 'char' instead of 'char *', for consistency across all the same named members. > > > > 9. - parallel_apply_can_start > > > > +/* > > + * Returns true, if it is allowed to start a parallel apply worker, false, > > + * otherwise. > > + */ > > +static bool > > +parallel_apply_can_start(TransactionId xid) > > > > (The commas are strange) > > > > SUGGESTION > > Returns true if it is OK to start a parallel apply worker, false otherwise. > > > > +1 for this. > > > > 28. - logicalrep_worker_detach > > > > + /* Stop the parallel apply workers. */ > > + if (am_leader_apply_worker()) > > + { > > > > Should that comment rather say like below? > > > > /* If this is the leader apply worker then stop all of its parallel > > apply workers. */ > > > > I think this would be just saying what is apparent from the code, so > not sure if it is an improvement. > > > > > 38. - apply_handle_commit_prepared > > > > + * > > + * Note that we don't need to wait here if the transaction was prepared in a > > + * parallel apply worker. Because we have already waited for the prepare to > > + * finish in apply_handle_stream_prepare() which will ensure all the operations > > + * in that transaction have happened in the subscriber and no concurrent > > + * transaction can create deadlock or transaction dependency issues. > > */ > > static void > > apply_handle_commit_prepared(StringInfo s) > > > > "worker. Because" -> "worker because" > > > > I think this will make this line too long. Can we think of breaking it > in some way? OK, how about below: Note that we don't need to wait here if the transaction was prepared in a parallel apply worker. In that case, we have already waited for the prepare to finish in apply_handle_stream_prepare() which will ensure all the operations in that transaction have happened in the subscriber, so no concurrent transaction can cause deadlock or transaction dependency issues. > > > > > 43. > > > > /* > > - * Initialize the worker's stream_fileset if we haven't yet. This will be > > - * used for the entire duration of the worker so create it in a permanent > > - * context. We create this on the very first streaming message from any > > - * transaction and then use it for this and other streaming transactions. > > - * Now, we could create a fileset at the start of the worker as well but > > - * then we won't be sure that it will ever be used. > > + * For the first stream start, check if there is any free parallel apply > > + * worker we can use to process this transaction. > > */ > > - if (MyLogicalRepWorker->stream_fileset == NULL) > > + if (first_segment) > > + parallel_apply_start_worker(stream_xid); > > > > This comment update seems misleading. The > > parallel_apply_start_worker() isn't just checking if there is a free > > worker. All that free worker logic stuff is *inside* the > > parallel_apply_start_worker() function, so maybe no need to mention > > about it here at the caller. > > > > It will be good to have some comments here instead of completely removing it. > > > > > 39. - apply_handle_stream_abort > > > > + /* We receive abort information only when we can apply in parallel. */ > > + if (MyLogicalRepWorker->parallel_apply) > > + read_abort_info = true; > > > > 44a. > > SUGGESTION > > We receive abort information only when the publisher can support parallel apply. > > > > The existing comment seems better to me in this case. > > > > > 55. - LogicalRepWorker > > > > + /* Indicates whether apply can be performed parallelly. */ > > + bool parallel_apply; > > + > > > > 55a. > > "parallelly" - ?? is there a better way to phrase this? IMO that is an > > uncommon word. > > > > How about ".. can be performed in parallel."? > > > ~ > > > > 55b. > > IMO this member name should be named slightly different to give a > > better feel for what it really means. > > > > Maybe something like one of: > > "parallel_apply_ok" > > "parallel_apply_enabled" > > "use_parallel_apply" > > etc? > > > > The extra word doesn't seem to be useful here. > > > 58. > > > > --- fail - streaming must be boolean > > +-- fail - streaming must be boolean or 'parallel' > > CREATE SUBSCRIPTION regress_testsub CONNECTION > > 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect = > > false, streaming = foo); > > > > I think there are tests already for explicitly create/set the > > subscription parameter streaming = on/off/parallel > > > > But what about when there is no value explicitly specified? Shouldn't > > there also be tests like below to check that *implied* boolean true > > still works for this enum? > > > > CREATE SUBSCRIPTION ... WITH (streaming) > > ALTER SUBSCRIPTION ... SET (streaming) > > > > I think before adding new tests for this, please check if we have any > similar tests for other boolean options. IMO this one is a bit different because it's not really a boolean option anymore - it's a kind of a hybrid boolean/enum. That's why I thought this ought to be tested regardless if there are existing tests for the (normal) boolean options. ------ Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: