Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id CAA4eK1KcsGAGb8n4CkXnGtRP1_JLLt0QZTFV2UGLMHE3ZB_fMA@mail.gmail.com
Whole thread Raw
In response to Re: Perform streaming logical transactions by background workers and parallel apply  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: Perform streaming logical transactions by background workers and parallel apply  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Fri, Sep 30, 2022 at 1:56 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are my review comments for the v35-0001 patch:
>
> ======
>
> 3. GENERAL
>
> (this comment was written after I wrote all the other ones below so
> there might be some unintended overlaps...)
>
> I found the mixed use of the same member names having different
> meanings to be quite confusing.
>
> e.g.1
> PGOutputData 'streaming' is now a single char internal representation
> the subscription parameter streaming mode ('f','t','p')
> - bool streaming;
> + char streaming;
>
> e.g.2
> WalRcvStreamOptions 'streaming' is a C string version of the
> subscription streaming mode ("on", "parallel")
> - bool streaming; /* Streaming of large transactions */
> + char    *streaming; /* Streaming of large transactions */
>
> e.g.3
> SubOpts 'streaming' is again like the first example - a single char
> for the mode.
> - bool streaming;
> + char streaming;
>
>
> IMO everything would become much simpler if you did:
>
> 3a.
> Rename "char streaming;" -> "char streaming_mode;"
>
> 3b.
> Re-designed the "char *streaming;" code to also use the single char
> notation, then also call that member 'streaming_mode'. Then everything
> will be consistent.
>

Won't this impact the previous version publisher which already uses
on/off? We may need to maintain multiple values which would be
confusing.

>
> 9. - parallel_apply_can_start
>
> +/*
> + * Returns true, if it is allowed to start a parallel apply worker, false,
> + * otherwise.
> + */
> +static bool
> +parallel_apply_can_start(TransactionId xid)
>
> (The commas are strange)
>
> SUGGESTION
> Returns true if it is OK to start a parallel apply worker, false otherwise.
>

+1 for this.
>
> 28. - logicalrep_worker_detach
>
> + /* Stop the parallel apply workers. */
> + if (am_leader_apply_worker())
> + {
>
> Should that comment rather say like below?
>
> /* If this is the leader apply worker then stop all of its parallel
> apply workers. */
>

I think this would be just saying what is apparent from the code, so
not sure if it is an improvement.

>
> 38. - apply_handle_commit_prepared
>
> + *
> + * Note that we don't need to wait here if the transaction was prepared in a
> + * parallel apply worker. Because we have already waited for the prepare to
> + * finish in apply_handle_stream_prepare() which will ensure all the operations
> + * in that transaction have happened in the subscriber and no concurrent
> + * transaction can create deadlock or transaction dependency issues.
>   */
>  static void
>  apply_handle_commit_prepared(StringInfo s)
>
> "worker. Because" -> "worker because"
>

I think this will make this line too long. Can we think of breaking it
in some way?

>
> 43.
>
>   /*
> - * Initialize the worker's stream_fileset if we haven't yet. This will be
> - * used for the entire duration of the worker so create it in a permanent
> - * context. We create this on the very first streaming message from any
> - * transaction and then use it for this and other streaming transactions.
> - * Now, we could create a fileset at the start of the worker as well but
> - * then we won't be sure that it will ever be used.
> + * For the first stream start, check if there is any free parallel apply
> + * worker we can use to process this transaction.
>   */
> - if (MyLogicalRepWorker->stream_fileset == NULL)
> + if (first_segment)
> + parallel_apply_start_worker(stream_xid);
>
> This comment update seems misleading. The
> parallel_apply_start_worker() isn't just checking if there is a free
> worker. All that free worker logic stuff is *inside* the
> parallel_apply_start_worker() function, so maybe no need to mention
> about it here at the caller.
>

It will be good to have some comments here instead of completely removing it.

>
> 39. - apply_handle_stream_abort
>
> + /* We receive abort information only when we can apply in parallel. */
> + if (MyLogicalRepWorker->parallel_apply)
> + read_abort_info = true;
>
> 44a.
> SUGGESTION
> We receive abort information only when the publisher can support parallel apply.
>

The existing comment seems better to me in this case.

>
> 55. - LogicalRepWorker
>
> + /* Indicates whether apply can be performed parallelly. */
> + bool parallel_apply;
> +
>
> 55a.
> "parallelly" - ?? is there a better way to phrase this? IMO that is an
> uncommon word.
>

How about ".. can be performed in parallel."?

> ~
>
> 55b.
> IMO this member name should be named slightly different to give a
> better feel for what it really means.
>
> Maybe something like one of:
> "parallel_apply_ok"
> "parallel_apply_enabled"
> "use_parallel_apply"
> etc?
>

The extra word doesn't seem to be useful here.

> 58.
>
> --- fail - streaming must be boolean
> +-- fail - streaming must be boolean or 'parallel'
>  CREATE SUBSCRIPTION regress_testsub CONNECTION
> 'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
> false, streaming = foo);
>
> I think there are tests already for explicitly create/set the
> subscription parameter streaming = on/off/parallel
>
> But what about when there is no value explicitly specified? Shouldn't
> there also be tests like below to check that *implied* boolean true
> still works for this enum?
>
> CREATE SUBSCRIPTION ... WITH (streaming)
> ALTER SUBSCRIPTION ... SET (streaming)
>

I think before adding new tests for this, please check if we have any
similar tests for other boolean options.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: use has_privs_of_role() for pg_hba.conf
Next
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: Perform streaming logical transactions by background workers and parallel apply