Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id CAA4eK1+S5CX=OzD01bgoEuxQqF45zyYV-dJGh3K2f4j6DyDiOA@mail.gmail.com
Whole thread Raw
In response to Re: Perform streaming logical transactions by background workers and parallel apply  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: Perform streaming logical transactions by background workers and parallel apply  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Tue, Dec 6, 2022 at 5:27 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are my review comments for patch v55-0002
>
...
>
> 3. pa_spooled_messages
>
> Previously I suggested this function name should be changed but that
> was rejected (see [1] #6a)
>
> > 6a.
> > IMO a better name for this function would be
> > pa_apply_spooled_messages();
> Not sure about this.
>
> ~
>
> FYI the reason for the previous suggestion is because there is no verb
> in the current function name, so the reader is left thinking
> pa_spooled_messages "what"?
>
> It means the caller has to have extra comments like:
> /* Check if changes have been serialized to a file. */
> pa_spooled_messages();
>
> OTOH, if the function was called something better -- e.g.
> pa_check_for_spooled_messages() or similar -- then it would be
> self-explanatory.
>

I think pa_check_for_spooled_messages() could be misleading because we
do apply the changes in that function, so probably a comment as
suggested by you is a better option.

> ~
>
> 4.
>
>  /*
> + * Replay the spooled messages in the parallel apply worker if the leader apply
> + * worker has finished serializing changes to the file.
> + */
> +static void
> +pa_spooled_messages(void)
>
> I'm not 100% sure of the logic, so IMO maybe the comment should say a
> bit more about how this works:
>
> Specifically, let's say there was some timeout and the LA needed to
> write the spool file, then let's say the PA timed out and found itself
> inside this function. Now, let's say the LA is still busy writing the
> file -- so what happens next?
>
> Does this function simply return, then the main PA loop waits again,
> then the times out again, then PA finds itself back inside this
> function again... and that keeps happening over and over until
> eventually the spool file is found FS_READY? Some explanatory comments
> might help.
>

No, PA will simply wait for LA to finish. See the code handling for
FS_BUSY state. We might want to slightly improve part of the current
comment to: "If the leader apply worker is busy serializing the
partial changes then acquire the stream lock now and wait for the
leader worker to finish serializing the changes".

>
> 16. apply_spooled_messages
>
> + stream_fd = BufFileOpenFileSet(stream_fileset, path, O_RDONLY, false);
>
> Something still seems a bit odd about this to me (previously also
> mentioned in review [1] #29) but I cannot quite put my finger on it...
>
> AFAIK the 'stream_fd' is the global the LA is using to remember the
> single stream spool file; It corresponds to the LogicalRepWorker's
> 'stream_fileset'. So using that same global on the PA side somehow
> seemed strange to me. The fileset at PA comes from a different place
> (MyParallelShared->fileset).
>

I think 'stream_fd' is specific to apply module which can be used by
apply, tablesync, or parallel worker. Unfortunately, now, the code in
worker.c is a mix of worker and apply module. At some point, we should
separate apply logic to a separate file.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Vik Fearing
Date:
Subject: Re: ANY_VALUE aggregate
Next
From: Vik Fearing
Date:
Subject: Re: ANY_VALUE aggregate