RE: Parallel Apply - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Parallel Apply
Date
Msg-id TYRPR01MB14195CF528AD5AE1450A9B824942A2@TYRPR01MB14195.jpnprd01.prod.outlook.com
Whole thread
In response to Re: Parallel Apply  (shveta malik <shveta.malik@gmail.com>)
Responses Re: Parallel Apply
List pgsql-hackers
On Thursday, April 23, 2026 2:32 PM shveta malik <shveta.malik@gmail.com> wrote:
> 
> On Thu, Apr 23, 2026 at 7:31 AM Peter Smith <smithpb2250@gmail.com>
> wrote:
> >
> > On Wed, Apr 22, 2026 at 7:23 PM shveta malik <shveta.malik@gmail.com>
> wrote:
> > >
> > ...
> > > Regarding 0001, I did not understand the need of having 2 separate
> messages:
> > >
> > > +#define PARALLEL_APPLY_INTERNAL_MESSAGE 'i'
> > > + LOGICAL_REP_MSG_INTERNAL_MESSAGE = 'i',
> > >
> > > And the need of sending both together in 0003:
> > >
> > > +send_internal_dependencies(ParallelApplyWorkerInfo *winfo, List
> > > *depends_on_xids)
> > > +{
> > > + pq_sendbyte(&dependencies, PARALLEL_APPLY_INTERNAL_MESSAGE);
> > > + pq_sendbyte(&dependencies,
> LOGICAL_REP_MSG_INTERNAL_MESSAGE);
> > >
> > >
> > > Also, it is confusing that above 2 are 'i' and
> > > WORKER_INTERNAL_MSG_RELATION is also 'i'. Code has become very
> tricky
> > > to understand now.
> > >
> > > Reviewing everything, I feel having 'i' outside of LogicalRepMsgType
> > > was better. I think it will eb better to retain
> > > PARALLEL_APPLY_INTERNAL_MESSAGE and getting rid of
> > > LOGICAL_REP_MSG_INTERNAL_MESSAGE. And when any worker
> intercepts
> > > PARALLEL_APPLY_INTERNAL_MESSAGE, it need not dispatch
> > > (apply_dispatch), instead it can handle it using
> > > apply_handle_internal_message()
> > >
> > > Goign above way:
> > > --Messaged received from pub can be handled using apply_dispatch.
> > > --Messages generated from leader to be handled separately/internally
> > > using apply_handle_internal_message().
> > >
> > > That way we have clear-cut boundary between the two types and less
> confusion.
> >
> > Hi Shveta,
> >
> > IIUC these need to be separate because they are used in 2 completly
> > different ways:
> >
> > 1. In LogicalParallelApplyLoop the code need to identify as different
> > from PqReplMsg_WALData
> > 2. In apply_dispach() the message is delegated elsewhere according to
> > the type LogicalRepMsgType
> >
> > PSA a pictue I made for my understanding of the current v15-0001
> > design. It might help to visualize the message format more easily.
> >
> > While your suggestion looks good for LogicalParallelApplyLoop, I think
> > the real problem is going to be in the apply_spooled_mesages() which
> > wants call the apply_dispatch() directly. That won't be possible if
> > LOGICAL_REP_MSG_INTERNAL_MESSAGE is removed. And, you cannot call
> > directly to apply_handle_internal_message() withint knowing it is a
> > PARALLEL_APPLY_INTERNAL_MESSAGE  message, but that means first read
> it
> > pq_getmsgbyte(s). Then, you also need some hacky way to "unread" that
> > byte in case it was not the PARALLEL_APPLY_INTERNAL_MESSAGE byte, but
> > something different.  AFAIK that was exactly what the previous
> > v14-0001 code was doing with the is_worker_internal_message()
> > function. I also think v15-0001 is a bit confusing, but v14-0001 was
> > even more so.
> >
> > If there was some new function like `pq_peekmsgbyte(s)` which could
> > simply "peek" the message byte value without advancing the cursor.
> > Then, I apply_spooled_mesages() can just peek to find
> > PARALLEL_APPLY_INTERNAL_MESSAGE and your suggested simplification
> > could work. But it would *still* be complicated by the fact that you
> > would have to ensure that PARALLEL_APPLY_INTERNAL_MESSAGE could
> not
> > clash with any of the LogicalRepMsgType! In the end, just keeping the
> > LOGICAL_REP_MSG_INTERNAL_MESSAGE like v14 does may be the best
> way to
> > ensure that uniqueness...
> 
> Okay. I see your point. Thanks for explaning.
> 
> Another approach could be the one shown in the attached patch. In this
> approach:
> 
> a) We avoid pre-reading the message and then rewinding the cursor,
> unlike the approach used in apply_spooled_messages() in v14.
> b) We keep a single LOGICAL_REP_MSG_INTERNAL_MESSAGE for internal
> messages; a separate PARALLEL_APPLY_INTERNAL_MESSAGE wrapper is not
> required.
> c) The caller decides whether to let apply_dispatch read the next
> message or to act on an already pre-read message. This makes the
> design more flexible if we need to handle additional pre-read internal
> messages in the future, without introducing new wrapper message
> formats.
> d) The logic for dispatching actions on all message types remains
> encapsulated within apply_dispatch.

I think the first thing we need to decide is the message format sent to the
parallel worker versus the format used for spooled messages.

Option 1 (Current approach):
  Message to parallel worker:
    PARALLEL_APPLY_INTERNAL_MESSAGE (1 byte) +
    LOGICAL_REP_MSG_INTERNAL_MESSAGE (1 byte) +
    WorkerInternalMsgType + data
  Spooled message:
    LOGICAL_REP_MSG_INTERNAL_MESSAGE (1 byte) +
    WorkerInternalMsgType + data

Option 2 (Alternative):
  Message to parallel worker:
    LOGICAL_REP_MSG_INTERNAL_MESSAGE (1 byte) +
    WorkerInternalMsgType + data
  Spooled message:
    LOGICAL_REP_MSG_INTERNAL_MESSAGE (1 byte) +
    WorkerInternalMsgType + data

Option 3 (Alternative):
  Message to parallel worker:
    PARALLEL_APPLY_INTERNAL_MESSAGE (1 byte) +
    WorkerInternalMsgType + data
  Spooled message:
      WorkerInternalMsgType + data

In Option 1, the extra PARALLEL_APPLY_INTERNAL_MESSAGE byte allows the parallel
worker to distinguish internal messages from logical replication messages
(which begin with PqReplMsg_WALData). Here, LOGICAL_REP_MSG_INTERNAL_MESSAGE
serves purely as an apply action.

Option 2 also works. The only minor issue is that LOGICAL_REP_MSG_INTERNAL_MESSAGE
serves two purposes: (1) distinguishing from PqReplMsg_WALData in the parallel
worker, and (2) acting as an apply action in apply_spooled_messages(). I don't
think this is a big issue, so I'm not strongly opposed to it.

Option 3 is what the V12 patch implements. It is the simplest approach,
although it requires adding WorkerInternalMsgType values directly into
LogicalRepMsgType, which has been commented previously.

----

The second question is how to implement it.

- Option 1: Used in the latest patch (we can improve it to use distinct byte values for
  PARALLEL_APPLY_INTERNAL_MESSAGE and LOGICAL_REP_MSG_INTERNAL_MESSAGE for clarity).

- Option 2

If we want to reuse LOGICAL_REP_MSG_INTERNAL_MESSAGE for both purposes, we could
directly call apply_handle_internal_message in the parallel worker like this (We
might need to set apply_error_callback_arg.command for this calling manually, so
that the errcontext can work):

    if (c == PqReplMsg_WALData)
    {
        ...
        apply_dispatch(&s);
    }
    else if (c == LOGICAL_REP_MSG_INTERNAL_MESSAGE)
    {
        /* Handle the internal message. */
        apply_handle_internal_message(&s);
    }

Shveta's patch does something similar but adds an extra parameter to
apply_dispatch to control whether the function reads the first byte or uses a
passed-in byte. I'm not sure if changing the interface is worth it, as it seems
to complicate apply_dispatch() unnecessarily.

- Option 3: Used in the older V12 patch.

At the code level, I personally prefer Option 3, but I understand the reluctance
to add internal values to LogicalRepMsgType. So, I'm not sure any of the
proposed alternatives are clearly better.

Best Regards,
Hou zj

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: EXCEPT TABLE - Case inconsistency for describe \d and \dRp+
Next
From: Dilip Kumar
Date:
Subject: Re: Parallel Apply