Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id CAD21AoCRDWMebeDNeMNgf4LpTA23si_1xU0T_F2Rh+BD9HU6=Q@mail.gmail.com
Whole thread Raw
In response to RE: Perform streaming logical transactions by background workers and parallel apply  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Responses Re: Perform streaming logical transactions by background workers and parallel apply  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Thu, Oct 6, 2022 at 9:04 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Masahiko Sawada <sawada.mshk@gmail.com>
> > Sent: Thursday, October 6, 2022 4:07 PM
> > To: Hou, Zhijie/侯 志杰 <houzj.fnst@fujitsu.com>
> > Cc: Amit Kapila <amit.kapila16@gmail.com>; Wang, Wei/王 威
> > <wangw.fnst@fujitsu.com>; Peter Smith <smithpb2250@gmail.com>; Dilip
> > Kumar <dilipbalaut@gmail.com>; Shi, Yu/侍 雨 <shiy.fnst@fujitsu.com>;
> > PostgreSQL Hackers <pgsql-hackers@lists.postgresql.org>
> > Subject: Re: Perform streaming logical transactions by background workers and
> > parallel apply
> >
> > On Tue, Sep 27, 2022 at 9:26 PM houzj.fnst@fujitsu.com
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > On Saturday, September 24, 2022 7:40 PM Amit Kapila
> > <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Thu, Sep 22, 2022 at 3:41 PM Amit Kapila <amit.kapila16@gmail.com>
> > > > wrote:
> > > > >
> > > > > On Thu, Sep 22, 2022 at 8:59 AM wangw.fnst@fujitsu.com
> > > > > <wangw.fnst@fujitsu.com> wrote:
> > > > > >
> > > > >
> > > > > Few comments on v33-0001
> > > > > =======================
> > > > >
> > > >
> > > > Some more comments on v33-0001
> > > > =============================
> > > > 1.
> > > > + /* Information from the corresponding LogicalRepWorker slot. */
> > > > + uint16 logicalrep_worker_generation;
> > > > +
> > > > + int logicalrep_worker_slot_no;
> > > > +} ParallelApplyWorkerShared;
> > > >
> > > > Both these variables are read/changed by leader/parallel workers without
> > > > using any lock (mutex). It seems currently there is no problem because of
> > the
> > > > way the patch is using in_parallel_apply_xact but I think it won't be a good
> > idea
> > > > to rely on it. I suggest using mutex to operate on these variables and also
> > check
> > > > if the slot_no is in a valid range after reading it in parallel_apply_free_worker,
> > > > otherwise error out using elog.
> > >
> > > Changed.
> > >
> > > > 2.
> > > >  static void
> > > >  apply_handle_stream_stop(StringInfo s)
> > > >  {
> > > > - if (!in_streamed_transaction)
> > > > + ParallelApplyWorkerInfo *winfo = NULL; TransApplyAction apply_action;
> > > > +
> > > > + if (!am_parallel_apply_worker() &&
> > > > + (!in_streamed_transaction && !stream_apply_worker))
> > > >   ereport(ERROR,
> > > >   (errcode(ERRCODE_PROTOCOL_VIOLATION),
> > > >   errmsg_internal("STREAM STOP message without STREAM START")));
> > > >
> > > > This check won't be able to detect missing stream start messages for parallel
> > > > apply workers apart from the first pair of start/stop. I thought of adding
> > > > in_remote_transaction check along with
> > > > am_parallel_apply_worker() to detect the same but that also won't work
> > > > because the parallel worker doesn't reset it at the stop message.
> > > > Another possibility is to introduce yet another variable for this but that
> > doesn't
> > > > seem worth it. I would like to keep this check simple.
> > > > Can you think of any better way?
> > >
> > > I feel we can reuse the in_streamed_transaction in parallel apply worker to
> > > simplify the check there. I tried to set this flag in parallel apply worker
> > > when stream starts and reset it when stream stop so that we can directly check
> > > this flag for duplicate stream start message and other related things.
> > >
> > > > 3. I think we can skip sending start/stop messages from the leader to the
> > > > parallel worker because unlike apply worker it will process only one
> > > > transaction-at-a-time. However, it is not clear whether that is worth the
> > effort
> > > > because it is sent after logical_decoding_work_mem changes. For now, I have
> > > > added a comment for this in the attached patch but let me if I am missing
> > > > something or if I am wrong.
> > >
> > > I the suggested comments look good.
> > >
> > > > 4.
> > > > postgres=# select pid, leader_pid, application_name, backend_type from
> > > > pg_stat_activity;
> > > >   pid  | leader_pid | application_name |         backend_type
> > > > -------+------------+------------------+------------------------------
> > > >  27624 |            |                  | logical replication launcher
> > > >  17336 |            | psql             | client backend
> > > >  26312 |            |                  | logical replication worker
> > > >  26376 |            | psql             | client backend
> > > >  14004 |            |                  | logical replication worker
> > > >
> > > > Here, the second worker entry is for the parallel worker. Isn't it better if we
> > > > distinguish this by keeping type as a logical replication parallel worker? I
> > think
> > > > for this you need to change bgw_type in logicalrep_worker_launch().
> > >
> > > Changed.
> > >
> > > > 5. Can we name parallel_apply_subxact_info_add() as
> > > > parallel_apply_start_subtrans()?
> > > >
> > > > Apart from the above, I have added/edited a few comments and made a few
> > > > other cosmetic changes in the attached.
> > >
> >
> > While looking at v35 patch, I realized that there are some cases where
> > the logical replication gets stuck depending on partitioned table
> > structure. For instance, there are following tables, publication, and
> > subscription:
> >
> > * On publisher
> > create table p (c int) partition by list (c);
> > create table c1 partition of p for values in (1);
> > create table c2 (c int);
> > create publication test_pub for table p, c1, c2 with
> > (publish_via_partition_root = 'true');
> >
> > * On subscriber
> > create table p (c int) partition by list (c);
> > create table c1 partition of p for values In (2);
> > create table c2 partition of p for values In (1);
> > create subscription test_sub connection 'port=5551 dbname=postgres'
> > publication test_pub with (streaming = 'parallel', copy_data =
> > 'false');
> >
> > Note that while both the publisher and the subscriber have the same
> > name tables the partition structure is different and rows go to a
> > different table on the subscriber (eg, row c=1 will go to c2 table on
> > the subscriber). If two current transactions are executed as follows,
> > the apply worker (ig, the leader apply worker) waits for a lock on c2
> > held by its parallel apply worker:
> >
> > * TX-1
> > BEGIN;
> > INSERT INTO p SELECT 1 FROM generate_series(1, 10000); --- changes are
> > streamed
> >
> >     * TX-2
> >     BEGIN;
> >     TRUNCATE c2; --- wait for a lock on c2
> >
> > * TX-1
> > INSERT INTO p SELECT 1 FROM generate_series(1, 10000);
> > COMMIT;
> >
> > This might not be a common case in practice but it could mean that
> > there is a restriction on how partitioned tables should be structured
> > on the publisher and the subscriber when using streaming = 'parallel'.
> > When this happens, since the logical replication cannot move forward
> > the users need to disable parallel-apply mode or increase
> > logical_decoding_work_mem. We could describe this limitation in the
> > doc but it would be hard for users to detect problematic table
> > structure.
>
> Thanks for testing this!
>
> I think the root reason for this kind of deadlock problems is the table
> structure difference between publisher and subscriber(similar to the unique
> difference reported earlier[1]). So, I think we'd better disallow this case. For
> example to avoid the reported problem, we could only support parallel apply if
> pubviaroot is false on publisher and replicated tables' types(relkind) are the
> same between publisher and subscriber.
>
> Although it might restrict some use cases, but I think it only restrict the
> cases when the partitioned table's structure is different between publisher and
> subscriber. User can still use parallel apply for cases when the table
> structure is the same between publisher and subscriber which seems acceptable
> to me. And we can also document that the feature is expected to be used for the
> case when tables' structure are the same. Thoughts ?

I'm concerned that it could be a big restriction for users. Having
different partitioned table's structures on the publisher and the
subscriber is quite common use cases.

From the feature perspective, the root cause seems to be the fact that
the apply worker does both receiving and applying changes. Since it
cannot receive the subsequent messages while waiting for a lock on a
table, the parallel apply worker also cannot move forward. If we have
a dedicated receiver process, it can off-load the messages to the
worker while another process waiting for a lock. So I think that
separating receiver and apply worker could be a building block for
parallel-apply.

Regards,

--
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: START_REPLICATION SLOT causing a crash in an assert build
Next
From: Nathan Bossart
Date:
Subject: Re: use has_privs_of_role() for pg_hba.conf