RE: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From kuroda.hayato@fujitsu.com
Subject RE: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id TYAPR01MB58660B4732E7F80B322174A3F5629@TYAPR01MB5866.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: Perform streaming logical transactions by background workers and parallel apply  ("wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>)
Responses RE: Perform streaming logical transactions by background workers and parallel apply
RE: Perform streaming logical transactions by background workers and parallel apply
List pgsql-hackers
Dear Wang,

Thanks for updating patch sets! Followings are comments about v20-0001.

1. config.sgml

```
       <para>
        Specifies maximum number of logical replication workers. This includes
        both apply workers and table synchronization workers.
       </para>
```

I think you can add a description in the above paragraph, like
" This includes apply main workers, apply background workers, and table synchronization workers."

2. logical-replication.sgml

2.a Configuration Settings

```
   <varname>max_logical_replication_workers</varname> must be set to at least
   the number of subscriptions, again plus some reserve for the table
   synchronization.
```

I think you can add a description in the above paragraph, like
"... the number of subscriptions, plus some reserve for the table synchronization
 and the streaming transaction."

2.b Monitoring

```
  <para>
   Normally, there is a single apply process running for an enabled
   subscription.  A disabled subscription or a crashed subscription will have
   zero rows in this view.  If the initial data synchronization of any
   table is in progress, there will be additional workers for the tables
   being synchronized.
  </para>
```

I think you can add a sentence in the above paragraph, like 
"... synchronized. Moreover, if the streaming transaction is applied parallelly,
there will be additional workers"

3. launcher.c

```
+       /* Sanity check : we don't support table sync in subworker. */
```

I think "Sanity check :" should be "Sanity check:", per other files.

4. worker.c

4.a handle_streamed_transaction()

```
-       /* not in streaming mode */
-       if (!in_streamed_transaction)
+       /* Not in streaming mode */
+       if (!(in_streamed_transaction || am_apply_bgworker()))
```

I think the comment should also mention about apply background worker case.

4.b handle_streamed_transaction()

```
-       Assert(stream_fd != NULL);
```

I think this assersion seems reasonable in case of stream='on'.
Could you revive it and move to later part in the function, like after subxact_info_add(current_xid)?

4.c apply_handle_prepare_internal()

```
         * BeginTransactionBlock is necessary to balance the EndTransactionBlock
         * called within the PrepareTransactionBlock below.
         */
-       BeginTransactionBlock();
+       if (!IsTransactionBlock())
+               BeginTransactionBlock();
+
```

I think the comment should be "We must be in transaction block to balance...".

4.d apply_handle_stream_prepare()

```
- *
- * Logic is in two parts:
- * 1. Replay all the spooled operations
- * 2. Mark the transaction as prepared
  */
 static void
 apply_handle_stream_prepare(StringInfo s)
```

I think these comments are useful when stream='on',
so it should be moved to later part.

5. applybgworker.c

5.a apply_bgworker_setup()

```
+       elog(DEBUG1, "setting up apply worker #%u", list_length(ApplyBgworkersList) + 1); 
```

"apply worker" should be "apply background worker".

5.b LogicalApplyBgwLoop()

```
+                               elog(DEBUG1, "[Apply BGW #%u] ended processing streaming chunk,"
+                                        "waiting on shm_mq_receive", shared->worker_id);
```

A blank is needed after comma. I checked serverlog, and the message outputed like:

```
[Apply BGW #1] ended processing streaming chunk,waiting on shm_mq_receive
```

6.

When I started up the apply background worker and did `SELECT * from pg_stat_subscription`, I got following lines:

```
postgres=# select * from pg_stat_subscription;
 subid | subname |  pid  | relid | received_lsn |      last_msg_send_time       |     last_msg_receipt_time     |
latest_end_lsn|        latest_end
 
_time        

-------+---------+-------+-------+--------------+-------------------------------+-------------------------------+----------------+------------------
-------------
 16400 | sub     | 22383 |       |              | -infinity                     | -infinity                     |
        | -infinity
 
 16400 | sub     | 22312 |       | 0/6734740    | 2022-08-09 07:40:19.367676+00 | 2022-08-09 07:40:19.375455+00 |
0/6734740     | 2022-08-09 07:40:
 
19.367676+00
(2 rows)
```


6.a

It seems that the upper line represents the apply background worker, but I think last_msg_send_time and
last_msg_receipt_timeshould be null.
 
Is it like initialization mistake?

```
$ ps aux | grep 22383
... postgres: logical replication apply background worker for subscription 16400
```

6.b

Currently, the documentation doesn't clarify the method to determine the type of logical replication workers.
Could you add descriptions about it?
I think adding a column "subworker" is an alternative approach.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED


pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication
Next
From: Michael Paquier
Date:
Subject: Re: [PATCH] Expose port->authn_id to extensions and triggers