Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Perform streaming logical transactions by background workers and parallel apply |
Date | |
Msg-id | CAHut+PsUCEbu5dgHarMAPZu5rCs62racVXs=CCbkt2q-eXMRYA@mail.gmail.com Whole thread Raw |
In response to | RE: Perform streaming logical transactions by background workers and parallel apply ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>) |
Responses |
Re: Perform streaming logical transactions by background workers and parallel apply
|
List | pgsql-hackers |
Hi, I have done some testing for this patch. This post describes my tests so far and the results observed. Background - Testing multiple PA workers: --------------------------------------- The "parallel apply" feature allocates the PA workers (if it can) upon receiving STREAM_START replication protocol msg. This means that if there are replication messages for overlapping streaming transactions you should see multiple PA workers processing them (assuming the PA pool size is configured appropriately). But AFAIK the only way to cause replication protocol messages to arrive and be applied in a particular order is by manual testing (e.g use 2x psql sessions and manually arrange for there to be overlapping transactions for the published table). I have tried to make this kind of (regression) testing easier -- in order to test many overlapping combinations in a repeatable and semi-automated way I have posted a small enhancement to the isolationtester spec grammar [1]. Using this, now we can just press a button to test lots of different streaming transaction combinations and then observe the parallel apply message dispatching in action... Test message combinations (from specs/pub-sub.spec): ---------------------------------------------------- # single tx permutation ps1_begin ps1_ins ps1_commit ps1_sel ps2_sel sub_sleep sub_sel permutation ps2_begin ps2_ins ps2_commit ps1_sel ps2_sel sub_sleep sub_sel # rollback permutation ps1_begin ps1_ins ps1_rollback ps1_sel sub_sleep sub_sel # overlapping tx rollback and commit permutation ps1_begin ps1_ins ps2_begin ps2_ins ps1_rollback ps2_commit sub_sleep sub_sel permutation ps1_begin ps1_ins ps2_begin ps2_ins ps1_commit ps2_rollback sub_sleep sub_sel # overlapping tx commits permutation ps1_begin ps1_ins ps2_begin ps2_ins ps2_commit ps1_commit sub_sleep sub_sel permutation ps1_begin ps1_ins ps2_begin ps2_ins ps1_commit ps2_commit sub_sleep sub_sel permutation ps1_begin ps2_begin ps1_ins ps2_ins ps2_commit ps1_commit sub_sleep sub_sel permutation ps1_begin ps2_begin ps1_ins ps2_ins ps1_commit ps2_commit sub_sleep sub_sel permutation ps1_begin ps2_begin ps2_ins ps1_ins ps2_commit ps1_commit sub_sleep sub_sel permutation ps1_begin ps2_begin ps2_ins ps1_ins ps1_commit ps2_commit sub_sleep sub_sel Test setup: ----------- 1. Setup publisher and subscriber servers 1a. Publisher server is configured to use new GUC 'force_stream_mode = true' [2]. This means even single-row inserts cause replication STREAM_START messages which will trigger the PA workers. 1b. Subscriber server is configured to use new GUC 'max_parallel_apply_workers_per_subscription'. Set this value to change how many PA workers can be allocated. 2. isolation/specs/pub-test.spec (defines the publisher sessions being tested) How verified: ------------- 1. Running the isolationtester pub-sub.spec test gives the expected table results (so data was replicated OK) - any new permutations can be added as required. - more overlapping sessions (e.g. 3 or 4...) can be added as required. 2. Changing the publisher GUC 'force_stream_mode' to be true/false - we can see if PA workers being used or not being used -- (ps -eaf | grep 'logical replication') 3. Changing the subscriber GUC 'max_parallel_apply_workers_per_subscription' - set to high value or low value so we can see the PA worker (pool) being used or filling to capacity 4. I have also patched some temporary logging into code for both "LA" and "PA" workers - now the subscriber logfile leaves a trail of evidence about which worker did what (for apply_dispatch and for locking calls) Observed Results: ----------------- 1. From the user's POV everything is normal - data gets replicated as expected regardless of GUC settings (force_streaming / max_parallel_apply_workers_per_subscription). [postgres@CentOS7-x64 isolation]$ make check-pub-sub ... ============== creating temporary instance ============== ============== initializing database system ============== ============== starting postmaster ============== running on port 61696 with PID 11822 ============== creating database "isolation_regression" ============== CREATE DATABASE ALTER DATABASE ALTER DATABASE ALTER DATABASE ALTER DATABASE ALTER DATABASE ALTER DATABASE ============== running regression test queries ============== test pub-sub ... ok 33424 ms ============== shutting down postmaster ============== ============== removing temporary instance ============== ===================== All 1 tests passed. ===================== 2. Confirmation multiple PA workers were used (force_streaming=true / max_parallel_apply_workers_per_subscription=99) [postgres@CentOS7-x64 isolation]$ ps -eaf | grep 'logical replication' postgres 5298 5293 0 Dec19 ? 00:00:00 postgres: logical replication launcher postgres 5306 5301 0 Dec19 ? 00:00:00 postgres: logical replication launcher postgres 17301 5301 0 10:31 ? 00:00:00 postgres: logical replication parallel apply worker for subscription 16387 postgres 17524 5301 0 10:31 ? 00:00:00 postgres: logical replication parallel apply worker for subscription 16387 postgres 21134 5301 0 08:08 ? 00:00:01 postgres: logical replication apply worker for subscription 16387 postgres 22377 13260 0 10:34 pts/0 00:00:00 grep --color=auto logical replication 3. Confirmation no PA workers were used when not streaming (force_streaming=false / max_parallel_apply_workers_per_subscription=99) [postgres@CentOS7-x64 isolation]$ ps -eaf | grep 'logical replication' postgres 26857 26846 0 10:37 ? 00:00:00 postgres: logical replication launcher postgres 26875 26864 0 10:37 ? 00:00:00 postgres: logical replication launcher postgres 26889 26864 0 10:37 ? 00:00:00 postgres: logical replication apply worker for subscription 16387 postgres 29901 13260 0 10:39 pts/0 00:00:00 grep --color=auto logical replication 4. Confirmation only one PA worker gets used when the pool is limited (force_streaming=true / max_parallel_apply_workers_per_subscription=1) 4a. (processes) [postgres@CentOS7-x64 isolation]$ ps -eaf | grep 'logical replication' postgres 2484 13260 0 10:42 pts/0 00:00:00 grep --color=auto logical replication postgres 32500 32495 0 10:40 ? 00:00:00 postgres: logical replication launcher postgres 32508 32503 0 10:40 ? 00:00:00 postgres: logical replication launcher postgres 32514 32503 0 10:41 ? 00:00:00 postgres: logical replication apply worker for subscription 16387 4b. (logs) 2022-12-20 10:41:43.551 AEDT [32514] LOG: out of parallel apply workers 2022-12-20 10:41:43.551 AEDT [32514] HINT: You might need to increase max_parallel_apply_workers_per_subscription. 2022-12-20 10:41:43.551 AEDT [32514] CONTEXT: processing remote data for replication origin "pg_16387" during message type "STREAM START" in transaction 756 5. Confirmation no PA workers get used when there is none available (force_streaming=true / max_parallel_apply_workers_per_subscription=0) 5a. (processes) [postgres@CentOS7-x64 isolation]$ ps -eaf | grep 'logical replication' postgres 10026 10021 0 10:47 ? 00:00:00 postgres: logical replication launcher postgres 10034 10029 0 10:47 ? 00:00:00 postgres: logical replication launcher postgres 10041 10029 0 10:47 ? 00:00:00 postgres: logical replication apply worker for subscription 16387 postgres 13068 13260 0 10:48 pts/0 00:00:00 grep --color=auto logical replication 5b. (logs) 2022-12-20 10:47:50.216 AEDT [10041] LOG: out of parallel apply workers 2022-12-20 10:47:50.216 AEDT [10041] HINT: You might need to increase max_parallel_apply_workers_per_subscription. .. Also, there are no "PA" log messages present Summary ------- In summary, everything I have tested so far appeared to be working properly. In other words, for overlapping streamed transactions of different kinds, and regardless of whether zero/some/all of those transactions are getting processed by a PA worker, the resulting replicated data looked consistently OK. PSA some files - test_init.sh - sample test script for setup publisher/subscriber required by spec test. - spec/pub-sub.spec = spec combinations for causing overlapping streaming transactions - pub-sub.out = output from successful isolationtester (make check-pub-sub) run - SUB.log = subscriber logs augmented with my "LA" and "PA" extra logging for showing locking/dispatching. (I can also post my logging patch if anyone is interested to try using it to see the output like in SUB.log). NOTE - all testing described in this post above was using v58-0001 only. However, the point of implementing these as a .spec test was to be able to repeat these same regression tests on newer versions with minimal manual steps required. Later I plan to fetch/apply the most recent patch version and repeat these same tests. ------ [1] My isolationtester conninfo enhancement v2 - https://www.postgresql.org/message-id/CAHut%2BPv_1Mev0709uj_OjyNCzfBjENE3RD9%3Dd9RZYfcqUKfG%3DA%40mail.gmail.com [2] Shi-san's GUC 'force_streaming_mode' - https://www.postgresql.org/message-id/flat/OSZPR01MB63104E7449DBE41932DB19F1FD1B9%40OSZPR01MB6310.jpnprd01.prod.outlook.com Kind Regards, Peter Smith. Fujitsu Australia
Attachment
pgsql-hackers by date: