RE: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers
From | wangw.fnst@fujitsu.com |
---|---|
Subject | RE: Perform streaming logical transactions by background workers and parallel apply |
Date | |
Msg-id | OS3PR01MB6275739E73E8BEC5D13FB6739E6B9@OS3PR01MB6275.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Perform streaming logical transactions by background workers and parallel apply (Peter Smith <smithpb2250@gmail.com>) |
Responses |
RE: Perform streaming logical transactions by background workers and parallel apply
|
List | pgsql-hackers |
On Fri, August 12, 2022 12:46 PM Peter Smith <smithpb2250@gmail.com> wrote: > Here are some review comments for v20-0003: > > (Sorry - the reviews are time consuming, so I am lagging slightly > behind the latest posted version) Thanks for your comments. > 1. <General> > > 1a. > There are a few comment modifications in this patch (e.g. changing > FROM "in an apply background worker" TO "using an apply background > worker"). e.g. I noticed lots of these in worker.c but they might be > in other files too. > > Although these are good changes, these are just tweaks to new comments > introduced by patch 0001, so IMO such changes belong in that patch, > not in this one. > > 1b. > Actually, there are still some comments says "by an apply background > worker///" and some saying "using an apply background worker..." and > some saying "in the apply background worker...". Maybe they are all > OK, but it will be better if all such can be searched and made to have > consistent wording Improved. > 2. Commit message > > 2a. > > Without these restrictions, the following scenario may occur: > The apply background worker lock a row when processing a streaming > transaction, > after that the main apply worker tries to lock the same row when processing > another transaction. At this time, the main apply worker waits for the > streaming transaction to complete and the lock to be released, it won't send > subsequent data of the streaming transaction to the apply background worker; > the apply background worker waits to receive the rest of streaming transaction > and can't finish this transaction. Then the main apply worker will wait > indefinitely. > > "background worker lock a row" -> "background worker locks a row" > > "Then the main apply worker will wait indefinitely." -> really, you > already said the main apply worker is waiting, so I think this > sentence only needs to say: "Now a deadlock has occurred, so both > workers will wait indefinitely." > > 2b. > > Text fragments are all common between: > > i. This commit message > ii. Text in pgdocs CREATE SUBSCRIPTION > iii. Function comment for 'logicalrep_rel_mark_parallel_apply' in relation.c > > After addressing other review comments please make sure all those 3 > parts are worded same. Improved. > 3. doc/src/sgml/ref/create_subscription.sgml > > + There are two requirements for using <literal>parallel</literal> > + mode: 1) the unique column in the table on the subscriber-side should > + also be the unique column on the publisher-side; 2) there cannot be > + any non-immutable functions used by the subscriber-side replicated > + table. > > 3a. > I am not sure – is "requirements" the correct word here, or maybe it > should be "prerequisites". > > 3b. > Is it correct to say "should also be", or should that say "must also be"? Improved. > 4. src/backend/replication/logical/applybgworker.c - > apply_bgworker_relation_check > > + /* > + * Skip check if not using apply background workers. > + * > + * If any worker is handling the streaming transaction, this check needs to > + * be performed not only in the apply background worker, but also in the > + * main apply worker. This is because without these restrictions, main > + * apply worker may block apply background worker, which will cause > + * infinite waits. > + */ > + if (!am_apply_bgworker() && > + (list_length(ApplyBgworkersFreeList) == list_length(ApplyBgworkersList))) > + return; > > I struggled a bit to reconcile the comment with the condition. Is the > !am_apply_bgworker() part of this even needed – isn't the > list_length() check enough? We need to check this for apply bgworker. (Both lists are "NIL" in apply bgworker.) > 5. > > + /* We are in error mode and should give user correct error. */ > > I still [1, #3.4a] don't see the value in saying "should give correct > error" (e.g. what's the alternative?). > > Maybe instead of that comment it can just say: > rel->parallel_apply = PARALLEL_APPLY_UNSAFE; I changed if-statement to report the error: If 'parallel_apply' isn't 'PARALLEL_APPLY_SAFE', then report the error. > 6. src/backend/replication/logical/proto.c - RelationGetUniqueKeyBitmap > > + /* Add referenced attributes to idindexattrs */ > + for (i = 0; i < indexRel->rd_index->indnatts; i++) > + { > + int attrnum = indexRel->rd_index->indkey.values[i]; > + > + /* > + * We don't include non-key columns into idindexattrs > + * bitmaps. See RelationGetIndexAttrBitmap. > + */ > + if (attrnum != 0) > + { > + if (i < indexRel->rd_index->indnkeyatts && > + !bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber, attunique)) > + attunique = bms_add_member(attunique, > + attrnum - FirstLowInvalidHeapAttributeNumber); > + } > + } > > There are 2x comments in that code that are referring to > 'idindexattrs' but I think it is a cut/paste problem because that > variable name does not even exist in this copied function. Fixed the comments. > 7. src/backend/replication/logical/relation.c - > logicalrep_rel_mark_parallel_apply > > + /* Initialize the flag. */ > + entry->parallel_apply = PARALLEL_APPLY_SAFE; > > I have unsuccessfully repeated the same review comment several times > [1 #3.8] suggesting that this flag should not be initialized to SAFE. > IMO the state should remain as UNKNOWN until you are either sure it is > SAFE, or sure it is UNSAFE. Anyway, I'll give up on this point now; > let's see what other people think. Okay, I will follow the relevant comments later. > 8. src/include/replication/logicalrelation.h > > +/* > + * States to determine if changes on one relation can be applied using an > + * apply background worker. > + */ > +typedef enum ParallelApplySafety > +{ > + PARALLEL_APPLY_UNKNOWN = 0, /* unknown */ > + PARALLEL_APPLY_SAFE, /* Can apply changes using an apply background > + worker */ > + PARALLEL_APPLY_UNSAFE /* Can not apply changes using an apply > + background worker */ > +} ParallelApplySafety; > + > > I think the values are self-explanatory so the comments for every > value add nothing here, particularly since the enum itself has a > comment saying the same thing. I'm not sure if you accidentally missed > my previous comment [1, #3.12b] about this, or just did not agree with > it. Changed. > 9. .../subscription/t/015_stream.pl > > +# "streaming = parallel" does not support non-immutable functions, so change > +# the function in the defult expression of column "c". > +$node_subscriber->safe_psql( > + 'postgres', qq{ > +ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT > to_timestamp(1284352323); > +ALTER SUBSCRIPTION tap_sub SET(streaming = parallel, binary = off); > +}); > > 9a. > typo "defult" > > 9b. > The problem with to_timestamp(1284352323) is that it looks like it > must be some special value, but in fact AFAIK you don't care at all > what value timestamp this is. I think it would be better here to just > use to_timestamp(0) or to_timestamp(999) or similar so the number is > obviously not something of importance. > > ====== > > 10. .../subscription/t/016_stream.pl > > +# "streaming = parallel" does not support non-immutable functions, so change > +# the function in the defult expression of column "c". > +$node_subscriber->safe_psql( > + 'postgres', qq{ > +ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT > to_timestamp(1284352323); > +ALTER SUBSCRIPTION tap_sub SET(streaming = parallel); > +}); > > 10a. Ditto 9a. > 10b. Ditto 9b. > > ====== > > 11. .../subscription/t/022_twophase_cascade.pl > > +# "streaming = parallel" does not support non-immutable functions, so change > +# the function in the defult expression of column "c". > +$node_B->safe_psql( > + 'postgres', "ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT > to_timestamp(1284352323);"); > +$node_C->safe_psql( > + 'postgres', "ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT > to_timestamp(1284352323);"); > + > > 11a. Ditto 9a. > 11b. Ditto 9b. > > ====== > > 12. .../subscription/t/023_twophase_stream.pl > > +# "streaming = parallel" does not support non-immutable functions, so change > +# the function in the defult expression of column "c". > +$node_subscriber->safe_psql( > + 'postgres', qq{ > +ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT > to_timestamp(1284352323); > +ALTER SUBSCRIPTION tap_sub SET(streaming = parallel); > +}); > > 12a. Ditto 9a. > 12b. Ditto 9b. Improved. > 13. .../subscription/t/032_streaming_apply.pl > > +# Drop default value on the subscriber, now it works. > +$node_subscriber->safe_psql('postgres', > + "ALTER TABLE test_tab1 ALTER COLUMN b DROP DEFAULT"); > > Maybe for these tests like this it would be better to test if it works > OK using an immutable DEFAULT function instead of just completely > removing the bad function to make it work. > > I think maybe the same was done for TRIGGER tests. There was a test > for a trigger with a bad function, and then the trigger was removed. > What about including a test for the trigger with a good function? Improved. Attach the new patches. Regards, Wang wei
Attachment
- v22-0001-Perform-streaming-logical-transactions-by-backgr.patch
- v22-0002-Test-streaming-parallel-option-in-tap-test.patch
- v22-0003-Add-some-checks-before-using-apply-background-wo.patch
- v22-0004-Retry-to-apply-streaming-xact-only-in-apply-work.patch
- v22-0005-Add-a-main_worker_pid-to-pg_stat_subscription.patch
pgsql-hackers by date: