RE: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From wangw.fnst@fujitsu.com
Subject RE: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id OS3PR01MB6275739E73E8BEC5D13FB6739E6B9@OS3PR01MB6275.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Perform streaming logical transactions by background workers and parallel apply  (Peter Smith <smithpb2250@gmail.com>)
Responses RE: Perform streaming logical transactions by background workers and parallel apply
List pgsql-hackers
On Fri, August 12, 2022 12:46 PM Peter Smith <smithpb2250@gmail.com> wrote:
> Here are some review comments for v20-0003:
> 
> (Sorry - the reviews are time consuming, so I am lagging slightly
> behind the latest posted version)

Thanks for your comments.

> 1. <General>
> 
> 1a.
> There are a few comment modifications in this patch (e.g. changing
> FROM "in an apply background worker" TO "using an apply background
> worker"). e.g. I noticed lots of these in worker.c but they might be
> in other files too.
> 
> Although these are good changes, these are just tweaks to new comments
> introduced by patch 0001, so IMO such changes belong in that patch,
> not in this one.
> 
> 1b.
> Actually, there are still some comments says "by an apply background
> worker///" and some saying "using an apply background worker..." and
> some saying "in the apply background worker...". Maybe they are all
> OK, but it will be better if all such can be searched and made to have
> consistent wording

Improved.

> 2. Commit message
> 
> 2a.
> 
> Without these restrictions, the following scenario may occur:
> The apply background worker lock a row when processing a streaming
> transaction,
> after that the main apply worker tries to lock the same row when processing
> another transaction. At this time, the main apply worker waits for the
> streaming transaction to complete and the lock to be released, it won't send
> subsequent data of the streaming transaction to the apply background worker;
> the apply background worker waits to receive the rest of streaming transaction
> and can't finish this transaction. Then the main apply worker will wait
> indefinitely.
> 
> "background worker lock a row" -> "background worker locks a row"
> 
> "Then the main apply worker will wait indefinitely." -> really, you
> already said the main apply worker is waiting, so I think this
> sentence only needs to say: "Now a deadlock has occurred, so both
> workers will wait indefinitely."
> 
> 2b.
> 
> Text fragments are all common between:
> 
> i.   This commit message
> ii.  Text in pgdocs CREATE SUBSCRIPTION
> iii. Function comment for 'logicalrep_rel_mark_parallel_apply' in relation.c
> 
> After addressing other review comments please make sure all those 3
> parts are worded same.

Improved.

> 3. doc/src/sgml/ref/create_subscription.sgml
> 
> +          There are two requirements for using <literal>parallel</literal>
> +          mode: 1) the unique column in the table on the subscriber-side should
> +          also be the unique column on the publisher-side; 2) there cannot be
> +          any non-immutable functions used by the subscriber-side replicated
> +          table.
> 
> 3a.
> I am not sure – is "requirements" the correct word here, or maybe it
> should be "prerequisites".
> 
> 3b.
> Is it correct to say "should also be", or should that say "must also be"?

Improved.

> 4. src/backend/replication/logical/applybgworker.c -
> apply_bgworker_relation_check
> 
> + /*
> + * Skip check if not using apply background workers.
> + *
> + * If any worker is handling the streaming transaction, this check needs to
> + * be performed not only in the apply background worker, but also in the
> + * main apply worker. This is because without these restrictions, main
> + * apply worker may block apply background worker, which will cause
> + * infinite waits.
> + */
> + if (!am_apply_bgworker() &&
> + (list_length(ApplyBgworkersFreeList) == list_length(ApplyBgworkersList)))
> + return;
> 
> I struggled a bit to reconcile the comment with the condition. Is the
> !am_apply_bgworker() part of this even needed – isn't the
> list_length() check enough?

We need to check this for apply bgworker. (Both lists are "NIL" in apply
bgworker.)

> 5.
> 
> + /* We are in error mode and should give user correct error. */
> 
> I still [1, #3.4a] don't see the value in saying "should give correct
> error" (e.g. what's the alternative?).
> 
> Maybe instead of that comment it can just say:
> rel->parallel_apply = PARALLEL_APPLY_UNSAFE;

I changed if-statement to report the error:
If 'parallel_apply' isn't 'PARALLEL_APPLY_SAFE', then report the error.

> 6. src/backend/replication/logical/proto.c - RelationGetUniqueKeyBitmap
> 
> + /* Add referenced attributes to idindexattrs */
> + for (i = 0; i < indexRel->rd_index->indnatts; i++)
> + {
> + int attrnum = indexRel->rd_index->indkey.values[i];
> +
> + /*
> + * We don't include non-key columns into idindexattrs
> + * bitmaps. See RelationGetIndexAttrBitmap.
> + */
> + if (attrnum != 0)
> + {
> + if (i < indexRel->rd_index->indnkeyatts &&
> + !bms_is_member(attrnum - FirstLowInvalidHeapAttributeNumber, attunique))
> + attunique = bms_add_member(attunique,
> +    attrnum - FirstLowInvalidHeapAttributeNumber);
> + }
> + }
> 
> There are 2x comments in that code that are referring to
> 'idindexattrs' but I think it is a cut/paste problem because that
> variable name does not even exist in this copied function.

Fixed the comments.

> 7. src/backend/replication/logical/relation.c -
> logicalrep_rel_mark_parallel_apply
> 
> + /* Initialize the flag. */
> + entry->parallel_apply = PARALLEL_APPLY_SAFE;
> 
> I have unsuccessfully repeated the same review comment several times
> [1 #3.8] suggesting that this flag should not be initialized to SAFE.
> IMO the state should remain as UNKNOWN until you are either sure it is
> SAFE, or sure it is UNSAFE. Anyway, I'll give up on this point now;
> let's see what other people think.

Okay, I will follow the relevant comments later.

> 8. src/include/replication/logicalrelation.h
> 
> +/*
> + * States to determine if changes on one relation can be applied using an
> + * apply background worker.
> + */
> +typedef enum ParallelApplySafety
> +{
> + PARALLEL_APPLY_UNKNOWN = 0, /* unknown  */
> + PARALLEL_APPLY_SAFE, /* Can apply changes using an apply background
> +    worker */
> + PARALLEL_APPLY_UNSAFE /* Can not apply changes using an apply
> +    background worker */
> +} ParallelApplySafety;
> +
> 
> I think the values are self-explanatory so the comments for every
> value add nothing here, particularly since the enum itself has a
> comment saying the same thing. I'm not sure if you accidentally missed
> my previous comment [1, #3.12b] about this, or just did not agree with
> it.

Changed.

> 9. .../subscription/t/015_stream.pl
> 
> +# "streaming = parallel" does not support non-immutable functions, so change
> +# the function in the defult expression of column "c".
> +$node_subscriber->safe_psql(
> + 'postgres', qq{
> +ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT
> to_timestamp(1284352323);
> +ALTER SUBSCRIPTION tap_sub SET(streaming = parallel, binary = off);
> +});
> 
> 9a.
> typo "defult"
> 
> 9b.
> The problem with to_timestamp(1284352323) is that it looks like it
> must be some special value, but in fact AFAIK you don't care at all
> what value timestamp this is. I think it would be better here to just
> use to_timestamp(0) or to_timestamp(999) or similar so the number is
> obviously not something of importance.
> 
> ======
> 
> 10. .../subscription/t/016_stream.pl
> 
> +# "streaming = parallel" does not support non-immutable functions, so change
> +# the function in the defult expression of column "c".
> +$node_subscriber->safe_psql(
> + 'postgres', qq{
> +ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT
> to_timestamp(1284352323);
> +ALTER SUBSCRIPTION tap_sub SET(streaming = parallel);
> +});
> 
> 10a. Ditto 9a.
> 10b. Ditto 9b.
> 
> ======
> 
> 11. .../subscription/t/022_twophase_cascade.pl
> 
> +# "streaming = parallel" does not support non-immutable functions, so change
> +# the function in the defult expression of column "c".
> +$node_B->safe_psql(
> + 'postgres', "ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT
> to_timestamp(1284352323);");
> +$node_C->safe_psql(
> + 'postgres', "ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT
> to_timestamp(1284352323);");
> +
> 
> 11a. Ditto 9a.
> 11b. Ditto 9b.
> 
> ======
> 
> 12. .../subscription/t/023_twophase_stream.pl
> 
> +# "streaming = parallel" does not support non-immutable functions, so change
> +# the function in the defult expression of column "c".
> +$node_subscriber->safe_psql(
> + 'postgres', qq{
> +ALTER TABLE test_tab ALTER COLUMN c SET DEFAULT
> to_timestamp(1284352323);
> +ALTER SUBSCRIPTION tap_sub SET(streaming = parallel);
> +});
> 
> 12a. Ditto 9a.
> 12b. Ditto 9b.

Improved.

> 13. .../subscription/t/032_streaming_apply.pl
> 
> +# Drop default value on the subscriber, now it works.
> +$node_subscriber->safe_psql('postgres',
> + "ALTER TABLE test_tab1 ALTER COLUMN b DROP DEFAULT");
> 
> Maybe for these tests like this it would be better to test if it works
> OK using an immutable DEFAULT function instead of just completely
> removing the bad function to make it work.
> 
> I think maybe the same was done for TRIGGER tests. There was a test
> for a trigger with a bad function, and then the trigger was removed.
> What about including a test for the trigger with a good function?

Improved.

Attach the new patches.

Regards,
Wang wei

Attachment

pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Expose Parallelism counters planned/execute in pg_stat_statements
Next
From: "wangw.fnst@fujitsu.com"
Date:
Subject: RE: Perform streaming logical transactions by background workers and parallel apply