Thread: Re: Make default subscription streaming option as Parallel
On Mon, Oct 7, 2024 at 11:05 AM vignesh C <vignesh21@gmail.com> wrote: > > The tests demonstrate a significant performance improvement when using > the parallel streaming option, insert shows 40-48 %improvement, delete > shows 34-39 %improvement, update shows 26-30 %improvement. In the case > of rollback the improvement is between 12-44%, the improvement > slightly reduces with larger amounts of data being rolled back in this > case. If there's a significant amount of data to roll back, the > performance of streaming in parallel may be comparable to or slightly > lower in some instances. However, this is acceptable since commit > operations are generally more frequent than rollback operations. > > One key point to consider is that the lock on transaction objects will > be held for a longer duration when using streaming in parallel. This > occurs because the parallel apply worker initiates the transaction as > soon as streaming begins, maintaining the lock until the transaction > is fully completed. As a result, for long-running transactions, this > extended lock can hinder concurrent access that requires a lock. > The longer-running transactions will anyway have a risk of deadlocks or longer waits if there are concurrent operations on the subscribers. However, with parallel apply, there is a risk of deadlock among the leader and parallel workers when the schema in publisher and subscriber is different. Say the subscriber has a unique constraint that the publisher doesn't have. See the comments in this regard atop applyparallelworker.c in the "Locking Considerations" section. Having said that, the apply workers will detect deadlock in such cases and will retry to apply the errored-out transaction. So, there is a self-healing in-built mechanism and in such cases, we anyway have a risk of UNIQUE_KEY conflict ERRORs which in most cases would require manual intervention. > Since there is a significant percentage improvement, we should make > the default subscription streaming option parallel. > This makes sense to me. I think it would be better to add a Note or Warning in the docs for the risk of deadlock when the schema of publisher and subscriber is not the same even though such cases should be less. -- With Regards, Amit Kapila.
Dear Amit, Vignesh, > 1. Please ensure that none of the existing tests that use > subscriptions with large changes will be impacted due to this change. I found at least 022_twophase_cascade.pl should be fixed. The file has a part which tests non-streaming case: ``` # ----------------------- # 2PC NON-STREAMING TESTS # ----------------------- ... $node_B->safe_psql( 'postgres', " CREATE SUBSCRIPTION tap_sub_B CONNECTION '$node_A_connstr application_name=$appname_B' PUBLICATION tap_pub_A WITH (two_phase = on)"); ... ``` I know the streaming actually does not happen because few tuples will be inserted later, but creating as streaming=parallel is bit misleading. I checked other files as well but I couldn't find what we should fix. > 2. The pg_createsubscriber utility uses CREATE SUBSCRIPTION statement > and after this change, it will enable parallel mode by default which I > think is a good idea as users won't need to do that manually after > running the tool. Do you see any problem with this? I also think it is okay. IIUC, there were no specific reasons to create subscriptions with streaming=off, it was chosen because it was a default. I cannot find strong reasons to keep current behavior. Best regards, Hayato Kuroda FUJITSU LIMITED
On Tue, Oct 22, 2024 at 2:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Oct 21, 2024 at 8:40 PM vignesh C <vignesh21@gmail.com> wrote: > > > > Attached v3 version patch has a caution added for the same. > > > > Thanks, the patch looks good to me and I am planning to commit this > early next week unless there are objections or any major problems. I > have slightly updated the docs and commit message. Few more points to > consider: Another point is that streaming = 'on' will be used by default if a v18 subscriber connects to a v15 or older publisher. I think it would not be a problem but worth considering the side effects. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
On Wed, Oct 23, 2024 at 5:58 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Hi Vignesh, here are some review comments for the patch v5-0001 (docs only). > > ====== > Commit message > > 1. > The commit message only refers to this as the "streaming option", but > an option of what? Somewhere we should mention this is an option of > CREATE SUBSCRIPTION. > I think we can change the first line to: "Previously the default value of streaming option for a subscription was 'off'...." > > ====== > doc/src/sgml/ref/create_subscription.sgml > > 3. > <para> > Specifies whether to enable streaming of in-progress transactions > - for this subscription. The default value is <literal>off</literal>, > - meaning all transactions are fully decoded on the publisher and only > - then sent to the subscriber as a whole. > + for this subscription. The default value is > <literal>parallel</literal>, > + meaning incoming changes are directly applied via one of the parallel > + apply workers, if available. If no parallel apply worker is free to > + handle streaming transactions then the changes are written to > + temporary files and applied after the transaction is committed. Note > + that if an error happens in a parallel apply worker, the finish LSN > + of the remote transaction might not be reported in the server log. > </para> > > The other enum values have separate paragraphs: > - "If set to 'on'" and > - "If set to 'off'" > > I felt the 'parallel' value description should have this same style -- > e.g. a separate paragraph saying: > - "If set to 'parallel' (the default value)...". > > IMO, doing this makes the 3 available enum values much clearer. > The currently proposed way is better as it maintains the description flow. With your proposal, there is some repetition and it is not making things significantly better. This is a matter of opinion, so I leave it to others to see if they have any opinions. -- With Regards, Amit Kapila.
On Wed, Oct 23, 2024 at 1:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Oct 22, 2024 at 2:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Oct 21, 2024 at 8:40 PM vignesh C <vignesh21@gmail.com> wrote: > > > > > > Attached v3 version patch has a caution added for the same. > > > > > > > Thanks, the patch looks good to me and I am planning to commit this > > early next week unless there are objections or any major problems. I > > have slightly updated the docs and commit message. Few more points to > > consider: > > Another point is that streaming = 'on' will be used by default if a > v18 subscriber connects to a v15 or older publisher. > Right, but older publishers shouldn't be less than 14. > > I think it would > not be a problem but worth considering the side effects. > I couldn't think of any side effects but if you or someone else sees any problems then we can discuss those. -- With Regards, Amit Kapila.
On Tue, Oct 22, 2024 at 9:26 PM vignesh C <vignesh21@gmail.com> wrote: > > The attached v5 version has the change to create subscriptions in > streaming off mode. I also did not find any other TAP test which > required further changes. > Pushed. -- With Regards, Amit Kapila.