On Mon, Oct 7, 2024 at 11:05 AM vignesh C <vignesh21@gmail.com> wrote:
>
> The tests demonstrate a significant performance improvement when using
> the parallel streaming option, insert shows 40-48 %improvement, delete
> shows 34-39 %improvement, update shows 26-30 %improvement. In the case
> of rollback the improvement is between 12-44%, the improvement
> slightly reduces with larger amounts of data being rolled back in this
> case. If there's a significant amount of data to roll back, the
> performance of streaming in parallel may be comparable to or slightly
> lower in some instances. However, this is acceptable since commit
> operations are generally more frequent than rollback operations.
>
> One key point to consider is that the lock on transaction objects will
> be held for a longer duration when using streaming in parallel. This
> occurs because the parallel apply worker initiates the transaction as
> soon as streaming begins, maintaining the lock until the transaction
> is fully completed. As a result, for long-running transactions, this
> extended lock can hinder concurrent access that requires a lock.
>
The longer-running transactions will anyway have a risk of deadlocks
or longer waits if there are concurrent operations on the subscribers.
However, with parallel apply, there is a risk of deadlock among the
leader and parallel workers when the schema in publisher and
subscriber is different. Say the subscriber has a unique constraint
that the publisher doesn't have. See the comments in this regard atop
applyparallelworker.c in the "Locking Considerations" section. Having
said that, the apply workers will detect deadlock in such cases and
will retry to apply the errored-out transaction. So, there is a
self-healing in-built mechanism and in such cases, we anyway have a
risk of UNIQUE_KEY conflict ERRORs which in most cases would require
manual intervention.
> Since there is a significant percentage improvement, we should make
> the default subscription streaming option parallel.
>
This makes sense to me. I think it would be better to add a Note or
Warning in the docs for the risk of deadlock when the schema of
publisher and subscriber is not the same even though such cases should
be less.
--
With Regards,
Amit Kapila.