On Wed, Mar 26, 2025 at 4:17 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Here's a rebased version of the patch series.
>
Thanks for the patches.
While testing the GUC "max_conflict_retention_duration", I noticed a
behavior that seems to bypass its intended purpose.
On Pub, if a txn is stuck in the COMMIT phase for a long time, the
apply_worker on the sub keeps looping in wait_for_publisher_status()
until that Pub's concurrent txn completes its commit.
Due to this, the apply worker can't advance its
oldest_nonremovable_xid and keeps waiting for the Pub's txn to finish.
In such a case, even if the wait time exceeds the configured
max_conflict_retention_duration, conflict retention doesn't stop for
the apply_worker. The conflict info retention is stoppend only once
the Pub's txn is committed and the apply_worker moves to
wait_for_local_flush().
Doesn't this defeat the purpose of max_conflict_retention_duration?
The apply worker has exceeded the max wait time but still retains the
conflict info.
I think we should consider applying the same max time limit check
inside wait_for_publisher_status() as well.
--
Thanks,
Nisha