On Wed, Jan 18, 2023 at 1:34 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2023-01-17 06:23:45 +0530, Amit Kapila wrote:
>
> > There is an analysis of the test
> > failure in the email [2] which explains the race condition that leads
> > to test failure. Thinking again about the failure, I feel we can
> > instead change the failed test (t/004_sync.pl) to either ensure that
> > both the walsenders (corresponding to sync worker and apply worker)
> > exits after dropping the subscription and before checking the
> > remaining slots on publisher or wait for slots to become zero in the
> > test.
>
> How about waiting for the table to start to be synced (and thus the slot to be
> created) before issuing the drop subscription?
>
In this test [1], the initial sync fails due to a unique constraint
violation, so checking that the sync has started is a bit tricky. We
can probably check sync_error_count in pg_stat_subscription_stats to
ensure that sync has started to fail which will ideally ensure that
the sync has started. I am not sure this would be completely safe. The
other possible ways are (a) after creating a subscription, wait for
two slots to get created in the publisher, and then after dropping
subscription wait for slots to become zero on the publisher; (b) after
dropping the subscription, wait for slots to become zero.
I think one of (a) or (b) will work.
[1]
# Table tap_rep already has the same records on both publisher and subscriber
# at this time. Recreate the subscription which will do the initial copy of
# the table again and fails due to unique constraint violation.
$node_subscriber->safe_psql('postgres',
"CREATE SUBSCRIPTION tap_sub CONNECTION '$publisher_connstr'
PUBLICATION tap_pub"
);
...
...
--
With Regards,
Amit Kapila.