Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION
Date
Msg-id CAA4eK1Ka5W2u-8-X-w8PrwzUf3NUC1Bdvf3DY3Y6LU6V4jaZ1w@mail.gmail.com
Whole thread Raw
In response to Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: undetected deadlock in ALTER SUBSCRIPTION ... REFRESH PUBLICATION
List pgsql-hackers
On Mon, Dec 4, 2023 at 5:30 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 12/4/23 12:37, Amit Kapila wrote:
> > On Sat, Dec 2, 2023 at 9:52 PM Shlok Kyal <shlok.kyal.oss@gmail.com> wrote:
> >>
> >>> thread. I think you can compare the timing of regression tests in
> >>> subscription, with and without the patch to show there is no
> >>> regression. And probably some tests with a large number of tables for
> >>> sync with very little data.
> >>
> >> I have tested the regression test timings for subscription with and
> >> without patch. I also did the timing test for sync of subscription
> >> with the publisher for 100 and 1000 tables respectively.
> >> I have attached the test script and results of the timing test are as follows:
> >>
> >> Time taken for test to run in Linux VM
> >> Summary                                |  Subscription Test (sec)
> >> |    100 tables in pub and Sub (sec)    |  1000 tables in pub and Sub
> >> (sec)
> >> Without patch Release           |  95.564
> >>  |     7.877                                             |   58.919
> >> With patch Release                |  96.513
> >>    |     6.533                                             |   45.807
> >>
> >> Time Taken for test to run in another Linux VM
> >> Summary                                |  Subscription Test (sec)
> >> |    100 tables in pub and Sub (sec)    |  1000 tables in pub and Sub
> >> (sec)
> >> Without patch Release           |  109.8145
> >> |    6.4675                                           |    83.001
> >> With patch Release                |  113.162
> >>    |    7.947                                              |   87.113
> >>
> >
> > So, on some machines, it may increase the test timing although not too
> > much. I think the reason is probably doing the work in multiple
> > transactions for multiple relations. I am wondering that instead of
> > committing and starting a new transaction before
> > wait_for_relation_state_change(), what if we do it inside that
> > function just before we decide to wait? It is quite possible that in
> > many cases we don't need any wait at all.
> >
>
> I'm not sure what you mean by "do it". What should the function do?
>

I mean to commit the open transaction at the below place in
wait_for_relation_state_change()

wait_for_relation_state_change()
{
...
-- commit the xact
WaitLatch();
...
}

Then start after the wait is over. This is just to test whether it
improves the difference in regression test timing.

> As for the test results, I very much doubt the differences are not
> caused simply by random timing variations, or something like that. And I
> don't understand what "Performance Machine Linux" is, considering those
> timings are slower than the other two machines.
>
> Also, even if it was a bit slower, does it really matter? I mean, the
> current code is wrong, can lead to infinite duration if it happens to
> hit the deadlock. And it's a one-time action, I don't think it's a very
> sensitive in terms of performance.
>

Yeah, I see that point but trying to evaluate if we can avoid an
increase in regression test timing at least for the HEAD. The tests
are done in release mode, so, I suspect there could be a slightly
bigger gap in debug mode (we can check that once though) which may hit
developers running regressions quite often in their development
environments. Now, if we there is no easy way to avoid the increase in
regression test timing, we have to still fix the problem, so we have
to take the hit of some increase in time.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Matthias van de Meent
Date:
Subject: Re: Avoid detoast overhead when possible
Next
From: Dean Rasheed
Date:
Subject: Adding OLD/NEW support to RETURNING