Thread: [PATCH] Fix alter subscription concurrency errors

[PATCH] Fix alter subscription concurrency errors

From
Jelte Fennema
Date:
Without this patch concurrent ALTER/DROP SUBSCRIPTION statements for
the same subscription could result in one of these statements returning the
following error:

ERROR:  XX000: tuple concurrently updated

This patch fixes that by re-fetching the tuple after acquiring the lock on the
subscription. The included isolation test fails most of its permutations
without this patch, with the error shown above.

The loop to re-fetch the tuple is heavily based on the code from dbcommands.c
Attachment

Re: [PATCH] Fix alter subscription concurrency errors

From
Amit Kapila
Date:
On Thu, Aug 25, 2022 at 8:17 PM Jelte Fennema
<Jelte.Fennema@microsoft.com> wrote:
>
> Without this patch concurrent ALTER/DROP SUBSCRIPTION statements for
> the same subscription could result in one of these statements returning the
> following error:
>
> ERROR:  XX000: tuple concurrently updated
>
> This patch fixes that by re-fetching the tuple after acquiring the lock on the
> subscription. The included isolation test fails most of its permutations
> without this patch, with the error shown above.
>

Won't the same thing can happen for similar publication commands? Why
is this unique to the subscription and not other Alter/Drop commands?

-- 
With Regards,
Amit Kapila.



Re: [PATCH] Fix alter subscription concurrency errors

From
Jelte Fennema
Date:
> Won't the same thing can happen for similar publication commands? Why
> is this unique to the subscription and not other Alter/Drop commands?

I indeed don't think this problem is unique to subscriptions, but it seems
better to at least have this problem in a few places less (not making perfect
the enemy of good).

If someone has a more generic way of solving this for other commands too,
then that sounds great, but if not then slowly chipping away at these cases
seems better than keeping the status quo.

Attached is a new patch where ALTER SUBSCRIPTION ... OWNER TO ... can
now also be executed concurrently with the other subscription commands.
Attachment

Re: [PATCH] Fix alter subscription concurrency errors

From
Asif Rehman
Date:
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           not tested
Documentation:            not tested

The patch applies with few "Hunk succeeded, offset -3 lines" warnings. Tested against master '7d5852ca'.

+               if (!HeapTupleIsValid(tup))
+               {
+                       if (!missing_ok)
+                               ereport(ERROR,
+                                               (errcode(ERRCODE_UNDEFINED_OBJECT),
+                                                errmsg("subscription \"%s\" does not exist",
+                                                               subname)));
+                       else
+                               ereport(NOTICE,
+                                               (errmsg("subscription \"%s\" does not exist, skipping",
+                                                               subname)));
+
+                       return InvalidOid;
+               }
+

I think 'tup' should be released before returning, or break out of loop instead to release it.

The new status of this patch is: Waiting on Author

Re: [PATCH] Fix alter subscription concurrency errors

From
Alvaro Herrera
Date:
On 2022-Aug-26, Jelte Fennema wrote:

> I indeed don't think this problem is unique to subscriptions, but it seems 
> better to at least have this problem in a few places less (not making perfect
> the enemy of good).
> 
> If someone has a more generic way of solving this for other commands too, 
> then that sounds great, but if not then slowly chipping away at these cases 
> seems better than keeping the status quo.
> 
> Attached is a new patch where ALTER SUBSCRIPTION ... OWNER TO ... can 
> now also be executed concurrently with the other subscription commands.

Would it work to use get_object_address() instead?  That would save
having to write a lookup-and-lock function with a retry loop for each
object type.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/



Re: [PATCH] Fix alter subscription concurrency errors

From
Michael Paquier
Date:
On Fri, Sep 09, 2022 at 06:37:07PM +0200, Alvaro Herrera wrote:
> Would it work to use get_object_address() instead?  That would save
> having to write a lookup-and-lock function with a retry loop for each
> object type.

Jeite, this thread is waiting for your input.  This is a bug fix, so I
have moved this patch to the next CF for now to keep track of it.
--
Michael

Attachment

Re: [PATCH] Fix alter subscription concurrency errors

From
vignesh C
Date:
On Wed, 12 Oct 2022 at 10:48, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Sep 09, 2022 at 06:37:07PM +0200, Alvaro Herrera wrote:
> > Would it work to use get_object_address() instead?  That would save
> > having to write a lookup-and-lock function with a retry loop for each
> > object type.
>
> Jeite, this thread is waiting for your input.  This is a bug fix, so I
> have moved this patch to the next CF for now to keep track of it.

Jeite, please post an updated version with the fixes. As CommitFest
2023-01 is currently underway, this would be an excellent time to
update the patch.

Regards,
Vignesh



Re: [PATCH] Fix alter subscription concurrency errors

From
"Gregory Stark (as CFM)"
Date:
On Mon, 16 Jan 2023 at 09:47, vignesh C <vignesh21@gmail.com> wrote:
>
> Jeite, please post an updated version with the fixes. As CommitFest
> 2023-01 is currently underway, this would be an excellent time to
> update the patch.

Hm. This patch is still waiting on updates. But it's a bug fix and it
would be good to get this in. Is someone else interested in finishing
this if Jeite isn't available?

-- 
Gregory Stark
As Commitfest Manager



Re: [PATCH] Fix alter subscription concurrency errors

From
Tom Lane
Date:
"Gregory Stark (as CFM)" <stark.cfm@gmail.com> writes:
> Hm. This patch is still waiting on updates. But it's a bug fix and it
> would be good to get this in. Is someone else interested in finishing
> this if Jeite isn't available?

I think the patch as-submitted is pretty uninteresting, mainly because the
design of adding bespoke lock code for subscription objects doesn't scale.
I'm not excited about doing this just for subscriptions without at least a
clear plan for working on other object types.

Alvaro suggested switching it to use get_object_address() instead, which'd
be better; but before going in that direction we might have to do more
work on get_object_address's error reporting (note the para in its header
comments saying it's pretty weak on that).

            regards, tom lane



Re: [PATCH] Fix alter subscription concurrency errors

From
Jelte Fennema
Date:
> "Gregory Stark (as CFM)" <stark.cfm@gmail.com> writes:
> > Hm. This patch is still waiting on updates. But it's a bug fix and it
> > would be good to get this in. Is someone else interested in finishing
> > this if Jeite isn't available?
>
> I think the patch as-submitted is pretty uninteresting, mainly because the
> design of adding bespoke lock code for subscription objects doesn't scale.
> I'm not excited about doing this just for subscriptions without at least a
> clear plan for working on other object types.
>
> Alvaro suggested switching it to use get_object_address() instead, which'd
> be better; but before going in that direction we might have to do more
> work on get_object_address's error reporting (note the para in its header
> comments saying it's pretty weak on that).

Sorry for not responding earlier in this thread. I'll be honest in
saying this was a small annoyance to me, so I ignored theresonses more
than I should have. It caused some test flakiness in the Citus test
suite, and it seemed that fixing the underlying issue in Postgres was
most appropriate. I addressed this in Citus its test suite by
disabling the relevant test (which was an edge case test anyway). So
my immidiate problem was fixed, and I stopped caring about this patch
very much. Definitely not enough to address this for all other DDLs
with the same issue.

All in all I'm having a hard time feeling motivated to work on a patch
that I don't care much about. Especially since I have two other
patches open for a few commit fests that I actually care about, but
those patches have received (imho) very little input. Which makes it
hard to justify to myself to spend time on this patch, given the
knowledge that if I would spend time on it, it might take away the
precious reviewer time from the patches I do care about.