Thread: [PATCH] Fix alter subscription concurrency errors
Without this patch concurrent ALTER/DROP SUBSCRIPTION statements for the same subscription could result in one of these statements returning the following error: ERROR: XX000: tuple concurrently updated This patch fixes that by re-fetching the tuple after acquiring the lock on the subscription. The included isolation test fails most of its permutations without this patch, with the error shown above. The loop to re-fetch the tuple is heavily based on the code from dbcommands.c
Attachment
On Thu, Aug 25, 2022 at 8:17 PM Jelte Fennema <Jelte.Fennema@microsoft.com> wrote: > > Without this patch concurrent ALTER/DROP SUBSCRIPTION statements for > the same subscription could result in one of these statements returning the > following error: > > ERROR: XX000: tuple concurrently updated > > This patch fixes that by re-fetching the tuple after acquiring the lock on the > subscription. The included isolation test fails most of its permutations > without this patch, with the error shown above. > Won't the same thing can happen for similar publication commands? Why is this unique to the subscription and not other Alter/Drop commands? -- With Regards, Amit Kapila.
> Won't the same thing can happen for similar publication commands? Why > is this unique to the subscription and not other Alter/Drop commands? I indeed don't think this problem is unique to subscriptions, but it seems better to at least have this problem in a few places less (not making perfect the enemy of good). If someone has a more generic way of solving this for other commands too, then that sounds great, but if not then slowly chipping away at these cases seems better than keeping the status quo. Attached is a new patch where ALTER SUBSCRIPTION ... OWNER TO ... can now also be executed concurrently with the other subscription commands.
Attachment
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation: not tested The patch applies with few "Hunk succeeded, offset -3 lines" warnings. Tested against master '7d5852ca'. + if (!HeapTupleIsValid(tup)) + { + if (!missing_ok) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_OBJECT), + errmsg("subscription \"%s\" does not exist", + subname))); + else + ereport(NOTICE, + (errmsg("subscription \"%s\" does not exist, skipping", + subname))); + + return InvalidOid; + } + I think 'tup' should be released before returning, or break out of loop instead to release it. The new status of this patch is: Waiting on Author
On 2022-Aug-26, Jelte Fennema wrote: > I indeed don't think this problem is unique to subscriptions, but it seems > better to at least have this problem in a few places less (not making perfect > the enemy of good). > > If someone has a more generic way of solving this for other commands too, > then that sounds great, but if not then slowly chipping away at these cases > seems better than keeping the status quo. > > Attached is a new patch where ALTER SUBSCRIPTION ... OWNER TO ... can > now also be executed concurrently with the other subscription commands. Would it work to use get_object_address() instead? That would save having to write a lookup-and-lock function with a retry loop for each object type. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
On Fri, Sep 09, 2022 at 06:37:07PM +0200, Alvaro Herrera wrote: > Would it work to use get_object_address() instead? That would save > having to write a lookup-and-lock function with a retry loop for each > object type. Jeite, this thread is waiting for your input. This is a bug fix, so I have moved this patch to the next CF for now to keep track of it. -- Michael
Attachment
On Wed, 12 Oct 2022 at 10:48, Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Sep 09, 2022 at 06:37:07PM +0200, Alvaro Herrera wrote: > > Would it work to use get_object_address() instead? That would save > > having to write a lookup-and-lock function with a retry loop for each > > object type. > > Jeite, this thread is waiting for your input. This is a bug fix, so I > have moved this patch to the next CF for now to keep track of it. Jeite, please post an updated version with the fixes. As CommitFest 2023-01 is currently underway, this would be an excellent time to update the patch. Regards, Vignesh
On Mon, 16 Jan 2023 at 09:47, vignesh C <vignesh21@gmail.com> wrote: > > Jeite, please post an updated version with the fixes. As CommitFest > 2023-01 is currently underway, this would be an excellent time to > update the patch. Hm. This patch is still waiting on updates. But it's a bug fix and it would be good to get this in. Is someone else interested in finishing this if Jeite isn't available? -- Gregory Stark As Commitfest Manager
"Gregory Stark (as CFM)" <stark.cfm@gmail.com> writes: > Hm. This patch is still waiting on updates. But it's a bug fix and it > would be good to get this in. Is someone else interested in finishing > this if Jeite isn't available? I think the patch as-submitted is pretty uninteresting, mainly because the design of adding bespoke lock code for subscription objects doesn't scale. I'm not excited about doing this just for subscriptions without at least a clear plan for working on other object types. Alvaro suggested switching it to use get_object_address() instead, which'd be better; but before going in that direction we might have to do more work on get_object_address's error reporting (note the para in its header comments saying it's pretty weak on that). regards, tom lane
> "Gregory Stark (as CFM)" <stark.cfm@gmail.com> writes: > > Hm. This patch is still waiting on updates. But it's a bug fix and it > > would be good to get this in. Is someone else interested in finishing > > this if Jeite isn't available? > > I think the patch as-submitted is pretty uninteresting, mainly because the > design of adding bespoke lock code for subscription objects doesn't scale. > I'm not excited about doing this just for subscriptions without at least a > clear plan for working on other object types. > > Alvaro suggested switching it to use get_object_address() instead, which'd > be better; but before going in that direction we might have to do more > work on get_object_address's error reporting (note the para in its header > comments saying it's pretty weak on that). Sorry for not responding earlier in this thread. I'll be honest in saying this was a small annoyance to me, so I ignored theresonses more than I should have. It caused some test flakiness in the Citus test suite, and it seemed that fixing the underlying issue in Postgres was most appropriate. I addressed this in Citus its test suite by disabling the relevant test (which was an edge case test anyway). So my immidiate problem was fixed, and I stopped caring about this patch very much. Definitely not enough to address this for all other DDLs with the same issue. All in all I'm having a hard time feeling motivated to work on a patch that I don't care much about. Especially since I have two other patches open for a few commit fests that I actually care about, but those patches have received (imho) very little input. Which makes it hard to justify to myself to spend time on this patch, given the knowledge that if I would spend time on it, it might take away the precious reviewer time from the patches I do care about.