Thread: Prevent internal error at concurrent CREATE OR REPLACE FUNCTION
Hi, I found that multiple sessions concurrently execute CREATE OR REPLACE FUNCTION for a same function, the error "tuple concurrently updated" is raised. This is an internal error output by elog, also the message is not user-friendly. I've attached a patch to prevent this internal error by locking an exclusive lock before the command and get the read tuple after acquiring the lock. Also, if the function has been removed during the lock waiting, the new entry is created. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Attachment
On Mon, 31 Mar 2025 20:00:57 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > Hi, > > I found that multiple sessions concurrently execute CREATE OR REPLACE FUNCTION > for a same function, the error "tuple concurrently updated" is raised. This is > an internal error output by elog, also the message is not user-friendly. > > I've attached a patch to prevent this internal error by locking an exclusive > lock before the command and get the read tuple after acquiring the lock. > Also, if the function has been removed during the lock waiting, the new entry > is created. I also found the same error is raised when concurrent ALTER FUNCTION commands are executed. I've added a patch to fix this in the similar way. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Attachment
Hi, On Thu, 22 May 2025 10:25:58 +0800 jian he <jian.universality@gmail.com> wrote: Thank you for looking into it. > + /* Lock the function so nobody else can do anything with it. */ > + LockDatabaseObject(ProcedureRelationId, oldproc->oid, 0, AccessExclusiveLock); > + > + /* > + * It is possible that by the time we acquire the lock on function, > + * concurrent DDL has removed it. We can test this by checking the > + * existence of function. We get the tuple again to avoid the risk > + * of function definition getting changed. > + */ > + oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP, > + PointerGetDatum(procedureName), > + PointerGetDatum(parameterTypes), > + ObjectIdGetDatum(procNamespace)); > > we already called LockDatabaseObject, concurrent DDL can > not do DROP FUNCTION or ALTER FUNCTION. > so no need to call SearchSysCacheCopy3 again? The function may be dropped *before* we call LockDatabaseObject. SearchSysCacheCopy3 is called for check this. Plese see AlterPublication() as a similar code example. > > @@ -553,11 +575,13 @@ ProcedureCreate(const char *procedureName, > replaces[Anum_pg_proc_proowner - 1] = false; > replaces[Anum_pg_proc_proacl - 1] = false; > > + > + > /* Okay, do it... */ > no need to add these two new lines. I'll remove the lines. Thanks. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
On Fri, 23 May 2025 10:37:42 +0800 jian he <jian.universality@gmail.com> wrote: > On Thu, May 22, 2025 at 10:25 AM jian he <jian.universality@gmail.com> wrote: > > > hi. > earlier, i didn't check patch 0002. > > i think in AlterFunction add > /* Lock the function so nobody else can do anything with it. */ > LockDatabaseObject(ProcedureRelationId, funcOid, 0, AccessExclusiveLock); > > right after > funcOid = LookupFuncWithArgs(stmt->objtype, stmt->func, false); > > should be fine. Thank you. That makes sense because we can reduce redundant call of SearchSysCacheCopy1 and HeapTupleIsValid. I've attached a updated patch. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Attachment
On Tue, May 27, 2025 at 1:35 AM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > + /* Lock the function so nobody else can do anything with it. */ > > + LockDatabaseObject(ProcedureRelationId, oldproc->oid, 0, AccessExclusiveLock); > > + > > + /* > > + * It is possible that by the time we acquire the lock on function, > > + * concurrent DDL has removed it. We can test this by checking the > > + * existence of function. We get the tuple again to avoid the risk > > + * of function definition getting changed. > > + */ > > + oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP, > > + PointerGetDatum(procedureName), > > + PointerGetDatum(parameterTypes), > > + ObjectIdGetDatum(procNamespace)); > > > > we already called LockDatabaseObject, concurrent DDL can > > not do DROP FUNCTION or ALTER FUNCTION. > > so no need to call SearchSysCacheCopy3 again? > > The function may be dropped *before* we call LockDatabaseObject. > SearchSysCacheCopy3 is called for check this. > Plese see AlterPublication() as a similar code example. > I am wondering, can we do it the following way for v2-0001? /* Check for pre-existing definition */ oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP, PointerGetDatum(procedureName), PointerGetDatum(parameterTypes), ObjectIdGetDatum(procNamespace)); if (HeapTupleIsValid(oldtup)) { /* There is one; okay to replace it? */ Form_pg_proc oldproc = (Form_pg_proc) GETSTRUCT(oldtup); if (!replace) ereport(ERROR, (errcode(ERRCODE_DUPLICATE_FUNCTION), errmsg("function \"%s\" already exists with same argument types", procedureName))); /* * It is possible that by the time we acquire the lock on function, * concurrent DDL has removed it. We can test this by checking the * existence of function. We get the tuple again to avoid the risk * of function definition getting changed. */ if (!ConditionalLockDatabaseObject(ProcedureRelationId, oldproc->oid, 0, AccessExclusiveLock)) oldtup = SearchSysCacheCopy3(PROCNAMEARGSNSP, PointerGetDatum(procedureName), PointerGetDatum(parameterTypes), ObjectIdGetDatum(procNamespace)); }
On Tue, 3 Jun 2025 17:39:50 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > On Tue, 27 May 2025 09:00:01 +0300 > Alexander Lakhin <exclusion@gmail.com> wrote: > I know there are other scenarios where the same is raises and I agree that > it would be better to consider a more global solution instead of addressing > each of them. However, I am not sure that improving the error message for > each case doesn't not make sense. To address the remaining cases where DDL commands fail with the internal error "ERROR: tuple concurrently updated" due to insufficient locking, I would like to propose improving the error reporting to produce a more appropriate and user-facing error message. This should make it easier for users to understand the cause of the failure. Patch 0003 improves the error message shown when concurrent updates to a system catalog tuple occur, producing output like: ERROR: operation failed due to a concurrent command DETAIL: Another command modified the same object in a concurrent session. Patches 0001 and 0002 are unchanged from v2, except for updated commit messages. I believe these patches are still useful, as they allow the operation to complete successfully after waiting, or to behave appropriately when the target function is dropped by another session during the wait. Best regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Attachment
On Thu, 5 Jun 2025 16:26:08 +0900 Yugo Nagata <nagata@sraoss.co.jp> wrote: > On Tue, 3 Jun 2025 17:39:50 +0900 > Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > On Tue, 27 May 2025 09:00:01 +0300 > > Alexander Lakhin <exclusion@gmail.com> wrote: > > > I know there are other scenarios where the same is raises and I agree that > > it would be better to consider a more global solution instead of addressing > > each of them. However, I am not sure that improving the error message for > > each case doesn't not make sense. > > To address the remaining cases where DDL commands fail with the internal > error "ERROR: tuple concurrently updated" due to insufficient locking, > I would like to propose improving the error reporting to produce a more > appropriate and user-facing error message. This should make it easier for > users to understand the cause of the failure. > > Patch 0003 improves the error message shown when concurrent updates to a > system catalog tuple occur, producing output like: > > ERROR: operation failed due to a concurrent command > DETAIL: Another command modified the same object in a concurrent session. > > Patches 0001 and 0002 are unchanged from v2, except for updated commit messages. > I believe these patches are still useful, as they allow the operation to complete > successfully after waiting, or to behave appropriately when the target function > is dropped by another session during the wait. I found that the error "tuple concurrently updated" was expected as the results of injection_points test , so I've fixed it so that the new message is expected instead. I've attached updated patches. Best regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Attachment
On Fri, 27 Jun 2025 18:53:02 +0700 Daniil Davydov <3danissimo@gmail.com> wrote: > Hi, > > On Thu, Jun 5, 2025 at 5:21 PM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > > > I've attached updated patches. > > > > I have some comments on v4-0001 patch : Thank you for your comments! > 1) > heap_freetuple should be called for every tuple that we get from > SearchSysCacheCopy3. > But if tuple is valid after the first SearchSysCacheCopy3, we > overwrite the old pointer (by the second SearchSysCacheCopy3 call) and > forget to free it. > I suggest adding heap_freetuple call before the second SearchSysCacheCopy3 call. Good catches. Fixed. > 2) > + Form_pg_proc oldproc = (Form_pg_proc) GETSTRUCT(oldtup); > + Datum proargnames; > + bool isnull; > + const char *dropcmd; > Strange alignment. I guess you should keep the same alignment as in > deleted declarations. Fixed. I've attached patches including these fixes. > 3) > This patch fixes postgres behavior if I first create a function and > then try to CREATE OR REPLACE it in concurrent transactions. > But if the function doesn't exist and I try to call CREATE OR REPLACE > in concurrent transactions, I will get an error. > I wrote about it in this thread [1] and Tom Lane said that this > behavior is kinda expected. > Just in case, I decided to mention it here anyway - perhaps you will > have other thoughts on this matter. > > [1] https://www.postgresql.org/message-id/flat/CAJDiXghv2JF5zbLyyybokWKM%2B-GYsTG%2Bhw7xseLNgJOJwf0%2B8w%40mail.gmail.com I agree with Tom Lane that the behavior is expected, although it would be better if the error message were more user-friendly. We could improve it by checking the unique constraint before calling index_insert in CatalogIndexInsert. Regards, Yugo Nagata -- Yugo Nagata <nagata@sraoss.co.jp>
Attachment
Hi, On Mon, Jun 30, 2025 at 3:47 PM Yugo Nagata <nagata@sraoss.co.jp> wrote: > > On Fri, 27 Jun 2025 18:53:02 +0700 > Daniil Davydov <3danissimo@gmail.com> wrote: > > This patch fixes postgres behavior if I first create a function and > > then try to CREATE OR REPLACE it in concurrent transactions. > > But if the function doesn't exist and I try to call CREATE OR REPLACE > > in concurrent transactions, I will get an error. > > I wrote about it in this thread [1] and Tom Lane said that this > > behavior is kinda expected. > > Just in case, I decided to mention it here anyway - perhaps you will > > have other thoughts on this matter. > > > > [1] https://www.postgresql.org/message-id/flat/CAJDiXghv2JF5zbLyyybokWKM%2B-GYsTG%2Bhw7xseLNgJOJwf0%2B8w%40mail.gmail.com > > I agree with Tom Lane that the behavior is expected, although it would be better > if the error message were more user-friendly. We could improve it by checking the > unique constraint before calling index_insert in CatalogIndexInsert. > As far as I understand, unique constraint checking is specific for each index access method. Thus, to implement the proposed idea, you will have to create a separate callback for check_unique function. It doesn't seem like a very neat solution, but there aren't many other options left. I would suggest intercepting the error (via PG_CATCH), and if it has the ERRCODE_UNIQUE_VIOLATION code, change the error message (more precisely, throw another error with the desired message). If we caught an error during the CatalogTupleInsert call, we can be sure that the problem is in concurrent execution, because before the insertion, we checked that such a tuple does not exist. What do you think? And in general, are you going to fix this behavior within this thread? P.S. Thank you for updating the patch. -- Best regards, Daniil Davydov