Thread: pg_replication_origin_drop API potential race condition
Hi Hackers. As discovered elsewhere [ak0125] there is a potential race condition in the pg_replication_origin_drop API The current code of pg_replication_origin_drop looks like: ==== roident = replorigin_by_name(name, false); Assert(OidIsValid(roident)); replorigin_drop(roident, true); ==== Users cannot deliberately drop a non-existent origin (replorigin_by_name passes missing_ok = false) but there is still a small window where concurrent processes may be able to call replorigin_drop for the same valid roident. Locking within replorigin_drop guards against concurrent drops so the 1st execution will succeed, but then the 2nd execution would give internal cache error: elog(ERROR, "cache lookup failed for replication origin with oid %u", roident); Some ideas to fix this include: 1. Do nothing except write a comment about this in the code. The internal ERROR is not ideal for a user API there is no great harm done. 2. Change the behavior of replorigin_drop to be like replorigin_drop_IF_EXISTS, so the 2nd execution of this race would silently do nothing when it finds the roident is already gone. 3. Same as 2, but make the NOP behavior more explicit by introducing a new "missing_ok" parameter for replorigin_drop. Thoughts? ---- [ak0125] https://www.postgresql.org/message-id/CAA4eK1%2ByeLwBCkTvTdPM-hSk1fr6jT8KJc362CN8zrGztq_JqQ%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia
On Wed, Jan 27, 2021 at 4:58 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Hi Hackers. > > As discovered elsewhere [ak0125] there is a potential race condition > in the pg_replication_origin_drop API > > The current code of pg_replication_origin_drop looks like: > ==== > roident = replorigin_by_name(name, false); > Assert(OidIsValid(roident)); > > replorigin_drop(roident, true); > ==== > > Users cannot deliberately drop a non-existent origin > (replorigin_by_name passes missing_ok = false) but there is still a > small window where concurrent processes may be able to call > replorigin_drop for the same valid roident. > > Locking within replorigin_drop guards against concurrent drops so the > 1st execution will succeed, but then the 2nd execution would give > internal cache error: elog(ERROR, "cache lookup failed for replication > origin with oid %u", roident); > > Some ideas to fix this include: > 1. Do nothing except write a comment about this in the code. The > internal ERROR is not ideal for a user API there is no great harm > done. > 2. Change the behavior of replorigin_drop to be like > replorigin_drop_IF_EXISTS, so the 2nd execution of this race would > silently do nothing when it finds the roident is already gone. > 3. Same as 2, but make the NOP behavior more explicit by introducing a > new "missing_ok" parameter for replorigin_drop. > How about if we call replorigin_by_name() inside replorigin_drop after acquiring the lock? Wouldn't that close this race condition? We are doing something similar for pg_replication_origin_advance(). -- With Regards, Amit Kapila.
On Wed, Feb 3, 2021 at 11:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > How about if we call replorigin_by_name() inside replorigin_drop after > acquiring the lock? Wouldn't that close this race condition? We are > doing something similar for pg_replication_origin_advance(). > Yes, that seems ok. I wonder if it is better to isolate that locked portion (replyorigin_by_name + replorigin_drop) so that in addition to being called from pg_replication_origin_drop, we can call it internally from PG code to safely drop the origins. ---- Kind Regards, Peter Smith. Fujitsu Australia
On Thu, Feb 4, 2021 at 9:57 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Wed, Feb 3, 2021 at 11:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > How about if we call replorigin_by_name() inside replorigin_drop after > > acquiring the lock? Wouldn't that close this race condition? We are > > doing something similar for pg_replication_origin_advance(). > > > > Yes, that seems ok. > > I wonder if it is better to isolate that locked portion > (replyorigin_by_name + replorigin_drop) so that in addition to being > called from pg_replication_origin_drop, we can call it internally from > PG code to safely drop the origins. > Yeah, I think that would be really good. -- With Regards, Amit Kapila.
On Thu, Feb 4, 2021 at 4:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Feb 4, 2021 at 9:57 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > On Wed, Feb 3, 2021 at 11:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > How about if we call replorigin_by_name() inside replorigin_drop after > > > acquiring the lock? Wouldn't that close this race condition? We are > > > doing something similar for pg_replication_origin_advance(). > > > > > > > Yes, that seems ok. > > > > I wonder if it is better to isolate that locked portion > > (replyorigin_by_name + replorigin_drop) so that in addition to being > > called from pg_replication_origin_drop, we can call it internally from > > PG code to safely drop the origins. > > > > Yeah, I think that would be really good. PSA a patch which I think implements what we are talking about. ---- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Thu, Feb 4, 2021 at 1:31 PM Peter Smith <smithpb2250@gmail.com> wrote: > > PSA a patch which I think implements what we are talking about. > This doesn't seem correct to me. Have you tested that the patch resolves the problem reported originally? Because the lockmode (RowExclusiveLock) you have used in the patch will allow multiple callers to acquire at the same time. The other thing I don't like about this is that first, it acquires lock in the function replorigin_drop_by_name and then again we acquire the same lock in a different mode in replorigin_drop. What I was imagining was to have a code same as replorigin_drop with the first parameter as the name instead of id and additionally, it will check the existence of origin by replorigin_by_name after acquiring the lock. So you can move all the common code from replorigin_drop (starting from restart till end leaving table_close) to a separate function say replorigin_drop_guts and then call it from both replorigin_drop and replorigin_drop_by_name. Now, I have also thought to directly change replorigin_drop but this is an exposed API so let's keep it as it is because some extensions might be using it. We can anyway later drop it if required. -- With Regards, Amit Kapila.
On Thu, Feb 4, 2021 at 9:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Feb 4, 2021 at 1:31 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > PSA a patch which I think implements what we are talking about. > > > > This doesn't seem correct to me. Have you tested that the patch > resolves the problem reported originally? Because the lockmode > (RowExclusiveLock) you have used in the patch will allow multiple > callers to acquire at the same time. The other thing I don't like > about this is that first, it acquires lock in the function > replorigin_drop_by_name and then again we acquire the same lock in a > different mode in replorigin_drop. > > What I was imagining was to have a code same as replorigin_drop with > the first parameter as the name instead of id and additionally, it > will check the existence of origin by replorigin_by_name after > acquiring the lock. So you can move all the common code from > replorigin_drop (starting from restart till end leaving table_close) > to a separate function say replorigin_drop_guts and then call it from > both replorigin_drop and replorigin_drop_by_name. > > Now, I have also thought to directly change replorigin_drop but this > is an exposed API so let's keep it as it is because some extensions > might be using it. We can anyway later drop it if required. > PSA patch updated per above suggestions. ---- Kind Regards, Peter Smith. Fujitsu Australia
Attachment
On Fri, Feb 5, 2021 at 9:46 AM Peter Smith <smithpb2250@gmail.com> wrote: > > PSA patch updated per above suggestions. > Thanks, I have tested your patch and before the patch, I was getting errors like "tuple concurrently deleted" or "cache lookup failed for replication origin with oid 1" and after the patch, I am getting "replication origin "origin-1" does not exist" which is clearly better and user-friendly. Before Patch postgres=# select pg_replication_origin_drop('origin-1'); ERROR: tuple concurrently deleted postgres=# select pg_replication_origin_drop('origin-1'); ERROR: cache lookup failed for replication origin with oid 1 After Patch postgres=# select pg_replication_origin_drop('origin-1'); ERROR: replication origin "origin-1" does not exist I wonder why you haven't changed the usage of the existing replorigin_drop in the code? I have changed the same, added few comments, ran pgindent, and updated the commit message in the attached. I am not completely whether we should retire replorigin_drop or just keep it for backward compatibility? What do you think? Anybody else has any opinion? For others, the purpose of this patch is to "make pg_replication_origin_drop safe against concurrent drops.". Currently, we get the origin id from the name and then drop the origin by taking ExclusiveLock on ReplicationOriginRelationId. So, two concurrent sessions can get the id from the name at the same time, and then when they try to drop the origin, one of the sessions will get either "tuple concurrently deleted" or "cache lookup failed for replication origin ..". To prevent this race condition we do the entire operation under lock. This obviates the need for replorigin_drop() API but we have kept it for backward compatibility. -- With Regards, Amit Kapila.
Attachment
On Fri, Feb 5, 2021 at 6:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Feb 5, 2021 at 9:46 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > PSA patch updated per above suggestions. > > > > Thanks, I have tested your patch and before the patch, I was getting > errors like "tuple concurrently deleted" or "cache lookup failed for > replication origin with oid 1" and after the patch, I am getting > "replication origin "origin-1" does not exist" which is clearly better > and user-friendly. > > Before Patch > postgres=# select pg_replication_origin_drop('origin-1'); > ERROR: tuple concurrently deleted > postgres=# select pg_replication_origin_drop('origin-1'); > ERROR: cache lookup failed for replication origin with oid 1 > > After Patch > postgres=# select pg_replication_origin_drop('origin-1'); > ERROR: replication origin "origin-1" does not exist > > I wonder why you haven't changed the usage of the existing > replorigin_drop in the code? I have changed the same, added few > comments, ran pgindent, and updated the commit message in the > attached. You are right. The goal of this patch was to fix pg_replication_origin_drop, but while focussed on fixing that, I forgot the same call pattern was also in the DropSubscription. > > I am not completely whether we should retire replorigin_drop or just > keep it for backward compatibility? What do you think? Anybody else > has any opinion? It is still good code, but just not being used atm. I don't know what is the PG convention for dead code - to remove it immedaitely at first sight, or to leave it lying around if it still might have future usefulness? Personally, I would leave it, if only because it seems a less radical change from the current HEAD code to keep the existing function signature. ------ Kind Regards, Peter Smith. Fujitsu Australia
On Fri, Feb 5, 2021 at 1:50 PM Peter Smith <smithpb2250@gmail.com> wrote: > > On Fri, Feb 5, 2021 at 6:02 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Feb 5, 2021 at 9:46 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > PSA patch updated per above suggestions. > > > > > > > Thanks, I have tested your patch and before the patch, I was getting > > errors like "tuple concurrently deleted" or "cache lookup failed for > > replication origin with oid 1" and after the patch, I am getting > > "replication origin "origin-1" does not exist" which is clearly better > > and user-friendly. > > > > Before Patch > > postgres=# select pg_replication_origin_drop('origin-1'); > > ERROR: tuple concurrently deleted > > postgres=# select pg_replication_origin_drop('origin-1'); > > ERROR: cache lookup failed for replication origin with oid 1 > > > > After Patch > > postgres=# select pg_replication_origin_drop('origin-1'); > > ERROR: replication origin "origin-1" does not exist > > > > I wonder why you haven't changed the usage of the existing > > replorigin_drop in the code? I have changed the same, added few > > comments, ran pgindent, and updated the commit message in the > > attached. > > You are right. > > The goal of this patch was to fix pg_replication_origin_drop, but > while focussed on fixing that, I forgot the same call pattern was also > in the DropSubscription. > > > > > I am not completely whether we should retire replorigin_drop or just > > keep it for backward compatibility? What do you think? Anybody else > > has any opinion? > > It is still good code, but just not being used atm. > > I don't know what is the PG convention for dead code - to remove it > immedaitely at first sight, or to leave it lying around if it still > might have future usefulness? > I am mostly worried about the extensions outside pg-core. For example, on a quick search, it seems there seem to be a few such usages in pglogical [1][2]. Then, I see a similar usage pattern (search by name and then drop) in one of the pglogical [3]. [1] - https://github.com/2ndQuadrant/pglogical/issues/160 [2] - https://github.com/2ndQuadrant/pglogical/issues/124 [3] - https://github.com/2ndQuadrant/pglogical/blob/REL2_x_STABLE/pglogical_functions.c -- With Regards, Amit Kapila.
On Fri, Feb 5, 2021, at 4:01 AM, Amit Kapila wrote:
I am not completely whether we should retire replorigin_drop or justkeep it for backward compatibility? What do you think? Anybody elsehas any opinion?
We could certainly keep some code for backward compatibility, however, we have
to consider if it is (a) an exposed API and/or (b) a critical path. We break
several extensions every release due to Postgres extensibility. For (a), it is
not an exposed function, I mean, we are not changing
`pg_replication_origin_drop`. Hence, there is no need to keep it. In (b), we
could risk slowing down some critical paths that we decide to keep the old
function and create a new one that contains additional features. It is not the
case for this function. It is rare that an extension does not have a few #ifdef
if it supports multiple Postgres versions. IMO we should keep as little code as
possible into the core in favor of maintainability.
- replorigin_drop(roident, true);
+ replorigin_drop_by_name(name, false /* missing_ok */ , true /* nowait */ );
A modern IDE would certainly show you the function definition that allows you
to check what each parameter value is without having to go back and forth. I
saw a few occurrences of this pattern in the source code and IMO it could be
used when it is not obvious what that value means. Booleans are easier to
figure out, however, sometimes integer and text are not.
On Fri, Feb 5, 2021 at 6:45 PM Euler Taveira <euler@eulerto.com> wrote: > > On Fri, Feb 5, 2021, at 4:01 AM, Amit Kapila wrote: > > I am not completely whether we should retire replorigin_drop or just > keep it for backward compatibility? What do you think? Anybody else > has any opinion? > > We could certainly keep some code for backward compatibility, however, we have > to consider if it is (a) an exposed API and/or (b) a critical path. We break > several extensions every release due to Postgres extensibility. For (a), it is > not an exposed function, I mean, we are not changing > `pg_replication_origin_drop`. Hence, there is no need to keep it. In (b), we > could risk slowing down some critical paths that we decide to keep the old > function and create a new one that contains additional features. It is not the > case for this function. It is rare that an extension does not have a few #ifdef > if it supports multiple Postgres versions. IMO we should keep as little code as > possible into the core in favor of maintainability. > Yeah, that makes. I was a bit worried about pglogical but I think they can easily update it if required, so removed as per your suggestion. Petr, any opinion on this matter? I am planning to push this early next week (by Tuesday) unless you or someone else think it is not a good idea. > - replorigin_drop(roident, true); > + replorigin_drop_by_name(name, false /* missing_ok */ , true /* nowait */ ); > > A modern IDE would certainly show you the function definition that allows you > to check what each parameter value is without having to go back and forth. I > saw a few occurrences of this pattern in the source code and IMO it could be > used when it is not obvious what that value means. Booleans are easier to > figure out, however, sometimes integer and text are not. > Fair enough, removed in the attached patch. -- With Regards, Amit Kapila.
Attachment
On 06/02/2021 07:29, Amit Kapila wrote: > On Fri, Feb 5, 2021 at 6:45 PM Euler Taveira <euler@eulerto.com> wrote: >> On Fri, Feb 5, 2021, at 4:01 AM, Amit Kapila wrote: >> >> I am not completely whether we should retire replorigin_drop or just >> keep it for backward compatibility? What do you think? Anybody else >> has any opinion? >> >> We could certainly keep some code for backward compatibility, however, we have >> to consider if it is (a) an exposed API and/or (b) a critical path. We break >> several extensions every release due to Postgres extensibility. For (a), it is >> not an exposed function, I mean, we are not changing >> `pg_replication_origin_drop`. Hence, there is no need to keep it. In (b), we >> could risk slowing down some critical paths that we decide to keep the old >> function and create a new one that contains additional features. It is not the >> case for this function. It is rare that an extension does not have a few #ifdef >> if it supports multiple Postgres versions. IMO we should keep as little code as >> possible into the core in favor of maintainability. >> > Yeah, that makes. I was a bit worried about pglogical but I think they > can easily update it if required, so removed as per your suggestion. > Petr, any opinion on this matter? I am planning to push this early > next week (by Tuesday) unless you or someone else think it is not a > good idea. > >> - replorigin_drop(roident, true); >> + replorigin_drop_by_name(name, false /* missing_ok */ , true /* nowait */ ); >> >> A modern IDE would certainly show you the function definition that allows you >> to check what each parameter value is without having to go back and forth. I >> saw a few occurrences of this pattern in the source code and IMO it could be >> used when it is not obvious what that value means. Booleans are easier to >> figure out, however, sometimes integer and text are not. >> > Fair enough, removed in the attached patch. To be fair the logical replication framework is full of these comments so it's pretty natural to add them to new code as well, but I agree with Euler that it's unnecessary with any reasonable development tooling. The patch as posted looks good to me, as an extension author I normally have origin cached by id, so the api change means I have to do name lookup now, but given this is just for drop, it does not really matter. -- Petr
On Sat, Feb 6, 2021 at 3:26 PM Petr Jelinek <pjmodos@pjmodos.net> wrote: > > On 06/02/2021 07:29, Amit Kapila wrote: > > On Fri, Feb 5, 2021 at 6:45 PM Euler Taveira <euler@eulerto.com> wrote: > >> - replorigin_drop(roident, true); > >> + replorigin_drop_by_name(name, false /* missing_ok */ , true /* nowait */ ); > >> > >> A modern IDE would certainly show you the function definition that allows you > >> to check what each parameter value is without having to go back and forth. I > >> saw a few occurrences of this pattern in the source code and IMO it could be > >> used when it is not obvious what that value means. Booleans are easier to > >> figure out, however, sometimes integer and text are not. > >> > > Fair enough, removed in the attached patch. > > > To be fair the logical replication framework is full of these comments > so it's pretty natural to add them to new code as well, but I agree with > Euler that it's unnecessary with any reasonable development tooling. > > The patch as posted looks good to me, > Thanks, but today again testing this API, I observed that we can still get "tuple concurrently deleted" because we are releasing the lock on ReplicationOriginRelationId at the end of API replorigin_drop_by_name. So there is no guarantee that invalidation reaches other backend doing the same operation. I think we need to keep the lock till the end of xact as we do in other drop operations (see DropTableSpace, dropdb). -- With Regards, Amit Kapila.
On Sat, Feb 6, 2021 at 5:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Feb 6, 2021 at 3:26 PM Petr Jelinek <pjmodos@pjmodos.net> wrote: > > > > On 06/02/2021 07:29, Amit Kapila wrote: > > > On Fri, Feb 5, 2021 at 6:45 PM Euler Taveira <euler@eulerto.com> wrote: > > >> - replorigin_drop(roident, true); > > >> + replorigin_drop_by_name(name, false /* missing_ok */ , true /* nowait */ ); > > >> > > >> A modern IDE would certainly show you the function definition that allows you > > >> to check what each parameter value is without having to go back and forth. I > > >> saw a few occurrences of this pattern in the source code and IMO it could be > > >> used when it is not obvious what that value means. Booleans are easier to > > >> figure out, however, sometimes integer and text are not. > > >> > > > Fair enough, removed in the attached patch. > > > > > > To be fair the logical replication framework is full of these comments > > so it's pretty natural to add them to new code as well, but I agree with > > Euler that it's unnecessary with any reasonable development tooling. > > > > The patch as posted looks good to me, > > > > Thanks, but today again testing this API, I observed that we can still > get "tuple concurrently deleted" because we are releasing the lock on > ReplicationOriginRelationId at the end of API replorigin_drop_by_name. > So there is no guarantee that invalidation reaches other backend doing > the same operation. I think we need to keep the lock till the end of > xact as we do in other drop operations (see DropTableSpace, dropdb). > Fixed the problem as mentioned above in the attached. -- With Regards, Amit Kapila.
Attachment
On Mon, Feb 8, 2021, at 3:23 AM, Amit Kapila wrote:
Fixed the problem as mentioned above in the attached.
This new version looks good to me.
--
Euler Taveira
> +void > +replorigin_drop_by_name(char *name, bool missing_ok, bool nowait) > +{ > + RepOriginId roident; > + Relation rel; > + > + Assert(IsTransactionState()); > + > + /* > + * To interlock against concurrent drops, we hold ExclusiveLock on > + * pg_replication_origin throughout this function. > + */ This comment is now wrong though; should s/throughout.*/till xact commit/ to reflect the new reality. I do wonder if this is going to be painful in some way, since the lock is now going to be much longer-lived. My impression is that it's okay, since dropping an origin is not a very frequent occurrence. It is going to block pg_replication_origin_advance() with *any* origin, which acquires RowExclusiveLock on the same relation. If this is a problem, then we could use LockSharedObject() in both places (and make it last till end of xact for the case of deletion), instead of holding this catalog-level lock till end of transaction. -- Álvaro Herrera Valdivia, Chile "On the other flipper, one wrong move and we're Fatal Exceptions" (T.U.X.: Term Unit X - http://www.thelinuxreview.com/TUX/)
On Mon, Feb 8, 2021 at 11:30 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > +void > > +replorigin_drop_by_name(char *name, bool missing_ok, bool nowait) > > +{ > > + RepOriginId roident; > > + Relation rel; > > + > > + Assert(IsTransactionState()); > > + > > + /* > > + * To interlock against concurrent drops, we hold ExclusiveLock on > > + * pg_replication_origin throughout this function. > > + */ > > This comment is now wrong though; should s/throughout.*/till xact commit/ > to reflect the new reality. > Right, I'll fix in the next version. > I do wonder if this is going to be painful in some way, since the lock > is now going to be much longer-lived. My impression is that it's okay, > since dropping an origin is not a very frequent occurrence. It is going > to block pg_replication_origin_advance() with *any* origin, which > acquires RowExclusiveLock on the same relation. If this is a problem, > then we could use LockSharedObject() in both places (and make it last > till end of xact for the case of deletion), instead of holding this > catalog-level lock till end of transaction. > IIUC, you are suggesting to use lock for the particular origin instead of locking the corresponding catalog table in functions pg_replication_origin_advance and replorigin_drop_by_name. If so, I don't see any problem with the same but please note that we do take catalog-level lock in replorigin_create() which would have earlier prevented create and drop to run concurrently. Having said that, I don't see any problem with it because I think till the drop is committed, the create will see the corresponding row as visible and we won't generate the wrong origin_id. What do you think? -- With Regards, Amit Kapila.
On Tue, Feb 9, 2021 at 9:27 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Feb 8, 2021 at 11:30 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > > +void > > > +replorigin_drop_by_name(char *name, bool missing_ok, bool nowait) > > > +{ > > > + RepOriginId roident; > > > + Relation rel; > > > + > > > + Assert(IsTransactionState()); > > > + > > > + /* > > > + * To interlock against concurrent drops, we hold ExclusiveLock on > > > + * pg_replication_origin throughout this function. > > > + */ > > > > This comment is now wrong though; should s/throughout.*/till xact commit/ > > to reflect the new reality. > > > > Right, I'll fix in the next version. > Fixed in the attached. > > I do wonder if this is going to be painful in some way, since the lock > > is now going to be much longer-lived. My impression is that it's okay, > > since dropping an origin is not a very frequent occurrence. It is going > > to block pg_replication_origin_advance() with *any* origin, which > > acquires RowExclusiveLock on the same relation. If this is a problem, > > then we could use LockSharedObject() in both places (and make it last > > till end of xact for the case of deletion), instead of holding this > > catalog-level lock till end of transaction. > > > > IIUC, you are suggesting to use lock for the particular origin instead > of locking the corresponding catalog table in functions > pg_replication_origin_advance and replorigin_drop_by_name. If so, I > don't see any problem with the same > I think it won't be that straightforward as we don't have origin_id. So what we instead need to do is first to acquire a lock on ReplicationOriginRelationId, get the origin_id, lock the specific origin and then re-check if the origin still exists. I feel some similar changes might be required in pg_replication_origin_advance. Now, we can do this optimization if we want but I am not sure if origin_drop would be a frequent enough operation that we add such an optimization. For now, I have added a note in the comments so that if we find any such use case we can implement such optimization in the future. What do you think? -- With Regards, Amit Kapila.
Attachment
On 2021-Feb-09, Amit Kapila wrote: > > IIUC, you are suggesting to use lock for the particular origin instead > > of locking the corresponding catalog table in functions > > pg_replication_origin_advance and replorigin_drop_by_name. Right. > I think it won't be that straightforward as we don't have origin_id. > So what we instead need to do is first to acquire a lock on > ReplicationOriginRelationId, get the origin_id, lock the specific > origin and then re-check if the origin still exists. I feel some > similar changes might be required in pg_replication_origin_advance. Hmm, ok. > Now, we can do this optimization if we want but I am not sure if > origin_drop would be a frequent enough operation that we add such an > optimization. For now, I have added a note in the comments so that if > we find any such use case we can implement such optimization in the > future. What do you think? By all means let's get the bug fixed. Then, in another patch, we can optimize further, if there really is a problem. -- Álvaro Herrera 39°49'30"S 73°17'W "Siempre hay que alimentar a los dioses, aunque la tierra esté seca" (Orual)
On Tue, Feb 9, 2021 at 4:16 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2021-Feb-09, Amit Kapila wrote: > > > Now, we can do this optimization if we want but I am not sure if > > origin_drop would be a frequent enough operation that we add such an > > optimization. For now, I have added a note in the comments so that if > > we find any such use case we can implement such optimization in the > > future. What do you think? > > By all means let's get the bug fixed. > I am planning to push this in HEAD only as there is no user reported problem and this is actually more about giving correct information to the user rather than some misleading message. Do you see any need to back-patch this change? -- With Regards, Amit Kapila.
On 2021-Feb-09, Amit Kapila wrote: > On Tue, Feb 9, 2021 at 4:16 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > By all means let's get the bug fixed. > > I am planning to push this in HEAD only as there is no user reported > problem and this is actually more about giving correct information to > the user rather than some misleading message. Do you see any need to > back-patch this change? master-only sounds OK. -- Álvaro Herrera Valdivia, Chile "Cuando mañana llegue pelearemos segun lo que mañana exija" (Mowgli)
On Tue, Feb 9, 2021 at 5:53 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2021-Feb-09, Amit Kapila wrote: > > > On Tue, Feb 9, 2021 at 4:16 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > By all means let's get the bug fixed. > > > > I am planning to push this in HEAD only as there is no user reported > > problem and this is actually more about giving correct information to > > the user rather than some misleading message. Do you see any need to > > back-patch this change? > > master-only sounds OK. > Pushed! -- With Regards, Amit Kapila.