Re: [Patch] add new parameter to pg_replication_origin_session_setup - Mailing list pgsql-hackers
| From | shveta malik |
|---|---|
| Subject | Re: [Patch] add new parameter to pg_replication_origin_session_setup |
| Date | |
| Msg-id | CAJpy0uB19aKEgVgh8gwzj87NUyDgOf01boa-6xJZK+nhb=3W4g@mail.gmail.com Whole thread Raw |
| In response to | RE: [Patch] add new parameter to pg_replication_origin_session_setup ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
| List | pgsql-hackers |
On Fri, Jan 9, 2026 at 4:58 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear Amit, Shveta,
>
> > > >
> > > > Thanks Hou-San and Kuroda-San.
> > > >
> > > > What should be the expected behavior when Session1 resets the origin
> > > > (changing acquired_pid from its own PID to 0), while Session2 is
> > > > already connected to the origin and Session3 also attempts to reuse
> > > > the same origin?
> > > >
> > > > Currently it asserts:
> > > >
> > > > Session1:
> > > > select pg_replication_origin_create('origin');
> > > > SELECT pg_replication_origin_session_setup('origin');
> > > >
> > > > Session2:
> > > > SELECT pg_replication_origin_session_setup('origin',48028);
> > > >
> > > > Session1:
> > > > SELECT pg_replication_origin_session_reset();
> > > >
> > > > Session3:
> > > > SELECT pg_replication_origin_session_setup('origin');
> > > > This asserts at:
> > > > TRAP: failed Assert("session_replication_state->refcount == 0"), File:
> > > > "origin.c", Line: 1231, PID: 48037
> > > >
>
> FYI, this happened because v1 assumed refcount was 0 if acquired_by was 0.
> But your proposed scenario met it.
>
> > > I checked the behavior on HEAD. Session3 is able to set up the origin
> > > and sets its own PID in acquired_pid. But it is unclear to me which
> > > PID should be recorded in acquired_pid - Session2’s PID, since it set
> > > up the origin earlier, or Session3’s PID. Or does this even make any
> > > difference?
>
> To clarify, I think the behavior on HEAD is not correct. The backend should
> acquire the active origin if it expressly specifies the PID. Otherwise, users
> may acquire unintentionally and advance it.
I agree.
>
> > Can we address these problems by prohibiting leader worker to reset
> > when pa workers are still associated with the origin? The way for
> > leader to know if pa workers are associated with origin is by checking
> > following condition: acquired_by == MyProcpid AND refcount > 1.
>
> I think it's okay. IIUC, the idea is to avoid that active origin has invalid
> acquired_by attribute. The replication origin was extended to support parallel
> apply of logical replication, and it is reasonable to force the same approach.
> Attached 0001 implemented that.
>
> One concern with the implementation is that acquired_by can be zero if the process
> exits without releasing the origin; this can happen if the first acquired process
> exits while the second is still using it.
> Regarding our logical replication, it won't be problematic because the leader
> worker ensures all parallel workers finish before it exits.
>
> To address the issue, I propose that another process should not be able to
> acquire the origin if the acquired_by of the active origin is 0. The problem
> should be resolved within the SQL interface, since it occurs there.
>
+1.
Please find a few comments:
1)
+ /*
+ * The replication origin cannot be reset if the replication origin is
+ * firstly acquired by this backend and other processes are actively using
+ * now. This can cause acquired_by to be zero and active replication origin
+ * might be dropped.
+ */
+ if (session_replication_state->acquired_by == MyProcPid &&
+ session_replication_state->refcount > 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg_plural("another process is acquiring the replication origin",
+ "other processes are acquiring the replication origin",
+
Since user is not aware of internal acquired_by logic, the error might
not make much sense to him as to why one session is able to reset
while another is not. Shall we make it:
ERROR: cannot reset replication origin "origin_name" while it is
still shared by other processes
DETAIL: the current session is the first process for this replication
origin, and other processes are sharing it.
HINT: ensure this replication origin is reset in all other processes first.
2)
When the first session leaves, while the second session is still using
origin, the third session is able to drop the origin which is not
right.
I think replorigin_state_clear() needs a change.
'if (state->acquired_by != 0)' check should be replaced by 'if
(state->refcount > 0)'
Patch 001 had correct changes in replorigin_state_clear(), IMO we
still need those
3)
When first session leaves, while second session is still using origin,
now correctly third session is not able to join it. It gives error:
postgres=# SELECT pg_replication_origin_session_setup('origin');
ERROR: replication origin with ID 1 is already active for another process
Error is not very informative provided the fact that now sharing is
allowed. Shall it be:
ERROR: replication origin "origin_name" cannot be acquired while it
is still in use
DETAIL: the process that first acquired this origin exited without
releasing it.
HINT: wait until all processes sharing this origin have released it.
Or
HINT: ensure this replication origin is reset in all other processes first.
4)
+ /* Number of backend that is currently using this origin. */
Number of backend that is --> Count of backends that are
thanks
Shveta
pgsql-hackers by date: