Re: Refactor recovery conflict signaling a little - Mailing list pgsql-hackers

From Chao Li
Subject Re: Refactor recovery conflict signaling a little
Date
Msg-id EA619211-93E3-40BF-8EC0-AED29B87AC33@gmail.com
Whole thread Raw
In response to Refactor recovery conflict signaling a little  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers

> On Jan 23, 2026, at 07:20, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> I had a look at recovery conflict signaling and a few things caught my eye. No functional changes, but some cleanups
andreadability improvements: 
>
> Patch 0001: Remove useless errdetail_abort()
> --------------------------------------------
>
> The function is supposed to add DETAIL to errors when you are in an aborted transaction, if the transaction was
abortedby a recovery conflict, like this: 
>
> ERROR:  current transaction is aborted, commands ignored until end of transaction block"
> DETAIL:  Abort reason: recovery conflict
>
> But I don't see how to reach that. If a transaction is aborted by recovery conflict, you get a different error like
this:
>
> ERROR:  canceling statement due to conflict with recovery
> DETAIL:  User was holding a relation lock for too long.
>
> The transaction abort clears the 'recoveryConflictPending' flag, so even if that happens in a transaction block, you
don'tget that "DETAIL: Abort reason: recovery conflict" in the subsequent errors. 
>
> errdetail_abort() was introduced in commit a8ce974cdd. I suppose it was needed back then, but the signal handling has
changeda lot since. Looking at that commit now, though, I don't really understand how it was reachable even back then.
(Exceptwith a race with an unrelated transaction abort, see commit message) 
>
> Has anyone seen the "DETAIL:  Abort reason: recovery conflict" in recent years, or ever? If not, let's rip it out.
>

I did a Google search and couldn't find any post reporting the message, which seems to prove that message can be
removed.

>
> 0002: Don't hint that you can reconnect when the database is dropped
> --------------------------------------------------------------------
>
> If you're connected to a database is being dropped, during recovery, you get an error like this:
>
> FATAL:  terminating connection due to conflict with recovery
> DETAIL:  User was connected to a database that must be dropped.
> HINT:  In a moment you should be able to reconnect to the database and repeat your command.
>
> The hint seems misleading. The database is being dropped, you most likely can *not* reconnect to it. Let's remove it.
>

I like this change. Not only removing the misleading error message, the code is also clearer now.

>
> 0003-0004:  Separate RecoveryConflictReasons from procsignals
> -------------------------------------------------------------
>
> We're currently using different PROCSIG_* flags to indicate different kinds of recovery conflicts. We're also abusing
thesame flags in functions like LogRecoveryConflict, which isn't related to inter-process signaling. It seems better to
havea separate enum for the recovery conflict reasons. With this patch, there's just a single PROCSIG_RECOVERY_CONFLICT
towake up a process on a recovery conflict, and the reason is communicated by setting a flag in a bitmask in PGPROC. 
>
> I was inspired to do this in preparation of my project to replaces latches with "interrupts". By having just a single
PROCSIGflag, we reduce the need for "interrupt bits" with that project. But it seems nicer on its own merits too. 
>
>
> 0005: Refactor ProcessRecoveryConflictInterrupt for readability
> ---------------------------------------------------------------
>
> The function had a switch-statement with fallthrough through all the cases. It took me a while to understand how it
works.Once I finally understood it, I refactored it to not rely on the fallthrough. I hope this makes it easier for
otherstoo. 
>
> - Heikki
>
<0001-Remove-useless-errdetail_abort.patch><0002-Don-t-hint-that-you-can-reconnect-when-the-database-.patch><0003-Use-ProcNumber-rather-than-pid-in-ReplicationSlot.patch><0004-Separate-RecoveryConflictReasons-from-procsignals.patch><0005-Refactor-ProcessRecoveryConflictInterrupt-for-readab.patch>

A few comments on 003-0005:

1 - 0003
```
 ReplicationSlotAcquire(const char *name, bool nowait, bool error_if_invalid)
 {
     ReplicationSlot *s;
-    int            active_pid;
+    ProcNumber    active_proc;
+    pid_t        active_pid;
```

Active_pid is only used inside the "if (active_proc != MyProcNumber)” clause, so it can be only defined within the “if”
clause.


2 - 0003
```
                 if (MyBackendType == B_STARTUP)
-                    (void) SendProcSignal(active_pid,
-                                          PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT,
-                                          INVALID_PROC_NUMBER);
+                    SendProcSignal(active_pid,
+                                   PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT,
+                                   active_proc);
```

Here active_proc!=INVALID_PROC_NUMBER, so this changes the original logic without an explanation. Is the change
intentional?

3 - 0004
```
+     * This is a bitmask of RecoveryConflictReasons
+     */
+    pg_atomic_uint32 pendingRecoveryConflicts;
```

I just feel this comment is a little bit confusing because RecoveryConflictReasons is an enum. Maybe we can say
somethinglike: This is a bitmask; each bit corresponds to one RecoveryConflictReason enum value. 

4 - 0004
```
-                (void) SendProcSignal(pid, sigmode, vxid.procNumber);
+                (void) SendProcSignal(pid, PROCSIG_RECOVERY_CONFLICT, vxid.procNumber);
```

Nit: Here (void) is retained for SendProcSignal, but in the place of commit 2 for 0003, (void) is deleted when calling
SendProcSignal,is there any reason for retaining this one and deleting that one? 

5 - 0004
```
+     * doesn't check for deadlock direcly, because we want to kill one of the
```

Typo: direcly -> directly

6 - 0005
```
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index a2fa98ee971..bbf2254ca67 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -179,11 +179,15 @@ static bool IsTransactionExitStmt(Node *parsetree);
 static bool IsTransactionExitStmtList(List *pstmts);
 static bool IsTransactionStmtList(List *pstmts);
 static void drop_unnamed_stmt(void);
+static void ProcessRecoveryConflictInterrupts(void);
+static void ProcessRecoveryConflictInterrupt(RecoveryConflictReason reason);
+static void report_recovery_conflict(RecoveryConflictReason reason);
 static void log_disconnections(int code, Datum arg);
 static void enable_statement_timeout(void);
 static void disable_statement_timeout(void);


+
 /* ----------------------------------------------------------------
```

Nit: No need to add this empty line.

7 - 0005
```
+static void
+report_recovery_conflict(RecoveryConflictReason reason)
+{
+    bool        fatal;

+    if (RECOVERY_CONFLICT_DATABASE)
+    {
```

I believe this should be if (reason == RECOVERY_CONFLICT_DATABASE).

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: DOCS - "\d mytable" also shows any publications that publish mytable
Next
From: Chao Li
Date:
Subject: Re: DOCS - "\d mytable" also shows any publications that publish mytable