Re: Add errdetail() with PID and UID about source of termination signal - Mailing list pgsql-hackers
| From | jie wang |
|---|---|
| Subject | Re: Add errdetail() with PID and UID about source of termination signal |
| Date | |
| Msg-id | CAJnZyeDui4YFseMKUc93bMHyY8J5cpFXRoVqNPsZQ_kWhLcVjw@mail.gmail.com Whole thread Raw |
| In response to | Re: Add errdetail() with PID and UID about source of termination signal (Andrew Dunstan <andrew@dunslane.net>) |
| Responses |
Re: Add errdetail() with PID and UID about source of termination signal
|
| List | pgsql-hackers |
Andrew Dunstan <andrew@dunslane.net> 于2026年4月7日周二 00:51写道:
On 2026-02-26 Th 4:25 AM, Jakub Wartak wrote:
> On Thu, Feb 26, 2026 at 4:09 AM Chao Li <li.evan.chao@gmail.com> wrote:
>
> Hi Chao,
>
>> I just reviewed v4 again and got a few more comments:
>>
>> 1. This patch only set the global proc_die_sender_pid/uid to 0 at startup, then assign values to them upon receiving SIGTERM, and never reset them, which assumes a process must die upon SIGTERM. Is the assumption true? I guess not. If a process receives SIGTERM and not die immediately, then die for other reason, then it may report a misleading PID and UID.
> Hmm, I'm not sure I follow. If we receive SIGTERM and not die immediately
> (for whatever reason), then two scenarios can happen as far as I'm concerned:
> * another SIGTERM comes in from the same or different uid/pid and it wll be
> reported properly
> * different SIGKILL, but in this case we won't report UID/PID at all
>
> am I missing something or do You have any particular scenario in mind?
> The flow will be wrapper_handler()->die()->SetLatch()->..->directyl to
> err reporting facilities.
>
>> 2.
>> syncrpe.c uses errhint to print PID and UID, and postgres.c uses errdetail. We should keep consistency, maybe all use errhint.
> Right, let's make it that way.
>
>> 3.
>> ```
>> @@ -319,7 +323,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
>> QueryCancelPending = false;
>> ereport(WARNING,
>> (errmsg("canceling wait for synchronous replication due to user request"),
>> - errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
>> + errdetail("The transaction has already committed locally, but might not have been replicated to the standby."),
>> + proc_die_sender_pid == 0 ? 0 :
>> + errhint("Signal sent by PID %d, UID %d.",
>> + proc_die_sender_pid, proc_die_sender_uid)
>> + ));
>> SyncRepCancelWait();
>> break;
>> }
>> ```
>>
>> I don’t think the query cancel case relates to SIGTERM, so we don’t need to log PID and UID here.
> Right, it was superfluous.
>
> v5 attached.
>
I'd kinda like to sneak this in for pg19, because I think it's useful.
Here's a v6 that changes one or two things:
- changes the globals to sig_atomic_t
- in ProcessInterrupts, copies to local sender_pid/sender_uid, then
zeros the globals before any ereport
- uses errdetail() for all the messages
Plus a few more cosmetic changes like consistent casing.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
sig_atomic_t is underlying int, but pid_t and uid_t are usually unsigned int,
so that while saving pid and uid to ProcDieSenderPid and ProcDieSenderUid,
overflow may happen. But that’s fine, as the data is stored in the same way
and we only want to print them. So, for the print statement:
#define ERRDETAIL_SIGNAL_SENDER(pid, uid) \
and we only want to print them. So, for the print statement:
#define ERRDETAIL_SIGNAL_SENDER(pid, uid) \
((pid) == 0 ? 0 : \
errdetail("Signal sent by PID %d, UID %d.", (int) (pid), (int) (uid)))
Does it make sense to use %u and cast to pid_t and uid_t? Like:
#define ERRDETAIL_SIGNAL_SENDER(pid, uid) \
((pid) == 0 ? 0 : \
Does it make sense to use %u and cast to pid_t and uid_t? Like:
#define ERRDETAIL_SIGNAL_SENDER(pid, uid) \
((pid) == 0 ? 0 : \
errdetail("Signal sent by PID %u, UID %u.", (pid_t) (pid), (uid_t) (uid)))
Thanks!
--
wang jie
pgsql-hackers by date: