Re: Add errdetail() with PID and UID about source of termination signal - Mailing list pgsql-hackers

From jie wang
Subject Re: Add errdetail() with PID and UID about source of termination signal
Date
Msg-id CAJnZyeDui4YFseMKUc93bMHyY8J5cpFXRoVqNPsZQ_kWhLcVjw@mail.gmail.com
Whole thread Raw
In response to Re: Add errdetail() with PID and UID about source of termination signal  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: Add errdetail() with PID and UID about source of termination signal
List pgsql-hackers


Andrew Dunstan <andrew@dunslane.net> 于2026年4月7日周二 00:51写道:

On 2026-02-26 Th 4:25 AM, Jakub Wartak wrote:
> On Thu, Feb 26, 2026 at 4:09 AM Chao Li <li.evan.chao@gmail.com> wrote:
>
> Hi Chao,
>
>> I just reviewed v4 again and got a few more comments:
>>
>> 1. This patch only set the global proc_die_sender_pid/uid to 0 at startup, then assign values to them upon receiving SIGTERM, and never reset them, which assumes a process must die upon SIGTERM. Is the assumption true? I guess not. If a process receives SIGTERM and not die immediately, then die for other reason, then it may report a misleading PID and UID.
> Hmm, I'm not sure I follow. If we receive SIGTERM and not die immediately
> (for whatever reason), then two scenarios can happen as far as I'm concerned:
> * another SIGTERM comes in from the same or different uid/pid and it wll be
> reported properly
> * different SIGKILL, but in this case we won't report UID/PID at all
>
> am I missing something or do You have any particular scenario in mind?
> The flow will be wrapper_handler()->die()->SetLatch()->..->directyl to
> err reporting facilities.
>
>> 2.
>> syncrpe.c uses errhint to print PID and UID, and postgres.c uses errdetail. We should keep consistency, maybe all use errhint.
> Right, let's make it that way.
>
>> 3.
>> ```
>> @@ -319,7 +323,11 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit)
>>                          QueryCancelPending = false;
>>                          ereport(WARNING,
>>                                          (errmsg("canceling wait for synchronous replication due to user request"),
>> -                                        errdetail("The transaction has already committed locally, but might not have been replicated to the standby.")));
>> +                                        errdetail("The transaction has already committed locally, but might not have been replicated to the standby."),
>> +                                               proc_die_sender_pid == 0 ? 0 :
>> +                                                       errhint("Signal sent by PID %d, UID %d.",
>> +                                                               proc_die_sender_pid, proc_die_sender_uid)
>> +                                               ));
>>                          SyncRepCancelWait();
>>                          break;
>>                  }
>> ```
>>
>> I don’t think the query cancel case relates to SIGTERM, so we don’t need to log PID and UID here.
> Right, it was superfluous.
>
> v5 attached.
>


I'd kinda like to sneak this in for pg19, because I think it's useful.
Here's a v6 that changes one or two things:


- changes the globals to sig_atomic_t

- in ProcessInterrupts, copies to local sender_pid/sender_uid, then
zeros the globals before any ereport

- uses errdetail() for all the messages


Plus a few more cosmetic changes like consistent casing.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com


I just reviewed v6 and got 1 comment.

sig_atomic_t is underlying int, but pid_t and uid_t are usually unsigned int,
so that while saving pid and uid to ProcDieSenderPid and ProcDieSenderUid,
overflow may happen. But that’s fine, as the data is stored in the same way
and we only want to print them. So, for the print statement:

#define ERRDETAIL_SIGNAL_SENDER(pid, uid) \
    ((pid) == 0 ? 0 : \
    errdetail("Signal sent by PID %d, UID %d.", (int) (pid), (int) (uid)))

Does it make sense to use %u and cast to pid_t and uid_t? Like:

#define ERRDETAIL_SIGNAL_SENDER(pid, uid) \
    ((pid) == 0 ? 0 : \
    errdetail("Signal sent by PID %u, UID %u.", (pid_t) (pid), (uid_t) (uid))) 

Thanks!
--
wang jie

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: pgsql: Reduce log level of some logical decoding messages from LOG to D
Next
From: Chao Li
Date:
Subject: Re: Add missing stats_reset column to pg_stat_database_conflicts view