RE: SIGQUIT on archiver child processes maybe not such a hot idea? - Mailing list pgsql-hackers

From Tsunakawa, Takayuki
Subject RE: SIGQUIT on archiver child processes maybe not such a hot idea?
Date
Msg-id 0A3221C70F24FB45833433255569204D1FD0B676@G01JPEXMBYT05
Whole thread Raw
In response to SIGQUIT on archiver child processes maybe not such a hot idea?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: SIGQUIT on archiver child processes maybe not such a hot idea?
List pgsql-hackers
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> After investigation, the mechanism that's causing that is that the
> src/test/recovery/t/010_logical_decoding_timelines.pl test shuts
> down its replica server with a mode-immediate stop, which causes
> that postmaster to shut down all its children with SIGQUIT, and
> in particular that signal propagates to a "cp" command that the
> archiver process is executing.  The "cp" is unsurprisingly running
> with default SIGQUIT handling, which per the signal man page
> includes dumping core.

We've experienced this (core dump in the data directory by an archive command) years ago.  Related to this, the example
ofusing cp in the PostgreSQL manual is misleading, because cp doesn't reliably persist the WAL archive file.
 


> This makes me wonder whether we shouldn't be using some other signal
> to shut down archiver subprocesses.  It's not real cool if we're
> spewing cores all over the place.  Admittedly, production servers
> are likely running with "ulimit -c 0" on most modern platforms,
> so this might not be a huge problem in the field; but accumulation
> of core files could be a problem anywhere that's configured to allow
> server core dumps.

We enable the core dump in production to help the investigation just in case.


> Ideally, perhaps, we'd be using SIGINT not SIGQUIT to shut down
> non-Postgres child processes.  But redesigning the system's signal
> handling to make that possible seems like a bit of a mess.
> 
> Thoughts?

We're using a shell script and a command that's called in the shell script.  That is:

archive_command = 'call some_shell_script.sh ...'

[some_shell_script.sh]
ulimit -c 0
trap SIGQUIT to just exit on the receipt of the signal
call some_command to copy file

some_command also catches SIGQUIT just exit.  It copies and syncs the file.

I proposed something in this line as below, but I couldn't respond to Peter's review comments due to other tasks.  Does
anyonethink it's worth resuming this?
 

https://www.postgresql.org/message-id/7E37040CF3804EA5B018D7A022822984@maumau


Regards
Takayuki Tsunakawa






pgsql-hackers by date:

Previous
From: Euler Taveira
Date:
Subject: Re: row filtering for logical replication
Next
From: Michael Paquier
Date:
Subject: Re: refactoring - share str2*int64 functions