> Robert Haas <robertmhaas@gmail.com> writes:
>> I'd support back-porting that commit to 9.1 and 9.2 as a fix for this
>> problem. As the commit message says, it's dead simple.
From: "Tom Lane" <tgl@sss.pgh.pa.us>
> While I have no great objection to back-porting Heikki's patch, it seems
> like a very large stretch to call this a root-cause fix. At best it's
> band-aiding one symptom in a rather fragile way.
Thank you, Robert san. I'll be waiting for it to be back-ported to the next
9.1/9.2 release.
Yes, I think this failure is only one potential symptom caused by the
implemnentation mistake -- handling both latch wakeup and other tasks that
wait on a latch in the SIGUSR1 handler. Although there may be no such tasks
now, I'd like to correct and clean up the implementation as follows to avoid
similar problems in the future. I think it's enough to do this only for
9.5. Please correct me before I go deeper in the wrong direction.
* The SIGUSR1 handler only does latch wakeup. Any other task is done in
other signal handlers such as SIGUSR2. Many daemon postgres processes
follow this style, but the normal backend, autovacuum daemons, and
background workers don't now.
* InitializeLatchSupport() in unix_latch.c calls pqsignal(SIGUSR1,
latch_sigusr1_handler). Change the argument of latch_sigusr1_handler()
accordingly.
* Remove SIGUSR1 handler registration and process-specific SIGUSR1 handler
functions from all processes. We can eliminate many SIGUSR1 handler
functions which have the same contents.
Regards
MauMau