Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations - Mailing list pgsql-hackers

From MauMau
Subject Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
Date
Msg-id CED78EDAC06B41E8AF82173A00FBD404@maumau
Whole thread Raw
In response to Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [RFC] Should smgrtruncate() avoid sending sinval message for temp relations
List pgsql-hackers
From: "Tom Lane" <tgl@sss.pgh.pa.us>
>> [ sinval catchup signal -> ProcessCatchupEvent -> WaitLatch -> deadlock ]

I must add one thing.  After some client processes closed the connection 
without any hang, their server processes were stuck with a stack trace like 
this (I'll look for and show the exact stack trace tomorrow):

WaitLatchOrSocket
SyncRepWaitForLSN
CommitTransaction
CommitTransactionCommand
ProcessCatchupEvent
...
shmem_exit
proc_exit_prepare
proc_exit
PostgresMain
...

The process appears to be hanging during session termination.  So, it's not 
the problem only during client request wait.


> Another line of thought is that we've been way too uncritical about
> shoving different kinds of events into the SIGUSR1 multiplexor.
> It might be a good idea to separate "high level" interrupts from
> "low level" ones, using say SIGUSR1 for the former and SIGUSR2
> for the latter.  However, that doesn't sound very back-patchable,
> even assuming that we can come up with a clean division.

This seems to be one step in the right direction.  There are two issues in 
the current implementation:

[Issue 1]
[ sinval catchup signal -> ProcessCatchupEvent -> WaitLatch -> deadlock ]
This is (partly) because the latch wakeup and other processing use the same 
SIGUSR1 in normal backend, autovacuum launcher/worker, and the background 
worker with database access.  On the other hand, other background daemon 
processes properly use SIGUSR1 only for latch wakeup, and SIGUSR2 for other 
tasks.

[Issue 2]
WaitLatch() returns prematurely due to the sinval catchup signal, even 
though the target event (e.g. reply from standby) hasn't occurred and called 
SetLatch() yet.  This is because procsignal_sigusr1_handler() always calls 
latch_sigusr1_handler().
This is probably not related to the cause of the hang.

So, as you suggest, I think it would be a good idea to separate 
latch_sigusr1_handler() call into a different function solely for it like 
other daemon processes,  and leave the rest in procsignal_sigusr1_handler() 
and rename function to procsignal_sigusr2_handler() or procsignal_handler(). 
Originally, it's not natural that the current procsignal_sigusr1_handler() 
contains latch_sigusr1_handler() call, because latch processing is not based 
on the procsignal mechanism (SetLatch() doesn't call SendProcSignal()).

I'll try the fix tomorrow if possible.  What kind of problems do you hink of 
for back-patching?

Regards
MauMau





pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Use unique index for longer pathkeys.
Next
From: Haribabu Kommi
Date:
Subject: Re: [BUGS] BUG #9652: inet types don't support min/max