Thread: pg_sleep() doesn't work well with recovery conflict interrupts.
Hi, Since a64ca63e59c11d8fe6db24eee3d82b61db7c2c83 pg_sleep() uses WaitLatch() to wait. That's fine in itself. But procsignal_sigusr1_handler, which is used e.g. when resolving recovery conflicts, doesn't unconditionally do a SetLatch(). That means that we'll we'll currently not be able to cancel conflicting backends during recovery for 10min. Now, I don't think that'll happen too often in practice, but it's still annoying. As an alternative to doing the PG_TRY/save set_latch_on_sigusr1/set set_latch_on_sigusr1/PG_CATCH/reset set_latch_on_sigusr1/ dance in pg_sleep() we could also have RecoveryConflictInterrupt() do an unconditional SetLatch()? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, May 28, 2014 at 8:53 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> Hi,
>
> Since a64ca63e59c11d8fe6db24eee3d82b61db7c2c83 pg_sleep() uses
> WaitLatch() to wait. That's fine in itself. But
> procsignal_sigusr1_handler, which is used e.g. when resolving recovery
> conflicts, doesn't unconditionally do a SetLatch().
> That means that we'll we'll currently not be able to cancel conflicting
> backends during recovery for 10min. Now, I don't think that'll happen
> too often in practice, but it's still annoying.
How will such a situation occur, aren't we using pg_usleep during
> Hi,
>
> Since a64ca63e59c11d8fe6db24eee3d82b61db7c2c83 pg_sleep() uses
> WaitLatch() to wait. That's fine in itself. But
> procsignal_sigusr1_handler, which is used e.g. when resolving recovery
> conflicts, doesn't unconditionally do a SetLatch().
> That means that we'll we'll currently not be able to cancel conflicting
> backends during recovery for 10min. Now, I don't think that'll happen
> too often in practice, but it's still annoying.
How will such a situation occur, aren't we using pg_usleep during
RecoveryConflict functions
(ex. in ResolveRecoveryConflictWithVirtualXIDs)?
On 2014-05-30 10:30:42 +0530, Amit Kapila wrote: > On Wed, May 28, 2014 at 8:53 PM, Andres Freund <andres@2ndquadrant.com> > wrote: > > Hi, > > > > Since a64ca63e59c11d8fe6db24eee3d82b61db7c2c83 pg_sleep() uses > > WaitLatch() to wait. That's fine in itself. But > > procsignal_sigusr1_handler, which is used e.g. when resolving recovery > > conflicts, doesn't unconditionally do a SetLatch(). > > That means that we'll we'll currently not be able to cancel conflicting > > backends during recovery for 10min. Now, I don't think that'll happen > > too often in practice, but it's still annoying. > > How will such a situation occur, aren't we using pg_usleep during > RecoveryConflict functions > (ex. in ResolveRecoveryConflictWithVirtualXIDs)? I am not sure what you mean. pg_sleep() is the SQL callable function, a different thing to pg_usleep(). The latter isn't interruptible on all platforms, but the sleep times should be short enough for that not to matter. I am pretty sure by now that the sane fix for this is to add a SetLatch() call to RecoveryConflictInterrupt(). All the signal handlers that deal with query cancelation et al. do so, so it seems right that RecoveryConflictInterrupt() does so as well. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > I am pretty sure by now that the sane fix for this is to add a > SetLatch() call to RecoveryConflictInterrupt(). All the signal handlers > that deal with query cancelation et al. do so, so it seems right that > RecoveryConflictInterrupt() does so as well. +1 regards, tom lane
On Sun, Jun 1, 2014 at 1:05 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-05-30 10:30:42 +0530, Amit Kapila wrote:
> > On Wed, May 28, 2014 at 8:53 PM, Andres Freund <andres@2ndquadrant.com>
> > > Since a64ca63e59c11d8fe6db24eee3d82b61db7c2c83 pg_sleep() uses
> > > WaitLatch() to wait. That's fine in itself. But
> > > procsignal_sigusr1_handler, which is used e.g. when resolving recovery
> > > conflicts, doesn't unconditionally do a SetLatch().
> > > That means that we'll we'll currently not be able to cancel conflicting
> > > backends during recovery for 10min. Now, I don't think that'll happen
> > > too often in practice, but it's still annoying.
> >
> > How will such a situation occur, aren't we using pg_usleep during
> > RecoveryConflict functions
> > (ex. in ResolveRecoveryConflictWithVirtualXIDs)?
>
> I am not sure what you mean. pg_sleep() is the SQL callable function, a
> different thing to pg_usleep().
> On 2014-05-30 10:30:42 +0530, Amit Kapila wrote:
> > On Wed, May 28, 2014 at 8:53 PM, Andres Freund <andres@2ndquadrant.com>
> > > Since a64ca63e59c11d8fe6db24eee3d82b61db7c2c83 pg_sleep() uses
> > > WaitLatch() to wait. That's fine in itself. But
> > > procsignal_sigusr1_handler, which is used e.g. when resolving recovery
> > > conflicts, doesn't unconditionally do a SetLatch().
> > > That means that we'll we'll currently not be able to cancel conflicting
> > > backends during recovery for 10min. Now, I don't think that'll happen
> > > too often in practice, but it's still annoying.
> >
> > How will such a situation occur, aren't we using pg_usleep during
> > RecoveryConflict functions
> > (ex. in ResolveRecoveryConflictWithVirtualXIDs)?
>
> I am not sure what you mean. pg_sleep() is the SQL callable function, a
> different thing to pg_usleep().
I was not clear how such a situation can occur, but now looking at
it bit more carefully, I think I understood that any backend calling
pg_sleep() during recovery conflict resolution can face this situation.