Re: Review for GetWALAvailability() - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Review for GetWALAvailability()
Date
Msg-id 20200617.121031.1692393794056067357.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Review for GetWALAvailability()  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: Review for GetWALAvailability()
List pgsql-hackers
At Tue, 16 Jun 2020 22:40:56 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in 
> On 2020-Jun-17, Fujii Masao wrote:
> > On 2020/06/17 3:50, Alvaro Herrera wrote:
> 
> > So InvalidateObsoleteReplicationSlots() can terminate normal backends.
> > But do we want to do this? If we want, we should add the note about this
> > case into the docs? Otherwise the users would be surprised at termination
> > of backends by max_slot_wal_keep_size. I guess that it's basically rarely
> > happen, though.
> 
> Well, if we could distinguish a walsender from a non-walsender process,
> then maybe it would make sense to leave backends alive.  But do we want
> that?  I admit I don't know what would be the reason to have a
> non-walsender process with an active slot, so I don't have a good
> opinion on what to do in this case.

The non-walsender backend is actually doing replication work. It
rather should be killed?

> > > > +        /*
> > > > +         * Signal to terminate the process using the replication slot.
> > > > +         *
> > > > +         * Try to signal every 100ms until it succeeds.
> > > > +         */
> > > > +        if (!killed && kill(active_pid, SIGTERM) == 0)
> > > > +            killed = true;
> > > > +        ConditionVariableTimedSleep(&slot->active_cv, 100,
> > > > +                                    WAIT_EVENT_REPLICATION_SLOT_DROP);
> > > > +    } while (ReplicationSlotIsActive(slot, NULL));
> > > 
> > > Note that here you're signalling only once and then sleeping many times
> > > in increments of 100ms -- you're not signalling every 100ms as the
> > > comment claims -- unless the signal fails, but you don't really expect
> > > that.  On the contrary, I'd claim that the logic is reversed: if the
> > > signal fails, *then* you should stop signalling.
> > 
> > You mean; in this code path, signaling fails only when the target process
> > disappears just before signaling. So if it fails, slot->active_pid is
> > expected to become 0 even without signaling more. Right?
> 
> I guess kill() can also fail if the PID now belongs to a process owned
> by a different user.  I think we've disregarded very quick reuse of
> PIDs, so we needn't concern ourselves with it.

The first time call to ConditionVariableTimedSleep doen't actually
sleep, so the loop works as expected.  But we may make an extra call
to kill(2).  Calling ConditionVariablePrepareToSleep beforehand of the
loop would make it better.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Review for GetWALAvailability()
Next
From: Noah Misch
Date:
Subject: Re: valgrind versus pg_atomic_init()