Re: Review for GetWALAvailability() - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Review for GetWALAvailability() |
Date | |
Msg-id | 20200617.121031.1692393794056067357.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: Review for GetWALAvailability() (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: Review for GetWALAvailability()
|
List | pgsql-hackers |
At Tue, 16 Jun 2020 22:40:56 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in > On 2020-Jun-17, Fujii Masao wrote: > > On 2020/06/17 3:50, Alvaro Herrera wrote: > > > So InvalidateObsoleteReplicationSlots() can terminate normal backends. > > But do we want to do this? If we want, we should add the note about this > > case into the docs? Otherwise the users would be surprised at termination > > of backends by max_slot_wal_keep_size. I guess that it's basically rarely > > happen, though. > > Well, if we could distinguish a walsender from a non-walsender process, > then maybe it would make sense to leave backends alive. But do we want > that? I admit I don't know what would be the reason to have a > non-walsender process with an active slot, so I don't have a good > opinion on what to do in this case. The non-walsender backend is actually doing replication work. It rather should be killed? > > > > + /* > > > > + * Signal to terminate the process using the replication slot. > > > > + * > > > > + * Try to signal every 100ms until it succeeds. > > > > + */ > > > > + if (!killed && kill(active_pid, SIGTERM) == 0) > > > > + killed = true; > > > > + ConditionVariableTimedSleep(&slot->active_cv, 100, > > > > + WAIT_EVENT_REPLICATION_SLOT_DROP); > > > > + } while (ReplicationSlotIsActive(slot, NULL)); > > > > > > Note that here you're signalling only once and then sleeping many times > > > in increments of 100ms -- you're not signalling every 100ms as the > > > comment claims -- unless the signal fails, but you don't really expect > > > that. On the contrary, I'd claim that the logic is reversed: if the > > > signal fails, *then* you should stop signalling. > > > > You mean; in this code path, signaling fails only when the target process > > disappears just before signaling. So if it fails, slot->active_pid is > > expected to become 0 even without signaling more. Right? > > I guess kill() can also fail if the PID now belongs to a process owned > by a different user. I think we've disregarded very quick reuse of > PIDs, so we needn't concern ourselves with it. The first time call to ConditionVariableTimedSleep doen't actually sleep, so the loop works as expected. But we may make an extra call to kill(2). Calling ConditionVariablePrepareToSleep beforehand of the loop would make it better. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: