Re: Review for GetWALAvailability() - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Review for GetWALAvailability() |
Date | |
Msg-id | f898aa30-053e-3598-f1f1-4b3b431f8f30@oss.nttdata.com Whole thread Raw |
In response to | Re: Review for GetWALAvailability() (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
Re: Review for GetWALAvailability()
Re: Review for GetWALAvailability() |
List | pgsql-hackers |
On 2020/06/17 17:30, Kyotaro Horiguchi wrote: > At Wed, 17 Jun 2020 17:01:11 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >> >> >> On 2020/06/17 12:10, Kyotaro Horiguchi wrote: >>> At Tue, 16 Jun 2020 22:40:56 -0400, Alvaro Herrera >>> <alvherre@2ndquadrant.com> wrote in >>>> On 2020-Jun-17, Fujii Masao wrote: >>>>> On 2020/06/17 3:50, Alvaro Herrera wrote: >>>> >>>>> So InvalidateObsoleteReplicationSlots() can terminate normal backends. >>>>> But do we want to do this? If we want, we should add the note about >>>>> this >>>>> case into the docs? Otherwise the users would be surprised at >>>>> termination >>>>> of backends by max_slot_wal_keep_size. I guess that it's basically >>>>> rarely >>>>> happen, though. >>>> >>>> Well, if we could distinguish a walsender from a non-walsender >>>> process, >>>> then maybe it would make sense to leave backends alive. But do we >>>> want >>>> that? I admit I don't know what would be the reason to have a >>>> non-walsender process with an active slot, so I don't have a good >>>> opinion on what to do in this case. >>> The non-walsender backend is actually doing replication work. It >>> rather should be killed? >> >> I have no better opinion about this. So I agree to leave the logic as >> it is >> at least for now, i.e., we terminate the process owning the slot >> whatever >> the type of process is. > > Agreed. > >>>>>>> + /* >>>>>>> + * Signal to terminate the process using the replication slot. >>>>>>> + * >>>>>>> + * Try to signal every 100ms until it succeeds. >>>>>>> + */ >>>>>>> + if (!killed && kill(active_pid, SIGTERM) == 0) >>>>>>> + killed = true; >>>>>>> + ConditionVariableTimedSleep(&slot->active_cv, 100, >>>>>>> + WAIT_EVENT_REPLICATION_SLOT_DROP); >>>>>>> + } while (ReplicationSlotIsActive(slot, NULL)); >>>>>> >>>>>> Note that here you're signalling only once and then sleeping many >>>>>> times >>>>>> in increments of 100ms -- you're not signalling every 100ms as the >>>>>> comment claims -- unless the signal fails, but you don't really expect >>>>>> that. On the contrary, I'd claim that the logic is reversed: if the >>>>>> signal fails, *then* you should stop signalling. >>>>> >>>>> You mean; in this code path, signaling fails only when the target >>>>> process >>>>> disappears just before signaling. So if it fails, slot->active_pid is >>>>> expected to become 0 even without signaling more. Right? >>>> >>>> I guess kill() can also fail if the PID now belongs to a process owned >>>> by a different user. >> >> Yes. This case means that the PostgreSQL process using the slot >> disappeared >> and the same PID was assigned to non-PostgreSQL process. So if kill() >> fails >> for this reason, we don't need to kill() again. >> >>> I think we've disregarded very quick reuse of >>>> PIDs, so we needn't concern ourselves with it. >>> The first time call to ConditionVariableTimedSleep doen't actually >>> sleep, so the loop works as expected. But we may make an extra call >>> to kill(2). Calling ConditionVariablePrepareToSleep beforehand of the >>> loop would make it better. >> >> Sorry I failed to understand your point... > > My point is the ConditionVariableTimedSleep does *not* sleep on the CV > first time in this usage. The new version anyway avoids useless > kill(2) call, but still may make an extra call to > ReplicationSlotAcquireInternal. I think we should call > ConditionVariablePrepareToSleep before the sorrounding for statement > block. OK, so what about the attached patch? I added ConditionVariablePrepareToSleep() just before entering the "for" loop in InvalidateObsoleteReplicationSlots(). Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
pgsql-hackers by date: