Home > mailing lists

Re: dropdb --force - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: dropdb --force
Date	November 6, 2019 11:29:19
Msg-id	CAA4eK1+Gy=pxwrFSY65UE1=H3p1jO2Nzca0A=MBGWL-s4iZ20w@mail.gmail.com Whole thread Raw
In response to	Re: dropdb --force (Pavel Stehule <pavel.stehule@gmail.com>)
Responses	Re: dropdb --force (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On Sun, Nov 3, 2019 at 2:08 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>

>> pá 25. 10. 2019 v 4:55 odesílatel Amit Kapila <amit.kapila16@gmail.com> napsal:
>>>
>>> On Thu, Oct 24, 2019 at 8:22 PM Pavel Stehule <pavel.stehule@gmail.com> wrote:
>>> >
>>> > čt 24. 10. 2019 v 11:10 odesílatel Amit Kapila <amit.kapila16@gmail.com> napsal:
>>> >>
>>> >> While making some changes in the patch, I noticed that in
>>> >> TerminateOtherDBBackends, there is a race condition where after we
>>> >> release the ProcArrayLock, the backend process to which we decided to
>>> >> send a signal exits by itself and the same pid can be assigned to
>>> >> another backend which is connected to some other database.  This leads
>>> >> to terminating a wrong backend.  I think we have some similar race
>>> >> condition at few other places like in pg_terminate_backend(),
>>> >> ProcSleep() and CountOtherDBBackends().  I think here the risk is a
>>> >> bit more because there could be a long list of pids.
>>> >>
>>> >> One idea could be that we write a new function similar to IsBackendPid
>>> >> which takes dbid and ensures that pid belongs to that database and use
>>> >> that before sending kill signal, but still it will not be completely
>>> >> safe.  But, I think it can be closer to cases like we already have in
>>> >> code.
>>> >>
>>> >> Another possible idea could be to use the SendProcSignal mechanism
>>> >> similar to how we have used it in CancelDBBackends() to allow the
>>> >> required backends to exit by themselves.  This might be safer.
>>> >>
>>> >> I am not sure if we can entirely eliminate this race condition and
>>> >> whether it is a good idea to accept such a race condition even though
>>> >> it exists in other parts of code.  What do you think?
>>> >>
>>> >> BTW, I have added/revised some comments in the code and done few other
>>> >> cosmetic changes, the result of which is attached.
>>> >
>>> >
>>> > Tomorrow I'll check variants that you mentioned.
>>> >
>>> > We sure so there are not any new connect to removed database, because we hold lock there.
>>> >
>>>
>>> Right.
>>>
>>> > So check if oid db is same should be enough.
>>> >
>>>
>>> We can do this before sending a kill signal but is it enough?  Because
>>> as soon as we release ProcArrayLock anytime the other process can exit
>>> and a new process can use its pid.  I think this is more of a
>>> theoretical risk and might not be easy to hit, but still, we can't
>>> ignore it.
>>
>>
>> yes, there is a theoretical risk probably - the released pid should near current fresh pid from range 0 .. pid_max.
>>
>> Probably the best solutions is enhancing SendProcSignal and using it here and fix CountOtherDBBackends.
>>
>> Alternative solution can be killing in block 50 processes and recheck. I'll try to write both and you can decide for
one
>
>
> I am sending patch where kill was replaced by SendProcSignal. I tested it on pg_bench with 400 connections, and it
workswithout problems. 
>

I think there is still a window where the same problem can happen, say
the signal has been sent by SendProcSignal to the required process and
it releases the ProcArrayLock.  Now, the target process exits and a
new process gets the same pid before the signal is received.

I am not sure but I think we can avoid such a race condition if we set
a flag (say killPending or something like that on the lines of
recoveryConflictPending) in proc and then check that flag before
allowing the process to die.  If something on these lines is feasible,
then I think there will be no race condition where we will kill the
wrong backend.  If we do this, then probably, just setting the flag
under ProcArrayLock is sufficient.  The signal can be sent outside the
lock.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Peter Eisentraut
Date: 06 November 2019, 11:04:25
Subject: Re: pglz performance

From: Kyotaro Horiguchi
Date: 06 November 2019, 11:29:27
Subject: Re: [HACKERS] WAL logging problem in 9.4.3?

Re: dropdb --force - Mailing list pgsql-hackers

Previous

Next