Re: [PERFORM] Cpu usage 100% on slave. s_lock problem. - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.
Date
Msg-id 528E17BE.7020902@vmware.com
Whole thread Raw
In response to Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: [PERFORM] Cpu usage 100% on slave. s_lock problem.  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On 03.10.2013 00:14, Merlin Moncure wrote:
> On Wed, Oct 2, 2013 at 3:43 PM, Ants Aasma <ants@cybertec.at> wrote:
>> On Wed, Oct 2, 2013 at 10:37 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>> On Wed, Oct 2, 2013 at 9:45 AM, Ants Aasma <ants@cybertec.at> wrote:
>>>> I haven't reviewed the code in as much detail to say if there is an
>>>> actual race here, I tend to think there's probably not, but the
>>>> specific pattern that I had in mind is that with the following actual
>>>> code:
>>>
>>> hm.  I think there *is* a race.  2+ threads could race to the line:
>>>
>>> LocalRecoveryInProgress = xlogctl->SharedRecoveryInProgress;
>>>
>>> and simultaneously set the value of LocalRecoveryInProgress to false,
>>> and both engage InitXLOGAccess, which is destructive.   The move
>>> operation is atomic, but I don't think there's any guarantee the reads
>>> to xlogctl->SharedRecoveryInProgress are ordered between threads
>>> without a lock.

The backend is single-threaded, so this is moot.

>> InitXLOGAccess only writes backend local variables, so it can't be
>> destructive. In fact, it calls GetRedoRecPtr(), which does a spinlock
>> acquisition cycle, which should be a full memory barrier. A read
>> barrier after accessing SharedRecoveryInProgress is good enough, and
>> it seems to be necessary to avoid a race condition for
>> XLogCtl->ThisTimeLineID.
>>
>> Admittedly the race is so unprobable that in practice it probably
>> doesn't matter. I just wanted to be spell out the correct way to do
>> unlocked accesses as it can get quite confusing. I found Herb Sutter's
>> "atomic<> weapons" talk very useful, thinking about the problem in
>> terms of acquire and release makes it much clearer to me.
>
> If we don't care about multiple calls to InitXLOGAccess() (for a
> backend) then we can get away with just a barrier.  That's a pretty
> weird assumption though (even if 'improbable') and I think it should
> be well documented in the code, particularly since previous versions
> went though such trouble to do a proper check->set->init.
>
> I'm still leaning on my earlier idea (recovery4.patch) since it keeps
> the old semantics by exploiting the fact that the call site of
> interest (only) does not require a correct answer (that is, for the
> backend to think it's in recovery when it's not).  That just defers
> the heap prune for a time and you don't have to mess around with
> barriers at all or be concerned about present or future issues caused
> by spurious extra calls to InitXLOGAccess().  The other code paths to
> RecoveryInProgress() appear not to be interesting from a spinlock
> perspective.

Hmm. All callers of RecoveryInProgress() must be prepared to handle the
case that RecoveryInProgress() returns true, but the system is no longer
in recovery. No matter what locking we do in RecoveryInProgress(), the
startup process might finish recovery just after RecoveryInProgress()
has returned.

What about the attached? It reads the shared variable without a lock or
barrier. If it returns 'true', but the system in fact just exited
recovery, that's OK. As explained above, all the callers must tolerate
that anyway. But if it returns 'false', then it performs a full memory
barrier, which should ensure that it sees any other shared variables as
it is after the startup process cleared SharedRecoveryInProgress
(notably, XLogCtl->ThisTimeLineID).

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: b21de4e7b32f868a23bdc5507898d36cbe146164 seems to be two bricks shy of a load
Next
From: Tom Lane
Date:
Subject: Re: Proof of concept: standalone backend with full FE/BE protocol