Re: the s_lock_stuck on perform_spin_delay - Mailing list pgsql-hackers

From Andy Fan
Subject Re: the s_lock_stuck on perform_spin_delay
Date
Msg-id 871qaxp3ly.fsf@163.com
Whole thread Raw
In response to Re: the s_lock_stuck on perform_spin_delay  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
List pgsql-hackers
Hi Matthias and Robert,

Matthias van de Meent <boekewurm+postgres@gmail.com> writes:

> On Thu, 4 Jan 2024 at 08:09, Andy Fan <zhihuifan1213@163.com> wrote:
>>
>> My question is if someone doesn't obey the rule by mistake (everyone
>> can make mistake), shall we PANIC on a production environment? IMO I
>> think it can be a WARNING on a production environment and be a stuck
>> when 'ifdef USE_ASSERT_CHECKING'.
>> [...]
>> I think a experienced engineer like Thomas can make this mistake and the
>> patch was reviewed by 3 peoples, the bug is still there. It is not easy
>> to say just don't do it.
>>
>> the attached code show the prototype in my mind. Any feedback is welcome.
>
> While I understand your point and could maybe agree with the change
> itself (a crash isn't great),

It's great that both of you agree that the crash is not great. 

> I don't think it's an appropriate fix
> for the problem of holding a spinlock while waiting for a LwLock, as
> spin.h specifically mentions the following (and you quoted the same):
>
> """
> Keep in mind the coding rule that spinlocks must not be held for more
> than a few instructions.
> """

Yes, I agree that the known [LW]LockAcquire after holding a Spin lock
should be fixed at the first chance rather than pander to it with my
previous patch. My previous patch just take care of the *unknown*
cases (and I cced thomas in the hope that he knows the bug). I also
agree that the special case about [LW]LockAcquire should be detected
more effective as you suggested below. So v2 comes and commit 2 is for
this suggestion. 

>
> I suspect that we'd be better off with stronger protections against
> waiting for LwLocks while we hold any spin lock. More specifically,
> I'm thinking about something like tracking how many spin locks we
> hold, and Assert()-ing that we don't hold any such locks when we start
> to wait for an LwLock or run CHECK_FOR_INTERRUPTS-related code (with
> potential manual contextual overrides in specific areas of code where
> specific care has been taken to make it safe to hold spin locks while
> doing those operations - although I consider their existence unlikely
> I can't rule them out as I've yet to go through all lock-touching
> code). This would probably work in a similar manner as
> START_CRIT_SECTION/END_CRIT_SECTION.
>
> Kind regards,
>
> Matthias van de Meent
> Neon (https://neon.tech)


-- 
Best Regards
Andy Fan


Attachment

pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: Synchronizing slots from primary to standby
Next
From: Tomas Vondra
Date:
Subject: Re: index prefetching