Re: the s_lock_stuck on perform_spin_delay - Mailing list pgsql-hackers

From Andy Fan
Subject Re: the s_lock_stuck on perform_spin_delay
Date
Msg-id 87msstvper.fsf@163.com
Whole thread Raw
In response to Re: the s_lock_stuck on perform_spin_delay  (Andres Freund <andres@anarazel.de>)
Responses Re: the s_lock_stuck on perform_spin_delay
List pgsql-hackers
Hi, 

> Hi,
>
> On 2024-01-22 15:18:35 +0800, Andy Fan wrote:
>> I used sigismember(&BlockSig, SIGQUIT) to detect if a process is doing a
>> quickdie, however this is bad not only because it doesn't work on
>> Windows, but also it has too poor performance even it impacts on
>> USE_ASSERT_CHECKING build only. In v8, I introduced a new global
>> variable quickDieInProgress to handle this.
>
> For reasons you already noted, using sigismember() isn't viable. But I am not
> convinced by quickDieInProgress either. I think we could just reset the state
> for the spinlock check in the code handling PANIC, perhaps via a helper
> function in spin.c.

Handled with the action for your suggestion #3. 

>> +void
>> +VerifyNoSpinLocksHeld(void)
>> +{
>> +#ifdef USE_ASSERT_CHECKING
>> +    if (last_spin_lock_file != NULL)
>> +        elog(PANIC, "A spin lock has been held at %s:%d",
>> +             last_spin_lock_file, last_spin_lock_lineno);
>> +#endif
>> +}
>
> I think the #ifdef for this needs to be in the header, not here. Otherwise we
> add a pointless external function call to a bunch of performance critical
> code.
>
Done.

>> diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
>> index 7d601bef6d..c600a113cf 100644
>> --- a/src/backend/storage/buffer/bufmgr.c
>> +++ b/src/backend/storage/buffer/bufmgr.c
>> @@ -5409,6 +5409,7 @@ LockBufHdr(BufferDesc *desc)
>>  
>>      init_local_spin_delay(&delayStatus);
>>  
>> +    START_SPIN_LOCK();
>>      while (true)
>>      {
>>          /* set BM_LOCKED flag */
>
> Seems pretty odd that we now need init_local_spin_delay() and
> START_SPIN_LOCK(). Note that init_local_spin_delay() also wraps handling of
> __FILE__, __LINE__ etc, so it seems we're duplicating state here.

Thanks for catching this! Based on the feedbacks so far, it is not OK to
acquire another spin lock when holding one already. So I refactor the
code like this:

 /*
- * Support for spin delay which is useful in various places where
- * spinlock-like procedures take place.
+ * Support for spin delay and spin misuse detection purpose.
+ *
+ * spin delay which is useful in various places where spinlock-like
+ * procedures take place.
+ *
+ * spin misuse is based on global spinStatus to know if a spin lock
+ * is held when a heavy operation is taking.
  */
 typedef struct
 {
@@ -846,22 +854,40 @@ typedef struct
     const char *file;
     int            line;
     const char *func;
-} SpinDelayStatus;
+    bool        in_panic; /* works for spin lock misuse purpose. */
+} SpinLockStatus;

+extern PGDLLIMPORT SpinLockStatus spinStatus;

Now all the init_local_spin_delay, perform_spin_delay, finish_spin_delay
access the same global variable spinStatus. and just 2 extra functions
added (previously have 3). they are:

extern void VerifyNoSpinLocksHeld(bool check_in_panic);
extern void ResetSpinLockStatus(void);

The panic check stuff is still added into spinLockStatus. 

v9 attached.

-- 
Best Regards
Andy Fan


Attachment

pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: Trivial revise for the check of parameterized partial paths
Next
From: Andy Fan
Date:
Subject: Re: the s_lock_stuck on perform_spin_delay