Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager
Date
Msg-id CAD21AoDOmPAf-+P96hLC=mtDhPsA4uRGs53X8jwARAoRfbR=Zw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Sat, Dec 9, 2017 at 2:00 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Dec 8, 2017 at 3:20 AM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>> The first hunk in monitoring.sgml looks unnecessary.
>>
>> You meant the following hunk?
>>
>> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
>> index 8d461c8..7aa7981 100644
>> --- a/doc/src/sgml/monitoring.sgml
>> +++ b/doc/src/sgml/monitoring.sgml
>> @@ -669,8 +669,8 @@ postgres   27093  0.0  0.0  30096  2752 ?
>> Ss   11:34   0:00 postgres: ser
>>            Heavyweight locks, also known as lock manager locks or simply locks,
>>            primarily protect SQL-visible objects such as tables.  However,
>>            they are also used to ensure mutual exclusion for certain internal
>> -          operations such as relation extension.
>> <literal>wait_event</literal> will
>> -          identify the type of lock awaited.
>> +          operations such as waiting for a transaction to finish.
>> +          <literal>wait_event</literal> will identify the type of lock awaited.
>>           </para>
>>          </listitem>
>>          <listitem>
>>
>> I think that since the extension locks are no longer a part of
>> heavyweight locks we should change the explanation.
>
> Yes, you are right.
>
>>> +RelationExtensionLockWaiterCount(Relation relation)
>>>
>>> Hmm.  This is sort of problematic, because with then new design we
>>> have no guarantee that the return value is actually accurate.  I don't
>>> think that's a functional problem, but the optics aren't great.
>>
>> Yeah, with this patch we could overestimate it and then add extra
>> blocks to the relation. Since the number of extra blocks is capped at
>> 512 I think it would not become serious problem.
>
> How about renaming it EstimateNumberOfExtensionLockWaiters?

Agreed, fixed.

>
>>> +    /* If force releasing, release all locks we're holding */
>>> +    if (force)
>>> +        held_relextlock.nLocks = 0;
>>> +    else
>>> +        held_relextlock.nLocks--;
>>> +
>>> +    Assert(held_relextlock.nLocks >= 0);
>>> +
>>> +    /* Return if we're still holding the lock even after computation */
>>> +    if (held_relextlock.nLocks > 0)
>>> +        return;
>>>
>>> I thought you were going to have the caller adjust nLocks?
>>
>> Yeah, I was supposed to change so but since we always release either
>> one lock or all relext locks I thought it'd better to pass a bool
>> rather than an int.
>
> I don't see why you need to pass either one.  The caller can set
> held_relextlock.nLocks either with -- or = 0, and then call
> RelExtLockRelease() only if the resulting value is 0.

Fixed.

>
>>> When I took a break from sitting at the computer, I realized that I
>>> think this has a more serious problem: won't it permanently leak
>>> reference counts if someone hits ^C or an error occurs while the lock
>>> is held?  I think it will -- it probably needs to do cleanup at the
>>> places where we do LWLockReleaseAll() that includes decrementing the
>>> shared refcount if necessary, rather than doing cleanup at the places
>>> we release heavyweight locks.
>>> I might be wrong about the details here -- this is off the top of my head.
>>
>> Good catch. It can leak reference counts if someone hits ^C or an
>> error occurs while waiting. Fixed in the latest patch. But since
>> RelExtLockReleaseAll() is called even when such situations I think we
>> don't need to change the place where releasing the all relext lock. We
>> just moved it from heavyweight locks. Am I missing something?
>
> Hmm, that might be an OK way to handle it.  I don't see a problem off
> the top of my head.  It might be clearer to rename it to
> RelExtLockCleanup() though, since it is not just releasing the lock
> but also any wait count we hold.

Yeah, it seems better. Fixed.

> +/* Must be greater than MAX_BACKENDS - which is 2^23-1, so we're fine. */
> +#define RELEXT_WAIT_COUNT_MASK    ((uint32) ((1 << 24) - 1))
>
> Let's drop the comment here and instead add a StaticAssertStmt() that
> checks this.

Fixed. I added StaticAssertStmt() to InitRelExtLocks().

>
> I am slightly puzzled, though.  If I read this correctly, bits 0-23
> are used for the waiter count, bit 24 is always 0, bit 25 indicates
> the presence or absence of an exclusive lock, and bits 26+ are always
> 0.  That seems slightly odd.  Shouldn't we either use the highest
> available bit for the locker (bit 31) or the lowest one (bit 24)?  The
> former seems better, in case MAX_BACKENDS changes later.  We could
> make RELEXT_WAIT_COUNT_MASK bigger too, just in case.

I agree with the former. Fixed.

> +        /* Make a lock tag */
> +        tag.dbid = MyDatabaseId;
> +        tag.relid = relid;
>
> What about shared relations?  I bet we need to use 0 in that case.
> Otherwise, if backends in two different databases try to extend the
> same shared relation at the same time, we'll (probably) fail to notice
> that they conflict.
>

You're right. I changed it so that we set invalidOId to tag.dbid if
the relation is shared relation.


> + * To avoid unnecessary recomputations of the hash code, we try to do this
> + * just once per function, and then pass it around as needed.  we can
> + * extract the index number of RelExtLockArray.
>
> This is just a copy-and-paste from lock.c, but actually we have a more
> sophisticated scheme here.  I think you can just drop this comment
> altogether, really.

Fixed.

>
> +    return (tag_hash((const void *) locktag, sizeof(RelExtLockTag))
> +            % N_RELEXTLOCK_ENTS);
>
> I would drop the outermost set of parentheses.  Is the cast to (const
> void *) really doing anything?
>

Fixed.

> +                 "cannot acquire relation extension locks for
> multiple relations at the same");
>
> cannot simultaneously acquire more than one distinct relation lock?
> As you have it, you'd have to add the word "time" at the end, but my
> version is shorter.

I wanted to mean, cannot acquire relation extension locks for multiple
relations at the "time". Fixed.

>
> +        /* Sleep until the lock is released */
>
> Really, there's no guarantee that the lock will be released when we
> wake up.  I think just /* Sleep until something happens, then recheck
> */

Fixed.

> +        lock_free = (oldstate & RELEXT_LOCK_BIT) == 0;
> +        if (lock_free)
> +            desired_state += RELEXT_LOCK_BIT;
> +
> +        if (pg_atomic_compare_exchange_u32(&relextlock->state,
> +                                           &oldstate, desired_state))
> +        {
> +            if (lock_free)
> +                return false;
> +            else
> +                return true;
> +        }
>
> Hmm.  If the lock is not free, we attempt to compare-and-swap anyway,
> but then return false?  Why not just lock_free = (oldstate &
> RELEXT_LOCK_BIT) == 0; if (!lock_free) return true; if
> (pg_atomic_compare_exchange(&relextlock->state, &oldstate, oldstate |
> RELEXT_LOCK_BIT)) return false;

Fixed.

>
> +    Assert(IsAnyRelationExtensionLockHeld() == 0);
>
> Since this is return bool now, it should just be
> Assert(!IsAnyRelationExtensionLockHeld()).

Fixed.

Attached updated version patch. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment

pgsql-hackers by date:

Previous
From: Rushabh Lathia
Date:
Subject: Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Next
From: Masahiko Sawada
Date:
Subject: Re: [HACKERS] What does it mean by XLOG_BACKUP_RECORD?