Thread: Replace known_assigned_xids_lck by memory barrier

Replace known_assigned_xids_lck by memory barrier

From
Michail Nikolaev
Date:
Hello everyone and Tom.

Tom, this is about your idea (1) from 2010 to replace spinlock with a
memory barrier in a known assignment xids machinery.

It was mentioned by you again in (2) and in (3) we have decided to
extract this change into a separate commitfest entry.

So, creating it here with a rebased version of (4).

In a nutshell: KnownAssignedXids as well as the head/tail pointers are
modified only by the startup process, so spinlock is used to ensure
that updates of the array and head/tail pointers are seen in a correct
order. It is enough to pass the barrier after writing to the array
(but before updating the pointers) to achieve the same result.

Best regards.

[1]:
https://github.com/postgres/postgres/commit/2871b4618af1acc85665eec0912c48f8341504c4#diff-8879f0173be303070ab7931db7c757c96796d84402640b9e386a4150ed97b179R2408-R2412

[2]: https://www.postgresql.org/message-id/flat/1249332.1668553589%40sss.pgh.pa.us#19d00eb435340f5c5455e3bf259eccc8

[3]: https://www.postgresql.org/message-id/flat/1225350.1669757944%40sss.pgh.pa.us#23ca1956e694910fd7795a514a3bc79f

[4]:
https://www.postgresql.org/message-id/flat/CANtu0oiPoSdQsjRd6Red5WMHi1E83d2%2B-bM9J6dtWR3c5Tap9g%40mail.gmail.com#cc4827dee902978f93278732435e8521

Attachment

Re: Replace known_assigned_xids_lck by memory barrier

From
Nathan Bossart
Date:
On Sun, Mar 19, 2023 at 12:43:43PM +0300, Michail Nikolaev wrote:
> In a nutshell: KnownAssignedXids as well as the head/tail pointers are
> modified only by the startup process, so spinlock is used to ensure
> that updates of the array and head/tail pointers are seen in a correct
> order. It is enough to pass the barrier after writing to the array
> (but before updating the pointers) to achieve the same result.

What sort of benefits do you see from this patch?  It might be worthwhile
in itself to remove spinlocks when possible, but IME it's much easier to
justify such changes when there is a tangible benefit we can point to.

     /*
-     * Now update the head pointer.  We use a spinlock to protect this
+     * Now update the head pointer.  We use a memory barrier to protect this
      * pointer, not because the update is likely to be non-atomic, but to
      * ensure that other processors see the above array updates before they
      * see the head pointer change.
      *
      * If we're holding ProcArrayLock exclusively, there's no need to take the
-     * spinlock.
+     * barrier.
      */

Are the assignments in question guaranteed to be atomic?  IIUC we assume
that aligned 4-byte loads/stores are atomic, so we should be okay as long
as we aren't handling anything larger.

-        SpinLockAcquire(&pArray->known_assigned_xids_lck);
+        pg_write_barrier();
         pArray->headKnownAssignedXids = head;
-        SpinLockRelease(&pArray->known_assigned_xids_lck);

This use of pg_write_barrier() looks correct to me, but don't we need
corresponding read barriers wherever we obtain the pointers?  FWIW I tend
to review src/backend/storage/lmgr/README.barrier in its entirety whenever
I deal with this stuff.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Replace known_assigned_xids_lck by memory barrier

From
Michail Nikolaev
Date:
Hello, Nathan.

> What sort of benefits do you see from this patch? It might be worthwhile
> in itself to remove spinlocks when possible, but IME it's much easier to
> justify such changes when there is a tangible benefit we can point to.

Oh, it is not an easy question :)

The answer, probably, looks like this:
1) performance benefits of spin lock acquire removing in
KnownAssignedXidsGetOldestXmin and KnownAssignedXidsSearch
2) it is closing 13-year-old tech depth

But in reality, it is not easy to measure performance improvement
consistently for this change.

> Are the assignments in question guaranteed to be atomic? IIUC we assume
> that aligned 4-byte loads/stores are atomic, so we should be okay as long
> as we aren't handling anything larger.

Yes, 4-bytes assignment are atomic, locking is used to ensure memory
write ordering in this place.

> This use of pg_write_barrier() looks correct to me, but don't we need
> corresponding read barriers wherever we obtain the pointers? FWIW I tend
> to review src/backend/storage/lmgr/README.barrier in its entirety whenever
> I deal with this stuff.

Oh, yeah, you're right! (1)
I'll prepare an updated version of the patch soon. I don't why I was
assuming pg_write_barrier is enough (⊙_⊙')


[1]: https://github.com/postgres/postgres/blob/master/src/backend/storage/lmgr/README.barrier#L125



Re: Replace known_assigned_xids_lck by memory barrier

From
Nathan Bossart
Date:
On Tue, Aug 15, 2023 at 12:29:24PM +0200, Michail Nikolaev wrote:
>> What sort of benefits do you see from this patch? It might be worthwhile
>> in itself to remove spinlocks when possible, but IME it's much easier to
>> justify such changes when there is a tangible benefit we can point to.
> 
> Oh, it is not an easy question :)
> 
> The answer, probably, looks like this:
> 1) performance benefits of spin lock acquire removing in
> KnownAssignedXidsGetOldestXmin and KnownAssignedXidsSearch
> 2) it is closing 13-year-old tech depth
> 
> But in reality, it is not easy to measure performance improvement
> consistently for this change.

Okay.  Elsewhere, it seems like folks are fine with patches that reduce
shared memory space via atomics or barriers even if there's no immediate
benefit [0], so I think it's fine to proceed.

>> Are the assignments in question guaranteed to be atomic? IIUC we assume
>> that aligned 4-byte loads/stores are atomic, so we should be okay as long
>> as we aren't handling anything larger.
> 
> Yes, 4-bytes assignment are atomic, locking is used to ensure memory
> write ordering in this place.

Yeah, it looks like both the values that are protected by
known_assigned_xids_lck are integers, so this should be okay.  One
remaining question I have is whether it is okay if we see an updated value
for one of the head/tail variables but not the other.  It looks like the
tail variable is only updated with ProcArrayLock held exclusively, which
IIUC wouldn't prevent such mismatches even today, since we use a separate
spinlock for reading them in some cases.

[0] https://postgr.es/m/20230524214958.mt6f5xokpumvnrio%40awork3.anarazel.de

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Replace known_assigned_xids_lck by memory barrier

From
Michail Nikolaev
Date:
Hello!

Updated version (with read barriers is attached).

> One remaining question I have is whether it is okay if we see an updated value
> for one of the head/tail variables but not the other.  It looks like the
> tail variable is only updated with ProcArrayLock held exclusively, which
> IIUC wouldn't prevent such mismatches even today, since we use a separate
> spinlock for reading them in some cases.

1) "The convention is that backends must hold shared ProcArrayLock to
examine the array", it is applied to pointers as well
2) Also, "only the startup process modifies the head/tail pointers."

So, the "tail" variable is updated by the startup process with
ProcArrayLock held in exclusive-only mode - so, no issues here.

Regarding "head" variable -  updates by the startup processes are
possible in next cases:
* ProcArrayLock in exclusive mode (like KnownAssignedXidsCompress or
KnownAssignedXidsSearch(remove=true)), no issues here
* ProcArrayLock not taken at all (like
KnownAssignedXidsAdd(exclusive_lock=false)) in such case we rely on
memory barrier machinery

Both head and tail variables are changed only with exclusive lock held.

I'll think more, but can't find something wrong here so far.

Attachment

Re: Replace known_assigned_xids_lck by memory barrier

From
Nathan Bossart
Date:
On Wed, Aug 16, 2023 at 05:30:59PM +0200, Michail Nikolaev wrote:
> Updated version (with read barriers is attached).

Thanks for the updated patch.  I've attached v4 in which I've made a number
of cosmetic edits.

> I'll think more, but can't find something wrong here so far.

IIUC this memory barrier stuff is only applicable to KnownAssignedXidsAdd()
(without an exclusive lock) when we add entries to the end of the array and
then update the head pointer.  Otherwise, appropriate locks are taken when
reading/writing the array.  For example, say we have the following array:

              head
               |
               v
    [ 0, 1, 2, 3 ]

When adding elements, we keep the head pointer where it is:

              head
               |
               v
    [ 0, 1, 2, 3, 4, 5 ]

If another processor sees this intermediate state, it's okay because it
will only inspect elements 0 through 3.  Only at the end do we update the
head pointer:

                    head
                     |
                     v
    [ 0, 1, 2, 3, 4, 5 ]

With weak memory ordering and no barriers, another process may see this
(which is obviously no good):

                    head
                     |
                     v
    [ 0, 1, 2, 3 ]

One thing that I'm still trying to understand is this code in
KnownAssignedXidsSearch():

        /* we hold ProcArrayLock exclusively, so no need for spinlock */
        tail = pArray->tailKnownAssignedXids;
        head = pArray->headKnownAssignedXids;

It's not clear to me why holding ProcArrayLock exclusively means we don't
need to worry about the spinlock/barriers.  If KnownAssignedXidsAdd() adds
entries without a lock, holding ProcArrayLock won't protect you, and I
don't see anything else that acts as a read barrier before the array
entries are inspected.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Replace known_assigned_xids_lck by memory barrier

From
Michail Nikolaev
Date:
Hello, good question!

Thanks for your edits.

As answer: probably we need to change
"If we know that we're holding ProcArrayLock exclusively, we don't
need the read barrier."
to
"If we're removing xid, we don't need the read barrier because only
the startup process can remove and add xids to KnownAssignedXids"

Best regards,
Mikhail.



Re: Replace known_assigned_xids_lck by memory barrier

From
Nathan Bossart
Date:
On Wed, Aug 16, 2023 at 09:29:10PM +0200, Michail Nikolaev wrote:
> As answer: probably we need to change
> "If we know that we're holding ProcArrayLock exclusively, we don't
> need the read barrier."
> to
> "If we're removing xid, we don't need the read barrier because only
> the startup process can remove and add xids to KnownAssignedXids"

Ah, that explains it.  v5 of the patch is attached.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Replace known_assigned_xids_lck by memory barrier

From
Nathan Bossart
Date:
On Wed, Aug 16, 2023 at 01:07:15PM -0700, Nathan Bossart wrote:
> Ah, that explains it.  v5 of the patch is attached.

Barring additional feedback, I plan to commit this patch in the current
commitfest.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Replace known_assigned_xids_lck by memory barrier

From
Robert Haas
Date:
On Fri, Sep 1, 2023 at 3:41 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> On Wed, Aug 16, 2023 at 01:07:15PM -0700, Nathan Bossart wrote:
> > Ah, that explains it.  v5 of the patch is attached.
>
> Barring additional feedback, I plan to commit this patch in the current
> commitfest.

I'm not an expert on this code but I looked at this patch briefly and
it seems OK to me.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Replace known_assigned_xids_lck by memory barrier

From
Nathan Bossart
Date:
On Fri, Sep 01, 2023 at 03:53:54PM -0400, Robert Haas wrote:
> I'm not an expert on this code but I looked at this patch briefly and
> it seems OK to me.

Thanks for taking a look.  Committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



Re: Replace known_assigned_xids_lck by memory barrier

From
Michail Nikolaev
Date:
Thanks everyone for help!