Thread: Bug with GIN index over JSONB data: "ERROR: buffer 10112 is not owned by resource owner"

I was looking into a possible scalability problem with GIN indexes under concurrent insert, but instead I found an uncharacterized bug. One of the processes will occasionally throw an error  "ERROR:  buffer 10112 is not owned by resource owner Portal" where the buffer number changes from run to run.

I've verified this with both 14.9 and 16.1, on ubuntu 22.04.  I use an AWS m5.4xlarge machine, and haven't tried to verify it on anything else.  I don't currently have any real hardware with enough CPUs to do a meaningful test.

I've attached the "user data" file I feed to AWS to run the test, this one is for v14.9.  The v16.1 is similar except I compile PostgreSQL myself (without JIT) rather than getting it from apt.  I standup an ubuntu 22.04 m5.4xlarge machine with all the defaults, except changing the storage from 8GB to 80GB, and fed it the attached user data cloud init file.

If you don't want to parse the meat out of the file, the core of the test is to run this command with some escalating level of concurrency in a loop.  Each call just inserts one JSONB object with highly redundant keys (the same 10 keys present in every row) but a more distinctive value for each key.

insert into j (j) select jsonb_object_agg(x::text, left(md5(random()::text),5)) from generate_series(1,10) f(x);

I've never seen the error occur until the concurrency reaches at least 4, but sample size is too low for that to be definitive.

Unless someone has some better idea, my next step will be to switch the column from jsonb to text[] and see if it exists there as well.

I assume the synchronous_commit=off is needed because without it you couldn't accumulate enough trials to spot the bug, even though it would exist in that setting.  I guess I could run the test on a machine with very fast SSD and leave synchronous_commit=on, but I'm not looking forward to the cost of renting a machine that can do that or figuring out how to configure it.  I also haven't tried it with fastupdate on. I assume the test would not work because the pending list would grow without bound at high concurrencies (it would grow faster than a single-threaded cleaner could clean it) and so not seeing the bug would not mean it wasn't present.

The test loops the insert for one minute, at each concurrency from 1 to 10, then starts over at -c 1 again.  It seems like if you don't see the bug within the first 20 minutes (the first two 1-to-10 concurrency cycles) you are unlikely to see it at all.  But that is more a hunch than a formal analysis.

Cheers,

Jeff
Attachment
On Fri, Nov 10, 2023 at 9:44 AM Jeff Janes <jeff.janes@gmail.com> wrote:
>
> I was looking into a possible scalability problem with GIN indexes under concurrent insert, but instead I found an
uncharacterizedbug. One of the processes will occasionally throw an error  "ERROR:  buffer 10112 is not owned by
resourceowner Portal" where the buffer number changes from run to run. 
>
> I've verified this with both 14.9 and 16.1, on ubuntu 22.04.  I use an AWS m5.4xlarge machine, and haven't tried to
verifyit on anything else.  I don't currently have any real hardware with enough CPUs to do a meaningful test. 
>
> I've attached the "user data" file I feed to AWS to run the test, this one is for v14.9.  The v16.1 is similar except
Icompile PostgreSQL myself (without JIT) rather than getting it from apt.  I standup an ubuntu 22.04 m5.4xlarge machine
withall the defaults, except changing the storage from 8GB to 80GB, and fed it the attached user data cloud init file. 
>
> If you don't want to parse the meat out of the file, the core of the test is to run this command with some escalating
levelof concurrency in a loop.  Each call just inserts one JSONB object with highly redundant keys (the same 10 keys
presentin every row) but a more distinctive value for each key. 
>
> insert into j (j) select jsonb_object_agg(x::text, left(md5(random()::text),5)) from generate_series(1,10) f(x);
>
> I've never seen the error occur until the concurrency reaches at least 4, but sample size is too low for that to be
definitive.
>
> Unless someone has some better idea, my next step will be to switch the column from jsonb to text[] and see if it
existsthere as well. 
>
> I assume the synchronous_commit=off is needed because without it you couldn't accumulate enough trials to spot the
bug,even though it would exist in that setting.  I guess I could run the test on a machine with very fast SSD and leave
synchronous_commit=on,but I'm not looking forward to the cost of renting a machine that can do that or figuring out how
toconfigure it.  I also haven't tried it with fastupdate on. I assume the test would not work because the pending list
wouldgrow without bound at high concurrencies (it would grow faster than a single-threaded cleaner could clean it) and
sonot seeing the bug would not mean it wasn't present. 
>
> The test loops the insert for one minute, at each concurrency from 1 to 10, then starts over at -c 1 again.  It seems
likeif you don't see the bug within the first 20 minutes (the first two 1-to-10 concurrency cycles) you are unlikely to
seeit at all.  But that is more a hunch than a formal analysis. 
>
> Cheers,
>
> Jeff

I can reproduce this by checking to e9f075f9a15593fe31c610e15cfc71a5fa281ede,
but master seems ok since Heikki has some ResourceOwner related patch
committed after that.

--
Regards
Junwang Zhao



Jeff Janes <jeff.janes@gmail.com> writes:
> I was looking into a possible scalability problem with GIN indexes under
> concurrent insert, but instead I found an uncharacterized bug. One of the
> processes will occasionally throw an error  "ERROR:  buffer 10112 is not
> owned by resource owner Portal" where the buffer number changes from run to
> run.

I am able to reproduce this in HEAD (8bfb231b4) on a not-that-big
machine (M2 Mac Mini):

pgbench: error: client 1 script 0 aborted in command 0 query 0: ERROR:  buffer pin 0x1cab is not owned by resource
ownerPortal 
pgbench: error: Run was aborted; the above results are incomplete.

FTR, I used the attached not-so-ubuntu-specific script, and a
very vanilla assert-enabled build.

However, it seems like there might be more than one bug.  My first
attempt died like this after a few minutes:

TRAP: failed Assert("ref != NULL"), File: "bufmgr.c", Line: 2447, PID: 7177
0   postgres                            0x0000000102caab74 ExceptionalCondition + 108
1   postgres                            0x0000000102b46490 UnpinBuffer + 528
2   postgres                            0x000000010282bae4 ginFinishSplit + 1700
3   postgres                            0x00000001028368ac ginEntryInsert + 904
4   postgres                            0x0000000102837194 gininsert + 472
5   postgres                            0x00000001029bf77c ExecInsertIndexTuples + 552
6   postgres                            0x00000001029ed0d4 ExecInsert + 1948
7   postgres                            0x00000001029ebbfc ExecModifyTable + 1252
8   postgres                            0x00000001029c0dc4 standard_ExecutorRun + 268
9   postgres                            0x0000000102b86450 ProcessQuery + 160
10  postgres                            0x0000000102b85afc PortalRunMulti + 396
11  postgres                            0x0000000102b85478 PortalRun + 476
12  postgres                            0x0000000102b8453c exec_simple_query + 1312
13  postgres                            0x0000000102b82250 PostgresMain + 3452

That was while running a build of commit 9ba9c7074 from 25 October.
After updating to current HEAD (8bfb231b4), I've not yet reproduced it.
But I suspect that's just bad luck, because this doesn't look like
anything we fixed in the last couple weeks.

            regards, tom lane

#!/bin/bash

dropdb test
createdb test

export PGDATABASE=test

psql -c '
create table j (id bigserial, j jsonb);
create index on j using gin (j);
alter index j_j_idx set (fastupdate = off);
'
psql -c '
alter system set synchronous_commit to off;
'
pg_ctl reload

for g in `seq 1 100`; do
  for f in `seq 1 10`; do
    echo "trying with $f clients"
    psql -c 'truncate j'
    pgbench -n -f <(echo 'insert into j (j) select jsonb_object_agg(x::text, left(md5(random()::text),5)) from
generate_series(1,10)f(x);') -T 60 -c $f -j $f | grep tps 
  done
done

I wrote:
> Jeff Janes <jeff.janes@gmail.com> writes:
>> I was looking into a possible scalability problem with GIN indexes under
>> concurrent insert, but instead I found an uncharacterized bug. One of the
>> processes will occasionally throw an error  "ERROR:  buffer 10112 is not
>> owned by resource owner Portal" where the buffer number changes from run to
>> run.

> I am able to reproduce this in HEAD (8bfb231b4) on a not-that-big
> machine (M2 Mac Mini):

I have tracked down the cause of this: there is one code path in
ginFindParents() that doesn't take the function's own advice to
not release pin on the index's root page.  The attached seems to
fix it.  AFAICS the bug goes clear back to the origin of GIN.

> However, it seems like there might be more than one bug.  My first
> attempt died like this after a few minutes:
> TRAP: failed Assert("ref != NULL"), File: "bufmgr.c", Line: 2447, PID: 7177
> ...
> That was while running a build of commit 9ba9c7074 from 25 October.
> After updating to current HEAD (8bfb231b4), I've not yet reproduced it.

Nothing to see there: that is the same bug.  The change in behavior is
accounted for by the intervening commit b8bff07da, which re-ordered
operations in bufmgr.c so that the lack of a resource manager entry
would be noticed before hitting this Assert.

I find it quite scary that, per the code coverage report,
ginFindParents() isn't reached at all during our regression tests.
And there are several other areas in ginbtree.c that aren't reached
either.  Even while running Jeff's test case, I can find no evidence
that the freestack == false path in ginFinishSplit() is ever reached.
There could be a pile of resource-mismanagement bugs similar to this
one in there, and we'd never know it.

Reaching this particular error requires a concurrent split of the
index's root page, so it's surely not that easy to trigger.  I wonder
if there's anything we could do to make such cases more testable.

            regards, tom lane

diff --git a/src/backend/access/gin/ginbtree.c b/src/backend/access/gin/ginbtree.c
index fc694b40f1..2dd3853e70 100644
--- a/src/backend/access/gin/ginbtree.c
+++ b/src/backend/access/gin/ginbtree.c
@@ -272,7 +272,11 @@ ginFindParents(GinBtree btree, GinBtreeStack *stack)
             blkno = GinPageGetOpaque(page)->rightlink;
             if (blkno == InvalidBlockNumber)
             {
-                UnlockReleaseBuffer(buffer);
+                /* Do not release pin on the root buffer */
+                if (buffer != root->buffer)
+                    UnlockReleaseBuffer(buffer);
+                else
+                    LockBuffer(buffer, GIN_UNLOCK);
                 break;
             }
             buffer = ginStepRight(buffer, btree->index, GIN_EXCLUSIVE);