Bug with GIN index over JSONB data: "ERROR: buffer 10112 is not owned by resource owner" - Mailing list pgsql-bugs

From Jeff Janes
Subject Bug with GIN index over JSONB data: "ERROR: buffer 10112 is not owned by resource owner"
Date
Msg-id CAMkU=1yCAKtv86dMrD__Ja-7KzjE=uMeKX8y__cx5W-OEWy2ow@mail.gmail.com
Whole thread Raw
Responses Re: Bug with GIN index over JSONB data: "ERROR: buffer 10112 is not owned by resource owner"  (Junwang Zhao <zhjwpku@gmail.com>)
Re: Bug with GIN index over JSONB data: "ERROR: buffer 10112 is not owned by resource owner"  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
I was looking into a possible scalability problem with GIN indexes under concurrent insert, but instead I found an uncharacterized bug. One of the processes will occasionally throw an error  "ERROR:  buffer 10112 is not owned by resource owner Portal" where the buffer number changes from run to run.

I've verified this with both 14.9 and 16.1, on ubuntu 22.04.  I use an AWS m5.4xlarge machine, and haven't tried to verify it on anything else.  I don't currently have any real hardware with enough CPUs to do a meaningful test.

I've attached the "user data" file I feed to AWS to run the test, this one is for v14.9.  The v16.1 is similar except I compile PostgreSQL myself (without JIT) rather than getting it from apt.  I standup an ubuntu 22.04 m5.4xlarge machine with all the defaults, except changing the storage from 8GB to 80GB, and fed it the attached user data cloud init file.

If you don't want to parse the meat out of the file, the core of the test is to run this command with some escalating level of concurrency in a loop.  Each call just inserts one JSONB object with highly redundant keys (the same 10 keys present in every row) but a more distinctive value for each key.

insert into j (j) select jsonb_object_agg(x::text, left(md5(random()::text),5)) from generate_series(1,10) f(x);

I've never seen the error occur until the concurrency reaches at least 4, but sample size is too low for that to be definitive.

Unless someone has some better idea, my next step will be to switch the column from jsonb to text[] and see if it exists there as well.

I assume the synchronous_commit=off is needed because without it you couldn't accumulate enough trials to spot the bug, even though it would exist in that setting.  I guess I could run the test on a machine with very fast SSD and leave synchronous_commit=on, but I'm not looking forward to the cost of renting a machine that can do that or figuring out how to configure it.  I also haven't tried it with fastupdate on. I assume the test would not work because the pending list would grow without bound at high concurrencies (it would grow faster than a single-threaded cleaner could clean it) and so not seeing the bug would not mean it wasn't present.

The test loops the insert for one minute, at each concurrency from 1 to 10, then starts over at -c 1 again.  It seems like if you don't see the bug within the first 20 minutes (the first two 1-to-10 concurrency cycles) you are unlikely to see it at all.  But that is more a hunch than a formal analysis.

Cheers,

Jeff
Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #18184: ERROR: wrong varnullingrels (b) (expected (b 3)) for Var 2/2
Next
From: Junwang Zhao
Date:
Subject: Re: Bug with GIN index over JSONB data: "ERROR: buffer 10112 is not owned by resource owner"