Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum - Mailing list pgsql-bugs

From Dmitry Dolgov
Subject Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date
Msg-id 20211029145532.kfwqwlrdekunwoa2@localhost
Whole thread Raw
In response to BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (PG Bug reporting form <noreply@postgresql.org>)
Responses Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
List pgsql-bugs
> On Fri, Oct 29, 2021 at 07:00:01AM +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference:      17255
> Logged by:          Alexander Lakhin
> Email address:      exclusion@gmail.com
> PostgreSQL version: 14.0
> Operating system:   Ubuntu 20.04
> Description:
>
> with the following stack:
> Core was generated by `postgres: law regression [local] CREATE INDEX
>                         '.
> Program terminated with signal SIGABRT, Aborted.
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> 50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> (gdb) bt
> #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007f8a7f97a859 in __GI_abort () at abort.c:79
> #2  0x0000562dabb49700 in index_delete_sort_cmp (deltid2=<synthetic
> pointer>, deltid1=<optimized out>) at heapam.c:7582
> #3  index_delete_sort (delstate=0x7fff6f609f10, delstate=0x7fff6f609f10) at
> heapam.c:7623
> #4  heap_index_delete_tuples (rel=0x7f8a76523e08, delstate=0x7fff6f609f10)
> at heapam.c:7296
> #5  0x0000562dabc5519a in table_index_delete_tuples
> (delstate=0x7fff6f609f10, rel=0x562dac23d6c2)
>     at ../../../../src/include/access/tableam.h:1327
> #6  _bt_delitems_delete_check (rel=rel@entry=0x7f8a7652cc80,
> buf=buf@entry=191, heapRel=heapRel@entry=0x7f8a76523e08,
>     delstate=delstate@entry=0x7fff6f609f10) at nbtpage.c:1541
> #7  0x0000562dabc4dbe1 in _bt_simpledel_pass (maxoff=<optimized out>,
> minoff=<optimized out>, newitem=<optimized out>,
>     ndeletable=55, deletable=0x7fff6f609f30, heapRel=0x7f8a76523e08,
> buffer=191, rel=0x7f8a7652cc80)
>     at nbtinsert.c:2899
> ...
>
> Discovered while hunting to another bug related to autovacuum (unfortunately
> I still can't produce the reliable reproducing script for that).

Thanks for reporting (in fact I'm impressed how many issues you've
discovered, hopefully there are at least some t-shirts "I've found X
bugs in PostgreSQL" available as a reward) and putting efforts into the
reproducing steps. I believe I've managed to reproduce at least a
similar crash with the same trace.

In my case it crashed on pg_unreachable (which is an abort, when asserts
are enabled) inside index_delete_sort_cmp. It seems like item pointers
to compare both have the same block and offset number. In the view of
the recent discussions I was thinking it could be somehow related to the
issues with duplicated TIDs, but delstate->deltids doesn't in fact have
any duplicated entries -- so not sure about that, still investigating
the core dump.



pgsql-bugs by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Next
From: "David G. Johnston"
Date:
Subject: Re: BUG #17256: Running pgagent on a custom user