Re: BUG #17406: Segmentation fault on GiST index after 14.2 upgrade - Mailing list pgsql-bugs

From Tomas Vondra
Subject Re: BUG #17406: Segmentation fault on GiST index after 14.2 upgrade
Date
Msg-id 788a0a51-b1e3-3601-5783-3f6488796ec4@enterprisedb.com
Whole thread Raw
In response to Re: BUG #17406: Segmentation fault on GiST index after 14.2 upgrade  (Victor Yegorov <vyegorov@gmail.com>)
Responses Re: BUG #17406: Segmentation fault on GiST index after 14.2 upgrade
Re: BUG #17406: Segmentation fault on GiST index after 14.2 upgrade
Re: BUG #17406: Segmentation fault on GiST index after 14.2 upgrade
List pgsql-bugs

On 2/16/22 09:14, Victor Yegorov wrote:
> вт, 15 февр. 2022 г. в 22:51, Tomas Vondra 
> <tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>>:
> 
>     Hmm. So I guess there are three options:
> 
>     1) The index was already broken on 12.9, but for some reason (choice of
>     a different plan, ...) it was not causing any issues.
> 
> 
> Nope, index is actively used on 12.9, plan hasn't changed.
> 

OK, that's valuable information.

>     How large is the table/index? Are you able to run the query with a
>     custom build (without values optimized out)? Any chance you still
>     have a
>     backup from before the pg_upgrade, on which you might run the query?
> 
> 
> Yes, this is a test DB restored from backup in order to test out 14 
> upgrade, production is still running on 12.9.
> v3_region is 2832 kB
> region_ltree_path_idx_gist is 472 kB
> 

That means it should be possible to reproduce the issue elsewhere by 
copying the files (and schema). Is there any sensitive data that'd 
prevent handing over this data?

> 
> (gdb) frame 7
> #7  gistScanPage (scan=scan@entry=0x55bbbd4196c8, 
> pageItem=pageItem@entry=0x7ffdac2e3fe0, 
> myDistances=myDistances@entry=0x0, tbm=tbm@entry=0x0, 
> ntids=ntids@entry=0x0) at ./build/../src/backend/access/gist/gistget.c:438
> 438     ./build/../src/backend/access/gist/gistget.c: No such file or 
> directory.
> (gdb) p pageItem->blkno
> $1 = 0
> (gdb) p pageItem->data->heap
> $5 = {heapPtr = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 0}, 
> recheck = false, recheckDistances = false, recontup = 0x55bbbb96927e 
> <palloc+46>, offnum = 38600}
> 
> This query below also crashes with SegFault:
> SELECT * FROM gist_page_items(get_raw_page('region_ltree_path_idx_gist', 
> 0), 'region_ltree_path_idx_gist');
> 

Interesting! What's the backtrace from the crash?


FWIW when I try that query on the gist index from the ltree example in 
our documentation, I get this:

test=# SELECT * FROM gist_page_items(get_raw_page('path_gist_idx', 0), 
'path_gist_idx');
WARNING:  problem in alloc set ExprContext: detected write past chunk 
end in block 0x21b4ae0, chunk 0x21b5ff8
WARNING:  problem in alloc set ExprContext: detected write past chunk 
end in block 0x21b4ae0, chunk 0x21b6f20
WARNING:  problem in alloc set ExprContext: detected write past chunk 
end in block 0x21b4ae0, chunk 0x21b7e48
WARNING:  problem in alloc set ExprContext: detected write past chunk 
end in block 0x21b4ae0, chunk 0x21b8db0
WARNING:  problem in alloc set ExprContext: detected write past chunk 
end in block 0x21b4ae0, chunk 0x21b9d18
WARNING:  problem in alloc set ExprContext: detected write past chunk 
end in block 0x21b4ae0, chunk 0x21bac80
WARNING:  problem in alloc set ExprContext: detected write past chunk 
end in block 0x21b0ad0, chunk 0x21b1360
WARNING:  problem in alloc set ExprContext: detected write past chunk 
end in block 0x21b0ad0, chunk 0x21b2288
WARNING:  problem in alloc set ExprContext: detected write past chunk 
end in block 0x21b0ad0, chunk 0x21b31f0
WARNING:  problem in alloc set ExprContext: detected write past chunk 
end in block 0x21b0ad0, chunk 0x21b4158
WARNING:  problem in alloc set ExprContext: detected write past chunk 
end in block 0x21aaab0, chunk 0x21adb20
WARNING:  problem in alloc set ExprContext: detected write past chunk 
end in block 0x21aaab0, chunk 0x21ae5c8
  itemoffset |  ctid  | itemlen | dead |   keys
------------+--------+---------+------+-----------
           1 | (0,1)  |      32 | f    | (path)=()
           2 | (0,2)  |      48 | f    | (path)=()
           3 | (0,3)  |      64 | f    | (path)=()
           4 | (0,4)  |      80 | f    | (path)=()
           5 | (0,5)  |      80 | f    | (path)=()
           6 | (0,6)  |      48 | f    | (path)=()
           7 | (0,7)  |      72 | f    | (path)=()
           8 | (0,8)  |      48 | f    | (path)=()
           9 | (0,9)  |      64 | f    | (path)=()
          10 | (0,10) |      80 | f    | (path)=()
          11 | (0,11) |      88 | f    | (path)=()
          12 | (0,12) |      96 | f    | (path)=()
          13 | (0,13) |      96 | f    | (path)=()
(13 rows)

This is on debug build with asserts. On non-assert build it crashes in 
AllocSetAlloc, but I'd bet the exact place where it crashes just depends 
on what place we corrupt by writing out of ExprContext. The empty paths 
seem strange too, of course.

But the confusing thing is I get similar issues even on 12.10 (after 
backporting the pageinspect gist changes) - but only with asserts, and 
without asserts it works (but the paths are still empty). Perhaps I 
backported that incorrectly, though.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Report a potential memory leak in setup_config()
Next
From: Peter Geoghegan
Date:
Subject: Re: BUG #17406: Segmentation fault on GiST index after 14.2 upgrade