Thread: pg14b2: FailedAssertion("_bt_posting_valid(nposting)", File: "nbtdedup.c", ...

This is crashing repeatedly during insert/update immediately after upgrading an
instance to v14, from v13.3.  In case it matters, the cluster was originally
initdb at 13.2.

TRAP: FailedAssertion("_bt_posting_valid(nposting)", File: "nbtdedup.c", Line: 1062, PID: 28580)
postgres: telsasoft ts 127.0.0.1(52250) INSERT(ExceptionalCondition+0x8d)[0x967d1d]
postgres: telsasoft ts 127.0.0.1(52250) INSERT(_bt_swap_posting+0x2cd)[0x507cdd]
postgres: telsasoft ts 127.0.0.1(52250) INSERT[0x509a14]
postgres: telsasoft ts 127.0.0.1(52250) INSERT(_bt_doinsert+0xcb7)[0x50d0b7]
postgres: telsasoft ts 127.0.0.1(52250) INSERT(btinsert+0x52)[0x5130f2]
postgres: telsasoft ts 127.0.0.1(52250) INSERT(ExecInsertIndexTuples+0x231)[0x687b81]
postgres: telsasoft ts 127.0.0.1(52250) INSERT[0x6b8718]
postgres: telsasoft ts 127.0.0.1(52250) INSERT[0x6b9297]
postgres: telsasoft ts 127.0.0.1(52250) INSERT(standard_ExecutorRun+0x142)[0x688b32]
postgres: telsasoft ts 127.0.0.1(52250) INSERT[0x82da8a]
postgres: telsasoft ts 127.0.0.1(52250) INSERT[0x82e673]
postgres: telsasoft ts 127.0.0.1(52250) INSERT[0x82e936]
postgres: telsasoft ts 127.0.0.1(52250) INSERT(PortalRun+0x2eb)[0x82ec8b]
postgres: telsasoft ts 127.0.0.1(52250) INSERT(PostgresMain+0x1f97)[0x82c777]
postgres: telsasoft ts 127.0.0.1(52250) INSERT[0x48f71a]
postgres: telsasoft ts 127.0.0.1(52250) INSERT(PostmasterMain+0x1138)[0x794c98]
postgres: telsasoft ts 127.0.0.1(52250) INSERT(main+0x6f2)[0x491292]

< 2021-06-27 23:46:43.257 CAT  >DETAIL:  Failed process was running: INSERT INTO alarms(...

#3  0x0000000000507cdd in _bt_swap_posting (newitem=newitem@entry=0x2011c00, oposting=oposting@entry=0x7f6de58e2a78,
postingoff=postingoff@entry=62)at nbtdedup.c:1062
 
        nhtids = <optimized out>
        replacepos = 0x2011dac ""
        nposting = 0x2011c28
        __func__ = "_bt_swap_posting"
#4  0x0000000000509a14 in _bt_insertonpg (rel=rel@entry=0x7f6dfd3cd628, itup_key=itup_key@entry=0x2011b40, buf=15,
cbuf=cbuf@entry=0,stack=stack@entry=0x2011bd8, itup=0x2011c00, itup@entry=0x200d608, itemsz=16, 
 
    newitemoff=2, postingoff=62, split_only_page=split_only_page@entry=false) at nbtinsert.c:1174
        itemid = 0x7f6de58e0e1c
        page = 0x7f6de58e0e00 "\200"
        opaque = 0x7f6de58e2df0
        isleaf = true
        isroot = false
        isrightmost = false
        isonly = false
        oposting = 0x7f6de58e2a78
        origitup = <optimized out>
        nposting = 0x0
        __func__ = "_bt_insertonpg"
#5  0x000000000050d0b7 in _bt_doinsert (rel=rel@entry=0x7f6dfd3cd628, itup=itup@entry=0x200d608,
checkUnique=checkUnique@entry=UNIQUE_CHECK_NO,indexUnchanged=indexUnchanged@entry=false, 
 
    heapRel=heapRel@entry=0x7f6dfd48ba80) at nbtinsert.c:257
        newitemoff = 0
        is_unique = false
        insertstate = {itup = 0x200d608, itemsz = 16, itup_key = 0x2011b40, buf = 15, bounds_valid = true, low = 2,
stricthigh= 3, postingoff = 62}
 
        itup_key = <optimized out>
        checkingunique = <optimized out>
#6  0x00000000005130f2 in btinsert (rel=0x7f6dfd3cd628, values=<optimized out>, isnull=<optimized out>,
ht_ctid=0x212e250,heapRel=0x7f6dfd48ba80, checkUnique=UNIQUE_CHECK_NO, indexUnchanged=false, 
 
    indexInfo=0x200d2e8) at nbtree.c:199
        result = <optimized out>
        itup = 0x200d608
#7  0x0000000000687b81 in ExecInsertIndexTuples (resultRelInfo=resultRelInfo@entry=0x200cd90,
slot=slot@entry=0x212e220,estate=estate@entry=0x212c6b0, update=update@entry=false, noDupErr=noDupErr@entry=false, 
 
    specConflict=specConflict@entry=0x0, arbiterIndexes=arbiterIndexes@entry=0x0) at execIndexing.c:415


(gdb) p *newitem
$2 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 22}, ip_posid = 4}, t_info = 32784}
(gdb) p *oposting
$3 = {t_tid = {ip_blkid = {bi_hi = 0, bi_lo = 16}, ip_posid = 8333}, t_info = 41824}

I will save a copy of the data dir and see if reindexing helps.
Let me know if there's anything else I can provide.

-- 
Justin



Can you please amcheck all of the indexes? 

Peter Geoghegan
(Sent from my phone)
Would also be good to get a raw binary copy of the page image in question. Hopefully the data isn't confidential. Same gdb procedure as before. 

Thanks 

Peter Geoghegan
(Sent from my phone)
Did you also change the kernel on upgrade? I recall that that was a factor on the other recent bug thread. 

Peter Geoghegan
(Sent from my phone)
On Sun, Jun 27, 2021 at 03:08:13PM -0700, Peter Geoghegan wrote:
> Can you please amcheck all of the indexes?

ts=# SELECT bt_index_check('child.alarms_null_alarm_clear_time_idx'::regclass);
ERROR:  item order invariant violated for index "alarms_null_alarm_clear_time_idx"
DETAIL:  Lower index tid=(1,77) (points to heap tid=(29,9)) higher index tid=(1,78) (points to heap tid=(29,9)) page
lsn=80/4B9C69D0.

ts=# SELECT itemoffset, ctid ,itemlen, nulls, vars, dead, htid FROM
bt_page_items('child.alarms_null_alarm_clear_time_idx',1);
 
 itemoffset |   ctid    | itemlen | nulls | vars | dead |  htid   
------------+-----------+---------+-------+------+------+---------
...
         77 | (29,9)    |      16 | t     | f    | f    | (29,9)
         78 | (29,9)    |      16 | t     | f    | f    | (29,9)

ts=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits,
t_oidFROM heap_page_items(get_raw_page('child.alarms_null', 29));
 
 lp | lp_off | lp_flags | lp_len | t_xmin |  t_xmax  | t_field3 | t_ctid  | t_infomask2 | t_infomask | t_hoff |
        t_bits                  | t_oid 
 

----+--------+----------+--------+--------+----------+----------+---------+-------------+------------+--------+------------------------------------------+-------
  1 |   6680 |        1 |   1512 |  88669 | 27455486 |       44 | (29,1)  |        8225 |      10691 |     32 |
1100001111111111111111101111111110000000|      
 
  2 |      6 |        2 |      0 |        |          |          |         |             |            |        |
                                |      
 
  3 |   5168 |        1 |   1512 |  87374 | 27455479 |       37 | (29,3)  |        8225 |      10691 |     32 |
1100001111111111111111101111111110000000|      
 
  4 |   4192 |        1 |    976 | 148104 | 27574887 |        0 | (29,4)  |        8225 |      10695 |     32 |
1100001111111111111111101111111110000000|      
 
  5 |     10 |        2 |      0 |        |          |          |         |             |            |        |
                                |      
 
  6 |   3216 |        1 |    976 | 148137 | 27574888 |        0 | (29,6)  |       40993 |      10695 |     32 |
1100001111111111111111101111111110000000|      
 
  7 |      8 |        2 |      0 |        |          |          |         |             |            |        |
                                |      
 
  8 |   2240 |        1 |    976 |  47388 | 27574858 |        7 | (29,8)  |       40993 |      10695 |     32 |
1100001111111111111111101111111110000000|      
 
  9 |      0 |        3 |      0 |        |          |          |         |             |            |        |
                                |      
 
 10 |   1264 |        1 |    976 | 148935 | 27574889 |        0 | (29,10) |       40993 |      10695 |     32 |
1100001111111111111111101111111110000000|      
 
 11 |      0 |        3 |      0 |        |          |          |         |             |            |        |
                                |      
 
 12 |      0 |        3 |      0 |        |          |          |         |             |            |        |
                                |      
 
(12 rows)

(gdb) fr 4
#4  0x0000000000509a14 in _bt_insertonpg (rel=rel@entry=0x7f6dfd3cd628, itup_key=itup_key@entry=0x2011b40, buf=15,
cbuf=cbuf@entry=0,stack=stack@entry=0x2011bd8, itup=0x2011c00, itup@entry=0x200d608, itemsz=16, 
 
    newitemoff=2, postingoff=62, split_only_page=split_only_page@entry=false) at nbtinsert.c:1174
1174    in nbtinsert.c
(gdb) p page
$5 = 0x7f6de58e0e00 "\200"
(gdb) dump binary memory /tmp/dump_block.page page (page + 8192)

ts=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits,
t_oidFROM heap_page_items(pg_read_binary_file('/tmp/dump_block.page')) WHERE t_xmin IS NOT NULL;
 
 lp  | lp_off | lp_flags | lp_len | t_xmin  |   t_xmax   | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff |
t_bits| t_oid 
 

-----+--------+----------+--------+---------+------------+----------+--------+-------------+------------+--------+--------+-------
   1 |   8152 |        1 |     24 | 1048576 | 2685931521 |        0 | (0,0)  |           0 |        120 |      1 |
 |      
 
   2 |   7288 |        1 |    864 | 1048576 | 2740985997 |        0 | (0,0)  |           0 |          1 |      0 |
 |      
 
  67 |   6368 |        1 |    920 | 1048576 | 2744656022 |        0 | (0,0)  |          33 |          4 |      0 |
 |      
 
 137 |   5056 |        1 |   1312 | 1048576 | 2770346200 |        0 | (0,0)  |          69 |          6 |      0 |
 |      
 
 142 |   4608 |        1 |    448 | 1048576 | 2713722952 |        0 | (0,0)  |         107 |          4 |      0 |
 |      
 
(5 rows)

I didn't change the kernel here, nor on the previous bug report - it was going
to be my "next step", until I found the stuck autovacuum, and I mentioned it
for context, but probably just confused things.

-- 
Justin



I've just realized that this VM has a strange storage configuration.

It's using LVM thin pools, which I don't use anywhere else.
Someone else set this up, and I think I've literally never used pools before.
Some time ago, the pool ran out of space, and I ran LVM repair on it.
It seems very possible that's the issue.
A latent problem might've been tickled by pg_upgrade --link.

That said, the relevant table is the active "alarms" table, and it would've
gotten plenty of DML with no issue for months running v13.

Feel free to dismiss this report if it seems dubious.

-- 
Justin



On Sun, Jun 27, 2021 at 6:47 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> I've just realized that this VM has a strange storage configuration.
>
> It's using LVM thin pools, which I don't use anywhere else.
> Someone else set this up, and I think I've literally never used pools before.
> Some time ago, the pool ran out of space, and I ran LVM repair on it.
> It seems very possible that's the issue.
> A latent problem might've been tickled by pg_upgrade --link.

Anything is possible, of course, but even if this is a bug in Postgres
it isn't particularly likely to be a bug in the nbtree code. We see
clear signs of general corruption here, which is apparently not
limited to the one page that you supplied to me privately -- since
AFAICT that's not the page that amcheck throws the error on.

> That said, the relevant table is the active "alarms" table, and it would've
> gotten plenty of DML with no issue for months running v13.

It might not have been visibly broken without assertions enabled,
though. I sprinkled nbtdedup.c with these _bt_posting_valid()
assertions just because it was easy. The assertions were bound to
catch some problem sooner or later, and had acceptable overhead.

-- 
Peter Geoghegan



On Sun, Jun 27, 2021 at 11:08 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > That said, the relevant table is the active "alarms" table, and it would've
> > gotten plenty of DML with no issue for months running v13.
>
> It might not have been visibly broken without assertions enabled,
> though. I sprinkled nbtdedup.c with these _bt_posting_valid()
> assertions just because it was easy. The assertions were bound to
> catch some problem sooner or later, and had acceptable overhead.

Obviously nothing stops you from running amcheck on the original
database that you're running in production. You won't need to have
enabled assertions to catch the same problem that way. This seems like
the best way to isolate the problem. I strongly suspect that it's the
LVM issue for my own reasons: nothing changed during the Postgres 14
cycle that seems truly related.

The index deletion stuff (commit d168b666823) might seem like an
obvious possible culprit, but I consider it unlikely. I added many
defensive assertions to that code too: _bt_bottomupdel_pass() also
uses exactly the same kind of _bt_posting_valid() assertions directly.
Plus _bt_delitems_delete_check() is highly defensive with assertions
when it processes a posting list tuple. If there was a problem with
any of that code it seems very likely that those assertions would have
failed first.

-- 
Peter Geoghegan



On Mon, Jun 28, 2021 at 01:42:25PM -0700, Peter Geoghegan wrote:
> On Sun, Jun 27, 2021 at 11:08 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > > That said, the relevant table is the active "alarms" table, and it would've
> > > gotten plenty of DML with no issue for months running v13.
> >
> > It might not have been visibly broken without assertions enabled,
> > though. I sprinkled nbtdedup.c with these _bt_posting_valid()
> > assertions just because it was easy. The assertions were bound to
> > catch some problem sooner or later, and had acceptable overhead.
> 
> Obviously nothing stops you from running amcheck on the original
> database that you're running in production. You won't need to have
> enabled assertions to catch the same problem that way. This seems like
> the best way to isolate the problem. I strongly suspect that it's the
> LVM issue for my own reasons: nothing changed during the Postgres 14
> cycle that seems truly related.

Sorry, but I didn't save the pre-upgrade cluster (just pg_dump).

For now, I moved the table out of the way and re-created it.
I could send you the whole relnode if you wanted to look more..
It seemed like almost any insert on the table caused it to crash.

BTW, on a copy of the v14 cluster, both vacuum and reindex also resolved the
issue (at least enough to avoid the crash).

-- 
Justin