Thread: nbtree: assertion failure in _bt_killitems() for posting tuple
During tests, we catched an assertion failure in _bt_killitems() for posting tuple in unique index: /* kitem must have matching offnum when heap TIDs match */ Assert(kitem->indexOffset == offnum); https://github.com/postgres/postgres/blob/master/src/backend/access/nbtree/nbtutils.c#L1809 I struggle to understand the meaning of this assertion. Don't we allow the chance that posting tuple moved right on the page as the comment says? * We match items by heap TID before assuming they are the right ones to * delete. We cope with cases where items have moved right due to insertions. It seems that this is exactly the case for this failure. We expected to find tuple at offset 121, but instead it is at offset 125. (see dump details below). Unfortunately I cannot attach test and core dump, since they rely on the enterprise multimaster extension code. Here are some details from the core dump, that I find essential: Stack is _bt_killitems _bt_release_current_position _bt_release_scan_state btrescan index_rescan RelationFindReplTupleByIndex (gdb) p offnum $3 = 125 (gdb) p *item $4 = {ip_blkid = {bi_hi = 0, bi_lo = 2}, ip_posid = 200} (gdb) p *kitem $5 = {heapTid = {ip_blkid = {bi_hi = 0, bi_lo = 2}, ip_posid = 200}, indexOffset = 121, tupleOffset = 32639} Unless I miss something, this assertion must be removed. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Thu, Mar 19, 2020 at 9:34 AM Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote: > Unfortunately I cannot attach test and core dump, since they rely on the > enterprise multimaster extension code. > Here are some details from the core dump, that I find essential: > > Stack is > _bt_killitems > _bt_release_current_position > _bt_release_scan_state > btrescan > index_rescan > RelationFindReplTupleByIndex > > (gdb) p offnum > $3 = 125 > (gdb) p *item > $4 = {ip_blkid = {bi_hi = 0, bi_lo = 2}, ip_posid = 200} > (gdb) p *kitem > $5 = {heapTid = {ip_blkid = {bi_hi = 0, bi_lo = 2}, ip_posid = 200}, > indexOffset = 121, tupleOffset = 32639} > > > Unless I miss something, this assertion must be removed. Is this index an unlogged index, under the hood? -- Peter Geoghegan
On Thu, Mar 19, 2020 at 9:34 AM Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote: > During tests, we catched an assertion failure in _bt_killitems() for > posting tuple in unique index: > > /* kitem must have matching offnum when heap TIDs match */ > Assert(kitem->indexOffset == offnum); > > https://github.com/postgres/postgres/blob/master/src/backend/access/nbtree/nbtutils.c#L1809 > > I struggle to understand the meaning of this assertion. > Don't we allow the chance that posting tuple moved right on the page as > the comment says? I think you're right. However, it still seems like we should check that "kitem->indexOffset" is consistent among all of the BTScanPosItem entries that we have for each TID that we believe to be from the same posting list tuple. (Thinks some more...) Even if the offnum changes when the buffer lock is released, due to somebody inserting on to the same page, I guess that we still expect to observe all of the heap TIDs together in the posting list. Though maybe not. Maybe it's possible for a deduplication pass to occur when the buffer lock is dropped, in which case we should arguably behave in the same way when we see the same heap TIDs (i.e. delete the entire posting list without regard for whether or not the TIDs happened to appear in a posting list initially). I'm not sure, though. It will make no difference most of the time, since the kill_prior_tuple stuff is generally not applied when the page is changed at all -- the LSN is checked by the logic added by commit 2ed5b87f. That's why I asked about unlogged indexes (we don't do the LSN thing there). But I still think that we need to take a firm position on it. -- Peter Geoghegan
On 20.03.2020 03:34, Peter Geoghegan wrote: > On Thu, Mar 19, 2020 at 9:34 AM Anastasia Lubennikova > <a.lubennikova@postgrespro.ru> wrote: >> During tests, we catched an assertion failure in _bt_killitems() for >> posting tuple in unique index: >> >> /* kitem must have matching offnum when heap TIDs match */ >> Assert(kitem->indexOffset == offnum); >> >> https://github.com/postgres/postgres/blob/master/src/backend/access/nbtree/nbtutils.c#L1809 >> >> I struggle to understand the meaning of this assertion. >> Don't we allow the chance that posting tuple moved right on the page as >> the comment says? > I think you're right. However, it still seems like we should check > that "kitem->indexOffset" is consistent among all of the BTScanPosItem > entries that we have for each TID that we believe to be from the same > posting list tuple. What kind of consistency do you mean here? We can probably change this assertion to Assert(kitem->indexOffset <= offnum); Anything else? > (Thinks some more...) > > Even if the offnum changes when the buffer lock is released, due to > somebody inserting on to the same page, I guess that we still expect > to observe all of the heap TIDs together in the posting list. Though > maybe not. Maybe it's possible for a deduplication pass to occur when > the buffer lock is dropped, in which case we should arguably behave in > the same way when we see the same heap TIDs (i.e. delete the entire > posting list without regard for whether or not the TIDs happened to > appear in a posting list initially). I'm not sure, though. > > It will make no difference most of the time, since the > kill_prior_tuple stuff is generally not applied when the page is > changed at all -- the LSN is checked by the logic added by commit > 2ed5b87f. That's why I asked about unlogged indexes (we don't do the > LSN thing there). But I still think that we need to take a firm > position on it. > It was a logged index. Though the failed test setup includes logical replication. Does it handle LSNs differently? Finally, Alexander Lakhin managed to reproduce this on master. Test is attached as a patch. Speaking of unlogged indexes. Now the situation, where items moved left on the page is legal even if LSN haven't changed. Anyway, the cycle starts from the offset that we saved in a first pass: OffsetNumber offnum = kitem->indexOffset; while (offnum <= maxoff) ... It still works correctly, but probably microvacuum becomes less efficient, if items were concurrently deduplicated. I wonder if this case worth optimizing? -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Tue, Mar 24, 2020 at 1:00 AM Anastasia Lubennikova <a.lubennikova@postgrespro.ru> wrote: > > I think you're right. However, it still seems like we should check > > that "kitem->indexOffset" is consistent among all of the BTScanPosItem > > entries that we have for each TID that we believe to be from the same > > posting list tuple. The assertion failure happens in the logical replication worker because it uses a dirty snapshot, which cannot release the pin per commit 2ed5b87f. This means that the leaf page can change between the time that we observe an item is dead, and the time we reach _bt_killitems(), even though _bt_killitems() does get to kill items. I am thinking about pushing a fix along the lines of the attached patch. This preserves the assertion, while avoiding the check in cases where it doesn't apply, such as when a dirty snapshot is in use. -- Peter Geoghegan
Attachment
On Sun, Apr 5, 2020 at 5:15 PM Peter Geoghegan <pg@bowt.ie> wrote: > I am thinking about pushing a fix along the lines of the attached > patch. This preserves the assertion, while avoiding the check in cases > where it doesn't apply, such as when a dirty snapshot is in use. Pushed. Thanks. -- Peter Geoghegan