Thread: Concurrency bug in amcheck
Hi! I found concurrency bug in amcheck running on replica. When btree_xlog_unlink_page() replays changes to replica, deleted page is left with no items. But if amcheck steps on such deleted page palloc_btree_page() expects it would have items. (lldb_on_primary) b btbulkdelete primary=# drop table test; primary=# create table test as (select random() x from generate_series(1,1000000) i); primary=# create index test_x_idx on test(x); primary=# delete from test; primary=# vacuum test; (lldb_on_replica) b bt_check_level_from_leftmost replica=# select bt_index_check('test_x_idx'); # skip to internal level (lldb_on_replica) c (lldb_on_replica) b palloc_btree_page # skip to non-leftmost page (lldb_on_replica) c (lldb_on_replica) c # concurrently delete btree pages (lldb_on_primary) c # continue with pages (lldb_on_replica) c Finally replica gets error. ERROR: internal block 289 in index "test_x_idx" lacks high key and/or at least one downlink Proposed fix is attached. Spotted by Konstantin Knizhnik, reproduction case and fix from me. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Tue, Apr 21, 2020 at 12:54 PM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > I found concurrency bug in amcheck running on replica. When > btree_xlog_unlink_page() replays changes to replica, deleted page is > left with no items. But if amcheck steps on such deleted page > palloc_btree_page() expects it would have items. I forgot to mention that I've reproduced it on master. Quick check shows bug should exist since 11. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hi, Peter, Just thought you might want to see this one... On 2020-04-21 15:31:13 +0300, Alexander Korotkov wrote: > On Tue, Apr 21, 2020 at 12:54 PM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > I found concurrency bug in amcheck running on replica. When > > btree_xlog_unlink_page() replays changes to replica, deleted page is > > left with no items. But if amcheck steps on such deleted page > > palloc_btree_page() expects it would have items. > > I forgot to mention that I've reproduced it on master. Quick check > shows bug should exist since 11. - Andres
On Tue, Apr 21, 2020 at 2:54 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > Proposed fix is attached. Spotted by Konstantin Knizhnik, > reproduction case and fix from me. I wonder if we should fix btree_xlog_unlink_page() instead of amcheck. We already know that its failure to be totally consistent with the primary causes problems for backwards scans -- this problem can be fixed at the same time: https://postgr.es/m/CANtu0ohkR-evAWbpzJu54V8eCOtqjJyYp3PQ_SGoBTRGXWhWRw@mail.gmail.com We'd probably still use your patch for the backbranches if we went this way. What do you think? -- Peter Geoghegan
On Wed, Apr 22, 2020 at 7:47 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Tue, Apr 21, 2020 at 2:54 AM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > Proposed fix is attached. Spotted by Konstantin Knizhnik, > > reproduction case and fix from me. > > I wonder if we should fix btree_xlog_unlink_page() instead of amcheck. > We already know that its failure to be totally consistent with the > primary causes problems for backwards scans -- this problem can be > fixed at the same time: > > https://postgr.es/m/CANtu0ohkR-evAWbpzJu54V8eCOtqjJyYp3PQ_SGoBTRGXWhWRw@mail.gmail.com > > We'd probably still use your patch for the backbranches if we went this way. > > What do you think? I've skip through the thread. It seems to be quite independent issue from this one. This issue is related to the fact that we leave some items on deleted pages on primary, and on the same time we have no items on deleted pages on standby. This inconsistency cause amcheck pass normally on primary, but fail on standby. BackwardScan on standby issue seems to be related solely on locking protocol and btpo_prev, btpo_next pointers. It wasn't mention on that thread that we might need hikeys on deleted pages. Assuming it doesn't seem we actually need any items on deleted pages, I can propose to delete them on primary as well. That would make contents of primary and standby more consistent. What do you think? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Mon, Apr 27, 2020 at 11:51 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > On Wed, Apr 22, 2020 at 7:47 PM Peter Geoghegan <pg@bowt.ie> wrote: > > On Tue, Apr 21, 2020 at 2:54 AM Alexander Korotkov > > <a.korotkov@postgrespro.ru> wrote: > > > Proposed fix is attached. Spotted by Konstantin Knizhnik, > > > reproduction case and fix from me. > > > > I wonder if we should fix btree_xlog_unlink_page() instead of amcheck. > > We already know that its failure to be totally consistent with the > > primary causes problems for backwards scans -- this problem can be > > fixed at the same time: > > > > https://postgr.es/m/CANtu0ohkR-evAWbpzJu54V8eCOtqjJyYp3PQ_SGoBTRGXWhWRw@mail.gmail.com > > > > We'd probably still use your patch for the backbranches if we went this way. > > > > What do you think? > > I've skip through the thread. It seems to be quite independent issue > from this one. This issue is related to the fact that we leave some > items on deleted pages on primary, and on the same time we have no > items on deleted pages on standby. This inconsistency cause amcheck > pass normally on primary, but fail on standby. BackwardScan on > standby issue seems to be related solely on locking protocol and > btpo_prev, btpo_next pointers. It wasn't mention on that thread that > we might need hikeys on deleted pages. > > Assuming it doesn't seem we actually need any items on deleted pages, > I can propose to delete them on primary as well. That would make > contents of primary and standby more consistent. What do you think? So, my proposal is following. Backpatch my fix upthread to 11. In master additionally make _bt_unlink_halfdead_page() remove page items on primary as well. Do you have any objections? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Mon, Apr 27, 2020 at 4:17 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > > Assuming it doesn't seem we actually need any items on deleted pages, > > I can propose to delete them on primary as well. That would make > > contents of primary and standby more consistent. What do you think? > > So, my proposal is following. Backpatch my fix upthread to 11. In > master additionally make _bt_unlink_halfdead_page() remove page items > on primary as well. Do you have any objections? What I meant was that we might as well review the behavior of _bt_unlink_halfdead_page() here, since we have to change it anyway. But I think you're right: No matter what happens or doesn't happen to _bt_unlink_halfdead_page(), the fact is that deleted pages may or may not have a single remaining item (which was originally the "top parent" item from the page at offset number P_HIKEY), now and forever. We have to conservatively assume that it could be either state, now and forever. That means that we definitely have to give up on the check, per your patch. So, +1 for backpatching that back to 11. (BTW, I don't think that this is a concurrency issue, except in the sense that a test case that recreates the false positive is sensitive to concurrency -- I imagine you agree with this.) I like your idea of making the primary consistent with the REDO routine on the master branch only. I wonder if that will make it possible to change btree_mask() so that wal_consistency_checking can check deleted pages as well. The contents of a deleted page's special area do matter, and yet we don't currently verify that it matches (we use mask_page_content() within btree_mask() for deleted pages, which seems inappropriately broad). In particular, the left and right sibling links should be consistent with the primary on a deleted page. -- Peter Geoghegan
On Tue, Apr 28, 2020 at 4:05 AM Peter Geoghegan <pg@bowt.ie> wrote: > On Mon, Apr 27, 2020 at 4:17 AM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > > Assuming it doesn't seem we actually need any items on deleted pages, > > > I can propose to delete them on primary as well. That would make > > > contents of primary and standby more consistent. What do you think? > > > > So, my proposal is following. Backpatch my fix upthread to 11. In > > master additionally make _bt_unlink_halfdead_page() remove page items > > on primary as well. Do you have any objections? > > What I meant was that we might as well review the behavior of > _bt_unlink_halfdead_page() here, since we have to change it anyway. > But I think you're right: No matter what happens or doesn't happen to > _bt_unlink_halfdead_page(), the fact is that deleted pages may or may > not have a single remaining item (which was originally the "top > parent" item from the page at offset number P_HIKEY), now and forever. > We have to conservatively assume that it could be either state, now > and forever. That means that we definitely have to give up on the > check, per your patch. So, +1 for backpatching that back to 11. Thank you. I've worked a bit on comments and commit message. I would appreciate you review. > (BTW, I don't think that this is a concurrency issue, except in the > sense that a test case that recreates the false positive is sensitive > to concurrency -- I imagine you agree with this.) Yes, I agree it's related to concurrency, but not exactly concurrency issue. > I like your idea of making the primary consistent with the REDO > routine on the master branch only. I wonder if that will make it > possible to change btree_mask() so that wal_consistency_checking can > check deleted pages as well. The contents of a deleted page's special > area do matter, and yet we don't currently verify that it matches (we > use mask_page_content() within btree_mask() for deleted pages, which > seems inappropriately broad). In particular, the left and right > sibling links should be consistent with the primary on a deleted page. Thank you. 2nd patch is proposed for master and makes btree page unlink remove all the items from the page being deleted. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
On Mon, May 11, 2020 at 5:56 AM Alexander Korotkov <a.korotkov@postgrespro.ru> wrote: > Thank you. I've worked a bit on comments and commit message. I would > appreciate you review. This looks good to me. > > I like your idea of making the primary consistent with the REDO > > routine on the master branch only. I wonder if that will make it > > possible to change btree_mask() so that wal_consistency_checking can > > check deleted pages as well. The contents of a deleted page's special > > area do matter, and yet we don't currently verify that it matches (we > > use mask_page_content() within btree_mask() for deleted pages, which > > seems inappropriately broad). In particular, the left and right > > sibling links should be consistent with the primary on a deleted page. > > Thank you. 2nd patch is proposed for master and makes btree page > unlink remove all the items from the page being deleted. This looks good, but can we do the wal_consistency_checking/btree_mask() improvement, too? There is no reason why it can't work with fully deleted pages. It already works with half-dead pages. It would be nice to be able to test this patch in that way, and it would be nice to have it in general. -- Peter Geoghegan
On Wed, May 13, 2020 at 4:06 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Mon, May 11, 2020 at 5:56 AM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > Thank you. 2nd patch is proposed for master and makes btree page > > unlink remove all the items from the page being deleted. > > This looks good, but can we do the > wal_consistency_checking/btree_mask() improvement, too? You never got around to committing the second patch (or the wal_consistency_checking stuff). Are you planning on picking it up again? I'm currently working on this bug fix from Michail Nikolaev: https://postgr.es/m/CANtu0ohkR-evAWbpzJu54V8eCOtqjJyYp3PQ_SGoBTRGXWhWRw@mail.gmail.com It would be nice if you could commit your second patch at around the same time. It's related IMV. Thanks -- Peter Geoghegan
Hi, Peter!
On Sat, Aug 1, 2020 at 3:23 AM Peter Geoghegan <pg@bowt.ie> wrote:
On Wed, May 13, 2020 at 4:06 PM Peter Geoghegan <pg@bowt.ie> wrote:
> On Mon, May 11, 2020 at 5:56 AM Alexander Korotkov
> <a.korotkov@postgrespro.ru> wrote:
> > Thank you. 2nd patch is proposed for master and makes btree page
> > unlink remove all the items from the page being deleted.
>
> This looks good, but can we do the
> wal_consistency_checking/btree_mask() improvement, too?
You never got around to committing the second patch (or the
wal_consistency_checking stuff). Are you planning on picking it up
again?
Thank you for your reminder. Revised patch is attached. Now, the contents of deleted btree pages isn't masked. I've checked that installcheck passes with wal_consistency_checking='Btree'. I'm going to push this if no objections.
Regards,
Alexander Korotkov
Attachment
Hi Alexander, On Tue, Aug 4, 2020 at 7:27 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > Thank you for your reminder. Revised patch is attached. Now, the contents of deleted btree pages isn't masked. I'vechecked that installcheck passes with wal_consistency_checking='Btree'. I'm going to push this if no objections. This looks good to me. One small thing, though: maybe the comments should not say anything about the REDO routine -- that seems like a case of "the tail wagging the dog" to me. Perhaps say something like: "Remove the last pivot tuple on the page. This keeps things simple for WAL consistency checking." (Just a suggestion.) Thanks! -- Peter Geoghegan
On Wed, Aug 5, 2020 at 1:58 AM Peter Geoghegan <pg@bowt.ie> wrote: > On Tue, Aug 4, 2020 at 7:27 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > Thank you for your reminder. Revised patch is attached. Now, the contents of deleted btree pages isn't masked. I'vechecked that installcheck passes with wal_consistency_checking='Btree'. I'm going to push this if no objections. > > This looks good to me. One small thing, though: maybe the comments > should not say anything about the REDO routine -- that seems like a > case of "the tail wagging the dog" to me. Perhaps say something like: > > "Remove the last pivot tuple on the page. This keeps things simple > for WAL consistency checking." Pushed. Comment is changed as you suggested. But I've replaced "last pivot tuple" with "remaining tuples", because the page can also have a high key, which is also a tuple. ------ Regards, Alexander Korotkov
On Tue, Aug 4, 2020 at 4:18 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > Pushed. Comment is changed as you suggested. But I've replaced "last > pivot tuple" with "remaining tuples", because the page can also have a > high key, which is also a tuple. You're right, of course. Thanks again -- Peter Geoghegan