Thread: Do not check unlogged indexes on standby
Hi hackers! Currently, if we check indexes on standby we often get man-psbpshn0skhsxynd/xiva_xtable_testing_01 R # select bt_index_check('xiva_loadtest.pk_uid'); ERROR: 58P01: could not open file "base/16453/125407": No such file or directory I think that we should print warning and that's it. Amcheck should not give false positives. Or, maybe, there are some design considerations that I miss? BTW I really want to enable rightlink-leftlink invariant validation on standby.. Thanks! Best regards, Andrey Borodin.
Attachment
On Mon, Aug 12, 2019 at 2:58 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > Currently, if we check indexes on standby we often get > > man-psbpshn0skhsxynd/xiva_xtable_testing_01 R # select bt_index_check('xiva_loadtest.pk_uid'); > ERROR: 58P01: could not open file "base/16453/125407": No such file or directory > > I think that we should print warning and that's it. Amcheck should not give false positives. I agree -- amcheck should just skip over unlogged tables during recovery, since there is simply nothing to check. I pushed your patch to all branches that have amcheck just now, so now we skip over unlogged relations when in recovery, though I made some revisions. Your patch didn't handle temp tables/indexes that were created in the first session correctly -- we must be careful about the distinction between unlogged tables, and tables that don't require WAL logging (the later includes temp tables). Also, I thought that it was a good idea to actively test for the presence of a main fork when we don't skip (i.e. when the system isn't in recovery and the B-Tree indexes isn't unlogged) -- we now give a clean report of corruption when that happens, rather than letting an ambiguous "can't happen" error get raised by low-level code. This might be possible with system catalog corruption, for example. Finally, I thought that the WARNING was a bit strong -- a NOTICE is more appropriate. Thanks! -- Peter Geoghegan
On Mon, Aug 12, 2019 at 2:58 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > BTW I really want to enable rightlink-leftlink invariant validation on standby.. That seems very hard. My hope was that bt_check_index() can detect the same problem a different way. The bt_right_page_check_scankey() cross-page check (which needs nothing more than an AccessShareLock) will often detect such problems, because the page image itself will be totally wrong in some way. I'm guessing that you have direct experience with that *not* being good enough, though. Can you share further details? I suppose that bt_right_page_check_scankey() helps with transposed pages, but doesn't help so much when you have WAL-level inconsistencies. -- Peter Geoghegan
> 13 авг. 2019 г., в 3:23, Peter Geoghegan <pg@bowt.ie> написал(а): > > I pushed your patch to all branches that have amcheck just now, so now > we skip over unlogged relations when in recovery, though I made some > revisions. Oh, cool, thanks! > Your patch didn't handle temp tables/indexes that were created in the > first session correctly -- we must be careful about the distinction > between unlogged tables, and tables that don't require WAL logging > (the later includes temp tables). Also, I thought that it was a good > idea to actively test for the presence of a main fork when we don't > skip (i.e. when the system isn't in recovery and the B-Tree indexes > isn't unlogged) -- we now give a clean report of corruption when that > happens, rather than letting an ambiguous "can't happen" error get > raised by low-level code. This might be possible with system catalog > corruption, for example. Finally, I thought that the WARNING was a bit > strong -- a NOTICE is more appropriate. +1 > 13 авг. 2019 г., в 3:36, Peter Geoghegan <pg@bowt.ie> написал(а): > > On Mon, Aug 12, 2019 at 2:58 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote: >> BTW I really want to enable rightlink-leftlink invariant validation on standby.. > > That seems very hard. My hope was that bt_check_index() can detect the > same problem a different way. The bt_right_page_check_scankey() > cross-page check (which needs nothing more than an AccessShareLock) > will often detect such problems, because the page image itself will be > totally wrong in some way. > > I'm guessing that you have direct experience with that *not* being > good enough, though. Can you share further details? I suppose that > bt_right_page_check_scankey() helps with transposed pages, but doesn't > help so much when you have WAL-level inconsistencies. We have a bunch of internal testing HA clusters that suffered from corruption conditions. We fixed everything that can be detected with parent-check on primaries or usual check on standbys. (page updates were lost both on primary and during WAL replay) But from time to time when clusters switch primary from one availability zone to another we observe "right sibling's left-link doesn't match: block 32709 links to 37022 instead of expected 40953 in index" We are going to search for these clusters with this [0] tolerating possible fraction of false positives, we have them anyway. But I think I could put some effort into making corruption-detection tooling better. I think if we observe links discrepancy, we can acquire lock of left and right pages and recheck. Best regards, Andrey Borodin. [0] https://github.com/x4m/amcheck/commit/894d8bafb3c9a26bbc168ea5f4f33bcd1fc9f495
On Tue, Aug 13, 2019 at 5:17 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > We have a bunch of internal testing HA clusters that suffered from corruption conditions. > We fixed everything that can be detected with parent-check on primaries or usual check on standbys. > (page updates were lost both on primary and during WAL replay) > But from time to time when clusters switch primary from one availability zone to another we observe > "right sibling's left-link doesn't match: block 32709 links to 37022 instead of expected 40953 in index" That sounds like an issue caused by a failure to replay all available WAL, where only one page happened to get written out by a checkpoint before a crash. It's something like that. That wouldn't be caught by the cross-page bt_index_check() check that we do already. > We are going to search for these clusters with this [0] tolerating possible fraction of false positives, we have them anyway. > But I think I could put some effort into making corruption-detection tooling better. > I think if we observe links discrepancy, we can acquire lock of left and right pages and recheck. That's one possibility. When I first designed amcheck it was important to be conservative, so I invented a general rule about never acquiring multiple buffer locks at once. I still think that that was the correct decision for the bt_downlink_check() check (the main extra bt_index_parent_check() check), but I think that you're right about retrying to verify the sibling links when bt_index_check() is called from SQL. nbtree will often "couple" buffer locks on the leaf level; it will acquire a lock on a leaf page, and not release that lock until it has also acquired a lock on the right sibling page (I'm mostly thinking of _bt_stepright()). I am in favor of a patch that makes amcheck perform sibling link verification within bt_index_check(), by retrying while pessimistically coupling buffer locks. (Though I think that that should just happen on the leaf level. We should not try to be too clever about ignorable/half-dead/deleted pages, to be conservative.) -- Peter Geoghegan
> 13 авг. 2019 г., в 20:30, Peter Geoghegan <pg@bowt.ie> написал(а): > > That's one possibility. When I first designed amcheck it was important > to be conservative, so I invented a general rule about never acquiring > multiple buffer locks at once. I still think that that was the correct > decision for the bt_downlink_check() check (the main extra > bt_index_parent_check() check), but I think that you're right about > retrying to verify the sibling links when bt_index_check() is called > from SQL. > > nbtree will often "couple" buffer locks on the leaf level; it will > acquire a lock on a leaf page, and not release that lock until it has > also acquired a lock on the right sibling page (I'm mostly thinking of > _bt_stepright()). I am in favor of a patch that makes amcheck perform > sibling link verification within bt_index_check(), by retrying while > pessimistically coupling buffer locks. (Though I think that that > should just happen on the leaf level. We should not try to be too > clever about ignorable/half-dead/deleted pages, to be conservative.) PFA V1 of this check retry. Best regards, Andrey Borodin.
Attachment
On 2019-Aug-15, Andrey Borodin wrote: > PFA V1 of this check retry. CFbot complains that this doesn't apply; can you please rebase? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
The patch has been committed already.
Peter Geoghegan
(Sent from my phone)
(Sent from my phone)
On Wed, Sep 11, 2019 at 7:10 PM Peter Geoghegan <pg@bowt.ie> wrote: > The patch has been committed already. Oh, wait. It hasn't. Andrey didn't create a new thread for his largely independent patch, so I incorrectly assumed he created a CF entry for his original bugfix. -- Peter Geoghegan
On 2019-Sep-11, Peter Geoghegan wrote: > On Wed, Sep 11, 2019 at 7:10 PM Peter Geoghegan <pg@bowt.ie> wrote: > > The patch has been committed already. > > Oh, wait. It hasn't. Andrey didn't create a new thread for his largely > independent patch, so I incorrectly assumed he created a CF entry for > his original bugfix. So, I'm confused. There appear to be two bugfix patches in this thread, with no relationship between them, and as far as I can tell only one of them has been addressed. What was applied (6754fe65a4c6) is significantly different from what Andrey submitted. Is that correct? If so, we still have an open bug, right? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Feb 5, 2020 at 1:27 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > So, I'm confused. There appear to be two bugfix patches in this thread, > with no relationship between them, and as far as I can tell only one of > them has been addressed. What was applied (6754fe65a4c6) is > significantly different from what Andrey submitted. Is that correct? > If so, we still have an open bug, right? No. We had two separate patches on this thread: 1. A bugfix patch to make amcheck not do the wrong thing with unlogged indexes when operating on a standby. 2. An unrelated feature/enhancement that would allow amcheck to detect more types of corruption with only an AccessShareLock on the relation. The first item was dealt with way back in August, without controversy -- my commit 6754fe65 was more or less Andrey's bugfix. The second item genereated another thread a little after this thread. Everything was handled on this other thread. Ultimately, I rejected the enhancement on the grounds that it wasn't safe on standbys in the face of concurrent splits and deletions [1]. I believe that all of the items discussed on this thread have been resolved. Did I miss a CF entry or something? [1] https://postgr.es/m/CAH2-Wzmb_QOmHX=uWjCFV4Gf1810kz-yVzK6RA=VS41EFcKh=g@mail.gmail.com -- Peter Geoghegan
On 2020-Feb-05, Peter Geoghegan wrote: > The second item genereated another thread a little after this thread. > Everything was handled on this other thread. Ultimately, I rejected > the enhancement on the grounds that it wasn't safe on standbys in the > face of concurrent splits and deletions [1]. > > I believe that all of the items discussed on this thread have been > resolved. Did I miss a CF entry or something? Nah. I just had one of the messages flagged in my inbox, and I wasn't sure what had happened since the other thread was not referenced in this one. I wasn't looking at any CF entries. Thanks for the explanation, -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services