Thread: Verify true root on replicas with amcheck
Hi, we have trouble to detect true root corruptions on replicas. I made a patch for resolving it with the locking meta pageand potential root page. I heard that amcheck has an invariant about locking no more than 1 page at a moment for avoidingdeadlocks. Is there possible a deadlock situation?
Attachment
On Thu, Jan 9, 2020 at 12:55 AM godjan • <g0dj4n@gmail.com> wrote: > Hi, we have trouble to detect true root corruptions on replicas. I made a patch for resolving it with the locking metapage and potential root page. What do you mean by true root corruption? What is the cause of the problem? What symptom does it have in your application? While I was the one that wrote the existing !readonly/parent check for the true root (a check which your patch makes work with the regular bt_check_index() function), I wasn't thinking of any particular corruption scenario at the time. I wrote the check simply because it was easy to do so (with a heavyweight ShareLock on the index). > I heard that amcheck has an invariant about locking no more than 1 page at a moment for avoiding deadlocks. Is there possiblea deadlock situation? This is a conservative principle that I came up with when I wrote the original version of amcheck. It's not strictly necessary, but it seemed like a good idea. It should be safe to "couple" buffer locks in a way that matches the B-Tree code -- as long as it is thought through very carefully. I am probably going to relax the rule for one specific case soon -- see: https://postgr.es/m/F7527087-6E95-4077-B964-D2CAFEF6224B@yandex-team.ru Your patch looks like it gets it right (it won't deadlock with other sessions that access the metapage), but I hesitate to commit it without a strong justification. Acquiring multiple buffer locks concurrently is worth avoiding wherever possible. -- Peter Geoghegan
On 1/9/20 3:55 AM, godjan • wrote: > Hi, we have trouble to detect true root corruptions on replicas. I made a patch for resolving it with the locking metapage and potential root page. I heard that amcheck has an invariant about locking no more than 1 page at a moment foravoiding deadlocks. Is there possible a deadlock situation? This patch no longer applies cleanly: http://cfbot.cputube.org/patch_27_2418.log The CF entry has been updated to Waiting on Author. Also, it would be a good idea to answer Peter's questions down-thread if you are interested in moving this patch forward. Regards, -- -David david@pgmasters.net
On 1/16/20 7:40 PM, Peter Geoghegan wrote: > On Thu, Jan 9, 2020 at 12:55 AM godjan • <g0dj4n@gmail.com> wrote: > >> I heard that amcheck has an invariant about locking no more than 1 page at a moment for avoiding deadlocks. Is there possiblea deadlock situation? > > This is a conservative principle that I came up with when I wrote the > original version of amcheck. It's not strictly necessary, but it > seemed like a good idea. It should be safe to "couple" buffer locks in > a way that matches the B-Tree code -- as long as it is thought through > very carefully. I am probably going to relax the rule for one specific > case soon -- see: > > https://postgr.es/m/F7527087-6E95-4077-B964-D2CAFEF6224B@yandex-team.ru > > Your patch looks like it gets it right (it won't deadlock with other > sessions that access the metapage), but I hesitate to commit it > without a strong justification. Acquiring multiple buffer locks > concurrently is worth avoiding wherever possible. I have marked this patch Returned with Feedback since it has been sitting for a while with no response from the author. Regards, -- -David david@pgmasters.net