Re: BUG #17212: pg_amcheck fails on checking temporary relations - Mailing list pgsql-hackers
From | Mark Dilger |
---|---|
Subject | Re: BUG #17212: pg_amcheck fails on checking temporary relations |
Date | |
Msg-id | 81019B2A-C0EB-4890-BA3A-DB60590AAF43@enterprisedb.com Whole thread Raw |
In response to | Re: BUG #17212: pg_amcheck fails on checking temporary relations (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: BUG #17212: pg_amcheck fails on checking temporary relations
|
List | pgsql-hackers |
> On Oct 4, 2021, at 4:45 PM, Peter Geoghegan <pg@bowt.ie> wrote: > > I'm guessing that you meant REINDEX CONCURRENTLY. Yes. > Since you're talking about the case where it has an error Sorry, I realized after hitting <send> that you might take it that way, but I mean the logs generally, not just postgreslogs. If somebody runs "reindex concurrently" on all tables at midnight every night, and they see one morning inthe (non-postgres) logs from the midnight hour weird error messages about their RAID controller, they may well want tocheck all their indexes to see if any of them were corrupted. This is a totally made-up example, but the idea that a usermight want to check their indexes, tables, or both owing to errors of some kind is not far-fetched. > , the whole > index build must have failed. So the user would get exactly what > they'd expect -- verification of the original index, without any > hindrance from the new/failed index. Right, in that case, but not if hardware errors corrupted the index, and generated logs, without happening to trip up thereindex concurrently statement itself. > (Thinks for a moment...) > > Actually, I think that we'd only verify the original index, even > before the error with CONCURRENTLY (though I've not checked that point > myself). To get back on track, let me say that I'm not taking the position that what pg_amcheck currently does is necessarily correct,but just that I'd like to be careful about what we change, if anything. There are three things that seem to irritatepeople: 1) A non-zero exit code means "not everything was checked and passed" rather than "at least one thing is definitely corrupt". 2) Skipping of indexes is reported to the user with the word 'ERROR' in the report rather than, say, 'NOTICE'. 3) Deadlocks can occur I have resisted changing #1 on the theory that `pg_amcheck --all && ./post_all_checks_pass.sh` should only run the post_all_checks_pass.shif indeed all checks have passed, and I'm interpreting skipping an index check as being contrary tothat. But maybe that's wrong of me. I don't know. There is already sloppiness between the time that pg_amcheck resolveswhich database relations are matched by --all, --table, --index, etc. and the time that all the checks are started,and again between that time and when the last one is complete. Database objects could be created or dropped duringthose spans of time, in which case --all doesn't have quite so well defined a meaning. But the user running pg_amcheckmight also *know* that they aren't running any such DDL, and therefore expect --all to actually result in everythingbeing checked. I find it strange that I should do anything about #2 in pg_amcheck, since it's the function in verify_nbtree that phrasesthe situation as an error. But I suppose I can just ignore that and have it print as a notice. I'm genuinely nottrying to give you grief here -- I simply don't like that pg_amcheck is adding commentary atop what the checking functionsare doing. I see a clean division between what pg_amcheck is doing and what amcheck is doing, and this feels tome to put that on the wrong side of the divide. If refusing to check the index because it is not in the requisite stateis a notice, then why wouldn't verify_nbtree raise it as one and return early rather than raising an error? I also find it strange that #3 is being attributed to pg_amcheck's choice of how to call the checking function, because Ican't think of any other function where we require the SQL caller to do anything like what you are requiring here in orderto prevent deadlocks, and also because the docs for the functions don't say that a deadlock is possible, merely thatthe function may return with an error. I was totally content to get an error back, since errors are how the verify_nbtreefunctions communicate everything else, and the handler for those functions is already prepared to deal withthe error messages so returned. But it clearly annoys you that pg_amcheck is doing this, so I'll go forward with thepatch that I already have written which does otherwise. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: