Thread: Logging corruption error codes
Hi! We are fine-tuning our data corruption monitoring and found out that many corruption cases do not report proper error code. This makes automatic log analyzer way too smart program. We think that corruption error codes should be given in cases when B-tree or TOAST do not know how to interpret data. PFA patch with cases that we have found in logs and consider evidence of corruption. Best regards, Andrey Borodin.
Attachment
On 2019-Jun-20, Andrey Borodin wrote: > Hi! > > We are fine-tuning our data corruption monitoring and found out that many corruption cases do not report proper error code. > This makes automatic log analyzer way too smart program. > We think that corruption error codes should be given in cases when B-tree or TOAST do not know how to interpret data. > PFA patch with cases that we have found in logs and consider evidence of corruption. This is not totally insane -- other similar messages such as 'corrupted page pointers' in bufpage.c get the same errcode. I would like to have a separate marking for messages indicating a system-level permanent problem rather than user error ("table/column X does not exist"), retryable condition ("serializability violation"), or resource exhaustion ("out of memory", "too many clients"), but that's probably a separate patch: things like "could not open/read/write file" for a data file, or "xlog flush request XYZ not satisfied", and so on, which also indicate a kind of corruption. As you say, currently we have to have much too smart programs to weed out the serious errors that ought to show up in an alerting system from run-of-the-mill problems. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 20 июня 2019 г., в 22:09, Alvaro Herrera <alvherre@2ndquadrant.com> написал(а): > > On 2019-Jun-20, Andrey Borodin wrote: > >> Hi! >> >> We are fine-tuning our data corruption monitoring and found out that many corruption cases do not report proper errorcode. >> This makes automatic log analyzer way too smart program. >> We think that corruption error codes should be given in cases when B-tree or TOAST do not know how to interpret data. >> PFA patch with cases that we have found in logs and consider evidence of corruption. > > This is not totally insane -- other similar messages such as 'corrupted > page pointers' in bufpage.c get the same errcode. On master there is only elog(ERROR, "incorrect index offsets supplied"); in bufpage.c. But this indicate misuse, not corrupted data on disk. Others already use ERRCODE_DATA_CORRUPTED. > > I would like to have a separate marking for messages indicating a > system-level permanent problem rather than user error ("table/column X > does not exist"), retryable condition ("serializability violation"), or > resource exhaustion ("out of memory", "too many clients"), Good idea, but there must be standards to which we comply? > but that's > probably a separate patch: things like "could not open/read/write file" > for a data file, or "xlog flush request XYZ not satisfied", and so on, > which also indicate a kind of corruption. I believe we should not report hardware problems as corruption. But this worries us (YC) too. Do you think that this problemdeserve a patch? If we introduce new error code - this is, kind of, new feature. Should I send it to pgsql-hackers? > As you say, currently we have > to have much too smart programs to weed out the serious errors that > ought to show up in an alerting system from run-of-the-mill problems. Thanks! Best regards, Andrey Borodin.
On Fri, Jun 21, 2019 at 03:22:15PM +0500, Andrey Borodin wrote: > 20 июня 2019 г., в 22:09, Alvaro Herrera <alvherre@2ndquadrant.com> написал(а): >> This is not totally insane -- other similar messages such as 'corrupted >> page pointers' in bufpage.c get the same errcode. > > On master there is only > elog(ERROR, "incorrect index offsets supplied"); > in bufpage.c. But this indicate misuse, not corrupted data on disk. > Others already use ERRCODE_DATA_CORRUPTED. Yeah. We may be able to expand the use of ERRCODE_DATA_CORRUPTED a bit more in the area. No objections against that. >> I would like to have a separate marking for messages indicating a >> system-level permanent problem rather than user error ("table/column X >> does not exist"), retryable condition ("serializability violation"), or >> resource exhaustion ("out of memory", "too many clients"), > > Good idea, but there must be standards to which we comply? I am not completely sure what you are going at here? Are you aiming at creating more wrappers which are errno-specific, like errcode_for_file_access() and such? There is a set of categories in errcodes.txt which make a rather clear classification of the different types of failures which can happen: - Corruption cases are part of the "cannot happen" case. - Serializable functions are related to transactionability. - OOM is system-related. - Too many clients is a Postgres-specific limitation. So these look rather clear to me. > I believe we should not report hardware problems as corruption. But > this worries us (YC) too. Do you think that this problem deserve a > patch? The difference may not be easy here. Postgres has no idea if the problem comes from the FS, the kernel or the hardware itself. We just report the problem that the OS tells us about so I don't think that we should try to be smarter than the OS telling us something. elog() points to an internal error which should never happen. By that, it seems to me that we should use elog() for cases where some code paths should never be logically reached. Now, there is no hard line here, sometimes we finish by using elog() on purpose as a sanity check if for example catalog data gets corrupted (you cannot use an assertion if you rely on an on-page state, obviously), so improving things with a case-by-case approach is fine in my opinion.. > If we introduce new error code - this is, kind of, new > feature. Should I send it to pgsql-hackers? Yes, I don't think that new error codes would gain a back-patch. So discussing that on -hackers would be more adapted in my opinion. Looking at the patch you propose, I don't have an argument against applying what you propose FWIW. These suggestions seem sensible. But that's not really a bug as the code works properly even without your changes. Note: the indentation of the patch is incorrect everywhere. A committer applying your patch would apply pgindent anyway. :) -- Michael
Attachment
Hi Michael! > 24 июня 2019 г., в 6:59, Michael Paquier <michael@paquier.xyz> написал(а): >> If we introduce new error code - this is, kind of, new >> feature. Should I send it to pgsql-hackers? > > Yes, I don't think that new error codes would gain a back-patch. > So discussing that on -hackers would be more adapted in my opinion. > > Looking at the patch you propose, I don't have an argument against > applying what you propose FWIW. These suggestions seem sensible. But > that's not really a bug as the code works properly even without your > changes. Ok, if it's not a bug, should I re-send it to pgsql-hackers@ ? Or just register on CF as a new thing... > > Note: the indentation of the patch is incorrect everywhere. A > committer applying your patch would apply pgindent anyway. :) Oh, sorry. I'm trying to write "like a code around", but do not understand clearly how pgindent formats... Best regards, Andrey Borodin.
On Tue, Jun 25, 2019 at 03:01:23PM +0500, Andrey Borodin wrote: > Ok, if it's not a bug, should I re-send it to pgsql-hackers@ ? Or > just register on CF as a new thing... If you wish to discard this patch and begin a more advanced discussion about those errors, a new discussion on -hackers would be adapted. Now, based on what I see you are suggesting for the couple of code paths related to indexes, I don't see any problem in just applying what you have (I'd need a second-look at it obviously), in which case this thread is fine IMO. I don't think that this should be back-patched though as this is no bug fix, as mentioned upthread. >> Note: the indentation of the patch is incorrect everywhere. A >> committer applying your patch would apply pgindent anyway. :) > > Oh, sorry. I'm trying to write "like a code around", but do not > understand clearly how pgindent formats... No problem. Trying to make the code consistent with the surroundings is a good habit. -- Michael
Attachment
On 2019-06-20 11:57, Andrey Borodin wrote: > We are fine-tuning our data corruption monitoring and found out that many corruption cases do not report proper error code. > This makes automatic log analyzer way too smart program. > We think that corruption error codes should be given in cases when B-tree or TOAST do not know how to interpret data. > PFA patch with cases that we have found in logs and consider evidence of corruption. > > Best regards, Andrey Borodin. Should we use errmsg_internal() in the adjusted calls, so that the error messages are not picked up for translation? I could go either way, but it's something that should be considered. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 22 июля 2019 г., в 16:16, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> написал(а): > > On 2019-06-20 11:57, Andrey Borodin wrote: >> We are fine-tuning our data corruption monitoring and found out that many corruption cases do not report proper errorcode. >> This makes automatic log analyzer way too smart program. >> We think that corruption error codes should be given in cases when B-tree or TOAST do not know how to interpret data. >> PFA patch with cases that we have found in logs and consider evidence of corruption. >> >> Best regards, Andrey Borodin. > > Should we use errmsg_internal() in the adjusted calls, so that the error > messages are not picked up for translation? I could go either way, but > it's something that should be considered. Thanks for looking into this. From my POV these messages provide meaningful information to cope with corruption. But they are definitely internal. Translations already provide some information on toast chunks, mentions btree many times times and many other internal things. So, I'm confused about status of these messages. Such messages should be rare enough and those to whom they are addressed should be familiar with English. We've encountered few more cases of messages, that potentially follow data corruption. In our test environment, we were experimentingwith custom Linux kernel that had page cache bug. The bug manifested itself in reappearing stale page versions.This causes various data corruptions, always undetected by data checksums (do we want Merkle tree?). Besides messages in this patch we also had: could not read block 1751 in file "base/16452/358336": Bad address // Probably mostly not only data corruption, but hardwarefault t_xmin is uncommitted in tuple to be updated // Probably on-disk corruption failed to re-find parent key in index // Probably index corruption left link changed unexpectedly in block // Probably on-disk data corruption right sibling 45056 of block * is not next child * of block * in index // Definitely index corruption Should I add corruption codes for these messages in the patch? Or make a separate discussion about these? Thanks! Best regards, Andrey Borodin.
On Thu, Jul 25, 2019 at 3:45 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote: > From my POV these messages provide meaningful information to cope with corruption. But they are definitely internal. > Translations already provide some information on toast chunks, mentions btree many times times and many other internalthings. > So, I'm confused about status of these messages. > Such messages should be rare enough and those to whom they are addressed should be familiar with English. I agree that these don't need to be translated, which means you must use errmsg_internal() with ereport(). A message like "failed to re-find parent key in index..." doesn't mean anything to more than a tiny number of experts. It is useful only because you can paste in into a search engine. Users will want to search for the English string anyway. > This causes various data corruptions, always undetected by data checksums (do we want Merkle tree?). I don't think that it's possible to verify the integrity of multiple page images without amcheck support for the access method. It might be possible to do slightly more in a generic way, but I doubt it. > Besides messages in this patch we also had: > could not read block 1751 in file "base/16452/358336": Bad address // Probably mostly not only data corruption, but hardwarefault > t_xmin is uncommitted in tuple to be updated // Probably on-disk corruption > failed to re-find parent key in index // Probably index corruption > left link changed unexpectedly in block // Probably on-disk data corruption > right sibling 45056 of block * is not next child * of block * in index // Definitely index corruption > > Should I add corruption codes for these messages in the patch? Or make a separate discussion about these? I don't think that we need to worry too much about the difference between data corruption and a hardware fault that could theoretically self-correct. There is a cost to making fine distinctions like this in the errcodes we use. -- Peter Geoghegan
Please find attached patch v2: added a little more cases with corruption. > 25 июля 2019 г., в 23:27, Peter Geoghegan <pg@bowt.ie> написал(а): > > On Thu, Jul 25, 2019 at 3:45 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote: >> From my POV these messages provide meaningful information to cope with corruption. But they are definitely internal. >> Translations already provide some information on toast chunks, mentions btree many times times and many other internalthings. >> So, I'm confused about status of these messages. >> Such messages should be rare enough and those to whom they are addressed should be familiar with English. > > I agree that these don't need to be translated, which means you must > use errmsg_internal() with ereport(). A message like "failed to > re-find parent key in index..." doesn't mean anything to more than a > tiny number of experts. It is useful only because you can paste in > into a search engine. Users will want to search for the English string > anyway. We already have translations for messages like "index \"%s\" is not a btree" and "version mismatch in index \"%s\": fileversion %d, ". Personally, I agree that we should try to make these messages googlable in mailing lists. Marking them errmsg_internal willdiscard some work of translators. So I haven't marked them internal in this version. >> This causes various data corruptions, always undetected by data checksums (do we want Merkle tree?). > > I don't think that it's possible to verify the integrity of multiple > page images without amcheck support for the access method. It might be > possible to do slightly more in a generic way, but I doubt it. Well, if you have a fork with LSNs of each page - you can guarantee that that you do not have stale version of single page.And you can have cheap block-level incremental backups, fast catchup of standbys etc. But this comes at a cost. Anyway,it's a discussion for another thread. >> Besides messages in this patch we also had: >> could not read block 1751 in file "base/16452/358336": Bad address // Probably mostly not only data corruption, but hardwarefault >> t_xmin is uncommitted in tuple to be updated // Probably on-disk corruption >> failed to re-find parent key in index // Probably index corruption >> left link changed unexpectedly in block // Probably on-disk data corruption >> right sibling 45056 of block * is not next child * of block * in index // Definitely index corruption >> >> Should I add corruption codes for these messages in the patch? Or make a separate discussion about these? > > I don't think that we need to worry too much about the difference > between data corruption and a hardware fault that could theoretically > self-correct. There is a cost to making fine distinctions like this in > the errcodes we use. Currently, that case with "could not read block" is marked by errcode_for_file_access(). I think that this code is betterthan corruption error code.. Thanks! Best regards, Andrey Borodin.
Attachment
On 2019-07-31 08:07, Andrey Borodin wrote: > Please find attached patch v2: added a little more cases with corruption. Committed with some adjustments. Note that your patch didn't compile at all. Please check that next time. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 1 авг. 2019 г., в 14:26, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> написал(а): > > On 2019-07-31 08:07, Andrey Borodin wrote: >> Please find attached patch v2: added a little more cases with corruption. > > Committed with some adjustments. Many thanks! > > Note that your patch didn't compile at all. Please check that next time. Uh...sorry. I've tried to cope with formatting and finally check-world'ed without saved tab. I see you adjusted ERRCODE_DATA_CORRUPTED -> ERRCODE_INDEX_CORRUPTED in one place, that is also my mistake. Thank you! Best regards, Andrey Borodin.