Thread: Logging corruption error codes

Logging corruption error codes

From

Andrey Borodin

Date:

20 June 2019, 09:57:22

Hi!

We are fine-tuning our data corruption monitoring and found out that many corruption cases do not report proper error
code.
This makes automatic log analyzer way too smart program.
We think that corruption error codes should be given in cases when B-tree or TOAST do not know how to interpret data.
PFA patch with cases that we have found in logs and consider evidence of corruption.

Best regards, Andrey Borodin.

Attachment

0001-Log-corruption-error-codes.patch

Re: Logging corruption error codes

From

Alvaro Herrera

Date:

20 June 2019, 17:09:45

On 2019-Jun-20, Andrey Borodin wrote:

> Hi!
> 
> We are fine-tuning our data corruption monitoring and found out that many corruption cases do not report proper error
code.
> This makes automatic log analyzer way too smart program.
> We think that corruption error codes should be given in cases when B-tree or TOAST do not know how to interpret
data.
> PFA patch with cases that we have found in logs and consider evidence of corruption.

This is not totally insane -- other similar messages such as 'corrupted
page pointers' in bufpage.c get the same errcode.

I would like to have a separate marking for messages indicating a
system-level permanent problem rather than user error ("table/column X
does not exist"), retryable condition ("serializability violation"), or
resource exhaustion ("out of memory", "too many clients"), but that's
probably a separate patch: things like "could not open/read/write file"
for a data file, or "xlog flush request XYZ not satisfied", and so on,
which also indicate a kind of corruption.  As you say, currently we have
to have much too smart programs to weed out the serious errors that
ought to show up in an alerting system from run-of-the-mill problems.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Logging corruption error codes

From

Andrey Borodin

Date:

21 June 2019, 10:22:15


> 20 июня 2019 г., в 22:09, Alvaro Herrera <alvherre@2ndquadrant.com> написал(а):
>
> On 2019-Jun-20, Andrey Borodin wrote:
>
>> Hi!
>>
>> We are fine-tuning our data corruption monitoring and found out that many corruption cases do not report proper
errorcode. 
>> This makes automatic log analyzer way too smart program.
>> We think that corruption error codes should be given in cases when B-tree or TOAST do not know how to interpret
data.
>> PFA patch with cases that we have found in logs and consider evidence of corruption.
>
> This is not totally insane -- other similar messages such as 'corrupted
> page pointers' in bufpage.c get the same errcode.
On master there is only
elog(ERROR, "incorrect index offsets supplied");
in bufpage.c. But this indicate misuse, not corrupted data on disk.
Others already use ERRCODE_DATA_CORRUPTED.
>
> I would like to have a separate marking for messages indicating a
> system-level permanent problem rather than user error ("table/column X
> does not exist"), retryable condition ("serializability violation"), or
> resource exhaustion ("out of memory", "too many clients"),
Good idea, but there must be standards to which we comply?

> but that's
> probably a separate patch: things like "could not open/read/write file"
> for a data file, or "xlog flush request XYZ not satisfied", and so on,
> which also indicate a kind of corruption.
I believe we should not report hardware problems as corruption. But this worries us (YC) too. Do you think that this
problemdeserve a patch? 
If we introduce new error code - this is, kind of, new feature. Should I send it to pgsql-hackers?

>  As you say, currently we have
> to have much too smart programs to weed out the serious errors that
> ought to show up in an alerting system from run-of-the-mill problems.

Thanks!

Best regards, Andrey Borodin.

Re: Logging corruption error codes

From

Michael Paquier

Date:

24 June 2019, 01:59:20

On Fri, Jun 21, 2019 at 03:22:15PM +0500, Andrey Borodin wrote:
> 20 июня 2019 г., в 22:09, Alvaro Herrera <alvherre@2ndquadrant.com> написал(а):
>> This is not totally insane -- other similar messages such as 'corrupted
>> page pointers' in bufpage.c get the same errcode.
>
> On master there is only
> elog(ERROR, "incorrect index offsets supplied");
> in bufpage.c. But this indicate misuse, not corrupted data on disk.
> Others already use ERRCODE_DATA_CORRUPTED.

Yeah.  We may be able to expand the use of ERRCODE_DATA_CORRUPTED a
bit more in the area.  No objections against that.

>> I would like to have a separate marking for messages indicating a
>> system-level permanent problem rather than user error ("table/column X
>> does not exist"), retryable condition ("serializability violation"), or
>> resource exhaustion ("out of memory", "too many clients"),
>
> Good idea, but there must be standards to which we comply?

I am not completely sure what you are going at here?  Are you aiming
at creating more wrappers which are errno-specific, like
errcode_for_file_access() and such?

There is a set of categories in errcodes.txt which make a rather clear
classification of the different types of failures which can happen:
- Corruption cases are part of the "cannot happen" case.
- Serializable functions are related to transactionability.
- OOM is system-related.
- Too many clients is a Postgres-specific limitation.
So these look rather clear to me.

> I believe we should not report hardware problems as corruption. But
> this worries us (YC) too. Do you think that this problem deserve a
> patch?

The difference may not be easy here.  Postgres has no idea if the
problem comes from the FS, the kernel or the hardware itself.  We just
report the problem that the OS tells us about so I don't think that we
should try to be smarter than the OS telling us something.  elog()
points to an internal error which should never happen.  By that, it
seems to me that we should use elog() for cases where some code paths
should never be logically reached.  Now, there is no hard line here,
sometimes we finish by using elog() on purpose as a sanity check if
for example catalog data gets corrupted (you cannot use an assertion
if you rely on an on-page state, obviously), so improving things with
a case-by-case approach is fine in my opinion..

> If we introduce new error code - this is, kind of, new
> feature. Should I send it to pgsql-hackers?

Yes, I don't think that new error codes would gain a back-patch.
So discussing that on -hackers would be more adapted in my opinion.

Looking at the patch you propose, I don't have an argument against
applying what you propose FWIW.  These suggestions seem sensible.  But
that's not really a bug as the code works properly even without your
changes.

Note: the indentation of the patch is incorrect everywhere.  A
committer applying your patch would apply pgindent anyway.  :)
--
Michael

Attachment

signature.asc

Re: Logging corruption error codes

From

Andrey Borodin

Date:

25 June 2019, 10:01:23

Hi Michael!

> 24 июня 2019 г., в 6:59, Michael Paquier <michael@paquier.xyz> написал(а):
>> If we introduce new error code - this is, kind of, new
>> feature. Should I send it to pgsql-hackers?
>
> Yes, I don't think that new error codes would gain a back-patch.
> So discussing that on -hackers would be more adapted in my opinion.
>
> Looking at the patch you propose, I don't have an argument against
> applying what you propose FWIW.  These suggestions seem sensible.  But
> that's not really a bug as the code works properly even without your
> changes.

Ok, if it's not a bug, should I re-send it to pgsql-hackers@ ? Or just register on CF as a new thing...

>
> Note: the indentation of the patch is incorrect everywhere.  A
> committer applying your patch would apply pgindent anyway.  :)
Oh, sorry. I'm trying to write "like a code around", but do not understand clearly how pgindent formats...

Best regards, Andrey Borodin.

Re: Logging corruption error codes

From

Michael Paquier

Date:

26 June 2019, 01:59:57

On Tue, Jun 25, 2019 at 03:01:23PM +0500, Andrey Borodin wrote:
> Ok, if it's not a bug, should I re-send it to pgsql-hackers@ ? Or
> just register on CF as a new thing...

If you wish to discard this patch and begin a more advanced discussion
about those errors, a new discussion on -hackers would be adapted.

Now, based on what I see you are suggesting for the couple of code
paths related to indexes, I don't see any problem in just applying
what you have (I'd need a second-look at it obviously), in which case
this thread is fine IMO.  I don't think that this should be
back-patched though as this is no bug fix, as mentioned upthread.

>> Note: the indentation of the patch is incorrect everywhere.  A
>> committer applying your patch would apply pgindent anyway.  :)
>
> Oh, sorry. I'm trying to write "like a code around", but do not
> understand clearly how pgindent formats...

No problem.  Trying to make the code consistent with the surroundings
is a good habit.
--
Michael

Attachment

signature.asc

Re: Logging corruption error codes

From

Peter Eisentraut

Date:

22 July 2019, 11:16:21

On 2019-06-20 11:57, Andrey Borodin wrote:
> We are fine-tuning our data corruption monitoring and found out that many corruption cases do not report proper error
code.
> This makes automatic log analyzer way too smart program.
> We think that corruption error codes should be given in cases when B-tree or TOAST do not know how to interpret
data.
> PFA patch with cases that we have found in logs and consider evidence of corruption.
> 
> Best regards, Andrey Borodin.

Should we use errmsg_internal() in the adjusted calls, so that the error
messages are not picked up for translation?  I could go either way, but
it's something that should be considered.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Logging corruption error codes

From

Andrey Borodin

Date:

25 July 2019, 10:45:00

> 22 июля 2019 г., в 16:16, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> написал(а):
>
> On 2019-06-20 11:57, Andrey Borodin wrote:
>> We are fine-tuning our data corruption monitoring and found out that many corruption cases do not report proper
errorcode. 
>> This makes automatic log analyzer way too smart program.
>> We think that corruption error codes should be given in cases when B-tree or TOAST do not know how to interpret
data.
>> PFA patch with cases that we have found in logs and consider evidence of corruption.
>>
>> Best regards, Andrey Borodin.
>
> Should we use errmsg_internal() in the adjusted calls, so that the error
> messages are not picked up for translation?  I could go either way, but
> it's something that should be considered.

Thanks for looking into this.

From my POV these messages provide meaningful information to cope with corruption. But they are definitely internal.
Translations already provide some information on toast chunks, mentions btree many times times and many other internal
things.
So, I'm confused about status of these messages.
Such messages should be rare enough and those to whom they are addressed should be familiar with English.

We've encountered few more cases of messages, that potentially follow data corruption. In our test environment, we were
experimentingwith custom Linux kernel that had page cache bug. The bug manifested itself in reappearing stale page
versions.This causes various data corruptions, always undetected by data checksums (do we want Merkle tree?). 

Besides messages in this patch we also had:
could not read block 1751 in file "base/16452/358336": Bad address  // Probably mostly not only data corruption, but
hardwarefault 
t_xmin is uncommitted in tuple to be updated // Probably on-disk corruption
failed to re-find parent key in index // Probably index corruption
left link changed unexpectedly in block // Probably on-disk data corruption
right sibling 45056 of block * is not next child * of block * in index // Definitely index corruption

Should I add corruption codes for these messages in the patch? Or make a separate discussion about these?

Thanks!

Best regards, Andrey Borodin.

Re: Logging corruption error codes

From

Peter Geoghegan

Date:

25 July 2019, 18:27:02

On Thu, Jul 25, 2019 at 3:45 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> From my POV these messages provide meaningful information to cope with corruption. But they are definitely internal.
> Translations already provide some information on toast chunks, mentions btree many times times and many other
internalthings.

> So, I'm confused about status of these messages.
> Such messages should be rare enough and those to whom they are addressed should be familiar with English.

I agree that these don't need to be translated, which means you must
use errmsg_internal() with ereport(). A message like "failed to
re-find parent key in index..." doesn't mean anything to more than a
tiny number of experts. It is useful only because you can paste in
into a search engine. Users will want to search for the English string
anyway.

> This causes various data corruptions, always undetected by data checksums (do we want Merkle tree?).

I don't think that it's possible to verify the integrity of multiple
page images without amcheck support for the access method. It might be
possible to do slightly more in a generic way, but I doubt it.

> Besides messages in this patch we also had:
> could not read block 1751 in file "base/16452/358336": Bad address  // Probably mostly not only data corruption, but
hardwarefault

> t_xmin is uncommitted in tuple to be updated // Probably on-disk corruption
> failed to re-find parent key in index // Probably index corruption
> left link changed unexpectedly in block // Probably on-disk data corruption
> right sibling 45056 of block * is not next child * of block * in index // Definitely index corruption
>
> Should I add corruption codes for these messages in the patch? Or make a separate discussion about these?

I don't think that we need to worry too much about the difference
between data corruption and a hardware fault that could theoretically
self-correct. There is a cost to making fine distinctions like this in
the errcodes we use.

-- 
Peter Geoghegan

Re: Logging corruption error codes

From

Andrey Borodin

Date:

31 July 2019, 06:07:55

Please find attached patch v2: added a little more cases with corruption.

> 25 июля 2019 г., в 23:27, Peter Geoghegan <pg@bowt.ie> написал(а):
>
> On Thu, Jul 25, 2019 at 3:45 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>> From my POV these messages provide meaningful information to cope with corruption. But they are definitely internal.
>> Translations already provide some information on toast chunks, mentions btree many times times and many other
internalthings. 
>> So, I'm confused about status of these messages.
>> Such messages should be rare enough and those to whom they are addressed should be familiar with English.
>
> I agree that these don't need to be translated, which means you must
> use errmsg_internal() with ereport(). A message like "failed to
> re-find parent key in index..." doesn't mean anything to more than a
> tiny number of experts. It is useful only because you can paste in
> into a search engine. Users will want to search for the English string
> anyway.
We already have translations for messages like "index \"%s\" is not a btree" and "version mismatch in index \"%s\":
fileversion %d, ". 
Personally, I agree that we should try to make these messages googlable in mailing lists. Marking them errmsg_internal
willdiscard some work of translators. 
So I haven't marked them internal in this version.

>> This causes various data corruptions, always undetected by data checksums (do we want Merkle tree?).
>
> I don't think that it's possible to verify the integrity of multiple
> page images without amcheck support for the access method. It might be
> possible to do slightly more in a generic way, but I doubt it.

Well, if you have a fork with LSNs of each page - you can guarantee that that you do not have stale version of single
page.And you can have cheap block-level incremental backups, fast catchup of standbys etc. But this comes at a cost.
Anyway,it's a discussion for another thread. 

>> Besides messages in this patch we also had:
>> could not read block 1751 in file "base/16452/358336": Bad address  // Probably mostly not only data corruption, but
hardwarefault 
>> t_xmin is uncommitted in tuple to be updated // Probably on-disk corruption
>> failed to re-find parent key in index // Probably index corruption
>> left link changed unexpectedly in block // Probably on-disk data corruption
>> right sibling 45056 of block * is not next child * of block * in index // Definitely index corruption
>>
>> Should I add corruption codes for these messages in the patch? Or make a separate discussion about these?
>
> I don't think that we need to worry too much about the difference
> between data corruption and a hardware fault that could theoretically
> self-correct. There is a cost to making fine distinctions like this in
> the errcodes we use.

Currently, that case with "could not read block" is marked by errcode_for_file_access(). I think that this code is
betterthan corruption error code.. 

Thanks!

Best regards, Andrey Borodin.

Attachment

0001-Log-corruption-error-codes-v2.patch

Re: Logging corruption error codes

From

Peter Eisentraut

Date:

01 August 2019, 09:26:32

On 2019-07-31 08:07, Andrey Borodin wrote:
> Please find attached patch v2: added a little more cases with corruption.

Committed with some adjustments.

Note that your patch didn't compile at all.  Please check that next time.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Logging corruption error codes

From

Andrey Borodin

Date:

01 August 2019, 09:42:11

> 1 авг. 2019 г., в 14:26, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> написал(а):
>
> On 2019-07-31 08:07, Andrey Borodin wrote:
>> Please find attached patch v2: added a little more cases with corruption.
>
> Committed with some adjustments.
Many thanks!

>
> Note that your patch didn't compile at all.  Please check that next time.

Uh...sorry. I've tried to cope with formatting and finally check-world'ed without saved tab.
I see you adjusted ERRCODE_DATA_CORRUPTED -> ERRCODE_INDEX_CORRUPTED in one place, that is also my mistake.

Thank you!

Best regards, Andrey Borodin.