Re: Print physical file path when checksum check fails - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Print physical file path when checksum check fails
Date
Msg-id 20200220033640.765tzkvwnc5vwbpb@alap3.anarazel.de
Whole thread Raw
In response to Re: Print physical file path when checksum check fails  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Print physical file path when checksum check fails
List pgsql-hackers
Hi,

On 2020-02-19 16:48:45 +0900, Michael Paquier wrote:
> On Wed, Feb 19, 2020 at 03:00:54PM +0900, Kyotaro Horiguchi wrote:
> > I have had support requests related to broken block several times, and
> > (I think) most of *them* had hard time to locate the broken block or
> > even broken file.  I don't think it is useles at all, but I'm not sure
> > it is worth the additional complexity.
>
> I handle stuff like that from time to time, and any reports usually
> go down to people knowledgeable about PostgreSQL enough to know the
> difference.  My point is that in order to know where a broken block is
> physically located on disk, you need to know four things:
> - The block number.
> - The physical location of the relation.
> - The size of the block.
> - The length of a file segment.
> The first two items are printed in the error message, and you can
> guess easily the actual location (file, offset) with the two others.

> I am not necessarily against improving the error message here, but
> FWIW I think that we need to consider seriously if the code
> complications and the maintenance cost involved are really worth
> saving from one simple calculation.

I don't think it's that simple for most.

And if we e.g. ever get the undo stuff merged, it'd get more
complicated, because they segment entirely differently. Similar, if we
ever manage to move SLRUs into the buffer pool and checksummed, it'd
again work differently.

Nor is it architecturally appealing to handle checksums in multiple
places above the smgr layer: For one, that requires multiple places to
compute verify them. But also, as the way checksums are computed depends
on the page format etc, it'll likely change for things like undo/slru -
which then again will require additional smarts if done above the smgr
layer.


> Particularly, quickly reading through the patch, I am rather unhappy
> about the shape of the second patch which pushes down the segment
> number knowledge into relpath.c, and creates more complication around
> the handling of zero_damaged_pages and zero'ed pages.  -- Michael

I do not like the SetZeroDamagedPageInChecksum stuff at all however.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Clean up some old cruft related to Windows
Next
From: Andres Freund
Date:
Subject: Re: backend type in log_line_prefix?