Re: Print physical file path when checksum check fails - Mailing list pgsql-hackers

From Hubert Zhang
Subject Re: Print physical file path when checksum check fails
Date
Msg-id CAB0yrem+fGBtMB15Jxy64Qoq9iB-pQfy0BqfpSD4roGmWpkKOQ@mail.gmail.com
Whole thread Raw
In response to Re: Print physical file path when checksum check fails  (Hubert Zhang <hzhang@pivotal.io>)
List pgsql-hackers
I have updated the patch based on the previous comments. Sorry for the late patch.

I removed `SetZeroDamagedPageInChecksum` and add `zeroDamagePage` flag in smgrread to determine whether we should zero damage page when an error happens. It depends on the caller.
 
`GetRelationFilePath` is removed as well. We print segno on the fly.

On Thu, Feb 20, 2020 at 2:33 PM Hubert Zhang <hzhang@pivotal.io> wrote:
Thanks,

On Thu, Feb 20, 2020 at 11:36 AM Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2020-02-19 16:48:45 +0900, Michael Paquier wrote:
> On Wed, Feb 19, 2020 at 03:00:54PM +0900, Kyotaro Horiguchi wrote:
> > I have had support requests related to broken block several times, and
> > (I think) most of *them* had hard time to locate the broken block or
> > even broken file.  I don't think it is useles at all, but I'm not sure
> > it is worth the additional complexity.
>
> I handle stuff like that from time to time, and any reports usually
> go down to people knowledgeable about PostgreSQL enough to know the
> difference.  My point is that in order to know where a broken block is
> physically located on disk, you need to know four things:
> - The block number.
> - The physical location of the relation.
> - The size of the block.
> - The length of a file segment.
> The first two items are printed in the error message, and you can
> guess easily the actual location (file, offset) with the two others.

> I am not necessarily against improving the error message here, but
> FWIW I think that we need to consider seriously if the code
> complications and the maintenance cost involved are really worth
> saving from one simple calculation.

I don't think it's that simple for most.

And if we e.g. ever get the undo stuff merged, it'd get more
complicated, because they segment entirely differently. Similar, if we
ever manage to move SLRUs into the buffer pool and checksummed, it'd
again work differently.

Nor is it architecturally appealing to handle checksums in multiple
places above the smgr layer: For one, that requires multiple places to
compute verify them. But also, as the way checksums are computed depends
on the page format etc, it'll likely change for things like undo/slru -
which then again will require additional smarts if done above the smgr
layer.


So considering undo staff, it's better to move checksum logic into md.c
I will keep it in the new patch.

On 2020-02-19 16:48:45 +0900, Michael Paquier wrote:
> Particularly, quickly reading through the patch, I am rather unhappy
> about the shape of the second patch which pushes down the segment
> number knowledge into relpath.c, and creates more complication around
> the handling of zero_damaged_pages and zero'ed pages.  -- Michael

I do not like the SetZeroDamagedPageInChecksum stuff at all however.


I'm +1 on the first concern, and I will delete the new added function `GetRelationFilePath`
in relpath.c and append the segno directly in error message inside function `VerifyPage`

As for SetZeroDamagedPageInChecksum, the reason why I introduced it is that when we are doing
smgrread() and one of the damaged page failed to pass the checksum check, we could not directly error
out, since the caller of smgrread() may tolerate this error and just zero all the damaged page plus a warning message.
Also, we could not just use GUC zero_damaged_pages to do this branch, since we also have ReadBufferMode(i.e. RBM_ZERO_ON_ERROR) to control it.

To get rid of SetZeroDamagedPageInChecksum, one idea is to pass zero_damaged_page flag into smgrread(), something like below:
==

extern void smgrread(SMgrRelation reln, ForkNumber forknum,

BlockNumber blocknum, char *buffer, int flag);

===


Any comments?



--
Thanks

Hubert Zhang


--
Thanks

Hubert Zhang
Attachment

pgsql-hackers by date:

Previous
From: "wenjing.zwj"
Date:
Subject: Re: [Proposal] Global temporary tables
Next
From: Peter Eisentraut
Date:
Subject: Re: Auxiliary Processes and MyAuxProc