Re: pg_stat_database.checksum_failures vs shared relations - Mailing list pgsql-hackers
From | Julien Rouhaud |
---|---|
Subject | Re: pg_stat_database.checksum_failures vs shared relations |
Date | |
Msg-id | Z-YWXul2kEck6UYH@jrouhaud Whole thread Raw |
In response to | Re: pg_stat_database.checksum_failures vs shared relations (Andres Freund <andres@anarazel.de>) |
Responses |
Re: pg_stat_database.checksum_failures vs shared relations
|
List | pgsql-hackers |
On Thu, Mar 27, 2025 at 09:02:02PM -0400, Andres Freund wrote: > Hi, > > On 2025-03-28 09:44:58 +0900, Michael Paquier wrote: > > On Thu, Mar 27, 2025 at 12:06:45PM -0400, Robert Haas wrote: > > > On Thu, Mar 27, 2025 at 11:58 AM Andres Freund <andres@anarazel.de> wrote: > > > > So, today we have the weird situation that *some* checksum errors on shared > > > > relations get attributed to the current database (if they happen in a backend > > > > normally accessing a shared relation), whereas others get reported to the > > > > "shared relations" "database" (if they happen during a base backup). That > > > > seems ... not optimal. > > > > > > > > One question is whether we consider this a bug that should be backpatched. > > > > > > I think it would be defensible if pg_basebackup reported all errors > > > with OID 0 and backend connections reported all errors with OID > > > MyDatabaseId, but it seems hard to justify having pg_basebackup take > > > care to report things using the correct database OID and individual > > > backend connections not take care to do the same thing. So I think > > > this is a bug. If fixing it in the back-branches is too annoying, I > > > think it would be reasonable to fix it only in master, but > > > back-patching seems OK too. > > > > Being able to get a better reporting for shared relations in back > > branches would be nice, but that's going to require some invasive > > chirurgy, isn't it? > > Yea, that's what I was worried about too. I think we basically would need a > PageIsVerifiedExtended2() that backs the current PageIsVerifiedExtended(), > with optional arguments that the "fixed" callers would use. While it would be nice, I'm not sure that it would really be worth the trouble. Maybe that's just me, but if I hit a corruption failure knowing whether it's a global relation vs normal relation is definitely not something that will radically change the following days / weeks of pain to fully resolve the issue. Instead there would be other improvements that I would welcome on top of fixing those counters, which would impact such new API. For instance one of the thing you need to do in case of a corruption is to understand the reason for the corruption, and for that knowing the underlying tablespace rather than the database seems like a way more useful information to track. For the rest, the relfilelocator, forknum and blocknum should already be reported in the logs so you have the full details of what was intercepted even if the pg_stat_database view is broken in the back branches. But even if we had all that, there is still no guarantee (at least for now) that we do see all the corruption as you might not read the "real" version of the blockss if they are in shared buffers and/or in the OS cache, depending on where the corruption actually happened. And even if you could actually check what is physically stored on disk, that would probably won't give you any strong guarantee that the rest data is actually ok anyway. The biggest source of corruption I know is an old vmware bug usually referred as the SEsparse bug, where in some occasion some blocks would get written at the wrong location. In that case, the checksum can tell me which are the blocks where the wrong write happened, but not what are the blocks where the write should have happened, which are also entirely inconsistent too. That's clearly out of postgres scope, but that's in my opinion just one out of probably a lot more examples that makes the current bug in back branches not worth spending too many efforts to fix.
pgsql-hackers by date: