Thread: PANIC: corrupted item lengths
I've seen a couple of *topic issues lately. What seems strange about the various errors generated in bufpage.c is that they are marked as ERRORs, yet are executed within a critical section causing the system to PANIC. There are a number of sanity checks that are made prior to any changes taking place, so they need not generate PANICs. *topic is one such error message but there are others. There is no need to take down the whole server just because one block has a corruption. I'm not advocating that corruptions are acceptable, but the server provides no way to remove the corruption, which is a problem and server really should keep a better grip on its towel in any case. I would much prefer: * VACUUMs seeing these errors would perform as with zero_damaged_pages. * other backends seeing those errors should just ERROR out. We can do this by having a new function: boolean PageIsValid() which performs the sanity checks. This can then be executed by heap_page_prune() prior to entering the critical section. That will then be called correctly by both VACUUM and other code. VACUUM can then optionally zero out the block, as is done with PageHeaderIsValid(). Votes? -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > I would much prefer: > > * VACUUMs seeing these errors would perform as with zero_damaged_pages. > * other backends seeing those errors should just ERROR out. > > We can do this by having a new function: boolean PageIsValid() which > performs the sanity checks. This can then be executed by > heap_page_prune() prior to entering the critical section. That will then > be called correctly by both VACUUM and other code. VACUUM can then > optionally zero out the block, as is done with PageHeaderIsValid(). I tend to hate automatic zeroing of pages because there's no way to get the contents later for forensics. I would support your proposal if we had a way of saving the block elsewhere before zeroing it (say create a directory corrupted+zeroed similar to lost+found in the database dir and save it annotated with the OID of the table and the block number). The main problem I see with this is that if you don't immediately act to examine the data, some of the pg_clog files that you need to be able to read these files may be gone. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Thu, Jun 4, 2009 at 2:55 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > I tend to hate automatic zeroing of pages because there's no way to get > the contents later for forensics. That's why we default to zero_damaged_pages=false. Saving the pages somewhere seems like it would be a good feature to add for zero_damaged_pages in general but an orthogonal issue. Simon's suggestion is a few more cases we could catch send through the zero_damaged_pages code path rather than crashing. If his analysis of the existing code is correct -- I haven't looked but I assume so -- then that sounds like the right thing to do. I would be concerned about the overhead of checking this on all the line pointers if we're just trying to look up a single item. But from the sound of his description it would only fire in the hot pruning for which we already control the frequency. -- greg
Simon Riggs <simon@2ndQuadrant.com> writes: > What seems strange about the various errors generated in bufpage.c is > that they are marked as ERRORs, yet are executed within a critical > section causing the system to PANIC. The reason we PANIC there is to reduce the probability that bad data will be written back to disk. Of course, if the bad data was read off disk in the first place, there's no hope --- but we have checks on incoming pages for that. What seems significantly more likely if we detect a problem here is that we somehow corrupted the page while it sits in shared buffers. So, there's some hope that the corruption will not get back to disk, so long as we PANIC and thereby cause shared-memory contents to be flushed. > Votes? +1 for no change. We could make the page-read-time validation checks stricter, if there's some specific pattern you're seeing that gets past those checks now. regards, tom lane
On Thu, 2009-06-04 at 10:02 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > What seems strange about the various errors generated in bufpage.c is > > that they are marked as ERRORs, yet are executed within a critical > > section causing the system to PANIC. > > The reason we PANIC there is to reduce the probability that bad data > will be written back to disk. Of course, if the bad data was read off > disk in the first place, there's no hope --- but we have checks on > incoming pages for that. We don't have checks for this on incoming pages: We only ever check the header. I think there is hope if we do have corruption. We don't need to PANIC, we can do what I've suggested instead. > What seems significantly more likely if we > detect a problem here is that we somehow corrupted the page while it > sits in shared buffers. So, there's some hope that the corruption will > not get back to disk, so long as we PANIC and thereby cause > shared-memory contents to be flushed. If the block is marked as dirty, then yes, I can see your point. I would prefer to PANIC than to lose data. If the block is *not* dirty, i.e. it has been trashed in some other way, then it is not likely to go to disk. Anybody re-reading the block will see the same corruption and die long before they can make the page dirty. So a corrupt, yet clean block is no reason to PANIC. If the block *is* dirty it might only be because of a hint bit change and there are various other ways to dirty a block that don't trigger full page validity checks. > > Votes? > > +1 for no change. > > We could make the page-read-time validation checks stricter, if there's > some specific pattern you're seeing that gets past those checks now. Don't know what the pattern is because the bloody things keep PANICing. Seriously, it does look like some kind of memory corruption issue, but still not sure if hardware or software related. But this thread is not about that issue, its about how we respond in the face of such issues. Main problem is no easy way to get rid of the corrupt block. You have to select out good data somehow then truncate. I like Alvaro's suggestion for the future, but mostly we need to be able to surgically remove the data block. Yes, I can do this without backend changes via normal levels of non-user level skullduggery. It would be good to have a check_page_validity option so that before we do anything to a page we do full validation, especially for example on PageAddItem(). With that turned on, I wouldn't mind PANICing, because at least you'd have reasonable proof that the corruption has happened recently and that PANICing may actually prevent the horse from escaping. Thanks, -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi, Le 4 juin 09 à 15:55, Alvaro Herrera a écrit : > I tend to hate automatic zeroing of pages because there's no way to > get > the contents later for forensics. I would support your proposal if we > had a way of saving the block elsewhere before zeroing it (say > create a > directory corrupted+zeroed similar to lost+found in the database dir > and > save it annotated with the OID of the table and the block number). What about creating a special purpose fork for this? It could contain some metadata plus the original (maybe corrupted) block content. > The main problem I see with this is that if you don't immediately > act to > examine the data, some of the pg_clog files that you need to be able > to > read these files may be gone. The necessary clogs maybe could be part of the special fork metadata associated with each saved apart corrupted blocks? Regards, -- dim