Re: [GENERAL] PANIC: heap_update_redo: no block - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: [GENERAL] PANIC: heap_update_redo: no block |
Date | |
Msg-id | 1143540087.3839.304.camel@localhost.localdomain Whole thread Raw |
In response to | Re: [GENERAL] PANIC: heap_update_redo: no block (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [GENERAL] PANIC: heap_update_redo: no block
|
List | pgsql-hackers |
On Mon, 2006-03-27 at 22:03 -0500, Tom Lane wrote: > Greg Stark <gsstark@mit.edu> writes: > > Tom Lane <tgl@sss.pgh.pa.us> writes: > >> I think what's happened here is that VACUUM FULL moved the only tuple > >> off page 1 of the relation, then truncated off page 1, and now > >> heap_update_redo is panicking because it can't find page 1 to replay the > >> move. Curious that we've not seen a case like this before, because it > >> seems like a generic hazard for WAL replay. > > > This sounds familiar > > http://archives.postgresql.org/pgsql-hackers/2005-05/msg01369.php Yes, I remember that also. > After further review I've concluded that there is not a systemic bug > here, but there are several nearby local bugs. IMHO that's amazing to find so many bugs in a code review of existing production code. Cool. > The reason it's not > a systemic bug is that this scenario is supposed to be handled by the > same mechanism that prevents torn-page writes: the first XLOG record > that touches a given page after a checkpoint is supposed to rewrite > the entire page, rather than update it incrementally. Since XLOG replay > always begins at a checkpoint, this means we should always be able to > write a fresh copy of the page, even after relation deletion or > truncation. Furthermore, during XLOG replay we are willing to create > a table (or even a whole tablespace or database directory) if it's not > there when touched. The subsequent replay of the deletion or truncation > will get rid of any unwanted data again. That will all work, agreed. > The subsequent replay of the deletion or truncation > will get rid of any unwanted data again. Trouble is, it is not a watertight assumption that there *will be* a subsequent truncation, even if it is a strong one. If there is not a later truncation, we will just ignore what we ought to now know is an error and then try to continue as if the database was fine, which it would not be. The overall problem is that auto extension fails to take action or provide notification with regard to file system corruptions. Clearly we would like xlog replay to work even in the face of strong file corruptions, but we should make attempts to identify this situation and notify people that this has occurred. I'd suggest both WARNING messages in the log and something more extreme still: anyone touching a corrupt table should receive a NOTICE saying "database recovery displayed errors for this table" "HINT: check the database logfiles for specific messages". Indexes should have a log WARNING saying "database recovery displayed errors for this index" "HINT: use REINDEX to rebuild this index". So I guess I had better help if we agree this is beneficial. > Therefore, there is no systemic bug --- unless you are running with > full_page_writes=off. I assert that that GUC variable is broken and > must be removed. On this analysis, I would agree for current production systems. But what this says is something deeper: we must log full pages, not because we fear a partial page write has occurred, but because the xlog mechanism intrinsically depends upon the existence of those full pages after each checkpoint. The writing of full pages in this way is a serious performance issue that it would be good to improve upon. Perhaps this is the spur to discuss a new xlog format that would support higher performance logging as well as log-mining for replication? > There are, however, a bunch of local bugs, including these: ... > Notice that these are each, individually, pretty low-probability > scenarios, which is why we've not seen many bug reports. Most people don't file bug reports. If we have a recovery mode that ignores file system corruptions we'll get even less because any errors that occur will be deemed as gamma rays or some other excuse. > a systemic bug Perhaps we do have one systemic problem: systems documentation. The xlog code is distinct from other parts of the codebase in that it has almost zero comments with it and the overall mechanisms are relatively poorly documented in README form. Methinks there are very few people who could attempt such a code review and even fewer who would find any bugs by inspection. I'll think some more on that... Best Regards, Simon Riggs
pgsql-hackers by date: