RE: heap page corruption not easy - Mailing list pgsql-hackers
From | Mikheev, Vadim |
---|---|
Subject | RE: heap page corruption not easy |
Date | |
Msg-id | 8F4C99C66D04D4118F580090272A7A234D3201@sectorbase1.sectorbase.com Whole thread Raw |
In response to | heap page corruption not easy (Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at>) |
List | pgsql-hackers |
> The point is, that the heap page is only modified in places that were > previously empty (except header). All previous row data stays exactly > in the same place. Thus if a page is only partly written > (any order of page segments) only a new row is affected. Exception: PageRepairFragmentation() and PageIndexTupleDelete() are called during vacuum - they change layout of tuples. > But those rows will be fixed during redo anyway. We can't count on this for non-atomic 8K page writes: each page keeps LSN (log sequence number - offset of end of log record for last page modification) - if page LSN >= LSN of redo record then recoverer assumes that changes already applied and doesn't try to redo op. We could change this - ie force applying changes. This requires new format of log records (we couldn't use PageAddItem in redo anymore): - for heap we would set pd_lower, pd_upper and line pointer (LP) and copy tuple data from record into page space; - for indices: set pd_lower, pd_upper, copy LPs from newly inserted index tuple LP till last one and copy tuple data fromrecord into page space (in split case it seems better to log contents of both left and right siblings). We would also have to log entire page for two ops above (which change page layout) if op occures first time after checkpoint or insert/update/delete ops (because of redo for insert/update/delete may be forced for improper page layout). Well, this probably will decrease required full page logging but I would think more about this way. For example, I didn't consider upcoming undo op... > The only source of serious problems is thus a bogus write of a page > segment (100 bytes ok 412 bytes chunk actually written to disk), > but this case is imho sufficiently guarded or at least detected > by disk hardware. With full page logging after checkpoint we would be safe from this case... > (I assume that the page header fits into one atomic block and > has no problem with beeing one step behind or ahead of redo). > > I thus doubt that we really need "physical log" for heap > pages in PostgreSQL with the current non-overwrite smgr. As you see we still need in full page backup when we want to reuse space and change page layout for this, so it's mostly issue not of smgr type. I don't know about Informix page design, but overwriting smgr itself doesn't require physical removing tuple from page (ie changing page layout) - something like turning LP_USED off would be enough and page layout could be changed on first insertion of new tuple. Nevertheless they do full page backup. > If we could detect corruption in index pages we would not need > physical log at all, since an index can always be recreated. I don't like to follow this way on long term - reindex is not option for 24x7x365 usage when index creation takes several minutes. Comments? - full page backup on first after checkpoint modification or - forcing redo and full page backup when changing page layout first time after checkpoint/insert/update/delete Vadim
pgsql-hackers by date: