RE: heap page corruption not easy - Mailing list pgsql-hackers

From Mikheev, Vadim
Subject RE: heap page corruption not easy
Date
Msg-id 8F4C99C66D04D4118F580090272A7A234D3201@sectorbase1.sectorbase.com
Whole thread Raw
In response to heap page corruption not easy  (Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at>)
List pgsql-hackers
> The point is, that the heap page is only modified in places that were
> previously empty (except header). All previous row data stays exactly 
> in the same place. Thus if a page is only partly written 
> (any order of page segments) only a new row is affected.

Exception: PageRepairFragmentation() and PageIndexTupleDelete() are
called during vacuum - they change layout of tuples.

> But those rows will be fixed during redo anyway.

We can't count on this for non-atomic 8K page writes: each page keeps
LSN (log sequence number - offset of end of log record for last page
modification) - if page LSN >= LSN of redo record then recoverer
assumes that changes already applied and doesn't try to redo op.

We could change this - ie force applying changes. This requires
new format of log records (we couldn't use PageAddItem in redo
anymore):

- for heap we would set pd_lower, pd_upper and line pointer (LP) and copy tuple data from record into page space;
- for indices: set pd_lower, pd_upper, copy LPs from newly inserted index tuple LP till last one and copy tuple data
fromrecord into page space (in split case it seems better to log contents of both  left and right siblings).
 

We would also have to log entire page for two ops above (which change
page layout) if op occures first time after checkpoint or
insert/update/delete ops (because of redo for insert/update/delete
may be forced for improper page layout).

Well, this probably will decrease required full page logging but
I would think more about this way. For example, I didn't consider
upcoming undo op...

> The only source of serious problems is thus a bogus write of a page
> segment (100 bytes ok 412 bytes chunk actually written to disk),
> but this case is imho sufficiently guarded or at least detected
> by disk hardware. 

With full page logging after checkpoint we would be safe from this
case...

> (I assume that the page header fits into one atomic block and 
> has no problem with beeing one step behind or ahead of redo).
> 
> I thus doubt that we really need "physical log" for heap 
> pages in PostgreSQL with the current non-overwrite smgr.

As you see we still need in full page backup when we want to reuse
space and change page layout for this, so it's mostly issue not of
smgr type. I don't know about Informix page design, but overwriting
smgr itself doesn't require physical removing tuple from page (ie
changing page layout) - something like turning LP_USED off would be
enough and page layout could be changed on first insertion of new
tuple. Nevertheless they do full page backup.

> If we could detect corruption in index pages we would not need
> physical log at all, since an index can always be recreated.

I don't like to follow this way on long term - reindex is not option
for 24x7x365 usage when index creation takes several minutes.

Comments?

- full page backup on first after checkpoint modification

or

- forcing redo and full page backup when changing page layout first time after checkpoint/insert/update/delete

Vadim


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Who is a maintainer of GiST code ?
Next
From: Hannu Krosing
Date:
Subject: Re: Who is a maintainer of GiST code ?