Thread: heap page corruption not easy

heap page corruption not easy

From

Zeugswetter Andreas SB

Date:

18 December 2000, 08:00:23

A heap page corruption is not very likely in PostgreSQL because of the
underlying page design. Not even on flakey hardware/ossoftware.
(I once read a page design note from pg 4 but don't exactly remember 
were or when)

The point is, that the heap page is only modified in places that were
previously empty (except header). All previous row data stays exactly 
in the same place. Thus if a page is only partly written 
(any order of page segments) only a new row is affected. But those rows
will be fixed during redo anyway. The only source of serious problems is 
thus a bogus write of a page segment (100 bytes ok 412 bytes chunk 
actually written to disk), but this case is imho sufficiently guarded or at least 
detected by disk hardware. 
(I assume that the page header fits into one atomic block and has no problem 
with beeing one step behind or ahead of redo).

I thus doubt that we really need "physical log" for heap pages in PostgreSQL
with the current non-overwrite smgr. If we could detect corruption in index pages
we would not need physical log at all, since an index can always be recreated.

What do you think ? I ask because "physical log" is a substantial amount of 
additional IO that we imho only want if it is absolutely necessary.

Andreas

PS: reposted, did this not make it to the list ?

RE: heap page corruption not easy

From

"Mikheev, Vadim"

Date:

18 December 2000, 20:26:05

> The point is, that the heap page is only modified in places that were
> previously empty (except header). All previous row data stays exactly 
> in the same place. Thus if a page is only partly written 
> (any order of page segments) only a new row is affected.

Exception: PageRepairFragmentation() and PageIndexTupleDelete() are
called during vacuum - they change layout of tuples.

> But those rows will be fixed during redo anyway.

We can't count on this for non-atomic 8K page writes: each page keeps
LSN (log sequence number - offset of end of log record for last page
modification) - if page LSN >= LSN of redo record then recoverer
assumes that changes already applied and doesn't try to redo op.

We could change this - ie force applying changes. This requires
new format of log records (we couldn't use PageAddItem in redo
anymore):

- for heap we would set pd_lower, pd_upper and line pointer (LP) and copy tuple data from record into page space;
- for indices: set pd_lower, pd_upper, copy LPs from newly inserted index tuple LP till last one and copy tuple data
fromrecord into page space (in split case it seems better to log contents of both  left and right siblings).

We would also have to log entire page for two ops above (which change
page layout) if op occures first time after checkpoint or
insert/update/delete ops (because of redo for insert/update/delete
may be forced for improper page layout).

Well, this probably will decrease required full page logging but
I would think more about this way. For example, I didn't consider
upcoming undo op...

> The only source of serious problems is thus a bogus write of a page
> segment (100 bytes ok 412 bytes chunk actually written to disk),
> but this case is imho sufficiently guarded or at least detected
> by disk hardware. 

With full page logging after checkpoint we would be safe from this
case...

> (I assume that the page header fits into one atomic block and 
> has no problem with beeing one step behind or ahead of redo).
> 
> I thus doubt that we really need "physical log" for heap 
> pages in PostgreSQL with the current non-overwrite smgr.

As you see we still need in full page backup when we want to reuse
space and change page layout for this, so it's mostly issue not of
smgr type. I don't know about Informix page design, but overwriting
smgr itself doesn't require physical removing tuple from page (ie
changing page layout) - something like turning LP_USED off would be
enough and page layout could be changed on first insertion of new
tuple. Nevertheless they do full page backup.

> If we could detect corruption in index pages we would not need
> physical log at all, since an index can always be recreated.

I don't like to follow this way on long term - reindex is not option
for 24x7x365 usage when index creation takes several minutes.

Comments?

- full page backup on first after checkpoint modification

or

- forcing redo and full page backup when changing page layout first time after checkpoint/insert/update/delete

Vadim

Re: heap page corruption not easy

From

Hiroshi Inoue

Date:

18 December 2000, 22:57:51

"Mikheev, Vadim" wrote:
> 
> > The point is, that the heap page is only modified in places that were
> > previously empty (except header). All previous row data stays exactly
> > in the same place. Thus if a page is only partly written
> > (any order of page segments) only a new row is affected.
> 
> Exception: PageRepairFragmentation() and PageIndexTupleDelete() are
> called during vacuum - they change layout of tuples.
>

Is it guaranteed that the result of PageRepairFragmentation()
has already been written to disk when tuple movement is logged ?

Regards.
Hiroshi Inoue

RE: heap page corruption not easy

From

"Mikheev, Vadim"

Date:

19 December 2000, 22:03:17

> > > The point is, that the heap page is only modified in 
> > > places that were previously empty (except header).
> > > All previous row data stays exactly in the same place.
> > > Thus if a page is only partly written
> > > (any order of page segments) only a new row is affected.
> > 
> > Exception: PageRepairFragmentation() and PageIndexTupleDelete() are
> > called during vacuum - they change layout of tuples.
> >
> 
> Is it guaranteed that the result of PageRepairFragmentation()
> has already been written to disk when tuple movement is logged ?

No.

Vadim