WAL format - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject WAL format
Date
Msg-id 4B1D5766.6070905@enterprisedb.com
Whole thread Raw
Responses Re: WAL format  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Re: WAL format  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: WAL format  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: WAL format  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
While looking at the streaming replication patch, I can't help but
wonder why our WAL format is so complicated.

WAL is divided into WAL segments, each 16 MB by default. Each WAL
segment is divided into pages, 8k by default. At the beginning of each
WAL page, there's a page header, but the header at the first page of
each WAL segment contains a few extra fields.

If a WAL record crosses a page boundary, we write as much of it as fits
onto the first page, and so-called continuation records with the rest of
the data on subsequent pages.

In particular I wonder why we bother with the page headers. A much
simpler format would be:

- get rid of page headers, except for the header at the beginning of
each WAL segment
- get rid of continuation records
- at the end of WAL segment, when there's not enough space to write the
next WAL record, always write an XLOG SWITCH record to fill the rest of
the segment.

The page addr stored in the WAL page header gives some extra protection
for detecting end of valid WAL correctly, but we rely on the prev-links
and CRC within page for that anyway, so I wouldn't mind losing that.

The changes to ReadRecord in the streaming replication patch feel a bit
awkward, because it has to work around the fact that WAL is streamed as
a stream of bytes, but ReadRecord works one page at a time. I'd like to
replace ReadRecord with a simpler ring buffer approach, but handling the
continuation records makes it a bit hard.

--  Heikki Linnakangas EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Greg Smith
Date:
Subject: Re: YAML
Next
From: Magnus Hagander
Date:
Subject: Build sizes vs docs