Re: 16-bit page checksums for 9.2 - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: 16-bit page checksums for 9.2
Date
Msg-id 20120206175959.GD19450@momjian.us
Whole thread Raw
In response to Re: 16-bit page checksums for 9.2  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: 16-bit page checksums for 9.2
List pgsql-hackers
On Mon, Feb 06, 2012 at 09:05:19AM +0000, Simon Riggs wrote:
> > In any case, I think it's a very bad idea to remove the page version field.
> > How are we supposed to ever be able to change the page format again if we
> > don't even have a version number on the page? I strongly oppose removing it.
> 
> Nobody is removing the version field, nor is anybody suggesting not
> being able to tell which page version we are looking at.

Agreed.  I thought the idea was that we have a 16-bit page _version_
number and 8+ free page _flag_ bits, which are all currently zero.  The
idea was to move the version number from 16-bit field into the unused
flag bits, and use the 16-bit field for the checksum.  I would like to
have some logic that would allow tools inspecting the page to tell if
they should look for the version number in the bits at the beginning of
the page or at the end.

Specifically, this becomes the checksum:
uint16          pd_pagesize_version;

and this holds the page version, if we have updated the page to the new
format:
       uint16          pd_flags;               /* flag bits, see below */

Of the 16 bits of pd_flags, these are the only ones used:

#define PD_HAS_FREE_LINES       0x0001          /* are there any unused line pointers? */
#define PD_PAGE_FULL            0x0002          /* not enough free space for new
                                       * tuple? */
 
#define PD_ALL_VISIBLE          0x0004          /* all tuples on page are visible to
                                           * everyone */
 

#define PD_VALID_FLAG_BITS      0x0007          /* OR of all valid pd_flags bits */


> > I'm also not very comfortable with the idea of having flags on the page
> > indicating whether it has a checksum or not. It's not hard to imagine a
> > software of firmware bug or hardware failure that would cause pd_flags field
> > to be zeroed out altogether. It would be more robust if the expected
> > bit-pattern was not 0-0, but 1-0 or 0-1, but then you need to deal with that
> > at upgrade somehow. And it still feels a bit whacky anyway.

> Good idea. Lets use
> 
> 0-0-0 to represent upgraded from previous version, needs a bit set
> 0-0-1 to represent new version number of page, no checksum
> 1-1-1 to represent new version number of page, with checksum
> 
> So we have 1 bit dedicated to the page version, 2 bits to the checksum indicator

Interesting point that we would not be guarding against a bit flip from
1 to 0 for the checksum bit; I agree using two bits is the way to go.  I
don't see how upgrade figures into this.

However, I am unclear how Simon's idea above actually works.  We need
two bits for redundancy, both 1, to mark a page as having a checksum.  I
don't think mixing the idea of a new page version and checksum enabled
really makes sense, especially since we have to plan for future page
version changes.

I think we dedicate 2 bits to say we have computed a checksum, and 3
bits to mark up to 8 possible page versions, so the logic is, in
pd_flags, we use bits 0x8 and 0x16 to indicate that a checksum is stored
on the page, and we use 0x32 and later for the page version number.  We
can assume all the remaining bits are for the page version number until
we need to define new bits, and we can start storing them at the end
first, and work forward.  If all the version bits are zero, it means the
page version number is still stored in pd_pagesize_version.

> >> I wonder if we should just dedicate 3 page header bits, call that the
> >> page version number, and set this new version number to 1, and assume
> >> all previous versions were zero, and have them look in the old page
> >> version location if the new version number is zero.  I am basically
> >> thinking of how we can plan ahead to move the version number to a new
> >> location and have a defined way of finding the page version number using
> >> old and new schemes.
> >
> >
> > Three bits seems short-sighted, but yeah, something like 6-8 bits should be
> > enough. On the whole, though. I think we should bite the bullet and invent a
> > way to extend the page header at upgrade.
> 
> There are currently many spare bits. I don't see any need to allocate
> them to this specific use ahead of time - especially since that is the
> exact decision we took last time when we reserved 16 bits for the
> version.

Right, but I am thinking we should set things up so we can grow the page
version number into the unused bit, rather than box it between bits we
are already using.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: [GENERAL] pg_dump -s dumps data?!
Next
From: Merlin Moncure
Date:
Subject: Re: SKIP LOCKED DATA