Re: 16-bit page checksums for 9.2 - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: 16-bit page checksums for 9.2
Date
Msg-id CA+U5nMJRGj9+P8ENgtWeo-3ZSreJWdB4_VUzteKmKXe4VVjpQA@mail.gmail.com
Whole thread Raw
In response to Re: 16-bit page checksums for 9.2  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, Mar 2, 2012 at 2:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> One thing I'm not too sure about is how to extend the page format to
>> handle optional features.  For example, suppose we want to add 2 bytes
>> to the page header for a checksum (or 4 bytes, or any other number).
>> Ideally, I'd like to not use up those 2 bytes on pages that aren't
>> checksummed.  What do we do about that?
>
> I'm having a hard time getting excited about adding that requirement on
> top of all the others that we have for this feature.  In particular, if
> we insist on that, there is exactly zero chance of turning checksums on
> on-the-fly, because you won't be able to do it if the page is full.
>
> The scheme that seems to me to make the most sense for checksums,
> if we grant Simon's postulate that a 2-byte checksum is enough, is
> (1) repurpose the top N bits of pd_flags as a page version number,
>    for some N;
> (2) repurpose pd_pagesize_version as the checksum, with the
>    understanding that you can't have a checksum in version-0 pages.

I'd say N = 1, for now. N > 1 in future, as needed. We can document the
intention to reserve high end bits for that purpose. We can defer the
decision about what bits are used for until they are needed.


> (Simon's current patch seems to be an overengineered version of that.

Happy to make changes. Having 3 bits instead of 1 is just for robustness,
but would accept the opinion that this is not worth having.

One suggestion from Heikki is that we don't change the bits at all;
we just have a database level setting that says whether checksums are
enabled. We use VACUUM FREEZE to enable checksumming, rewrite the blocks
and then enable at db level. Once fully enabled we check the checksums
on read. If we do that, we don't need to record that the page format
has changed and we could dispense with page-level flags entirely, but
of course that then means you can't inspect a block and know whether its
actually got a checksum on it or maybe its just a version field.

If we want to know the difference between page formats we need to use
flag bits to indicate that. Which in some ways could be called fragile.

Personally, don't mind what we do there.


> Possibly we should also ask ourselves how much we really need pd_tli
> and whether that couldn't be commandeered as the checksum field.)

pd_tli has slightly more utility than pd_pagesize_version, but we could
easily live without either.

The question is whether a 32-bit value, split into 2 pieces, is a
faster and better way than the current proposal. Splitting the value in two
doesn't sound like it would lead to good performance or sensible code,
but someone may have a reason why 32-bit checksums are important.

IMHO we have a viable mechanism for checksums, its all down to what user
interface we would prefer and related details.

I'm happy to implement whatever we decide.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
Next
From: Jeff Janes
Date:
Subject: Re: 16-bit page checksums for 9.2