Re: Enabling Checksums - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Enabling Checksums
Date
Msg-id 5135BC52.3020609@vmware.com
Whole thread Raw
In response to Re: Enabling Checksums  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Enabling Checksums  (Jeff Davis <pgsql@j-davis.com>)
Re: Enabling Checksums  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Enabling Checksums  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On 04.03.2013 09:11, Simon Riggs wrote:
> On 3 March 2013 18:24, Greg Smith<greg@2ndquadrant.com>  wrote:
>
>> The 16-bit checksum feature seems functional, with two sources of overhead.
>> There's some CPU time burned to compute checksums when pages enter the
>> system.  And there's extra overhead for WAL logging hint bits.  I'll
>> quantify both of those better in another message.
>
> It's crunch time. Do you and Jeff believe this patch should be
> committed to Postgres core?
>
> Are there objectors?

In addition to my hostility towards this patch in general, there are 
some specifics in the patch I'd like to raise (read out in a grumpy voice):

If you enable checksums, the free space map never gets updated in a 
standby. It will slowly drift to be completely out of sync with reality, 
which could lead to significant slowdown and bloat after failover.

Since the checksums are an all-or-nothing cluster-wide setting, the 
three extra flags in the page header, PD_CHECKSUMS1, PD_CHECKSUM2 and 
PD_HEADERCHECK, are not needed. Let's leave them out. That keeps the 
code simpler, and leaves the bits free for future use. If we want to 
enable such per-page setting in the future, we can add it later. For a 
per-relation scheme, they're not needed.

> + * The checksum algorithm is a modified Fletcher 64-bit (which is
> + * order-sensitive). The modification is because, at the end, we have two
> + * 64-bit sums, but we only have room for a 16-bit checksum. So, instead of
> + * using a modulus of 2^32 - 1, we use 2^8 - 1; making it also resemble a
> + * Fletcher 16-bit. We don't use Fletcher 16-bit directly, because processing
> + * single bytes at a time is slower.

How does the error detection rate of this compare with e.g CRC-16? Is 
there any ill effect from truncating the Fletcher sums like this?

> +    /*
> +     * Store the sums as bytes in the checksum. We add one to shift the range
> +     * from 0..255 to 1..256, to make zero invalid for checksum bytes (which
> +     * seems wise).
> +     */
> +    p8Checksum[0] = (sum1 % 255) + 1;
> +    p8Checksum[1] = (sum2 % 255) + 1;

That's a bit odd. We don't avoid zero in the WAL crc, and I don't recall 
seeing that in other checksum implementations either. 16-bits is not 
very wide for a checksum, and this eats about 1% of the space of valid 
values.

I can see that it might be a handy debugging aid to avoid 0. But there's 
probably no need to avoid 0 in both bytes, it seems enough to avoid a 
completely zero return value.

XLogCheckBuffer() and XLogCheckBufferNeedsBackup() read the page LSN 
without a lock. That's not atomic, so it could incorrectly determine 
that a page doesn't need to be backed up. We used to always hold an 
exclusive lock on the buffer when it's called, which prevents 
modifications to the LSN, but that's no longer the case.

Shouldn't SetBufferCommitInfoNeedsSave() check the BM_PERMANENT flag? I 
think it will generate WAL records for unlogged tables as it is.

- Heikki



pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: 9.2.3 crashes during archive recovery
Next
From: Andres Freund
Date:
Subject: Re: Support for REINDEX CONCURRENTLY