Re: emergency outage requiring database restart - Mailing list pgsql-hackers

From Ants Aasma
Subject Re: emergency outage requiring database restart
Date
Msg-id CA+CSw_v2moKAxgfkVZOitXU3EJubmqRnAkbarzLue+n-h0Pj+Q@mail.gmail.com
Whole thread Raw
In response to Re: emergency outage requiring database restart  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: emergency outage requiring database restart  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-hackers
On Wed, Oct 26, 2016 at 8:43 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> /var/lib/pgsql/9.5/data/pg_log/postgresql-26.log | grep "page
> verification"
> 2016-10-26 11:26:42 CDT [postgres@castaging]: WARNING:  page
> verification failed, calculated checksum 37251 but expected 37244
> 2016-10-26 11:27:55 CDT [postgres@castaging]: WARNING:  page
> verification failed, calculated checksum 37249 but expected 37244
> 2016-10-26 12:16:44 CDT [postgres@castaging]: WARNING:  page
> verification failed, calculated checksum 44363 but expected 44364
> 2016-10-26 12:18:58 CDT [postgres@castaging]: WARNING:  page
> verification failed, calculated checksum 49525 but expected 49539
> 2016-10-26 12:19:12 CDT [postgres@castaging]: WARNING:  page
> verification failed, calculated checksum 37345 but expected 37340

The checksum values are improbably close. The checksum algorithm has
decently good mixing of all bits in the page. Having the first byte
match in 5 checksums makes this 1:2^40 improbable. What is not mixed
in properly is the block number, it only gets xor'ed before packing
the value into 16bits using modulo 0xFFFF. So I'm pretty sure
different block numbers were used for writing out and reading in the
page. Either the blocknum gets corrupted between calculating the
checksum and writing the page out (unlikely given the proximity), or
the pages are somehow getting transposed in the storage.

Regards,
Ants Aasma



pgsql-hackers by date:

Previous
From: Etsuro Fujita
Date:
Subject: Re: Push down more full joins in postgres_fdw
Next
From: Michael Paquier
Date:
Subject: Re: WAL consistency check facility