On 3/8/19 4:19 PM, Julien Rouhaud wrote:
> On Thu, Mar 7, 2019 at 7:00 PM Andres Freund <andres@anarazel.de> wrote:
>>
>> On 2019-03-07 12:53:30 +0100, Tomas Vondra wrote:
>>>
>>> But then again, we could just
>>> hack a special version of ReadBuffer_common() which would just
>>
>>> (a) check if a page is in shared buffers, and if it is then consider the
>>> checksum correct (because in memory it may be stale, and it was read
>>> successfully so it was OK at that moment)
>>>
>>> (b) if it's not in shared buffers already, try reading it and verify the
>>> checksum, and then just evict it right away (not to spoil sb)
>>
>> This'd also make sense and make the whole process more efficient. OTOH,
>> it might actually be worthwhile to check the on-disk page even if
>> there's in-memory state. Unless IO is in progress the on-disk page
>> always should be valid.
>
> Definitely. I already saw servers with all-frozen-read-only blocks
> popular enough to never get evicted in months, and then a minor
> upgrade / restart having catastrophic consequences.
>
Do I understand correctly the "catastrophic consequences" here are due
to data corruption / broken checksums on those on-disk pages?
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services