Home > mailing lists

Re: regression test failed when enabling checksum - Mailing list pgsql-hackers

From	Jeff Janes
Subject	Re: regression test failed when enabling checksum
Date	April 3, 2013 16:49:05
Msg-id	CAMkU=1x=261iP1rJz8Z1YJBqnnNUGtJ9yMUaLcQqxKkVKu8iDg@mail.gmail.com Whole thread Raw
In response to	Re: regression test failed when enabling checksum (Andres Freund <andres@2ndquadrant.com>)
Responses	Re: regression test failed when enabling checksum Re: regression test failed when enabling checksum
List	pgsql-hackers

Tree view

On Wed, Apr 3, 2013 at 2:31 AM, Andres Freund <andres@2ndquadrant.com> wrote:

I just checked and unfortunately your dump doesn't contain all that much
valid WAL:
...

So just two checkpoint records.

Unfortunately I fear that won't be enough to diagnose the problem,
could you reproduce it with a higher wal_keep_segments?

I've been trying, but see message "commit dfda6ebaec67 versus wal_keep_segments".

Looking at some of the log files more, I see that vacuum is involved, but in some way I don't understand. The crash always happens on a test cycle immediately after the sleep that allows the autovac to kick in and finish. So the events goes something like this:

...

run the frantic updating of "foo" until crash

recovery

query "foo" and verify the results are consistent with expectations

sleep to allow autovac to do its job.

truncate "foo" and repopulate it.

run the frantic updating of "foo" until crash

recovery

attempt to query "foo" but get the checksum failure.

What the vacuum is doing that corrupts the system in a way that survives the truncate is a mystery to me.

Also, at one point I had the harness itself exit as soon as it detected the problem, but I failed to have it shut down the server. So the server keep running idle and having autovac do its thing, which produced some interesting log output:

WARNING: relation "foo" page 45 is uninitialized --- fixing

WARNING: relation "foo" page 46 is uninitialized --- fixing

...

WARNING: relation "foo" page 72 is uninitialized --- fixing

WARNING: relation "foo" page 73 is uninitialized --- fixing

WARNING: page verification failed, calculated checksum 54570 but expected 34212

ERROR: invalid page in block 74 of relation base/16384/4931589

This happened 3 times. Every time, the warnings started on page 45, and they continued up until the invalid page was found (which varied, being 74, 86, and 74 again)

I wonder if the bug is in checksums, or if the checksums are doing their job by finding some other bug. And why did those uninitialized pages trigger warnings when they were autovacced, but not when they were seq scanned in a query?

Cheers,

Jeff

pgsql-hackers by date:

From: Andrew Dunstan
Date: 03 April 2013, 16:46:15
Subject: Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)

From: Robert Haas
Date: 03 April 2013, 16:51:43
Subject: Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)

Re: regression test failed when enabling checksum - Mailing list pgsql-hackers

Previous

Next