pgstattuple triggered checkpoint failure and database outage? - Mailing list pgsql-general

From Stuart Bishop
Subject pgstattuple triggered checkpoint failure and database outage?
Date
Msg-id 6bc73d4c0903292342u3c18acfu25c21baeafe140be@mail.gmail.com
Whole thread Raw
Responses Re: pgstattuple triggered checkpoint failure and database outage?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
I just had a brief outage on a production server running 8.3.6, which
I suspect was triggered by me running a table bloat report making lots
of pgstattuple calls.

The first I got of it was the script I'd just kicked off died:

could not open segment 1 of relation 1663/16409/11088101 (target block
131292): No such file or directory
CONTEXT:  writing block 131292 of relation 1663/16409/11088101

More alerts came in - looks like everything was failing with similar errors.

Checking the logs the first indication of the problem is:

<@:6160> 2009-03-30 06:49:27 BST LOG:  checkpoint starting: time
[...]
<@:6160> 2009-03-30 06:49:58 BST ERROR:  could not open segment 1 of
relation 1663/16409/11088101 (target block 131072): No such file or
directory
<@:6160> 2009-03-30 06:49:58 BST CONTEXT:  writing block 131072 of
relation 1663/16409/11088101
<@:6160> 2009-03-30 06:49:59 BST LOG:  checkpoint starting: time
<@:6160> 2009-03-30 06:49:59 BST ERROR:  could not open segment 1 of
relation 1663/16409/11088101 (target block 134984): No such file or
directory
<@:6160> 2009-03-30 06:49:59 BST CONTEXT:  writing block 134984 of
relation 1663/16409/11088101
<@:6160> 2009-03-30 06:50:00 BST LOG:  checkpoint starting: time
<@:6160> 2009-03-30 06:50:01 BST ERROR:  could not open segment 1 of
relation 1663/16409/11088101 (target block 135061): No such file or
directory
<@:6160> 2009-03-30 06:50:01 BST CONTEXT:  writing block 135061 of
relation 1663/16409/11088101


Doing an immediate shutdown and restart seems to have brought
everything back online. I don't think there is any corruption (not
that I can tell easily...), and I'm not worried if I lost a
transaction or three.

Can anyone think what happened here? I suspect pgstattuple as it was
the only unusual activity happening at that time and as far as I'm
aware we have no hardware alerts and the box has been running smoothly
for quite some time.

--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/

pgsql-general by date:

Previous
From: aravind chandu
Date:
Subject: Parallel DB architechture
Next
From: Asko Oja
Date:
Subject: Re: Parallel DB architechture