Re: the un-vacuumable table - Mailing list pgsql-hackers

From Andrew Hammond
Subject Re: the un-vacuumable table
Date
Msg-id 5a0a9d6f0807032257l7217d1efx79453e06407774f3@mail.gmail.com
Whole thread Raw
In response to Re: the un-vacuumable table  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: the un-vacuumable table  ("Andrew Hammond" <andrew.george.hammond@gmail.com>)
List pgsql-hackers
On Thu, Jul 3, 2008 at 3:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Andrew Hammond" <andrew.george.hammond@gmail.com> writes:
>> On Thu, Jul 3, 2008 at 2:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> The whole thing is pretty mystifying, especially the ENOSPC write
>>> failure on what seems like it couldn't have been a full disk.
>
>> Yes, I've passed along the task of explaining why PG thought the disk
>> was full to the sysadmin responsible for the box. I'll post the answer
>> here, when and if we have one.
>
> I just noticed something even more mystifying: you said that the ENOSPC
> error occurred once a day during vacuuming.

Actually, the ENOSPC happened once. After that first error, we got

vacuumdb: vacuuming of database "adecndb" failed: ERROR:  failed to
re-find parent key in "ledgerdetail_2008_03_idx2" for deletion target
page 64767

repeatedly.

> That doesn't make any
> sense, because a write error would leave the shared buffer still marked
> dirty, and so the next checkpoint would try to write it again.  If
> there's a persistent write error on a particular block, you should see
> it being complained of at least once per checkpoint interval.
>
> If you didn't see that, it suggests that the ENOSPC was transient,
> which isn't unreasonable --- but why would it recur for the exact
> same block each night?
>
> Have you looked into the machine's kernel log to see if there is any
> evidence of low-level distress (hardware or filesystem level)?  I'm
> wondering if ENOSPC is being reported because it is the closest
> available errno code, but the real problem is something different than
> the error message text suggests.  Other than the errno the symptoms
> all look quite a bit like a bad-sector problem ...

I will pass this along to the sysadmin in charge of this box.


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Truncated queries when select * from pg_stat_activity - wishlist / feature request
Next
From: Tom Raney
Date:
Subject: Re: [PATCHES] Explain XML patch v2