Thread: Unlinked files in PGDATA/base following unclean shutdown

Unlinked files in PGDATA/base following unclean shutdown

From
Jack Orenstein
Date:
Our application is running Postgres 7.4, (working on conversion to 8.3
right now). Our testing involves various forms of violence, including
shutting off power and kill -9 postmaster.

Occasionally we observe a form of database corruption in which one of
the files storing a table or index disappears. The logs will contain
ERRORs that look like this:

     could not open relation "some_table_name": No such file or directory

When this happens, and I cross-reference the pg_class.oid with the
expected file under PGDATA, the file is missing (and does not appear
to be in lost+found).

I have fsync set to true, and wal_sync_method set to fsync.

A few questions about this:

1) Why is this happening?

2) To help investigate this problem, I've written a script to
cross-reference pg_class and the files in PGDATA/base. (I know that I
should use pg_class.relfilenode instead of pg_class.oid -- I'll fix
that.) The question is how to check for consistency in the case of
large tables, which are split into multiple segments, (e.g. 123456.1,
123456.2). I.e., how can I find out how many segments there should be?
Any chance it's as simple as (pg_class.relpages + SUITABLE_CONSTANT -
1) / SUITABLE_CONSTANT?

Jack



Re: Unlinked files in PGDATA/base following unclean shutdown

From
Tom Lane
Date:
Jack Orenstein <jack.orenstein@hds.com> writes:
> Our application is running Postgres 7.4, (working on conversion to 8.3
> right now). Our testing involves various forms of violence, including
> shutting off power and kill -9 postmaster.

Do you have reason to trust either your kernel or your disk drives under
such abuse?  In general I'd bet on the drives being at fault a lot
sooner than any other part of the food chain.  Complete disappearance of
files that should be there is a fault at the filesystem level or lower,
anyway.

> The question is how to check for consistency in the case of
> large tables, which are split into multiple segments, (e.g. 123456.1,
> 123456.2). I.e., how can I find out how many segments there should be?

The kernel-defined EOF is the truth, the whole truth, and nothing but
the truth.  There is no other authority.

All segments before the last one should be exactly 1GB, but the last
one can be anything up to that.  Consult the comments in md.c for
more details.  (I think 7.4 may treat some corner cases differently
from 8.3 anyway.)

            regards, tom lane

Re: Unlinked files in PGDATA/base following unclean shutdown

From
Jack Orenstein
Date:
Tom Lane wrote:
> Jack Orenstein <jack.orenstein@hds.com> writes:

>> The question is how to check for consistency in the case of
>> large tables, which are split into multiple segments, (e.g. 123456.1,
>> 123456.2). I.e., how can I find out how many segments there should be?
>
> The kernel-defined EOF is the truth, the whole truth, and nothing but
> the truth.  There is no other authority.
>
> All segments before the last one should be exactly 1GB, but the last
> one can be anything up to that.  Consult the comments in md.c for
> more details.  (I think 7.4 may treat some corner cases differently
> from 8.3 anyway.)
>
>             regards, tom lane
>

Just to be clear, you mean 2**30 (1,073,741,824) bytes, not 10**9?

Jack

Re: Unlinked files in PGDATA/base following unclean shutdown

From
Alvaro Herrera
Date:
Jack Orenstein wrote:
> Tom Lane wrote:

>> All segments before the last one should be exactly 1GB, but the last
>> one can be anything up to that.  Consult the comments in md.c for
>> more details.  (I think 7.4 may treat some corner cases differently
>> from 8.3 anyway.)
>
> Just to be clear, you mean 2**30 (1,073,741,824) bytes, not 10**9?

It's RELSEG_SIZE * BLCKSZ, where

#define RELSEG_SIZE 131072
#define BLCKSZ 8192

(both from pg_config.h)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support