Re: Creation of 10000's of empty table segments and more... - Mailing list pgsql-bugs

From Philip Poles
Subject Re: Creation of 10000's of empty table segments and more...
Date
Msg-id 002901c00c4b$66a8e880$26ab6bcf@surfen.com
Whole thread Raw
In response to Creation of 10000's of empty table segments and more...  ("Philip Poles" <philip@surfen.com>)
Responses Re: Creation of 10000's of empty table segments and more...
List pgsql-bugs
Greetings...

Tom, I don't have a way to reproduce this from scratch, unfortunately.
The problem occurred on a production server which had been running trouble-free
for the past two months.
Also, there are no core files lying around anywhere, which is somewhat
surprising.
Does it make any difference that I compiled with a BLCKSZ of 32768 and
NAMEDATALEN of 64?
All I do have is the intact contents of the database directory of the problem
database.
Is there any way to move this to another installation so that I can have a look
at it, and maybe get you a core dump or at least a detailed log on another
machine? pg_dump obviously won't do the trick here.
The tarball of the entire directory (including the thousands of empty files) is
38.5M, so I don't know if you'd be interested in looking at that.
Also, would upgrading to 7.0.2 help, or are the lastest fixes to the btree code
and md.c in current sources only?

Is there anything else I can do to help (other than actually digging into the
code, of course) ?

Thank-you,

    -Philip

----- Original Message -----
From: Tom Lane <tgl@sss.pgh.pa.us>
To: Philip Poles <philip@surfen.net>
Cc: <pgsql-bugs@postgresql.org>
Sent: Friday, August 18, 2000 11:57 PM
Subject: Re: [BUGS] Creation of 10000's of empty table segments and more...


"Philip Poles" <philip@surfen.com> writes:
> Basically, the backend has created a bunch of empty files of the name
> <table_name>.<n>, ~32500 for one table, ~50000 for another, ~44000 for
> a third, and ~250 for a fourth.  From reading the old thread on this,
> I suspect it's being caused by the nightly vacuum we run, and is due
> to a corrupted index.

Probably so.  The proximate cause of that behavior was that if the
low-level file access code (md.c) was handed a ridiculously large
intra-file block number, it would try to access the file segment
containing that block number --- and merrily create all the intervening
segments, though not populate them with any data.  So a bad block number
is being injected from somewhere, and a corrupted index is the most
likely source.

In current sources, md.c will barf promptly if handed a block number
more than one segment beyond the current EOF, so that
sorcerer's-apprentice behavior should be fixed.  The more interesting
question is whether the original cause of the index corruption has been
fixed.  (I cleaned up some problems in the btree index code not long
ago, but have no way to tell if this is related.)  I don't suppose you
have a way of reproducing the problem from a cold start?

> Also, during the day before the dump/vacuum began to fail, the backend was
> resetting itself every few minutes with the message:
> Server process (pid 25155) exited with status 11 at Fri Aug 11 11:47:47 2000
> Terminating any active server processes...

> I'm not sure what status 11 means.

SEGV crash.  There should have been a core dump from that --- is there
a core file in the old database directory, and if so can you get a
backtrace from it?

regards, tom lane

pgsql-bugs by date:

Previous
From: Karel Zak
Date:
Subject: Re: to_char Bug...
Next
From: Tom Lane
Date:
Subject: Re: Creation of 10000's of empty table segments and more...