Thread: large duplicated files

large duplicated files

From
"Ryan D. Enos"
Date:
Hi,
I am very new to postgresql and am not really a programmer of any type.
I use pgsql to manage very large voter databases for political science
research.  My problem is that my database is creating large duplicate
files, i.e.: 17398.1, 17398.2, 17398.3, etc.  Each is about 1g in size.
I understand that each of these is probably a part of a file that pgsql
created because of a limit on file size and that they may be large
indexes.  However, I don't know where these files came from or how to
reclaim the disk space.
I have extensively searched the archives and found that I am not the
first to have this problem.  I have followed the suggestions to previous
posters, using a VACUUM FULL command and REINDEX.  But nothing reclaims
the disk space.  I have tried to see the type of file by using:
select * from pg_class where relfilenode =""
but this returns 0 rows.
How can I reclaim this space and prevent these files from being created
in the future?
Any help would be greatly appreciated.
Thanks.
Ryan

Re: large duplicated files

From
"Ryan D. Enos"
Date:
Well, I feel like the guy who goes to the doctor and then finds the pain
is suddenly gone when he gets there.  I have discovered that my
previously described problem was almost certainly the result of
temporary tables that were not being dropped after a crash through an
OBDC connection (at least I hope that's where those files were coming from).
However, I am still curious if anybody knows how I can find and destroy
those tables in the even of a crash?
thanks.
Ryan

Re: large duplicated files

From
Tom Lane
Date:
"Ryan D. Enos" <renos@ucla.edu> writes:
> My problem is that my database is creating large duplicate
> files, i.e.: 17398.1, 17398.2, 17398.3, etc.  Each is about 1g in size.

These are not "duplicates", they are sections of a very large table
(or possibly a very large index).

I gather from your followup that the files are now gone, meaning that
someone dropped the table.  If they were still there you could determine
which table they belonged to by looking at pg_class.relfilenode --- see
http://www.postgresql.org/docs/8.2/static/storage.html
for information about the physical layout of a PG database.

My best guess about what happened is that a client did something like
CREATE TEMP TABLE foo AS SELECT ...
with an erroneous query that generated a huge table --- maybe it had
a missing join condition or something like that.

            regards, tom lane

Re: large duplicated files

From
Christoph Frick
Date:
On Fri, Aug 17, 2007 at 12:15:13AM -0700, Ryan D. Enos wrote:

> Well, I feel like the guy who goes to the doctor and then finds the
> pain is suddenly gone when he gets there.  I have discovered that my
> previously described problem was almost certainly the result of
> temporary tables that were not being dropped after a crash through an
> OBDC connection (at least I hope that's where those files were coming
> from).  However, I am still curious if anybody knows how I can find
> and destroy those tables in the even of a crash?

there are lots of scripts out there (google them) to find out which
table/index actually uses up your harddisk space. in an older
postgreslql version e.g. an index went nuts and kept growing without
reason - reindexing helped here.

if you delete lots of data also be sure to vacuum the db (depending on
your version of the db) and have enough fsm configured. do a verbose
vacuum to find out if there there are enough fsm (shows up at the end of
the report).

--
cu

Attachment