Re: [HACKERS] Orphaned files in base/[oid] - Mailing list pgsql-hackers

From Chris Travers
Subject Re: [HACKERS] Orphaned files in base/[oid]
Date
Msg-id CAN-RpxAshr3Nmsht7Bva6cgBXJdFg6amXeBetZ9L4hVq=-y+vQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Orphaned files in base/[oid]  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] Orphaned files in base/[oid]  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers


On Mon, Aug 14, 2017 at 6:33 PM, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2017-08-14 14:12:22 +0200, Chris Travers wrote:
> Problem:
> The system this came up on is PostgreSQL 9.6.3 and has had repeated trouble
> with disk space.  Querying pg_database_size, as well as du on the
> subdirectory of base/ show total usage to be around 3.8TB.  Summing up the
> size of the relations in pg_class though shows around 2.1TB.
>
> Initial troubleshooting found around 150 GB of space in pg_temp which had
> never been cleared and was at least several days old.  Restarting the
> server cleared these up.
>
> Poking around the base/[oid] directory, I found a large number of files
> which did not correspond with a pg_class entry.  One of the apparent
> relations was nearly 1TB in size.
>
> What I think happened:
> I think various pg_temp/* and orphaned relation files (In base/[oid]) were
> created when PostgreSQL crashed due to running out of space in various
> operations including creating materialised views.
>
> So my question is if there is a way we can safely clean these up on server
> restart?  If not does it make sense to try to create a utility that can
> connect to PostgreSQL, seek out valid files, and delete the rest?

I think the fix here is to call RemovePgTempFiles() during
crash-restarts, instead of just full starts. The previously stated need
to be able to inspect temp files after a crash can be less impactfully
fulfilled with restart_after_crash = false.

But that only clears temp files right?

I am less concerned about the temp files because a restart clears them.

The bigger issue I see are with the orphaned base files.  It looks like files in base/[oid] don't get cleaned up either if I read my output correctly and it would explain why we saw 1.7TB of discrepancy between relations and database size.  Safety-wise it seems like the best way out of that is a dump/restore but doing that with a 2.1TB database is annoying.
 
Greetings,

Andres Freund



--
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com 
Saarbrücker Straße 37a, 10405 Berlin

pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: [HACKERS] shared memory based stat collector (was: Sharingrecord typmods between backends)
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Orphaned files in base/[oid]