Thread: Removing unreferenced files

Removing unreferenced files

From
Ron Farrer
Date:
Hello,

This is in regards to the patch[0] posted in 2006 based on previous
works[1]. Below is a summary of the issues, at present, as I understand
it along with some questions.

Initial questions that had no consensus in previous discussions:
1. Approach on file handling undecided
2. Startup vs standalone tool
3. If startup: how to determine when to run

Outstanding problems (point-for-point with original 2005 post):
1. Does not work with non-standard tablespaces. The check will even
report all files are stale, etc.
2. Has issues with stale subdirs of a tablespace (subdirs corresponding
to a nonexistent database) [appears related to #1 because of maintenance
mode and not failing]
3. Assumes relfilenode is unique database-wide when it’s only safe
tablespace-wide
4. Does not examine table segment files such as “nnn.1” - it should
instead complain when “nnn” does not match a hash entry
5. It loads every value of relfilenode in pg_class into the hash table
without checking that it is meaningful or not - needs to check.
6. strol vs strspn (or other) [not sure what the problem here is. If
errors are handled correctly this should not be an issue]
7. No checks for readdir failure [this should be easy to check for]

Other thoughts:
1. What to do if problem happens during drop table/index and the files
that should be removed are still there.. the DBA needs to know when this
happens somehow
2.  What happened to pgfsck: was that a better approach? why was that
abandoned?
3.  What to do about stale files and missing files

References:
0 -
http://www.postgresql.org/message-id/200606081508.k58F85m29270@candle.pha.pa.us
1 - http://www.postgresql.org/message-id/8291.1115340924@sss.pgh.pa.us


Ron

-- 
Command Prompt, Inc. http://www.commandprompt.com/ +1-800-492-2240
PostgreSQL Centered full stack support, consulting, and development.




Re: Removing unreferenced files

From
"Joshua D. Drake"
Date:
On 08/05/2015 11:44 AM, Ron Farrer wrote:

> Initial questions that had no consensus in previous discussions:
> 1. Approach on file handling undecided
> 2. Startup vs standalone tool

I think it should be on startup and perhaps also have a function that 
will do it from user space. If this problem persists, we shouldn't 
expect users to go down to clean it up.

> 3. If startup: how to determine when to run
>
> Outstanding problems (point-for-point with original 2005 post):
> 1. Does not work with non-standard tablespaces. The check will even
> report all files are stale, etc.
> 2. Has issues with stale subdirs of a tablespace (subdirs corresponding
> to a nonexistent database) [appears related to #1 because of maintenance
> mode and not failing]
> 3. Assumes relfilenode is unique database-wide when it’s only safe
> tablespace-wide
> 4. Does not examine table segment files such as “nnn.1” - it should
> instead complain when “nnn” does not match a hash entry
> 5. It loads every value of relfilenode in pg_class into the hash table
> without checking that it is meaningful or not - needs to check.
> 6. strol vs strspn (or other) [not sure what the problem here is. If
> errors are handled correctly this should not be an issue]
> 7. No checks for readdir failure [this should be easy to check for]

Ron,

Do you have suggestions on how to resolve any of the 1-7?

>
> Other thoughts:
> 1. What to do if problem happens during drop table/index and the files
> that should be removed are still there.. the DBA needs to know when this
> happens somehow

Why? If the drop/table/index happens, should it not just remove those files?

> 2.  What happened to pgfsck: was that a better approach? why was that
> abandoned?

> 3.  What to do about stale files and missing files

IMO, unless there is a prevailing reason not to we should remove them.

JD

-- 
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.