Re: Cleaning up unreferenced table files - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: Cleaning up unreferenced table files
Date
Msg-id 200505021830.j42IU3l21583@candle.pha.pa.us
Whole thread Raw
In response to Cleaning up unreferenced table files  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: Cleaning up unreferenced table files  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-patches
Applied.

---------------------------------------------------------------------------

pgman wrote:
> Tom Lane wrote:
> > Heikki Linnakangas <hlinnaka@iki.fi> writes:
> > > On Tue, 26 Apr 2005, Tom Lane wrote:
> > >> Heikki Linnakangas <hlinnaka@iki.fi> writes:
> > >>> I feel that crashes that leaves behind stale files are rare.
> > >>
> > >> Indeed, and getting more so all the time ...
> >
> > > How so? Have changes been made in those parts of the code?
> >
> > No, just that the overall reliability of Postgres keeps improving.
> >
> > At the time that TODO entry was created, I don't think we even had the
> > ability to roll back table creates/drops properly, so there were
> > scenarios in which unreferenced files could be left behind without even
> > assuming any software error.  And the prevalence of backend crashes was
> > way higher than it is now, too.  So I think a good argument could be
> > made that the TODO item isn't nearly as important as it was at the time.
>
> Agreed, but I don't think we are ever going to have the file system obey
> the same semantics as the database, so there will always be corner cases
> where we have the possibility for unreferenced files.
>
> > > If nobody ever runs into this issue in production, and this whole exercise
> > > turns out to be completely unnecessary, at least we'll know. That alone
> > > makes me feel better.
> >
> > We will know no such thing, unless the patch is made to announce the
> > problem so intrusively that people are certain to see it *and report it
> > to us*.  Which I don't think will be acceptable.
>
> You are right that checking only after a server crash isn't going to
> help us very much, because the odds someone is going to look at the logs
> just at that moment are slim, and if they stop and start the server
> again, the message will not appear so they will think it is fixed.
>
> Though the problem is caused by a server crash, it remains even after a
> clean startup.  My new version of the patch checks not just after a
> server crash, but every time the server starts up.  Here are some
> timings:
>
>     3dbs    without patch    0m0.230s  300 files
>     3dbs    with patch    0m0.230s  300 files
>     100dbs    with patch    0m0.270s  10000 files
>
> (The timing test started the server and waited for the message "database
> system is ready".)
>
> You can see that for three empty databases (only system tables/indexes),
> the patch has no impact.  For 100 databases, or 10,000 files, there is a
> 17% increase in startup time, which seems acceptable.
>
> The message printed in the logs is:
>
>     LOG:  The table or index file "/u/pgsql/data/base/1/100001" is stale and can be safely removed
>
> I made a few changes to the patch. I changed 'clean' to 'check' because
> I don't think we are ever going to remove the files on startup (If we
> did we could do the cleanup just on crash recovery startup.)  I changed
> the message from NOTICE to LOG, I made it run on all startups and not
> just crash recovery, and it was missing FreeDir() calls.
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

> Index: doc/src/sgml/maintenance.sgml
> ===================================================================
> RCS file: /cvsroot/pgsql/doc/src/sgml/maintenance.sgml,v
> retrieving revision 1.41
> diff -c -c -r1.41 maintenance.sgml
> *** doc/src/sgml/maintenance.sgml    20 Feb 2005 02:21:26 -0000    1.41
> --- doc/src/sgml/maintenance.sgml    1 May 2005 04:30:48 -0000
> ***************
> *** 474,479 ****
> --- 474,496 ----
>     </para>
>    </sect1>
>
> +  <sect1 id="check-files-after-crash">
> +   <title>Check files after crash</title>
> +
> +   <indexterm zone="check-files-after-crash">
> +    <primary>stale file</primary>
> +   </indexterm>
> +
> +   <para>
> +    <productname>PostgreSQL</productname> recovers automatically after crash
> +    using the write-ahead log (see <xref linkend="wal">) and no manual
> +    operations are normally needed. However, if there was a transaction running
> +    when the crash occured that created or dropped a relation, the
> +    transaction might have left a stale file in the data directory. If this
> +    happens, you will get a notice in the log file stating which files can be
> +    deleted.
> +   </para>
> +  </sect1>
>
>    <sect1 id="logfile-maintenance">
>     <title>Log File Maintenance</title>
> Index: src/backend/access/transam/xlog.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v
> retrieving revision 1.189
> diff -c -c -r1.189 xlog.c
> *** src/backend/access/transam/xlog.c    28 Apr 2005 21:47:10 -0000    1.189
> --- src/backend/access/transam/xlog.c    1 May 2005 04:30:52 -0000
> ***************
> *** 43,48 ****
> --- 43,49 ----
>   #include "utils/builtins.h"
>   #include "utils/guc.h"
>   #include "utils/relcache.h"
> + #include "utils/flatfiles.h"
>
>
>   /*
> ***************
> *** 4525,4530 ****
> --- 4526,4533 ----
>
>           CreateCheckPoint(true, true);
>
> +         CheckStaleRelFiles();
> +
>           /*
>            * Close down recovery environment
>            */
> ***************
> *** 4536,4541 ****
> --- 4539,4550 ----
>            */
>           remove_backup_label();
>       }
> +     else
> +     {
> +         XLogInitRelationCache();
> +         CheckStaleRelFiles();
> +         XLogCloseRelationCache();
> +     }
>
>       /*
>        * Preallocate additional log files, if wanted.
> Index: src/backend/catalog/catalog.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/catalog/catalog.c,v
> retrieving revision 1.59
> diff -c -c -r1.59 catalog.c
> *** src/backend/catalog/catalog.c    14 Apr 2005 20:03:23 -0000    1.59
> --- src/backend/catalog/catalog.c    1 May 2005 04:30:52 -0000
> ***************
> *** 106,111 ****
> --- 106,144 ----
>       return path;
>   }
>
> + /*
> +  * GetTablespacePath    - construct path to a tablespace symbolic link
> +  *
> +  * Result is a palloc'd string.
> +  *
> +  * XXX this must agree with relpath and GetDatabasePath!
> +  */
> + char *
> + GetTablespacePath(Oid spcNode)
> + {
> +     int            pathlen;
> +     char       *path;
> +
> +     Assert(spcNode != GLOBALTABLESPACE_OID);
> +
> +     if (spcNode == DEFAULTTABLESPACE_OID)
> +     {
> +         /* The default tablespace is {datadir}/base */
> +         pathlen = strlen(DataDir) + 5 + 1;
> +         path = (char *) palloc(pathlen);
> +         snprintf(path, pathlen, "%s/base",
> +                  DataDir);
> +     }
> +     else
> +     {
> +         /* All other tablespaces have symlinks in pg_tblspc */
> +         pathlen = strlen(DataDir) + 11 + OIDCHARS + 1;
> +         path = (char *) palloc(pathlen);
> +         snprintf(path, pathlen, "%s/pg_tblspc/%u",
> +                  DataDir, spcNode);
> +     }
> +     return path;
> + }
>
>   /*
>    * IsSystemRelation
> Index: src/backend/commands/tablespace.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/commands/tablespace.c,v
> retrieving revision 1.17
> diff -c -c -r1.17 tablespace.c
> *** src/backend/commands/tablespace.c    14 Apr 2005 20:03:24 -0000    1.17
> --- src/backend/commands/tablespace.c    1 May 2005 04:30:53 -0000
> ***************
> *** 341,348 ****
>       /*
>        * All seems well, create the symlink
>        */
> !     linkloc = (char *) palloc(strlen(DataDir) + 11 + 10 + 1);
> !     sprintf(linkloc, "%s/pg_tblspc/%u", DataDir, tablespaceoid);
>
>       if (symlink(location, linkloc) < 0)
>           ereport(ERROR,
> --- 341,347 ----
>       /*
>        * All seems well, create the symlink
>        */
> !     linkloc = GetTablespacePath(tablespaceoid);
>
>       if (symlink(location, linkloc) < 0)
>           ereport(ERROR,
> ***************
> *** 495,502 ****
>       char       *subfile;
>       struct stat st;
>
> !     location = (char *) palloc(strlen(DataDir) + 11 + 10 + 1);
> !     sprintf(location, "%s/pg_tblspc/%u", DataDir, tablespaceoid);
>
>       /*
>        * Check if the tablespace still contains any files.  We try to rmdir
> --- 494,500 ----
>       char       *subfile;
>       struct stat st;
>
> !     location = GetTablespacePath(tablespaceoid);
>
>       /*
>        * Check if the tablespace still contains any files.  We try to rmdir
> ***************
> *** 1036,1043 ****
>           set_short_version(location);
>
>           /* Create the symlink if not already present */
> !         linkloc = (char *) palloc(strlen(DataDir) + 11 + 10 + 1);
> !         sprintf(linkloc, "%s/pg_tblspc/%u", DataDir, xlrec->ts_id);
>
>           if (symlink(location, linkloc) < 0)
>           {
> --- 1034,1040 ----
>           set_short_version(location);
>
>           /* Create the symlink if not already present */
> !         linkloc = GetTablespacePath(xlrec->ts_id);
>
>           if (symlink(location, linkloc) < 0)
>           {
> Index: src/backend/utils/adt/misc.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/utils/adt/misc.c,v
> retrieving revision 1.40
> diff -c -c -r1.40 misc.c
> *** src/backend/utils/adt/misc.c    31 Dec 2004 22:01:22 -0000    1.40
> --- src/backend/utils/adt/misc.c    1 May 2005 04:30:53 -0000
> ***************
> *** 26,31 ****
> --- 26,32 ----
>   #include "funcapi.h"
>   #include "catalog/pg_type.h"
>   #include "catalog/pg_tablespace.h"
> + #include "catalog/catalog.h"
>
>   #define atooid(x)  ((Oid) strtoul((x), NULL, 10))
>
> ***************
> *** 144,154 ****
>
>           fctx = palloc(sizeof(ts_db_fctx));
>
> -         /*
> -          * size = path length + tablespace dirname length + 2 dir sep
> -          * chars + oid + terminator
> -          */
> -         fctx->location = (char *) palloc(strlen(DataDir) + 11 + 10 + 1);
>           if (tablespaceOid == GLOBALTABLESPACE_OID)
>           {
>               fctx->dirdesc = NULL;
> --- 145,150 ----
> ***************
> *** 157,168 ****
>           }
>           else
>           {
> !             if (tablespaceOid == DEFAULTTABLESPACE_OID)
> !                 sprintf(fctx->location, "%s/base", DataDir);
> !             else
> !                 sprintf(fctx->location, "%s/pg_tblspc/%u", DataDir,
> !                         tablespaceOid);
> !
>               fctx->dirdesc = AllocateDir(fctx->location);
>
>               if (!fctx->dirdesc)
> --- 153,159 ----
>           }
>           else
>           {
> !             fctx->location = GetTablespacePath(tablespaceOid);
>               fctx->dirdesc = AllocateDir(fctx->location);
>
>               if (!fctx->dirdesc)
> Index: src/backend/utils/init/Makefile
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/utils/init/Makefile,v
> retrieving revision 1.18
> diff -c -c -r1.18 Makefile
> *** src/backend/utils/init/Makefile    20 Feb 2005 02:22:00 -0000    1.18
> --- src/backend/utils/init/Makefile    1 May 2005 04:30:53 -0000
> ***************
> *** 12,18 ****
>   top_builddir = ../../../..
>   include $(top_builddir)/src/Makefile.global
>
> ! OBJS = flatfiles.o globals.o miscinit.o postinit.o
>
>   all: SUBSYS.o
>
> --- 12,18 ----
>   top_builddir = ../../../..
>   include $(top_builddir)/src/Makefile.global
>
> ! OBJS = flatfiles.o globals.o miscinit.o postinit.o checkfiles.o
>
>   all: SUBSYS.o
>
> Index: src/include/catalog/catalog.h
> ===================================================================
> RCS file: /cvsroot/pgsql/src/include/catalog/catalog.h,v
> retrieving revision 1.30
> diff -c -c -r1.30 catalog.h
> *** src/include/catalog/catalog.h    31 Dec 2004 22:03:24 -0000    1.30
> --- src/include/catalog/catalog.h    1 May 2005 04:30:54 -0000
> ***************
> *** 19,24 ****
> --- 19,25 ----
>
>   extern char *relpath(RelFileNode rnode);
>   extern char *GetDatabasePath(Oid dbNode, Oid spcNode);
> + extern char *GetTablespacePath(Oid spcNode);
>
>   extern bool IsSystemRelation(Relation relation);
>   extern bool IsToastRelation(Relation relation);
> Index: src/include/utils/flatfiles.h
> ===================================================================
> RCS file: /cvsroot/pgsql/src/include/utils/flatfiles.h,v
> retrieving revision 1.1
> diff -c -c -r1.1 flatfiles.h
> *** src/include/utils/flatfiles.h    20 Feb 2005 02:22:07 -0000    1.1
> --- src/include/utils/flatfiles.h    1 May 2005 04:30:54 -0000
> ***************
> *** 30,33 ****
> --- 30,36 ----
>
>   extern Datum flatfile_update_trigger(PG_FUNCTION_ARGS);
>
> + /* from checkfiles.c */
> + extern void CheckStaleRelFiles(void);
> +
>   #endif   /* FLATFILES_H */

> /*-------------------------------------------------------------------------
>  *
>  * checkfiles.c
>  *      support to clean up stale relation files on crash recovery
>  *
>  * If a backend crashes while in a transaction that has created or
>  * deleted a relfilenode, a stale file can be left over in the data
>  * directory. This file contains routines to clean up those stale
>  * files on recovery.
>  *
>  * $PostgreSQL$
>  *
>  *-------------------------------------------------------------------------
>  */
> #include "postgres.h"
>
> #include "storage/fd.h"
>
> #include "utils/flatfiles.h"
> #include "miscadmin.h"
> #include "catalog/pg_tablespace.h"
> #include "catalog/catalog.h"
> #include "access/skey.h"
> #include "utils/fmgroids.h"
> #include "access/relscan.h"
> #include "access/heapam.h"
> #include "utils/resowner.h"
>
> static void CheckStaleRelFilesFrom(Oid tablespaceoid, Oid dboid);
> static void CheckStaleRelFilesFromTablespace(Oid tablespaceoid);
>
> /* Like AllocateDir, but ereports on failure */
> static DIR *
> AllocateDirChecked(char *path)
> {
>     DIR *dirdesc = AllocateDir(path);
>     if (dirdesc == NULL)
>         ereport(ERROR,
>                 (errcode_for_file_access(),
>                  errmsg("could not open directory \"%s\": %m",
>                         path)));
>     return dirdesc;
> }
>
> /*
>  * Scan through all tablespaces for relations left over
>  * by aborted transactions.
>  *
>  * For example, if a transaction issues
>  * BEGIN; CREATE TABLE foobar ();
>  * and then the backend crashes, the file is left in the
>  * tablespace until CheckStaleRelFiles deletes it.
>  */
> void
> CheckStaleRelFiles(void)
> {
>     DIR                *dirdesc;
>     struct dirent *de;
>     char       *path;
>     int pathlen;
>
>     pathlen = strlen(DataDir) + 11 + 1;
>     path = (char *) palloc(pathlen);
>     snprintf(path, pathlen, "%s/pg_tblspc/", DataDir);
>     dirdesc = AllocateDirChecked(path);
>     while ((de = readdir(dirdesc)) != NULL)
>     {
>         char *invalid;
>         Oid tablespaceoid;
>
>         /* Check that the directory name looks like valid tablespace link.    */
>         tablespaceoid = (Oid) strtol(de->d_name, &invalid, 10);
>         if(invalid[0] == '\0')
>             CheckStaleRelFilesFromTablespace(tablespaceoid);
>     }
>     FreeDir(dirdesc);
>     pfree(path);
>
>     CheckStaleRelFilesFromTablespace(DEFAULTTABLESPACE_OID);
> }
>
> /* Scan a specific tablespace for stale relations */
> static void
> CheckStaleRelFilesFromTablespace(Oid tablespaceoid)
> {
>     DIR                *dirdesc;
>     struct dirent *de;
>     char       *path;
>
>     path = GetTablespacePath(tablespaceoid);
>
>     dirdesc = AllocateDirChecked(path);
>     while ((de = readdir(dirdesc)) != NULL)
>     {
>         char *invalid;
>         Oid dboid;
>
>         dboid = (Oid) strtol(de->d_name, &invalid, 10);
>         if(invalid[0] == '\0')
>             CheckStaleRelFilesFrom(tablespaceoid, dboid);
>     }
>     FreeDir(dirdesc);
>     pfree(path);
> }
>
> /* Scan a specific database in a specific tablespace for stale relations.
>  *
>  * First, pg_class for the database is opened, and the relfilenodes of all
>  * relations mentioned there are stored in a hash table.
>  *
>  * Then the directory is scanned. Every file in the directory that's not
>  * found in pg_class (the hash table) is logged.
>  */
> static void
> CheckStaleRelFilesFrom(Oid tablespaceoid, Oid dboid)
> {
>     DIR *dirdesc;
>     struct dirent *de;
>     HASHCTL hashctl;
>     HTAB *relfilenodeHash;
>     MemoryContext mcxt;
>     RelFileNode rnode;
>     char *path;
>
>     /* We create a private memory context so that we can easily deallocate
>      * the hash table and its contents
>      */
>     mcxt = AllocSetContextCreate(TopMemoryContext, "CheckStaleRelFiles",
>                                  ALLOCSET_DEFAULT_MINSIZE,
>                                  ALLOCSET_DEFAULT_INITSIZE,
>                                  ALLOCSET_DEFAULT_MAXSIZE);
>
>     hashctl.hash = tag_hash;
>     /* The entry contents is not used for anything, we just check
>      * if an oid is in the hash table or not. */
>     hashctl.keysize = sizeof(Oid);
>     hashctl.entrysize = 1;
>     hashctl.hcxt = mcxt;
>     relfilenodeHash = hash_create("relfilenodeHash", 100, &hashctl,
>                                   HASH_FUNCTION
>                                   | HASH_ELEM | HASH_CONTEXT);
>
>     /* Read all relfilenodes from pg_class into the hash table */
>     {
>         ResourceOwner owner, oldowner;
>         Relation        rel;
>         HeapScanDesc scan;
>         HeapTuple tuple;
>
>         /* Need a resowner to keep the heapam and buffer code happy */
>         owner = ResourceOwnerCreate(NULL, "CheckStaleRelFiles");
>         oldowner = CurrentResourceOwner;
>         CurrentResourceOwner = owner;
>
>         rnode.spcNode = tablespaceoid;
>         rnode.dbNode = dboid;
>         rnode.relNode = RelationRelationId;
>         rel = XLogOpenRelation(true, 0 , rnode);
>
>         scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
>         while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
>         {
>             Form_pg_class classform = (Form_pg_class) GETSTRUCT(tuple);
>
>             hash_search(relfilenodeHash, &classform->relfilenode,
>                         HASH_ENTER, NULL);
>         }
>         heap_endscan(scan);
>
>         XLogCloseRelation(rnode);
>         CurrentResourceOwner = oldowner;
>         ResourceOwnerDelete(owner);
>     }
>
>     /* Scan the directory */
>     path = GetDatabasePath(dboid, tablespaceoid);
>
>     dirdesc = AllocateDirChecked(path);
>     while ((de = readdir(dirdesc)) != NULL)
>     {
>         char *invalid;
>         Oid relfilenode;
>
>         relfilenode = strtol(de->d_name, &invalid, 10);
>         if(invalid[0] == '\0')
>         {
>             /* Filename was a valid number, check if pg_class knows
>                about it */
>             if(hash_search(relfilenodeHash, &relfilenode,
>                            HASH_FIND, NULL) == NULL)
>             {
>                 char *filepath;
>                 rnode.spcNode = tablespaceoid;
>                 rnode.dbNode = dboid;
>                 rnode.relNode = relfilenode;
>
>                 filepath = relpath(rnode);
>
>                 ereport(LOG,
>                         (errcode_for_file_access(),
>                          errmsg("The table or index file \"%s\" is stale and can be safely removed",
>                                 filepath)));
>                 pfree(filepath);
>             }
>         }
>     }
>     FreeDir(dirdesc);
>     pfree(path);
>     hash_destroy(relfilenodeHash);
>     MemoryContextDelete(mcxt);
> }

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

pgsql-patches by date:

Previous
From: Neil Conway
Date:
Subject: Re: Problem with Create Domain example
Next
From: "Magnus Hagander"
Date:
Subject: Added columns to pg_stat_activity