Re: DROP DATABASE vs patch to not remove files right away - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: DROP DATABASE vs patch to not remove files right away
Date
Msg-id 4807A4C4.4050601@enterprisedb.com
Whole thread Raw
In response to Re: DROP DATABASE vs patch to not remove files right away  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: DROP DATABASE vs patch to not remove files right away  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> Tom Lane wrote:
>>> ISTM that we must fix the bgwriter so that ForgetDatabaseFsyncRequests
>>> causes PendingUnlinkEntrys for the doomed DB to be thrown away too.
>
>> Because of the asynchronous nature of ForgetDatabaseFsyncRequests, this
>> still isn't enough, I'm afraid.
>
> Hm.  I notice that there is no bug on Windows because dropdb forces a
> checkpoint before it starts to remove files.  Maybe the sanest solution
> is to just do that on all platforms.

Yeah, I don't see any other simple solution.

Patch attached that does the three changes we've talked about:'
- make ForgetDatabaseFsyncRequests forget unlink requests as well
- make rmtree() not fail on ENOENT
- force checkpoint on in dropdb on all platforms

plus some comment changes reflecting what we now know. I will apply this
tomorrow if there's no objections.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com
Index: src/backend/commands/dbcommands.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/commands/dbcommands.c,v
retrieving revision 1.205
diff -c -r1.205 dbcommands.c
*** src/backend/commands/dbcommands.c    26 Mar 2008 21:10:37 -0000    1.205
--- src/backend/commands/dbcommands.c    17 Apr 2008 18:53:17 -0000
***************
*** 696,713 ****
      pgstat_drop_database(db_id);

      /*
!      * Tell bgwriter to forget any pending fsync requests for files in the
!      * database; else it'll fail at next checkpoint.
       */
      ForgetDatabaseFsyncRequests(db_id);

      /*
!      * On Windows, force a checkpoint so that the bgwriter doesn't hold any
!      * open files, which would cause rmdir() to fail.
       */
- #ifdef WIN32
      RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);
- #endif

      /*
       * Remove all tablespace subdirs belonging to the database.
--- 696,714 ----
      pgstat_drop_database(db_id);

      /*
!      * Tell bgwriter to forget any pending fsync and unlink requests for files
!      * in the database; else it'll fail at next checkpoint, or worse it will
!      * delete files that belong to a newly created database with the same OID.
       */
      ForgetDatabaseFsyncRequests(db_id);

      /*
!      * Force a checkpoint to make sure the bgwriter has received the message
!      * sent by ForgetDatabaseFsyncRequests. On Windows, this also ensures that
!      * the bgwriter doesn't hold any open files, which would cause rmdir() to
!      * fail.
       */
      RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT);

      /*
       * Remove all tablespace subdirs belonging to the database.
Index: src/backend/storage/smgr/md.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/backend/storage/smgr/md.c,v
retrieving revision 1.136
diff -c -r1.136 md.c
*** src/backend/storage/smgr/md.c    10 Mar 2008 20:06:27 -0000    1.136
--- src/backend/storage/smgr/md.c    17 Apr 2008 19:26:16 -0000
***************
*** 1196,1203 ****
          if (unlink(path) < 0)
          {
              /*
!              * ENOENT shouldn't happen either, but it doesn't really matter
!              * because we would've deleted it now anyway.
               */
              if (errno != ENOENT)
                  ereport(WARNING,
--- 1196,1206 ----
          if (unlink(path) < 0)
          {
              /*
!              * There's a race condition, when the database is dropped at the
!              * same time that we process the pending unlink requests. If the
!              * DROP DATABASE deletes the file before we do, we will get ENOENT
!              * here. rmtree() also has to ignore ENOENT errors, to deal with
!              * the possibility that we delete the file first.
               */
              if (errno != ENOENT)
                  ereport(WARNING,
***************
*** 1321,1327 ****
--- 1324,1334 ----
          /* Remove any pending requests for the entire database */
          HASH_SEQ_STATUS hstat;
          PendingOperationEntry *entry;
+         ListCell   *cell,
+                    *prev,
+                    *next;

+         /* Remove fsync requests */
          hash_seq_init(&hstat, pendingOpsTable);
          while ((entry = (PendingOperationEntry *) hash_seq_search(&hstat)) != NULL)
          {
***************
*** 1331,1336 ****
--- 1338,1359 ----
                  entry->canceled = true;
              }
          }
+
+         /* Remove unlink requests */
+         prev = NULL;
+         for (cell = list_head(pendingUnlinks); cell; cell = next)
+         {
+             PendingUnlinkEntry *entry = (PendingUnlinkEntry *) lfirst(cell);
+
+             next = lnext(cell);
+             if (entry->rnode.dbNode == rnode.dbNode)
+             {
+                 pendingUnlinks = list_delete_cell(pendingUnlinks, cell, prev);
+                 pfree(entry);
+             }
+             else
+                 prev = cell;
+         }
      }
      else if (segno == UNLINK_RELATION_REQUEST)
      {
***************
*** 1386,1392 ****
  }

  /*
!  * ForgetRelationFsyncRequests -- ensure any fsyncs for a rel are forgotten
   */
  void
  ForgetRelationFsyncRequests(RelFileNode rnode)
--- 1409,1415 ----
  }

  /*
!  * ForgetRelationFsyncRequests -- forget any fsyncs for a rel
   */
  void
  ForgetRelationFsyncRequests(RelFileNode rnode)
***************
*** 1419,1425 ****
  }

  /*
!  * ForgetDatabaseFsyncRequests -- ensure any fsyncs for a DB are forgotten
   */
  void
  ForgetDatabaseFsyncRequests(Oid dbid)
--- 1442,1448 ----
  }

  /*
!  * ForgetDatabaseFsyncRequests -- forget any fsyncs and unlinks for a DB
   */
  void
  ForgetDatabaseFsyncRequests(Oid dbid)
Index: src/port/dirmod.c
===================================================================
RCS file: /home/hlinnaka/pgcvsrepository/pgsql/src/port/dirmod.c,v
retrieving revision 1.53
diff -c -r1.53 dirmod.c
*** src/port/dirmod.c    11 Apr 2008 23:53:00 -0000    1.53
--- src/port/dirmod.c    17 Apr 2008 19:08:15 -0000
***************
*** 406,413 ****
      {
          snprintf(filepath, MAXPGPATH, "%s/%s", path, *filename);

          if (lstat(filepath, &statbuf) != 0)
!             goto report_and_fail;

          if (S_ISDIR(statbuf.st_mode))
          {
--- 406,429 ----
      {
          snprintf(filepath, MAXPGPATH, "%s/%s", path, *filename);

+         /*
+          * It's ok if the file is not there anymore; we were just about to
+          * delete it anyway.
+          *
+          * This is not an academic possibility. One scenario where this
+          * happens is when bgwriter has a pending unlink request for a file
+          * in a database that's being dropped. In dropdb(), we call
+          * ForgetDatabaseFsyncRequests() to flush out any such pending unlink
+          * requests, but because that's asynchronous, it's not guaranteed
+          * that the bgwriter receives the message in time.
+          */
          if (lstat(filepath, &statbuf) != 0)
!         {
!             if (errno != ENOENT)
!                 goto report_and_fail;
!             else
!                 continue;
!         }

          if (S_ISDIR(statbuf.st_mode))
          {
***************
*** 422,428 ****
          else
          {
              if (unlink(filepath) != 0)
!                 goto report_and_fail;
          }
      }

--- 438,447 ----
          else
          {
              if (unlink(filepath) != 0)
!             {
!                 if (errno != ENOENT)
!                     goto report_and_fail;
!             }
          }
      }


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: RFD: hexstring(n) data type
Next
From: Heikki Linnakangas
Date:
Subject: Re: Patch for Prevent pg_dump/pg_restore from being affected by statement_timeout