Re: Lots of FSM-related fragility in transaction commit - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Lots of FSM-related fragility in transaction commit
Date
Msg-id 5430.1324337163@sss.pgh.pa.us
Whole thread Raw
In response to Re: Lots of FSM-related fragility in transaction commit  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> On 08.12.2011 08:20, Tom Lane wrote:
>> I thought that removing the unreadable file would let me restart,
>> but after "rm 52860_fsm" and trying again to start the server,
>> there's a different problem:
>> LOG:  consistent recovery state reached at 0/5D71BA8
>> LOG:  redo starts at 0/5100050
>> WARNING:  page 18 of relation base/27666/52860 is uninitialized
>> CONTEXT:  xlog redo visible: rel 1663/27666/52860; blk 18
>> PANIC:  WAL contains references to invalid pages

> The bug here is that we consider having immediately reached consistency
> during crash recovery.

That fixes only part of the problem I was on about, though: I don't
believe that this type of situation should occur in the first place.
We should not be throwing ERROR during post-commit attempts to unlink
the physical files of deleted relations.

After some investigation I believe that the problem was introduced
by the changes to support multiple "forks" in a relation.  Specifically,
instead of just doing "smgrdounlink()" during post-commit, we now
do something like this:

        for (fork = 0; fork <= MAX_FORKNUM; fork++)
        {
            if (smgrexists(srel, fork))
                smgrdounlink(srel, fork, false);
        }

and if you look into mdexists() you'll find that it throws ERROR for any
unexpected errno.  This makes smgrdounlink's attempts to be robust
rather pointless.

I'm not entirely sure whether that behavior of mdexists is a good thing,
but it strikes me that the smgrexists call is pretty unnecessary here in
the first place.  We should just get rid of it and do smgrdounlink
unconditionally.  The only reason it's there is to keep smgrdounlink
from whinging about ENOENT on non-primary forks, which is simply stupid
for smgrdounlink to be doing anyway --- it is actually *more* willing to
complain about ENOENT on non-primary than primary forks, which has no
sense that I can detect.

Accordingly I propose the attached patch for HEAD, and similar changes
in all branches that have multiple forks.  Note that this makes the
warning-report conditions identical for both code paths in mdunlink.

            regards, tom lane


diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index d2fecb1ecb9171fc2c6dd2887d846c8a16779bb0..c5a2541807e148c29c790b0fe12a52c0389f4626 100644
*** a/src/backend/access/transam/twophase.c
--- b/src/backend/access/transam/twophase.c
*************** FinishPreparedTransaction(const char *gi
*** 1366,1373 ****

          for (fork = 0; fork <= MAX_FORKNUM; fork++)
          {
!             if (smgrexists(srel, fork))
!                 smgrdounlink(srel, fork, false);
          }
          smgrclose(srel);
      }
--- 1366,1372 ----

          for (fork = 0; fork <= MAX_FORKNUM; fork++)
          {
!             smgrdounlink(srel, fork, false);
          }
          smgrclose(srel);
      }
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index c383011b5f6538fa2b90fa5f7778da7ff59fa679..4177a6c5cf95d5e1a1b49b9f11c17098b2158529 100644
*** a/src/backend/access/transam/xact.c
--- b/src/backend/access/transam/xact.c
*************** xact_redo_commit_internal(TransactionId
*** 4595,4605 ****

          for (fork = 0; fork <= MAX_FORKNUM; fork++)
          {
!             if (smgrexists(srel, fork))
!             {
!                 XLogDropRelation(xnodes[i], fork);
!                 smgrdounlink(srel, fork, true);
!             }
          }
          smgrclose(srel);
      }
--- 4595,4602 ----

          for (fork = 0; fork <= MAX_FORKNUM; fork++)
          {
!             XLogDropRelation(xnodes[i], fork);
!             smgrdounlink(srel, fork, true);
          }
          smgrclose(srel);
      }
*************** xact_redo_abort(xl_xact_abort *xlrec, Tr
*** 4738,4748 ****

          for (fork = 0; fork <= MAX_FORKNUM; fork++)
          {
!             if (smgrexists(srel, fork))
!             {
!                 XLogDropRelation(xlrec->xnodes[i], fork);
!                 smgrdounlink(srel, fork, true);
!             }
          }
          smgrclose(srel);
      }
--- 4735,4742 ----

          for (fork = 0; fork <= MAX_FORKNUM; fork++)
          {
!             XLogDropRelation(xlrec->xnodes[i], fork);
!             smgrdounlink(srel, fork, true);
          }
          smgrclose(srel);
      }
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c
index fff933d403e7ca5e7b551b987e1684cd6a6e98a7..450e292f4013f9be5418f482654cd33c14c6d344 100644
*** a/src/backend/catalog/storage.c
--- b/src/backend/catalog/storage.c
*************** smgrDoPendingDeletes(bool isCommit)
*** 361,368 ****
                  srel = smgropen(pending->relnode, pending->backend);
                  for (i = 0; i <= MAX_FORKNUM; i++)
                  {
!                     if (smgrexists(srel, i))
!                         smgrdounlink(srel, i, false);
                  }
                  smgrclose(srel);
              }
--- 361,367 ----
                  srel = smgropen(pending->relnode, pending->backend);
                  for (i = 0; i <= MAX_FORKNUM; i++)
                  {
!                     smgrdounlink(srel, i, false);
                  }
                  smgrclose(srel);
              }
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index a761369d39c2f7aa4c71131d79ef6a0edc61b4ea..7150a00407e3b28dae1cca2e6063251f2c5b681c 100644
*** a/src/backend/storage/smgr/md.c
--- b/src/backend/storage/smgr/md.c
*************** mdcreate(SMgrRelation reln, ForkNumber f
*** 323,329 ****
   * number until it's safe, because relfilenode assignment skips over any
   * existing file.
   *
!  * If isRedo is true, it's okay for the relation to be already gone.
   * Also, we should remove the file immediately instead of queuing a request
   * for later, since during redo there's no possibility of creating a
   * conflicting relation.
--- 323,335 ----
   * number until it's safe, because relfilenode assignment skips over any
   * existing file.
   *
!  * All the above applies only to the relation's main fork; other forks can
!  * just be removed immediately, since they are not needed to prevent the
!  * relfilenode number from being recycled.  Also, we do not carefully
!  * track whether other forks have been created or not, but just attempt to
!  * unlink them unconditionally; so we should never complain about ENOENT.
!  *
!  * If isRedo is true, it's unsurprising for the relation to be already gone.
   * Also, we should remove the file immediately instead of queuing a request
   * for later, since during redo there's no possibility of creating a
   * conflicting relation.
*************** mdunlink(RelFileNodeBackend rnode, ForkN
*** 351,363 ****
      if (isRedo || forkNum != MAIN_FORKNUM)
      {
          ret = unlink(path);
!         if (ret < 0)
!         {
!             if (!isRedo || errno != ENOENT)
!                 ereport(WARNING,
!                         (errcode_for_file_access(),
!                          errmsg("could not remove file \"%s\": %m", path)));
!         }
      }
      else
      {
--- 357,366 ----
      if (isRedo || forkNum != MAIN_FORKNUM)
      {
          ret = unlink(path);
!         if (ret < 0 && errno != ENOENT)
!             ereport(WARNING,
!                     (errcode_for_file_access(),
!                      errmsg("could not remove file \"%s\": %m", path)));
      }
      else
      {

pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: JSON for PG 9.2
Next
From: "David E. Wheeler"
Date:
Subject: Re: JSON for PG 9.2