Thread: 7.4RC2 PANIC: insufficient room in FSM

7.4RC2 PANIC: insufficient room in FSM

From
"Arthur Ward"
Date:
I was a bit stunned last night when I found this in the server logs for a
7.4RC2 installation:

Nov 24 20:37:18 x pg_autovacuum: [2003-11-24 08:37:18 PM] Performing:
VACUUM ANALYZE "clients"."x"
Nov 24 20:37:19 x postgres: [13904] PANIC:  insufficient room in FSM
Nov 24 20:37:19 x postgres: STATEMENT:  VACUUM ANALYZE "clients"."x"

Following this is of course the fallout of backends shutting down and PG
recycling itself with no other problems. Did I miss something along the
way about the FSM needing to be sufficiently large to hold all free pages
no matter what?

I plan to bump up the FSM size anyhow (perhaps tonight I can get some FSM
stats from manually vacuuming), but my gosh, that's some bad behavior for
a presumably minor situation. IMO, that's a significant bug.

Re: 7.4RC2 PANIC: insufficient room in FSM

From
Tom Lane
Date:
"Arthur Ward" <award@dominionsciences.com> writes:
> I was a bit stunned last night when I found this in the server logs for a
> 7.4RC2 installation:

> Nov 24 20:37:18 x pg_autovacuum: [2003-11-24 08:37:18 PM] Performing:
> VACUUM ANALYZE "clients"."x"
> Nov 24 20:37:19 x postgres: [13904] PANIC:  insufficient room in FSM

We have seen reports of similar things in situations where the real
problem was that the lock table had gotten too big --- is it possible
that you had something going on in parallel that would have acquired
lots of locks?  If so, raising max_locks_per_transaction should avoid
the problem.

I'll look at whether we couldn't downgrade the failure to something
less than a PANIC, too ...

            regards, tom lane

Re: 7.4RC2 PANIC: insufficient room in FSM

From
"Arthur Ward"
Date:
> "Arthur Ward" <award@dominionsciences.com> writes:
>> I was a bit stunned last night when I found this in the server logs for
>> a
>> 7.4RC2 installation:
>
>> Nov 24 20:37:18 x pg_autovacuum: [2003-11-24 08:37:18 PM] Performing:
>> VACUUM ANALYZE "clients"."x"
>> Nov 24 20:37:19 x postgres: [13904] PANIC:  insufficient room in FSM
>
> We have seen reports of similar things in situations where the real
> problem was that the lock table had gotten too big --- is it possible
> that you had something going on in parallel that would have acquired
> lots of locks?  If so, raising max_locks_per_transaction should avoid
> the problem.

I've combed through the system logs, our data-acquisition daemon's log,
and web logs, and there's nothing indicating that there would be any more
activity than there is normally all workday. It normally gets auto-vacuum
hits during the day when there is a little more large-transaction activity
with no problems. In the wee hours of the morning, I have a process doing
bulk loads that locks about a dozen tables explicitly to avoid unnecessary
rollbacks, but that was at least four hours in the future (or finished
19-ish hours in the past). That load also runs without issue. So, no, I
can't say that there was anything out of the ordinary happening to cause
the panic.

Re: 7.4RC2 PANIC: insufficient room in FSM

From
Tom Lane
Date:
"Arthur Ward" <award@dominionsciences.com> writes:
> [ 7.4RC2 produced this: ]
> Nov 24 20:37:19 x postgres: [13904] PANIC:  insufficient room in FSM

After further study I've concluded that this means the fix I put in
place here:

2003-10-29 12:36  tgl

    * src/backend/storage/freespace/freespace.c: compact_fsm_storage()
    does need to handle the case where a relation's FSM data has to be
    both moved down and compressed.  Per report from Dror Matalon.

was incomplete, and that in fact there is no can't-happen case for this
routine.  I've applied the attached patch for 7.4.1.

            regards, tom lane


*** src/backend/storage/freespace/freespace.c.orig    Wed Oct 29 12:36:57 2003
--- src/backend/storage/freespace/freespace.c    Wed Nov 26 13:43:16 2003
***************
*** 1394,1399 ****
--- 1394,1400 ----
  compact_fsm_storage(void)
  {
      int            nextChunkIndex = 0;
+     bool        did_push = false;
      FSMRelation *fsmrel;

      for (fsmrel = FreeSpaceMap->firstRel;
***************
*** 1419,1434 ****
              newAllocPages = newAlloc * INDEXCHUNKPAGES;
          else
              newAllocPages = newAlloc * CHUNKPAGES;
-         newChunkIndex = nextChunkIndex;
-         nextChunkIndex += newAlloc;

          /*
           * Determine current size, current and new locations
           */
          curChunks = fsm_current_chunks(fsmrel);
          oldChunkIndex = fsmrel->firstChunk;
-         newLocation = FreeSpaceMap->arena + newChunkIndex * CHUNKBYTES;
          oldLocation = FreeSpaceMap->arena + oldChunkIndex * CHUNKBYTES;

          /*
           * It's possible that we have to move data down, not up, if the
--- 1420,1434 ----
              newAllocPages = newAlloc * INDEXCHUNKPAGES;
          else
              newAllocPages = newAlloc * CHUNKPAGES;

          /*
           * Determine current size, current and new locations
           */
          curChunks = fsm_current_chunks(fsmrel);
          oldChunkIndex = fsmrel->firstChunk;
          oldLocation = FreeSpaceMap->arena + oldChunkIndex * CHUNKBYTES;
+         newChunkIndex = nextChunkIndex;
+         newLocation = FreeSpaceMap->arena + newChunkIndex * CHUNKBYTES;

          /*
           * It's possible that we have to move data down, not up, if the
***************
*** 1440,1449 ****
           * more than once, so pack everything against the end of the arena
           * if so.
           *
!          * In corner cases where roundoff has affected our allocation, it's
!          * possible that we have to move down and compress our data too.
!          * Since this case is extremely infrequent, we do not try to be smart
!          * about it --- we just drop pages from the end of the rel's data.
           */
          if (newChunkIndex > oldChunkIndex)
          {
--- 1440,1455 ----
           * more than once, so pack everything against the end of the arena
           * if so.
           *
!          * In corner cases where we are on the short end of a roundoff choice
!          * that we were formerly on the long end of, it's possible that we
!          * have to move down and compress our data too.  In fact, even after
!          * pushing down the following rels, there might not be as much space
!          * as we computed for this rel above --- that would imply that some
!          * following rel(s) are also on the losing end of roundoff choices.
!          * We could handle this fairly by doing the per-rel compactions
!          * out-of-order, but that seems like way too much complexity to deal
!          * with a very infrequent corner case.  Instead, we simply drop pages
!          * from the end of the current rel's data until it fits.
           */
          if (newChunkIndex > oldChunkIndex)
          {
***************
*** 1455,1475 ****
                  fsmrel->storedPages = newAllocPages;
                  curChunks = fsm_current_chunks(fsmrel);
              }
              if (fsmrel->nextPhysical != NULL)
                  limitChunkIndex = fsmrel->nextPhysical->firstChunk;
              else
                  limitChunkIndex = FreeSpaceMap->totalChunks;
              if (newChunkIndex + curChunks > limitChunkIndex)
              {
!                 /* need to push down additional rels */
!                 push_fsm_rels_after(fsmrel);
!                 /* recheck for safety */
                  if (fsmrel->nextPhysical != NULL)
                      limitChunkIndex = fsmrel->nextPhysical->firstChunk;
                  else
                      limitChunkIndex = FreeSpaceMap->totalChunks;
                  if (newChunkIndex + curChunks > limitChunkIndex)
!                     elog(PANIC, "insufficient room in FSM");
              }
              memmove(newLocation, oldLocation, curChunks * CHUNKBYTES);
          }
--- 1461,1504 ----
                  fsmrel->storedPages = newAllocPages;
                  curChunks = fsm_current_chunks(fsmrel);
              }
+             /* is there enough space? */
              if (fsmrel->nextPhysical != NULL)
                  limitChunkIndex = fsmrel->nextPhysical->firstChunk;
              else
                  limitChunkIndex = FreeSpaceMap->totalChunks;
              if (newChunkIndex + curChunks > limitChunkIndex)
              {
!                 /* not enough space, push down following rels */
!                 if (!did_push)
!                 {
!                     push_fsm_rels_after(fsmrel);
!                     did_push = true;
!                 }
!                 /* now is there enough space? */
                  if (fsmrel->nextPhysical != NULL)
                      limitChunkIndex = fsmrel->nextPhysical->firstChunk;
                  else
                      limitChunkIndex = FreeSpaceMap->totalChunks;
                  if (newChunkIndex + curChunks > limitChunkIndex)
!                 {
!                     /* uh-oh, forcibly cut the allocation to fit */
!                     newAlloc = limitChunkIndex - newChunkIndex;
!                     /*
!                      * If newAlloc < 0 at this point, we are moving the rel's
!                      * firstChunk into territory currently assigned to a later
!                      * rel.  This is okay so long as we do not copy any data.
!                      * The rels will be back in nondecreasing firstChunk order
!                      * at completion of the compaction pass.
!                      */
!                     if (newAlloc < 0)
!                         newAlloc = 0;
!                     if (fsmrel->isIndex)
!                         newAllocPages = newAlloc * INDEXCHUNKPAGES;
!                     else
!                         newAllocPages = newAlloc * CHUNKPAGES;
!                     fsmrel->storedPages = newAllocPages;
!                     curChunks = fsm_current_chunks(fsmrel);
!                 }
              }
              memmove(newLocation, oldLocation, curChunks * CHUNKBYTES);
          }
***************
*** 1504,1509 ****
--- 1533,1539 ----
              memmove(newLocation, oldLocation, curChunks * CHUNKBYTES);
          }
          fsmrel->firstChunk = newChunkIndex;
+         nextChunkIndex += newAlloc;
      }
      Assert(nextChunkIndex <= FreeSpaceMap->totalChunks);
      FreeSpaceMap->usedChunks = nextChunkIndex;
***************
*** 1544,1551 ****
          oldChunkIndex = fsmrel->firstChunk;
          if (newChunkIndex < oldChunkIndex)
          {
!             /* trouble... */
!             elog(PANIC, "insufficient room in FSM");
          }
          else if (newChunkIndex > oldChunkIndex)
          {
--- 1574,1581 ----
          oldChunkIndex = fsmrel->firstChunk;
          if (newChunkIndex < oldChunkIndex)
          {
!             /* we're pushing down, how can it move up? */
!             elog(PANIC, "inconsistent entry sizes in FSM");
          }
          else if (newChunkIndex > oldChunkIndex)
          {
***************
*** 1758,1771 ****
  {
      int            chunkCount;

      /* Convert page count to chunk count */
      if (fsmrel->isIndex)
          chunkCount = (fsmrel->storedPages - 1) / INDEXCHUNKPAGES + 1;
      else
          chunkCount = (fsmrel->storedPages - 1) / CHUNKPAGES + 1;
-     /* Make sure storedPages==0 produces right answer */
-     if (chunkCount < 0)
-         chunkCount = 0;
      return chunkCount;
  }

--- 1788,1801 ----
  {
      int            chunkCount;

+     /* Make sure storedPages==0 produces right answer */
+     if (fsmrel->storedPages <= 0)
+         return 0;
      /* Convert page count to chunk count */
      if (fsmrel->isIndex)
          chunkCount = (fsmrel->storedPages - 1) / INDEXCHUNKPAGES + 1;
      else
          chunkCount = (fsmrel->storedPages - 1) / CHUNKPAGES + 1;
      return chunkCount;
  }