Thread: ERROR: out of free buffers: time to abort !

ERROR: out of free buffers: time to abort !

From
Ed Loehr
Date:
This problem was posted 12/24/1999 to GENERAL with no
answer...hoping one of you know the easy answer...

I am seeing the following error during a DB rebuild.  It is
occuring during the execution of a PL/pgSQL procedure which is
called from a trigger procedure on an AFTER INSERT trigger...
   ERROR:  out of free buffers: time to abort !

The insert fails.  This is under pgsql 6.5.2, redhat 6.1, built
from tgz, running under "postmaster -i -N 15 -o -F -S 4096"...

Any ideas?


Cheers,
Ed Loehr




Re: ERROR: out of free buffers: time to abort !

From
Ed Loehr
Date:
> I am seeing the following error during a DB rebuild.  It is
> occuring during the execution of a PL/pgSQL procedure which is
> called from a trigger procedure on an AFTER INSERT trigger...
>
>     ERROR:  out of free buffers: time to abort !
>
> The insert fails.  This is under pgsql 6.5.2, redhat 6.1, built
> from tgz, running under "postmaster -i -N 15 -o -F -S 4096"...
>
> Any ideas?

This problem disappears when I up the number of shared mem buffers
with the -B flag from default of 64 to 256.

Cheers,
Ed Loehr



Re: [HACKERS] Re: ERROR: out of free buffers: time to abort !

From
Tom Lane
Date:
Ed Loehr <eloehr@austin.rr.com> writes:
>> I am seeing the following error during a DB rebuild.  It is
>> occuring during the execution of a PL/pgSQL procedure which is
>> called from a trigger procedure on an AFTER INSERT trigger...
>> 
>> ERROR:  out of free buffers: time to abort !
>> 
>> The insert fails.  This is under pgsql 6.5.2, redhat 6.1, built
>> from tgz, running under "postmaster -i -N 15 -o -F -S 4096"...

> This problem disappears when I up the number of shared mem buffers
> with the -B flag from default of 64 to 256.

That's the message you get if all the disk buffers are marked as
"in use" (ref count > 0) so that there is noplace to read in another
database page.  I fixed several nasty buffer-ref-count-leakage bugs
a couple of months ago, so I think this problem may be gone in current
sources.  (I'd appreciate it if you'd try this test case as soon as
we are ready for 7.0 beta...)

In the meantime, upping the number of buffers will at least postpone the
problem.  But I'm worried that it may not solve it completely --- you
may still find that the error occurs after you've been running long
enough.
        regards, tom lane


Re: [HACKERS] Re: ERROR: out of free buffers: time to abort !

From
Ed Loehr
Date:
Tom Lane wrote:

> >> I am seeing the following error during a DB rebuild.

> >> ERROR:  out of free buffers: time to abort !
> >>
> > This problem disappears when I up the number of shared mem buffers
> > with the -B flag from default of 64 to 256.
>
> That's the message you get if all the disk buffers are marked as
> "in use" (ref count > 0) so that there is noplace to read in another
> database page.  I fixed several nasty buffer-ref-count-leakage bugs
> a couple of months ago, so I think this problem may be gone in current
> sources.  (I'd appreciate it if you'd try this test case as soon as
> we are ready for 7.0 beta...)

Great.  Thanks again, Tom.

> In the meantime, upping the number of buffers will at least postpone the
> problem.  But I'm worried that it may not solve it completely --- you
> may still find that the error occurs after you've been running long
> enough.

Can I postpone/workaround the problem by periodic server restarts to reset
the counts?  Or is this a persistent thing across server restarts?

Cheers,
Ed Loehr



Re: [HACKERS] Re: ERROR: out of free buffers: time to abort !

From
Tom Lane
Date:
Ed Loehr <eloehr@austin.rr.com> writes:
>> In the meantime, upping the number of buffers will at least postpone the
>> problem.  But I'm worried that it may not solve it completely --- you
>> may still find that the error occurs after you've been running long
>> enough.

> Can I postpone/workaround the problem by periodic server restarts to reset
> the counts?  Or is this a persistent thing across server restarts?

Yes, a postmaster restart would clean up the buffer reference counts.
I think there were also some less drastic code paths that would clean
them up --- you might try something as simple as deliberately inducing
an SQL error now and then, so that error cleanup runs.
        regards, tom lane


Re: [HACKERS] Re: ERROR: out of free buffers: time to abort !

From
Ed Loehr
Date:
Tom Lane wrote:

> > Can I postpone/workaround the problem by periodic server restarts to reset
> > the counts?  Or is this a persistent thing across server restarts?
>
> Yes, a postmaster restart would clean up the buffer reference counts.
> I think there were also some less drastic code paths that would clean
> them up --- you might try something as simple as deliberately inducing
> an SQL error now and then, so that error cleanup runs.

What *kind* of SQL error would trigger the cleanup?  I've certainly had
numerous SQL errors prior to this problem showing up (parse errors, misnamed
attributes, ...), but that didn't apparently fix the problem system wide.

Also, are these buffer counts per backend or per postmaster?  In other words,
does a particular kind of SQL error need to occur on each backend?

Cheers,
Ed Loehr



Re: [HACKERS] Re: ERROR: out of free buffers: time to abort !

From
Tom Lane
Date:
Ed Loehr <eloehr@austin.rr.com> writes:
> Tom Lane wrote:
>> Yes, a postmaster restart would clean up the buffer reference counts.
>> I think there were also some less drastic code paths that would clean
>> them up --- you might try something as simple as deliberately inducing
>> an SQL error now and then, so that error cleanup runs.

> What *kind* of SQL error would trigger the cleanup?

Actually, on looking at the code it doesn't seem that error recovery
will fix things --- nothing short of a postmaster restart will do it.

Instead of hacking up your application code to work around this problem,
why don't you try applying the following patch to the 6.5.3 sources.
You may get some "Buffer Leak" notice messages, but it ought to work
better than it does now.  (I think --- this is off-the-cuff and not
tested ... but the complete changes that I put into current sources are
much too large to risk back-patching.)

Keep us posted.
        regards, tom lane

*** src/backend/storage/buffer/bufmgr.c~    Sat Jan  8 17:44:58 2000
--- src/backend/storage/buffer/bufmgr.c    Sat Jan  8 17:49:15 2000
***************
*** 1202,1213 ****     for (i = 1; i <= NBuffers; i++)     {         CommitInfoNeedsSave[i - 1] = 0;         if
(BufferIsValid(i))        {             while (PrivateRefCount[i - 1] > 0)                 ReleaseBuffer(i);         }
 
-         LastRefCount[i - 1] = 0;     }      ResetLocalBufferPool();
--- 1202,1218 ----     for (i = 1; i <= NBuffers; i++)     {         CommitInfoNeedsSave[i - 1] = 0;
+         /*
+          * quick hack: any refcount still being held in LastRefCount
+          * needs to be released.
+          */
+         PrivateRefCount[i - 1] += LastRefCount[i - 1];
+         LastRefCount[i - 1] = 0;         if (BufferIsValid(i))         {             while (PrivateRefCount[i - 1] >
0)                ReleaseBuffer(i);         }     }      ResetLocalBufferPool();
 
***************
*** 1228,1233 ****
--- 1233,1244 ----      for (i = 1; i <= NBuffers; i++)     {
+         /*
+          * quick hack: any refcount still being held in LastRefCount
+          * needs to be released.
+          */
+         PrivateRefCount[i - 1] += LastRefCount[i - 1];
+         LastRefCount[i - 1] = 0;         if (BufferIsValid(i))         {             BufferDesc *buf =
&(BufferDescriptors[i- 1]);
 


Re: [HACKERS] Re: ERROR: out of free buffers: time to abort !

From
Ed Loehr
Date:
Tom Lane wrote:

> Instead of hacking up your application code to work around this problem,
> why don't you try applying the following patch to the 6.5.3 sources.

I am running 6.5.2.  Were there any other pertinent changes from 6.5.2 to 6.5.3
that would make you uncomfortable about applying that patch to 6.5.2?

Cheers,
Ed Loehr




Re: [HACKERS] Re: ERROR: out of free buffers: time to abort !

From
Tom Lane
Date:
Ed Loehr <eloehr@austin.rr.com> writes:
> Tom Lane wrote:
>> Instead of hacking up your application code to work around this problem,
>> why don't you try applying the following patch to the 6.5.3 sources.

> I am running 6.5.2.  Were there any other pertinent changes from 6.5.2 to 6.5.3
> that would make you uncomfortable about applying that patch to 6.5.2?

No, but I would recommend trying it in a playpen installation, in any
case, not straight into production servers ;-)
        regards, tom lane