Thread: ERROR: out of free buffers: time to abort !
This problem was posted 12/24/1999 to GENERAL with no answer...hoping one of you know the easy answer... I am seeing the following error during a DB rebuild. It is occuring during the execution of a PL/pgSQL procedure which is called from a trigger procedure on an AFTER INSERT trigger... ERROR: out of free buffers: time to abort ! The insert fails. This is under pgsql 6.5.2, redhat 6.1, built from tgz, running under "postmaster -i -N 15 -o -F -S 4096"... Any ideas? Cheers, Ed Loehr
> I am seeing the following error during a DB rebuild. It is > occuring during the execution of a PL/pgSQL procedure which is > called from a trigger procedure on an AFTER INSERT trigger... > > ERROR: out of free buffers: time to abort ! > > The insert fails. This is under pgsql 6.5.2, redhat 6.1, built > from tgz, running under "postmaster -i -N 15 -o -F -S 4096"... > > Any ideas? This problem disappears when I up the number of shared mem buffers with the -B flag from default of 64 to 256. Cheers, Ed Loehr
Ed Loehr <eloehr@austin.rr.com> writes: >> I am seeing the following error during a DB rebuild. It is >> occuring during the execution of a PL/pgSQL procedure which is >> called from a trigger procedure on an AFTER INSERT trigger... >> >> ERROR: out of free buffers: time to abort ! >> >> The insert fails. This is under pgsql 6.5.2, redhat 6.1, built >> from tgz, running under "postmaster -i -N 15 -o -F -S 4096"... > This problem disappears when I up the number of shared mem buffers > with the -B flag from default of 64 to 256. That's the message you get if all the disk buffers are marked as "in use" (ref count > 0) so that there is noplace to read in another database page. I fixed several nasty buffer-ref-count-leakage bugs a couple of months ago, so I think this problem may be gone in current sources. (I'd appreciate it if you'd try this test case as soon as we are ready for 7.0 beta...) In the meantime, upping the number of buffers will at least postpone the problem. But I'm worried that it may not solve it completely --- you may still find that the error occurs after you've been running long enough. regards, tom lane
Tom Lane wrote: > >> I am seeing the following error during a DB rebuild. > >> ERROR: out of free buffers: time to abort ! > >> > > This problem disappears when I up the number of shared mem buffers > > with the -B flag from default of 64 to 256. > > That's the message you get if all the disk buffers are marked as > "in use" (ref count > 0) so that there is noplace to read in another > database page. I fixed several nasty buffer-ref-count-leakage bugs > a couple of months ago, so I think this problem may be gone in current > sources. (I'd appreciate it if you'd try this test case as soon as > we are ready for 7.0 beta...) Great. Thanks again, Tom. > In the meantime, upping the number of buffers will at least postpone the > problem. But I'm worried that it may not solve it completely --- you > may still find that the error occurs after you've been running long > enough. Can I postpone/workaround the problem by periodic server restarts to reset the counts? Or is this a persistent thing across server restarts? Cheers, Ed Loehr
Ed Loehr <eloehr@austin.rr.com> writes: >> In the meantime, upping the number of buffers will at least postpone the >> problem. But I'm worried that it may not solve it completely --- you >> may still find that the error occurs after you've been running long >> enough. > Can I postpone/workaround the problem by periodic server restarts to reset > the counts? Or is this a persistent thing across server restarts? Yes, a postmaster restart would clean up the buffer reference counts. I think there were also some less drastic code paths that would clean them up --- you might try something as simple as deliberately inducing an SQL error now and then, so that error cleanup runs. regards, tom lane
Tom Lane wrote: > > Can I postpone/workaround the problem by periodic server restarts to reset > > the counts? Or is this a persistent thing across server restarts? > > Yes, a postmaster restart would clean up the buffer reference counts. > I think there were also some less drastic code paths that would clean > them up --- you might try something as simple as deliberately inducing > an SQL error now and then, so that error cleanup runs. What *kind* of SQL error would trigger the cleanup? I've certainly had numerous SQL errors prior to this problem showing up (parse errors, misnamed attributes, ...), but that didn't apparently fix the problem system wide. Also, are these buffer counts per backend or per postmaster? In other words, does a particular kind of SQL error need to occur on each backend? Cheers, Ed Loehr
Ed Loehr <eloehr@austin.rr.com> writes: > Tom Lane wrote: >> Yes, a postmaster restart would clean up the buffer reference counts. >> I think there were also some less drastic code paths that would clean >> them up --- you might try something as simple as deliberately inducing >> an SQL error now and then, so that error cleanup runs. > What *kind* of SQL error would trigger the cleanup? Actually, on looking at the code it doesn't seem that error recovery will fix things --- nothing short of a postmaster restart will do it. Instead of hacking up your application code to work around this problem, why don't you try applying the following patch to the 6.5.3 sources. You may get some "Buffer Leak" notice messages, but it ought to work better than it does now. (I think --- this is off-the-cuff and not tested ... but the complete changes that I put into current sources are much too large to risk back-patching.) Keep us posted. regards, tom lane *** src/backend/storage/buffer/bufmgr.c~ Sat Jan 8 17:44:58 2000 --- src/backend/storage/buffer/bufmgr.c Sat Jan 8 17:49:15 2000 *************** *** 1202,1213 **** for (i = 1; i <= NBuffers; i++) { CommitInfoNeedsSave[i - 1] = 0; if (BufferIsValid(i)) { while (PrivateRefCount[i - 1] > 0) ReleaseBuffer(i); } - LastRefCount[i - 1] = 0; } ResetLocalBufferPool(); --- 1202,1218 ---- for (i = 1; i <= NBuffers; i++) { CommitInfoNeedsSave[i - 1] = 0; + /* + * quick hack: any refcount still being held in LastRefCount + * needs to be released. + */ + PrivateRefCount[i - 1] += LastRefCount[i - 1]; + LastRefCount[i - 1] = 0; if (BufferIsValid(i)) { while (PrivateRefCount[i - 1] > 0) ReleaseBuffer(i); } } ResetLocalBufferPool(); *************** *** 1228,1233 **** --- 1233,1244 ---- for (i = 1; i <= NBuffers; i++) { + /* + * quick hack: any refcount still being held in LastRefCount + * needs to be released. + */ + PrivateRefCount[i - 1] += LastRefCount[i - 1]; + LastRefCount[i - 1] = 0; if (BufferIsValid(i)) { BufferDesc *buf = &(BufferDescriptors[i- 1]);
Tom Lane wrote: > Instead of hacking up your application code to work around this problem, > why don't you try applying the following patch to the 6.5.3 sources. I am running 6.5.2. Were there any other pertinent changes from 6.5.2 to 6.5.3 that would make you uncomfortable about applying that patch to 6.5.2? Cheers, Ed Loehr
Ed Loehr <eloehr@austin.rr.com> writes: > Tom Lane wrote: >> Instead of hacking up your application code to work around this problem, >> why don't you try applying the following patch to the 6.5.3 sources. > I am running 6.5.2. Were there any other pertinent changes from 6.5.2 to 6.5.3 > that would make you uncomfortable about applying that patch to 6.5.2? No, but I would recommend trying it in a playpen installation, in any case, not straight into production servers ;-) regards, tom lane