Thread: VACUUM hanging on idle system
I have a completely idle postgresql system (all backends "idle", none "in transaction"); every time I attempt to vacuum a particular table, it hangs after a while. Here is the output of vacuumdb: INFO: vacuuming "public.ledgerdetail" INFO: index "ledgerdetail_pkey" now contains 11574842 row versions in 33032 pages DETAIL: 1 index row versions were removed. 42 index pages have been deleted, 42 are currently reusable. CPU 1.23s/1.57u sec elapsed 12.13 sec. The table has another index besides that one. In the system error log there is this line: could not write block 32756 of relation 1663/134097/385649401: No space left on device The referenced block is in the table's index file. The disk has plenty of space. I've seen that semaphore operations can give an "out of space" error. I wrote a program to dump the state of the semaphores, and ran it with the system idle and with the vacuum hanging. There are about 25 semaphore sets in the system, but only one changes. Here is the before and after: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 pid ... value ... ncnt ... zcnt ... Set 131091 47542 47395 47376 47375 47374 47373 47372 47371 84426 47370 47541 45292 46849 46810 46809 46808 530 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 537 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Set 131091 47542 47395 47376 47375 47374 47373 47372 47371 84426 47370 47541 45381 46849 46810 46809 46808 530 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 537 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 The pid of the backend doing the vacuum is 45381, and it seems to be waiting on semaphore #11. (I have the complete dump if it's of interest.) Someone tried to drop the index and got this: ERROR: FlushRelationBuffers("idx_ld1", 0): block 32756 is referenced (private 0, global 1) I'm going to restart postgresql and see if any possible semaphore problem goes away. Any ideas about what the problem is here? Thanks.
"Clarence" <clarence1126@gmail.com> writes: > I have a completely idle postgresql system (all backends "idle", none > "in transaction"); every time I attempt to vacuum a particular table, > it hangs after a while. > ... > In the system error log there is this line: > could not write block 32756 of relation 1663/134097/385649401: > No space left on device > The referenced block is in the table's index file. The disk has plenty > of space. ... but at one time not so much, right? > Someone tried to drop the index and got this: > ERROR: FlushRelationBuffers("idx_ld1", 0): block 32756 is referenced > (private 0, global 1) What it looks like to me is that some backend tried to write out a dirty page, failed for lack of disk space, and neglected to release the buffer pin during error recovery. The extra pin would block VACUUM but not much of anything else. A postmaster restart should clear the problem. I seem to recall having fixed a bug like this in the past. What PG version are you running? regards, tom lane
Tom Lane wrote: > "Clarence" <clarence1126@gmail.com> writes: > > I have a completely idle postgresql system (all backends "idle", none > > "in transaction"); every time I attempt to vacuum a particular table, > > it hangs after a while. > > ... > > In the system error log there is this line: > > could not write block 32756 of relation 1663/134097/385649401: > > No space left on device > > The referenced block is in the table's index file. The disk has plenty > > of space. > > ... but at one time not so much, right? At the time of the error, there was over 1000 gigabytes free. The least there has ever been was about 200 gigs free. > > > Someone tried to drop the index and got this: > > ERROR: FlushRelationBuffers("idx_ld1", 0): block 32756 is referenced > > (private 0, global 1) > > What it looks like to me is that some backend tried to write out a dirty > page, failed for lack of disk space, and neglected to release the buffer > pin during error recovery. The extra pin would block VACUUM but not > much of anything else. A postmaster restart should clear the problem. Yes, vacuuming was the only operation affected, and restarting did fix the problem. > > I seem to recall having fixed a bug like this in the past. What PG > version are you running? We're running 8.0.3
"Clarence" <clarence1126@gmail.com> writes: > Tom Lane wrote: >> I seem to recall having fixed a bug like this in the past. What PG >> version are you running? > We're running 8.0.3 Um. Extract from the 8.0.5 CVS logs: 2005-12-08 14:19 tgl * src/backend/: postmaster/bgwriter.c, utils/resowner/resowner.c (REL8_0_STABLE): Fix bgwriter's failure to release buffer pins and open files after an error. This probably explains bug #2099 and could also account for mysterious VACUUM hangups. 8.0 branch is currently at 8.0.8 ... regards, tom lane
Tom Lane wrote: > Um. Extract from the 8.0.5 CVS logs: > > 2005-12-08 14:19 tgl > > * src/backend/: postmaster/bgwriter.c, utils/resowner/resowner.c > (REL8_0_STABLE): Fix bgwriter's failure to release buffer pins and > open files after an error. This probably explains bug #2099 and > could also account for mysterious VACUUM hangups. > > 8.0 branch is currently at 8.0.8 ... Thanks, Tom