Thread: VACUUM hanging on idle system

VACUUM hanging on idle system

From
"Clarence"
Date:
I have a completely idle postgresql system (all backends "idle", none
"in transaction"); every time I attempt to vacuum a particular table,
it hangs
after a while. Here is the output of vacuumdb:

INFO:  vacuuming "public.ledgerdetail"
INFO:  index "ledgerdetail_pkey" now contains 11574842 row versions in
33032
pages
DETAIL:  1 index row versions were removed.
42 index pages have been deleted, 42 are currently reusable.
CPU 1.23s/1.57u sec elapsed 12.13 sec.

The table has another index besides that one. In the system error log
there
is this line:
   could not write block 32756 of relation 1663/134097/385649401:
   No space left on device
The referenced block is in the table's index file. The disk has plenty
of space.
I've seen that semaphore operations can give an "out of space" error. I
wrote
a program to dump the state of the semaphores, and ran it with the
system
idle and with the vacuum hanging. There are about 25 semaphore sets in
the
system, but only one changes. Here is the before and after:

    0     1     2     3     4     5     6     7     8     9    10    11
   12    13    14    15    16
pid  ...
value ...
ncnt  ...
zcnt  ...

Set 131091
47542 47395 47376 47375 47374 47373 47372 47371 84426 47370 47541 45292
46849 46810 46809 46808   530
    0     0     0     0     0     0     0     0     0     0     0     0
    0     0     0     0   537
    0     0     0     0     0     0     0     0     0     0     0     0
    0     0     0     0     0
    0     0     0     0     0     0     0     0     0     0     0     0
    0     0     0     0     0

Set 131091
47542 47395 47376 47375 47374 47373 47372 47371 84426 47370 47541 45381
46849 46810 46809 46808   530
    0     0     0     0     0     0     0     0     0     0     0     0
    0     0     0     0   537
    0     0     0     0     0     0     0     0     0     0     0     1
    0     0     0     0     0
    0     0     0     0     0     0     0     0     0     0     0     0
    0     0     0     0     0

The pid of the backend doing the vacuum is 45381, and it seems to be
waiting
on semaphore #11. (I have the complete dump if it's of interest.)

Someone tried to drop the index and got this:
ERROR:  FlushRelationBuffers("idx_ld1", 0): block 32756 is referenced
(private
0, global 1)

I'm going to restart postgresql and see if any possible semaphore
problem
goes away.


Any ideas about what the problem is here?

Thanks.


Re: VACUUM hanging on idle system

From
Tom Lane
Date:
"Clarence" <clarence1126@gmail.com> writes:
> I have a completely idle postgresql system (all backends "idle", none
> "in transaction"); every time I attempt to vacuum a particular table,
> it hangs after a while.
> ...
> In the system error log there is this line:
>    could not write block 32756 of relation 1663/134097/385649401:
>    No space left on device
> The referenced block is in the table's index file. The disk has plenty
> of space.

... but at one time not so much, right?

> Someone tried to drop the index and got this:
> ERROR:  FlushRelationBuffers("idx_ld1", 0): block 32756 is referenced
> (private 0, global 1)

What it looks like to me is that some backend tried to write out a dirty
page, failed for lack of disk space, and neglected to release the buffer
pin during error recovery.  The extra pin would block VACUUM but not
much of anything else.  A postmaster restart should clear the problem.

I seem to recall having fixed a bug like this in the past.  What PG
version are you running?

            regards, tom lane

Re: VACUUM hanging on idle system

From
"Clarence"
Date:
Tom Lane wrote:
> "Clarence" <clarence1126@gmail.com> writes:
> > I have a completely idle postgresql system (all backends "idle", none
> > "in transaction"); every time I attempt to vacuum a particular table,
> > it hangs after a while.
> > ...
> > In the system error log there is this line:
> >    could not write block 32756 of relation 1663/134097/385649401:
> >    No space left on device
> > The referenced block is in the table's index file. The disk has plenty
> > of space.
>
> ... but at one time not so much, right?

At the time of the error, there was over 1000 gigabytes free. The least
there has ever been was about 200 gigs free.

>
> > Someone tried to drop the index and got this:
> > ERROR:  FlushRelationBuffers("idx_ld1", 0): block 32756 is referenced
> > (private 0, global 1)
>
> What it looks like to me is that some backend tried to write out a dirty
> page, failed for lack of disk space, and neglected to release the buffer
> pin during error recovery.  The extra pin would block VACUUM but not
> much of anything else.  A postmaster restart should clear the problem.

Yes, vacuuming was the only operation affected, and restarting did fix
the problem.

>
> I seem to recall having fixed a bug like this in the past.  What PG
> version are you running?

We're running 8.0.3


Re: VACUUM hanging on idle system

From
Tom Lane
Date:
"Clarence" <clarence1126@gmail.com> writes:
> Tom Lane wrote:
>> I seem to recall having fixed a bug like this in the past.  What PG
>> version are you running?

> We're running 8.0.3

Um.  Extract from the 8.0.5 CVS logs:

2005-12-08 14:19  tgl

    * src/backend/: postmaster/bgwriter.c, utils/resowner/resowner.c
    (REL8_0_STABLE): Fix bgwriter's failure to release buffer pins and
    open files after an error.  This probably explains bug #2099 and
    could also account for mysterious VACUUM hangups.

8.0 branch is currently at 8.0.8 ...

            regards, tom lane

Re: VACUUM hanging on idle system

From
"Clarence"
Date:
Tom Lane wrote:
> Um.  Extract from the 8.0.5 CVS logs:
>
> 2005-12-08 14:19  tgl
>
>     * src/backend/: postmaster/bgwriter.c, utils/resowner/resowner.c
>     (REL8_0_STABLE): Fix bgwriter's failure to release buffer pins and
>     open files after an error.  This probably explains bug #2099 and
>     could also account for mysterious VACUUM hangups.
>
> 8.0 branch is currently at 8.0.8 ...


Thanks, Tom