Thread: FlushRelationBuffers returned -2

FlushRelationBuffers returned -2

From
Magnus Hagander
Date:
I noticed my nightly vacuum process has started dying on a certain relation
last night. When I try to vaccum verbose it, I get the following output.

I can still do "SELECT * FROM filelist".

filelist is a table which has 12146 rows in it, with some 15 columns. Two
fields in approx 8-9000 of these rows are updated each night. No other
activity in the database while I vacuum (nobody even connected).

Restarting the postmaster corrected the problem.

Database is version 7.0.2 running on Linux 2.2.16/libc5. Started with
parameter "-B 512".

//Magnus



NOTICE:  --Relation filelist--
NOTICE:  Pages 847: Changed 0, reaped 821, Empty 0, New 0; Tup 12146: Vac
10733, Keep/VTL 0/0, Crash 0, UnUsed 11548, MinLen 92, MaxLen 1165;
Re-using: Free/Avail. Space 4329756/5420; EndEmpty/Avail. Pages 537/50. CPU
0.36s/0.09u sec.
NOTICE:  Index filelist_date_index: Pages 129; Tuples 12146: Deleted 0. CPU
0.04s/0.12u sec.
NOTICE:  Index filelist_dirid_index: Pages 134; Tuples 12146: Deleted 0. CPU
0.03s/0.18u sec.
NOTICE:  Index filelist_id_index: Pages 129; Tuples 12146: Deleted 0. CPU
0.06s/0.16u sec.
NOTICE:  Index filelist_name_index: Pages 247; Tuples 12146: Deleted 0. CPU
0.17s/0.15u sec.
NOTICE:  Rel filelist: Pages: 847 --> 310; Tuple(s) moved: 0. CPU
0.10s/0.07u sec.
NOTICE:  FlushRelationBuffers(filelist, 310): block 0 is referenced (private
0, global 5)
FATAL 1:  VACUUM (vc_repair_frag): FlushRelationBuffers returned -2
pqReadData() -- backend closed the channel unexpectedly.       This probably means the backend terminated abnormally
  before or while processing the request.
 


Re: FlushRelationBuffers returned -2

From
Tom Lane
Date:
Magnus Hagander <mha@sollentuna.net> writes:
> I noticed my nightly vacuum process has started dying on a certain relation
> last night. When I try to vaccum verbose it, I get the following output.

> NOTICE:  FlushRelationBuffers(filelist, 310): block 0 is referenced (private
> 0, global 5)
> FATAL 1:  VACUUM (vc_repair_frag): FlushRelationBuffers returned -2

> Restarting the postmaster corrected the problem.

That's what I was about to suggest trying.  It sounds like something
crashed and left the buffer reference count incremented above zero for
one of the pages of the relation.  In fact, several somethings, five of
them to be exact.

Have you had any interesting backend crashes lately?  Are you doing
anything unusual with that table? It would seem that whatever is causing
this is at least moderately reproducible in your environment, since it's
happened more than once.  It'd be easier to track down if you could
identify what sequence of operations causes the buffer refcount to be
left incremented.

BTW, don't be afraid of the fact VACUUM aborts --- it's just being
paranoid about the possibility that someone else is using this table
that it's supposed to have an exclusive lock on.
        regards, tom lane


RE: FlushRelationBuffers returned -2

From
Magnus Hagander
Date:
> Magnus Hagander <mha@sollentuna.net> writes:
> > I noticed my nightly vacuum process has started dying on a 
> certain relation
> > last night. When I try to vaccum verbose it, I get the 
> following output.
> 
> > NOTICE:  FlushRelationBuffers(filelist, 310): block 0 is 
> referenced (private
> > 0, global 5)
> > FATAL 1:  VACUUM (vc_repair_frag): FlushRelationBuffers returned -2
> 
> > Restarting the postmaster corrected the problem.
> 
> That's what I was about to suggest trying.  It sounds like something
> crashed and left the buffer reference count incremented above zero for
> one of the pages of the relation.  In fact, several 
> somethings, five of
> them to be exact.
> 
> Have you had any interesting backend crashes lately?  Are you doing
> anything unusual with that table? It would seem that whatever 
> is causing
> this is at least moderately reproducible in your environment, 
> since it's
> happened more than once.  It'd be easier to track down if you could
> identify what sequence of operations causes the buffer refcount to be
> left incremented.
I don't *think* there should be anything like that. Looking at my log, there
are the following entries (other than the usual "unexpected EOF on client
connection" from scripts that are aborted...

ERROR:  regcomp failed with error empty (sub)expression
pq_flush: send() failed: Broken pipe
ERROR:  Named portals may only be used in begin/end transaction blocks

Apart from that, it's just the usual EXPLAIN output and misspelled queries
(complaning about tables taht don't exist, etc, when somebody misspelled
them). All these queries worked with other tables, but in the same database
(a partially different project).

There is no "core" file anywhere in the pgsql directory structure, and
nothing in the log about any backend crash.

I'll keep my eyes open if it happens again, and try to find a pattern.


> BTW, don't be afraid of the fact VACUUM aborts --- it's just being
> paranoid about the possibility that someone else is using this table
> that it's supposed to have an exclusive lock on.
Ok. Better safe than sorry :-) Good to know.

//Magnus