Thread: FlushRelationBuffers returned -2
I noticed my nightly vacuum process has started dying on a certain relation last night. When I try to vaccum verbose it, I get the following output. I can still do "SELECT * FROM filelist". filelist is a table which has 12146 rows in it, with some 15 columns. Two fields in approx 8-9000 of these rows are updated each night. No other activity in the database while I vacuum (nobody even connected). Restarting the postmaster corrected the problem. Database is version 7.0.2 running on Linux 2.2.16/libc5. Started with parameter "-B 512". //Magnus NOTICE: --Relation filelist-- NOTICE: Pages 847: Changed 0, reaped 821, Empty 0, New 0; Tup 12146: Vac 10733, Keep/VTL 0/0, Crash 0, UnUsed 11548, MinLen 92, MaxLen 1165; Re-using: Free/Avail. Space 4329756/5420; EndEmpty/Avail. Pages 537/50. CPU 0.36s/0.09u sec. NOTICE: Index filelist_date_index: Pages 129; Tuples 12146: Deleted 0. CPU 0.04s/0.12u sec. NOTICE: Index filelist_dirid_index: Pages 134; Tuples 12146: Deleted 0. CPU 0.03s/0.18u sec. NOTICE: Index filelist_id_index: Pages 129; Tuples 12146: Deleted 0. CPU 0.06s/0.16u sec. NOTICE: Index filelist_name_index: Pages 247; Tuples 12146: Deleted 0. CPU 0.17s/0.15u sec. NOTICE: Rel filelist: Pages: 847 --> 310; Tuple(s) moved: 0. CPU 0.10s/0.07u sec. NOTICE: FlushRelationBuffers(filelist, 310): block 0 is referenced (private 0, global 5) FATAL 1: VACUUM (vc_repair_frag): FlushRelationBuffers returned -2 pqReadData() -- backend closed the channel unexpectedly. This probably means the backend terminated abnormally before or while processing the request.
Magnus Hagander <mha@sollentuna.net> writes: > I noticed my nightly vacuum process has started dying on a certain relation > last night. When I try to vaccum verbose it, I get the following output. > NOTICE: FlushRelationBuffers(filelist, 310): block 0 is referenced (private > 0, global 5) > FATAL 1: VACUUM (vc_repair_frag): FlushRelationBuffers returned -2 > Restarting the postmaster corrected the problem. That's what I was about to suggest trying. It sounds like something crashed and left the buffer reference count incremented above zero for one of the pages of the relation. In fact, several somethings, five of them to be exact. Have you had any interesting backend crashes lately? Are you doing anything unusual with that table? It would seem that whatever is causing this is at least moderately reproducible in your environment, since it's happened more than once. It'd be easier to track down if you could identify what sequence of operations causes the buffer refcount to be left incremented. BTW, don't be afraid of the fact VACUUM aborts --- it's just being paranoid about the possibility that someone else is using this table that it's supposed to have an exclusive lock on. regards, tom lane
> Magnus Hagander <mha@sollentuna.net> writes: > > I noticed my nightly vacuum process has started dying on a > certain relation > > last night. When I try to vaccum verbose it, I get the > following output. > > > NOTICE: FlushRelationBuffers(filelist, 310): block 0 is > referenced (private > > 0, global 5) > > FATAL 1: VACUUM (vc_repair_frag): FlushRelationBuffers returned -2 > > > Restarting the postmaster corrected the problem. > > That's what I was about to suggest trying. It sounds like something > crashed and left the buffer reference count incremented above zero for > one of the pages of the relation. In fact, several > somethings, five of > them to be exact. > > Have you had any interesting backend crashes lately? Are you doing > anything unusual with that table? It would seem that whatever > is causing > this is at least moderately reproducible in your environment, > since it's > happened more than once. It'd be easier to track down if you could > identify what sequence of operations causes the buffer refcount to be > left incremented. I don't *think* there should be anything like that. Looking at my log, there are the following entries (other than the usual "unexpected EOF on client connection" from scripts that are aborted... ERROR: regcomp failed with error empty (sub)expression pq_flush: send() failed: Broken pipe ERROR: Named portals may only be used in begin/end transaction blocks Apart from that, it's just the usual EXPLAIN output and misspelled queries (complaning about tables taht don't exist, etc, when somebody misspelled them). All these queries worked with other tables, but in the same database (a partially different project). There is no "core" file anywhere in the pgsql directory structure, and nothing in the log about any backend crash. I'll keep my eyes open if it happens again, and try to find a pattern. > BTW, don't be afraid of the fact VACUUM aborts --- it's just being > paranoid about the possibility that someone else is using this table > that it's supposed to have an exclusive lock on. Ok. Better safe than sorry :-) Good to know. //Magnus