Thread: FlushRelationBuffers error
Hi all, I'm running postgres 7.4.5 on a linux box, this morning I got this error on my logs: WARNING: FlushRelationBuffers("exp_provider", 1836): block 1460 is referenced (private 0, global 1) ERROR: FlushRelationBuffers returned -2 DEBUG: AbortCurrentTransaction PANIC: cannot abort transaction 354676201, it was already committed after the recovery: ERROR: could not access status of transaction 352975274 DEBUG: AbortCurrentTransaction this messages for 5 hours I had my verbosity equal to terse ( I run the server with debug2 level ) so I didn't see the exactly reason for this, after putting verbosity to "verbose" I got the entire message: ERROR: 58P01: could not access status of transaction 352975274 DETAIL: could not open file "/var/lib/pgsql/data/pg_clog/0150": No such file or directory LOCATION: SlruReportIOError, slru.c:609 DEBUG: 00000: AbortCurrentTransaction LOCATION: PostgresMain, postgres.c:2721 In the pg_clog directory I had only the file 0152 ! I had to create a 8k file with zeroes and I discover the offset: ERROR: XX000: could not access status of transaction 352975274 DETAIL: could not read from file "/var/lib/pgsql/data/pg_clog/0150" at offset 155648: Success LOCATION: SlruReportIOError, slru.c:630 DEBUG: 00000: AbortCurrentTransaction LOCATION: PostgresMain, postgres.c:2721 After creating that file till to cover that offset the problem seems be fixed. Info for hackers: exp_provider is an index and during that message a reindex was in place. Some questions: What about the 0151 file? Don't you think that even with verbosity terse the message about the file missing shall appear ? Why emit the offset only if the file was found ? I have to thank Neil Conway that was helping me on IRC about this error. If you need further infos, please let me know. Regards Gaetano Mendola
Any chance for bad memory? Jan On 9/30/2004 6:16 AM, Gaetano Mendola wrote: > Hi all, > I'm running postgres 7.4.5 on a linux box, this morning I got this error on my logs: > > WARNING: FlushRelationBuffers("exp_provider", 1836): block 1460 is referenced (private 0, global 1) > ERROR: FlushRelationBuffers returned -2 > DEBUG: AbortCurrentTransaction > PANIC: cannot abort transaction 354676201, it was already committed > > after the recovery: > > ERROR: could not access status of transaction 352975274 > DEBUG: AbortCurrentTransaction > > this messages for 5 hours > > > > I had my verbosity equal to terse ( I run the server with debug2 level ) so I didn't see the > exactly reason for this, after putting verbosity to "verbose" I got the entire message: > > ERROR: 58P01: could not access status of transaction 352975274 > DETAIL: could not open file "/var/lib/pgsql/data/pg_clog/0150": No such file or directory > LOCATION: SlruReportIOError, slru.c:609 > DEBUG: 00000: AbortCurrentTransaction > LOCATION: PostgresMain, postgres.c:2721 > > In the pg_clog directory I had only the file 0152 ! > > > I had to create a 8k file with zeroes and I discover the offset: > > ERROR: XX000: could not access status of transaction 352975274 > DETAIL: could not read from file "/var/lib/pgsql/data/pg_clog/0150" at offset 155648: Success > LOCATION: SlruReportIOError, slru.c:630 > DEBUG: 00000: AbortCurrentTransaction > LOCATION: PostgresMain, postgres.c:2721 > > After creating that file till to cover that offset the problem seems be fixed. > > Info for hackers: exp_provider is an index and during that message a reindex was in place. > > Some questions: > What about the 0151 file? > Don't you think that even with verbosity terse the message about the file missing shall appear ? > Why emit the offset only if the file was found ? > > I have to thank Neil Conway that was helping me on IRC about this error. > > If you need further infos, please let me know. > > Regards > Gaetano Mendola > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Gaetano Mendola <mendola@bigfoot.com> writes: > I'm running postgres 7.4.5 on a linux box, this morning I got this error on my logs: > WARNING: FlushRelationBuffers("exp_provider", 1836): block 1460 is referenced (private 0, global 1) > ERROR: FlushRelationBuffers returned -2 > Info for hackers: exp_provider is an index and during that message a reindex was in place. AFAICS, the only place that FlushRelationBuffers is called on an index with a nonzero second argument is in btvacuumcleanup, ie, shortening an index at the conclusion of VACUUM FULL. REINDEX wouldn't have anything to do with it. The message appears to indicate that some other backend was holding a pin on a buffer containing one of the blocks of the index. Which should definitely not happen, because VACUUM has an exclusive lock at the table level, and so no other backend should be touching the index at all. One thing I am wondering about is that VACUUM FULL relies on an exclusive lock on a table to prevent outside access to the table's indexes as well as the table itself. It might be better to spend a few more cycles and obtain locks on the indexes too. I don't think there are any situations where the main backend could access an index without having any lock on the parent table, but there are some contrib functions like pgstattuple that I'm not so sure about. Are you using any contrib or locally written C functions in your database? regards, tom lane
Jan Wieck wrote:> Any chance for bad memory?> I'll say near 0. However who never knows ? Now the server is again up and running without glitches. I suspect a race condition somewhere for the reindex operation. I had with the engine 7.3 ( see in the archives ) a duplicate error during reindexes at least one each month, for instance the server was another one, and at that time I solved it not reindexing the DB daily ( so I decreased the chances ). With the 7.4 is the first time, since November 2003, that I see this error ( and for coincidence during a reindex too ) so I suspect that the race condition is still there but with less chance to pops up. Is it so dangerous teach the postmaster to solve this kind of problems without a direct user intervention ? Regards Gaetano Mendola > On 9/30/2004 6:16 AM, Gaetano Mendola wrote:>>> Hi all,>> I'm running postgres 7.4.5 on a linux box, this morning I gotthis>> error on my logs:>>>> WARNING: FlushRelationBuffers("exp_provider", 1836): block 1460 is>> referenced (private0, global 1)>> ERROR: FlushRelationBuffers returned -2>> DEBUG: AbortCurrentTransaction>> PANIC: cannot aborttransaction 354676201, it was already committed>>>> after the recovery:>>>> ERROR: could not access status of transaction352975274>> DEBUG: AbortCurrentTransaction>>>> this messages for 5 hours>>>>>>>> I had my verbosity equal toterse ( I run the server with debug2 level>> ) so I didn't see the>> exactly reason for this, after putting verbosity to"verbose" I got>> the entire message:>>>> ERROR: 58P01: could not access status of transaction 352975274>> DETAIL: couldnot open file "/var/lib/pgsql/data/pg_clog/0150": No>> such file or directory>> LOCATION: SlruReportIOError, slru.c:609>>DEBUG: 00000: AbortCurrentTransaction>> LOCATION: PostgresMain, postgres.c:2721>>>> In the pg_clog directoryI had only the file 0152 !>>>>>> I had to create a 8k file with zeroes and I discover the offset:>>>> ERROR: XX000: could not access status of transaction 352975274>> DETAIL: could not read from file "/var/lib/pgsql/data/pg_clog/0150">>at offset 155648: Success>> LOCATION: SlruReportIOError, slru.c:630>> DEBUG: 00000:AbortCurrentTransaction>> LOCATION: PostgresMain, postgres.c:2721>>>> After creating that file till to cover thatoffset the problem seems>> be fixed.>>>> Info for hackers: exp_provider is an index and during that message a>> reindexwas in place.>>>> Some questions:>> What about the 0151 file?>> Don't you think that even with verbosity terse themessage about the>> file missing shall appear ?>> Why emit the offset only if the file was found ?>>>> I have to thankNeil Conway that was helping me on IRC about this error.>>>> If you need further infos, please let me know.>>>> Regards>>Gaetano Mendola>>>>>> ---------------------------(end of broadcast)--------------------------->> TIP 3: if posting/readingthrough Usenet, please send an appropriate>> subscribe-nomail command to majordomo@postgresql.org sothat your>> message can get through to the mailing list cleanly>>>
Tom Lane wrote: > Gaetano Mendola <mendola@bigfoot.com> writes: > >>I'm running postgres 7.4.5 on a linux box, this morning I got this error on my logs: > > >>WARNING: FlushRelationBuffers("exp_provider", 1836): block 1460 is referenced (private 0, global 1) >>ERROR: FlushRelationBuffers returned -2 > > >>Info for hackers: exp_provider is an index and during that message a reindex was in place. > > > AFAICS, the only place that FlushRelationBuffers is called on an index > with a nonzero second argument is in btvacuumcleanup, ie, shortening an > index at the conclusion of VACUUM FULL. REINDEX wouldn't have anything > to do with it. > > The message appears to indicate that some other backend was holding a > pin on a buffer containing one of the blocks of the index. Which should > definitely not happen, because VACUUM has an exclusive lock at the table > level, and so no other backend should be touching the index at all. > > One thing I am wondering about is that VACUUM FULL relies on an > exclusive lock on a table to prevent outside access to the table's > indexes as well as the table itself. It might be better to spend a few > more cycles and obtain locks on the indexes too. I don't think there > are any situations where the main backend could access an index without > having any lock on the parent table, but there are some contrib > functions like pgstattuple that I'm not so sure about. Are you using > any contrib or locally written C functions in your database? Unfortunately no contrib and even no C functions at all. Looking better at what was happening this morning indeed was a vacuum full, I have in the morning a vacuum followed by some reindex, and the failure was during the vacuum: INFO: vacuuming "public.ua_user_data_exp" INFO: "ua_user_data_exp": found 15232 removable, 33609 nonremovable row versions in 6073 pages [SNIP] CPU 0.07s/0.05u sec elapsed 0.24 sec. INFO: index "exp_mac_address_normal" now contains 33609 row versions in 644 pages DETAIL: 3275 index row versions were removed. 1 index pages have been deleted, 1 are currently reusable. CPU 0.00s/0.04u sec elapsed 0.25 sec. WARNING: FlushRelationBuffers("exp_provider", 1836): block 1460 is referenced (private 0, global 1) vacuumdb: vacuuming of database "empdb" failed: ERROR: FlushRelationBuffers returned -2 PANIC: cannot abort transaction 354676201, it was already committed server closed the connection unexpectedly This probably means the server terminated abnormally before or whileprocessing the request. Regards Gaetano Mendola