Thread: FlushRelationBuffers error

FlushRelationBuffers error

From
Gaetano Mendola
Date:
Hi all,
I'm running postgres 7.4.5 on a linux box, this morning I got this error on my logs:

WARNING:  FlushRelationBuffers("exp_provider", 1836): block 1460 is referenced (private 0, global 1)
ERROR:  FlushRelationBuffers returned -2
DEBUG:  AbortCurrentTransaction
PANIC:  cannot abort transaction 354676201, it was already committed

after the recovery:

ERROR:  could not access status of transaction 352975274
DEBUG:  AbortCurrentTransaction

this messages for 5 hours



I had my verbosity equal to terse ( I run the server with debug2 level ) so I didn't see the
exactly reason for this, after putting verbosity to "verbose" I got the entire message:

ERROR:  58P01: could not access status of transaction 352975274
DETAIL:  could not open file "/var/lib/pgsql/data/pg_clog/0150": No such file or directory
LOCATION:  SlruReportIOError, slru.c:609
DEBUG:  00000: AbortCurrentTransaction
LOCATION:  PostgresMain, postgres.c:2721

In the pg_clog directory I had only the  file   0152 !


I had to create a 8k file with zeroes and I discover the offset:

ERROR:  XX000: could not access status of transaction 352975274
DETAIL:  could not read from file "/var/lib/pgsql/data/pg_clog/0150" at offset 155648: Success
LOCATION:  SlruReportIOError, slru.c:630
DEBUG:  00000: AbortCurrentTransaction
LOCATION:  PostgresMain, postgres.c:2721

After creating that file till to cover that offset the problem seems be fixed.

Info for hackers: exp_provider is an index and during that message a reindex was in place.

Some questions:
What about the 0151  file?
Don't you think that even with verbosity terse the message about the file missing shall appear ?
Why emit the offset only if the file was found ?

I have to thank Neil Conway that was helping me on IRC about this error.

If you need further infos, please let me know.

Regards
Gaetano Mendola



Re: FlushRelationBuffers error

From
Jan Wieck
Date:
Any chance for bad memory?


Jan


On 9/30/2004 6:16 AM, Gaetano Mendola wrote:

> Hi all,
> I'm running postgres 7.4.5 on a linux box, this morning I got this error on my logs:
> 
> WARNING:  FlushRelationBuffers("exp_provider", 1836): block 1460 is referenced (private 0, global 1)
> ERROR:  FlushRelationBuffers returned -2
> DEBUG:  AbortCurrentTransaction
> PANIC:  cannot abort transaction 354676201, it was already committed
> 
> after the recovery:
> 
> ERROR:  could not access status of transaction 352975274
> DEBUG:  AbortCurrentTransaction
> 
> this messages for 5 hours
> 
> 
> 
> I had my verbosity equal to terse ( I run the server with debug2 level ) so I didn't see the
> exactly reason for this, after putting verbosity to "verbose" I got the entire message:
> 
> ERROR:  58P01: could not access status of transaction 352975274
> DETAIL:  could not open file "/var/lib/pgsql/data/pg_clog/0150": No such file or directory
> LOCATION:  SlruReportIOError, slru.c:609
> DEBUG:  00000: AbortCurrentTransaction
> LOCATION:  PostgresMain, postgres.c:2721
> 
> In the pg_clog directory I had only the  file   0152 !
> 
> 
> I had to create a 8k file with zeroes and I discover the offset:
> 
> ERROR:  XX000: could not access status of transaction 352975274
> DETAIL:  could not read from file "/var/lib/pgsql/data/pg_clog/0150" at offset 155648: Success
> LOCATION:  SlruReportIOError, slru.c:630
> DEBUG:  00000: AbortCurrentTransaction
> LOCATION:  PostgresMain, postgres.c:2721
> 
> After creating that file till to cover that offset the problem seems be fixed.
> 
> Info for hackers: exp_provider is an index and during that message a reindex was in place.
> 
> Some questions:
> What about the 0151  file?
> Don't you think that even with verbosity terse the message about the file missing shall appear ?
> Why emit the offset only if the file was found ?
> 
> I have to thank Neil Conway that was helping me on IRC about this error.
> 
> If you need further infos, please let me know.
> 
> Regards
> Gaetano Mendola
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly


-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: FlushRelationBuffers error

From
Tom Lane
Date:
Gaetano Mendola <mendola@bigfoot.com> writes:
> I'm running postgres 7.4.5 on a linux box, this morning I got this error on my logs:

> WARNING:  FlushRelationBuffers("exp_provider", 1836): block 1460 is referenced (private 0, global 1)
> ERROR:  FlushRelationBuffers returned -2

> Info for hackers: exp_provider is an index and during that message a reindex was in place.

AFAICS, the only place that FlushRelationBuffers is called on an index
with a nonzero second argument is in btvacuumcleanup, ie, shortening an
index at the conclusion of VACUUM FULL.  REINDEX wouldn't have anything
to do with it.

The message appears to indicate that some other backend was holding a
pin on a buffer containing one of the blocks of the index.  Which should
definitely not happen, because VACUUM has an exclusive lock at the table
level, and so no other backend should be touching the index at all.

One thing I am wondering about is that VACUUM FULL relies on an
exclusive lock on a table to prevent outside access to the table's
indexes as well as the table itself.  It might be better to spend a few
more cycles and obtain locks on the indexes too.  I don't think there
are any situations where the main backend could access an index without
having any lock on the parent table, but there are some contrib
functions like pgstattuple that I'm not so sure about.  Are you using
any contrib or locally written C functions in your database?
        regards, tom lane


Re: FlushRelationBuffers error

From
Gaetano Mendola
Date:
Jan Wieck wrote:> Any chance for bad memory?>

I'll say near 0. However who never knows ? Now the server is again up and
running without glitches.

I suspect a race condition somewhere for the reindex operation.

I had with the engine 7.3 ( see in the archives ) a duplicate error during
reindexes at least one each month, for instance the server was another one,
and at that time I solved it not reindexing the DB daily ( so I decreased the
chances ).

With the 7.4 is the first time, since November 2003, that I see this error
( and for coincidence during a reindex too ) so I suspect that the race condition
is still there but with less chance to pops up.

Is it so dangerous teach the postmaster to solve this kind of problems without
a direct user intervention ?

Regards
Gaetano Mendola






> On 9/30/2004 6:16 AM, Gaetano Mendola wrote:>>> Hi all,>> I'm running postgres 7.4.5 on a linux box, this morning I
gotthis>> error on my logs:>>>> WARNING:  FlushRelationBuffers("exp_provider", 1836): block 1460 is>> referenced
(private0, global 1)>> ERROR:  FlushRelationBuffers returned -2>> DEBUG:  AbortCurrentTransaction>> PANIC:  cannot
aborttransaction 354676201, it was already committed>>>> after the recovery:>>>> ERROR:  could not access status of
transaction352975274>> DEBUG:  AbortCurrentTransaction>>>> this messages for 5 hours>>>>>>>> I had my verbosity equal
toterse ( I run the server with debug2 level>> ) so I didn't see the>> exactly reason for this, after putting verbosity
to"verbose" I got>> the entire message:>>>> ERROR:  58P01: could not access status of transaction 352975274>> DETAIL:
couldnot open file "/var/lib/pgsql/data/pg_clog/0150": No>> such file or directory>> LOCATION:  SlruReportIOError,
slru.c:609>>DEBUG:  00000: AbortCurrentTransaction>> LOCATION:  PostgresMain, postgres.c:2721>>>> In the pg_clog
directoryI had only the  file   0152 !>>>>>> I had to create a 8k file with zeroes and I discover the offset:>>>>
ERROR: XX000: could not access status of transaction 352975274>> DETAIL:  could not read from file
"/var/lib/pgsql/data/pg_clog/0150">>at offset 155648: Success>> LOCATION:  SlruReportIOError, slru.c:630>> DEBUG:
00000:AbortCurrentTransaction>> LOCATION:  PostgresMain, postgres.c:2721>>>> After creating that file till to cover
thatoffset the problem seems>> be fixed.>>>> Info for hackers: exp_provider is an index and during that message a>>
reindexwas in place.>>>> Some questions:>> What about the 0151  file?>> Don't you think that even with verbosity terse
themessage about the>> file missing shall appear ?>> Why emit the offset only if the file was found ?>>>> I have to
thankNeil Conway that was helping me on IRC about this error.>>>> If you need further infos, please let me know.>>>>
Regards>>Gaetano Mendola>>>>>> ---------------------------(end of broadcast)--------------------------->> TIP 3: if
posting/readingthrough Usenet, please send an appropriate>>       subscribe-nomail command to majordomo@postgresql.org
sothat your>>       message can get through to the mailing list cleanly>>>
 


Re: FlushRelationBuffers error

From
Gaetano Mendola
Date:
Tom Lane wrote:
> Gaetano Mendola <mendola@bigfoot.com> writes:
> 
>>I'm running postgres 7.4.5 on a linux box, this morning I got this error on my logs:
> 
> 
>>WARNING:  FlushRelationBuffers("exp_provider", 1836): block 1460 is referenced (private 0, global 1)
>>ERROR:  FlushRelationBuffers returned -2
> 
> 
>>Info for hackers: exp_provider is an index and during that message a reindex was in place.
> 
> 
> AFAICS, the only place that FlushRelationBuffers is called on an index
> with a nonzero second argument is in btvacuumcleanup, ie, shortening an
> index at the conclusion of VACUUM FULL.  REINDEX wouldn't have anything
> to do with it.
> 
> The message appears to indicate that some other backend was holding a
> pin on a buffer containing one of the blocks of the index.  Which should
> definitely not happen, because VACUUM has an exclusive lock at the table
> level, and so no other backend should be touching the index at all.
> 
> One thing I am wondering about is that VACUUM FULL relies on an
> exclusive lock on a table to prevent outside access to the table's
> indexes as well as the table itself.  It might be better to spend a few
> more cycles and obtain locks on the indexes too.  I don't think there
> are any situations where the main backend could access an index without
> having any lock on the parent table, but there are some contrib
> functions like pgstattuple that I'm not so sure about.  Are you using
> any contrib or locally written C functions in your database?

Unfortunately no contrib and even no C functions at all.

Looking better at what was happening this morning indeed was a vacuum 
full, I have in the morning a vacuum followed by some reindex, and the
failure was during the vacuum:

INFO:  vacuuming "public.ua_user_data_exp"
INFO:  "ua_user_data_exp": found 15232 removable, 33609 nonremovable row 
versions in 6073 pages

[SNIP]

CPU 0.07s/0.05u sec elapsed 0.24 sec.
INFO:  index "exp_mac_address_normal" now contains 33609 row versions in 
644 pages
DETAIL:  3275 index row versions were removed.
1 index pages have been deleted, 1 are currently reusable.
CPU 0.00s/0.04u sec elapsed 0.25 sec.
WARNING:  FlushRelationBuffers("exp_provider", 1836): block 1460 is 
referenced (private 0, global 1)
vacuumdb: vacuuming of database "empdb" failed: ERROR: 
FlushRelationBuffers returned -2
PANIC:  cannot abort transaction 354676201, it was already committed
server closed the connection unexpectedly        This probably means the server terminated abnormally        before or
whileprocessing the request.
 




Regards
Gaetano Mendola