Thread: Vaccuum Failure w/7.1beta4 on Linux/Sparc

Vaccuum Failure w/7.1beta4 on Linux/Sparc

From
Ryan Kirkpatrick
Date:
While testing some existing database applications on 7.1beta4 on
my Sparc 20 running Debian GNU/Linux 2.2, I got the following error on
attempting to do a vacuum of a table:

NOTICE:  FlushRelationBuffers(jobs, 1399): block 953 is referenced (private 0, global 1)
ERROR! Can't vacuum table Jobs! ERROR:  VACUUM (repair_frag): FlushRelationBuffers returned -2

The first line is the error message from pgsql, while the second line is
the error message from my application (using perl Pg module) reporting the
error message returned. It appears that this should only be a warning
(i.e. NOTICE, not FATAL or ERROR), but it caused the Pg module to throw an
error anyway. My application of course checks for errors, see the error
thrown by Pg and dies assuming the error was fatal.This error occurred after a load of about 50k records into the
referenced table, a load of 50k records total into a few other tables, and
then a few clean up queries. The part of the application I was testing is 
a database load from another (old, closed source) database. The vacuum
was at the end of the of the database load, as part of final cleanup
routines.So, is this a problem with pgsql in general, specific to
Linux/Sparc, or a bug in Pg causing it to be too paranoid? Thanks.

---------------------------------------------------------------------------
|   "For to me to live is Christ, and to die is gain."                    |
|                                            --- Philippians 1:21 (KJV)   |
---------------------------------------------------------------------------
|   Ryan Kirkpatrick  |  Boulder, Colorado  |  http://www.rkirkpat.net/   |
---------------------------------------------------------------------------




Re: Vaccuum Failure w/7.1beta4 on Linux/Sparc

From
Tom Lane
Date:
Ryan Kirkpatrick <pgsql@rkirkpat.net> writes:
>     While testing some existing database applications on 7.1beta4 on
> my Sparc 20 running Debian GNU/Linux 2.2, I got the following error on
> attempting to do a vacuum of a table:

> NOTICE:  FlushRelationBuffers(jobs, 1399): block 953 is referenced (private 0, global 1)
> ERROR! Can't vacuum table Jobs! ERROR:  VACUUM (repair_frag): FlushRelationBuffers returned -2

This is undoubtedly a backend bug.  Can you generate a reproducible test
case?

>     So, is this a problem with pgsql in general, specific to
> Linux/Sparc, or a bug in Pg causing it to be too paranoid? Thanks.

Pg did get an ERROR from the vacuum command (note second line).  Yes,
there is paranoia right up the line here, but I think that's a good
thing.  Somewhere someone is failing to release a buffer refcount,
and we don't know what other consequences that bug might have.  Better
to err on the side of caution.
        regards, tom lane


Re: Vaccuum Failure w/7.1beta4 on Linux/Sparc

From
Ryan Kirkpatrick
Date:
On Mon, 12 Mar 2001, Tom Lane wrote:

> Ryan Kirkpatrick <pgsql@rkirkpat.net> writes:
> >     While testing some existing database applications on 7.1beta4 on
> > my Sparc 20 running Debian GNU/Linux 2.2, I got the following error on
> > attempting to do a vacuum of a table:
> 
> > NOTICE:  FlushRelationBuffers(jobs, 1399): block 953 is referenced (private 0, global 1)
> > ERROR! Can't vacuum table Jobs! ERROR:  VACUUM (repair_frag): FlushRelationBuffers returned -2
> 
> This is undoubtedly a backend bug.  Can you generate a reproducible test
> case?
I will work on it... The code that eventually caused it does a lot
of different things so it will take me a little while to pair it down to
a small, self-contained test case. I should have it by this weekend.Also, two other details I forgot to put in my first
email:

a) Running 'vaccumdb -t Jobs {dbname}' about 24 hours after the error (the
backend had been completely idle during this time), ran successfully
without error.

b) The disk space where the pgsql database is located is NFS mounted from
my Alpha (running Linux of course :). [0] Might this cause the error?

[0] Yes, I know running pgsql on an NFS mount is probably not the greatest
idea, but the system only has 1GB of local disk space (almost all used for
the system) and is running as development server only. No valuable data is
entrusted to it. Hopefully I will have more local disk space in the near
future.

> Pg did get an ERROR from the vacuum command (note second line).  Yes,
> there is paranoia right up the line here, but I think that's a good
> thing.  Somewhere someone is failing to release a buffer refcount,
> and we don't know what other consequences that bug might have.  Better
> to err on the side of caution.
A resonable amount of paranoia is indeed always healthy. :) Just
wanted to know if this might have been a known and harmless warning. I
guess not. I will work on a test case and get back hopefully by the
weekend. Thanks for your help.

---------------------------------------------------------------------------
|   "For to me to live is Christ, and to die is gain."                    |
|                                            --- Philippians 1:21 (KJV)   |
---------------------------------------------------------------------------
|   Ryan Kirkpatrick  |  Boulder, Colorado  |  http://www.rkirkpat.net/   |
---------------------------------------------------------------------------



Re: Vaccuum Failure w/7.1beta4 on Linux/Sparc -- FALSE ALARM

From
Ryan Kirkpatrick
Date:
On Mon, 12 Mar 2001, Ryan Kirkpatrick wrote:

>     While testing some existing database applications on 7.1beta4 on
> my Sparc 20 running Debian GNU/Linux 2.2, I got the following error on
> attempting to do a vacuum of a table:
> 
> NOTICE:  FlushRelationBuffers(jobs, 1399): block 953 is referenced (private 0, global 1)
> ERROR! Can't vacuum table Jobs! ERROR:  VACUUM (repair_frag): FlushRelationBuffers returned -2
I moved the data directory to a local parition (from the NFS
mounted one it was on) and reran my application. It worked fine this time,
vaccuming tables with out errors and the above error was never seen. Looks
like pgsql is not NFS safe, or at least with Linux's implementation. This is good news in that it is not a serious
issue,but bad news
 
in that now I really do have to hurry up and get more local space for this
box to do anything useful with it. :)Thanks for everyone's help. TTYL.

---------------------------------------------------------------------------
|   "For to me to live is Christ, and to die is gain."                    |
|                                            --- Philippians 1:21 (KJV)   |
---------------------------------------------------------------------------
|   Ryan Kirkpatrick  |  Boulder, Colorado  |  http://www.rkirkpat.net/   |
---------------------------------------------------------------------------



Re: Re: Vaccuum Failure w/7.1beta4 on Linux/Sparc -- FALSE ALARM

From
Tom Lane
Date:
Ryan Kirkpatrick <pgsql@rkirkpat.net> writes:
> On Mon, 12 Mar 2001, Ryan Kirkpatrick wrote:
>> While testing some existing database applications on 7.1beta4 on
>> my Sparc 20 running Debian GNU/Linux 2.2, I got the following error on
>> attempting to do a vacuum of a table:
>> 
>> NOTICE:  FlushRelationBuffers(jobs, 1399): block 953 is referenced (private 0, global 1)
>> ERROR! Can't vacuum table Jobs! ERROR:  VACUUM (repair_frag): FlushRelationBuffers returned -2

This is probably explained by the problem we found a few days ago with
BufferSync acquiring locks it shouldn't.
        regards, tom lane


Re: Re: Vaccuum Failure w/7.1beta4 on Linux/Sparc -- FALSE ALARM

From
Ryan Kirkpatrick
Date:
On Mon, 26 Mar 2001, Tom Lane wrote:

> Ryan Kirkpatrick <pgsql@rkirkpat.net> writes:
> > On Mon, 12 Mar 2001, Ryan Kirkpatrick wrote:
> >> While testing some existing database applications on 7.1beta4 on
> >> my Sparc 20 running Debian GNU/Linux 2.2, I got the following error on
> >> attempting to do a vacuum of a table:
> >> 
> >> NOTICE:  FlushRelationBuffers(jobs, 1399): block 953 is referenced (private 0, global 1)
> >> ERROR! Can't vacuum table Jobs! ERROR:  VACUUM (repair_frag): FlushRelationBuffers returned -2
> 
> This is probably explained by the problem we found a few days ago with
> BufferSync acquiring locks it shouldn't.
Yea, it was. I just tried RC1 on the Sparc with my application,
with the data directory NFS mounted, and it ran without errors
now. Thanks. :)

---------------------------------------------------------------------------
|   "For to me to live is Christ, and to die is gain."                    |
|                                            --- Philippians 1:21 (KJV)   |
---------------------------------------------------------------------------
|   Ryan Kirkpatrick  |  Boulder, Colorado  |  http://www.rkirkpat.net/   |
---------------------------------------------------------------------------