Thread: Really odd corruption problem: cannot open pg_aggregate: No such file or directory

So, one of the many machines that I support seems to have developed
an incredibly odd and specific corruption that I've never seen before.

Whenever a query requiring an aggregate is attempted, it spits out:
cannot open pg_aggregate: No such file or directory
and fails.

If I do:
select * from pg_class where relname='pg_aggregate';
I see that the relation exists.

If I check the relfilenode in the data directory, that exists, and
seems to be an object file containing what should be the basic
aggregate functions.

version:  PostgreSQL 7.2.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2 20020903 (Red Hat Linux 8.0 3.2-7)

The system ran for a few weeks before anything odd happened, and
then suddenly this.  Does anyone have any ideas?  Now that I look at
the above string, I realize that the system /is/ an Athlon processor.
Does anyone know if there could be an issue between the i686 and
athlon optimizations?


-- 
Adam Haberlach         |  "When your product is stolen by thieves, you
adam@mediariffic.com   |  have a police problem.  When it is stolen by
http://mediariffic.com |  millions of honest customers, you have a                      |  marketing problem."  -
GeorgeGilder
 


Adam Haberlach <adam@newsnipple.com> writes:

>     So, one of the many machines that I support seems to have developed
> an incredibly odd and specific corruption that I've never seen before.
> 
> Whenever a query requiring an aggregate is attempted, it spits out:
> cannot open pg_aggregate: No such file or directory
> and fails.

Why not use 'strace' to see what file the backend is actually trying
to open?  

-Doug


Re: Really odd corruption problem: cannot open pg_aggregate:

From
"scott.marlowe"
Date:
On Thu, 24 Jul 2003, Adam Haberlach wrote:

>     So, one of the many machines that I support seems to have developed
> an incredibly odd and specific corruption that I've never seen before.
> 
> Whenever a query requiring an aggregate is attempted, it spits out:
> cannot open pg_aggregate: No such file or directory
> and fails.
> 
> If I do:
> select * from pg_class where relname='pg_aggregate';
> I see that the relation exists.
> 
> If I check the relfilenode in the data directory, that exists, and
> seems to be an object file containing what should be the basic
> aggregate functions.
> 
> version:  PostgreSQL 7.2.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2 20020903 (Red Hat Linux 8.0 3.2-7)
> 
> 
>     The system ran for a few weeks before anything odd happened, and
> then suddenly this.  Does anyone have any ideas?  Now that I look at
> the above string, I realize that the system /is/ an Athlon processor.
> Does anyone know if there could be an issue between the i686 and
> athlon optimizations?

test your memory and drive subsystem first.  memtest86.com has a nice 
tester for free, and on linux badblocks can do a decent job (not great, 
just decent) of finding bad blocks.  

Postgresql is good, but it can't make up for bad hardware.



Adam Haberlach <adam@newsnipple.com> writes:
> Whenever a query requiring an aggregate is attempted, it spits out:
> cannot open pg_aggregate: No such file or directory
> and fails.

Weird.  It would be useful to find out exactly what pathname it's trying
to open.  strace'ing the backend might be the easiest way.

> Does anyone know if there could be an issue between the i686 and
> athlon optimizations?

Seems unlikely that it would manifest this way, if so.  The error is
coming from a low-level routine that would also be used for opening
any other table ...
        regards, tom lane


On Thu, Jul 24, 2003 at 10:17:06AM -0700, Adam Haberlach wrote:
>     So, one of the many machines that I support seems to have developed
> an incredibly odd and specific corruption that I've never seen before.
> 
> Whenever a query requiring an aggregate is attempted, it spits out:
> cannot open pg_aggregate: No such file or directory
> and fails.
> 
> If I do:
> select * from pg_class where relname='pg_aggregate';
> I see that the relation exists.
> 
> If I check the relfilenode in the data directory, that exists, and
> seems to be an object file containing what should be the basic
> aggregate functions.
> 
> version:  PostgreSQL 7.2.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2 20020903 (Red Hat Linux 8.0 3.2-7)
> 
> 
>     The system ran for a few weeks before anything odd happened, and
> then suddenly this.  Does anyone have any ideas?  Now that I look at
> the above string, I realize that the system /is/ an Athlon processor.
> Does anyone know if there could be an issue between the i686 and
I'd like to thank everyone for the quick responses and the suggestion
to strace the postmaster.

open("/var/lib/pgsql/data/base/16556/16406", O_RDWR) = -1 ENOENT (No such file or directory)
It looks like a file /was/ missing, and I had been looking in the
wrong place to verify that it was there (the template database).  I'm
going to chalk this one up to bad hardware and hope it doesn't happen
again.  Thanks again...

-- 
Adam Haberlach         |  "When your product is stolen by thieves, you
adam@mediariffic.com   |  have a police problem.  When it is stolen by
http://mediariffic.com |  millions of honest customers, you have a                      |  marketing problem."  -
GeorgeGilder
 


Re: Really odd corruption problem: cannot open pg_aggregate: No such file or directory

From
"Balaji Gadhiraju"
Date:
I too got this error. This happened with Postgres 7.2.3 and Linux  2.4.20 on via processor. This happened not on just
onebox but around dozen boxes. This  may not be hardware problem.  

In our case, we create the table and use it then delete it. This activity happens very often, once a day. we run vaccum
also.The problem happened on one such table. The entry for the table exists in pg_class but the actual file is missing.
Onceit gets to this state, the table can not be dropped. 

Were there any bug fixes related to this in the later versions of postgres. I searched in the google for this error and
gotsome cases but not much information why. 

http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=%22RelationBuildDesc%3A+can%27t+open%22


Thanks,
balaji.


-----Original Message-----
From:    Adam Haberlach [mailto:adam@newsnipple.com]
Sent:    Thu 7/24/2003 11:07 AM
To:    pgsql-hackers@postgresql.org
Cc:
Subject:    Re: [HACKERS] Really odd corruption problem: cannot open pg_aggregate: No such file or directory
On Thu, Jul 24, 2003 at 10:17:06AM -0700, Adam Haberlach wrote:
>     So, one of the many machines that I support seems to have developed
> an incredibly odd and specific corruption that I've never seen before.
>
> Whenever a query requiring an aggregate is attempted, it spits out:
> cannot open pg_aggregate: No such file or directory
> and fails.
>
> If I do:
> select * from pg_class where relname='pg_aggregate';
> I see that the relation exists.
>
> If I check the relfilenode in the data directory, that exists, and
> seems to be an object file containing what should be the basic
> aggregate functions.
>
> version:  PostgreSQL 7.2.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2 20020903 (Red Hat Linux 8.0 3.2-7)
>
>
>     The system ran for a few weeks before anything odd happened, and
> then suddenly this.  Does anyone have any ideas?  Now that I look at
> the above string, I realize that the system /is/ an Athlon processor.
> Does anyone know if there could be an issue between the i686 and
I'd like to thank everyone for the quick responses and the suggestion
to strace the postmaster.

open("/var/lib/pgsql/data/base/16556/16406", O_RDWR) = -1 ENOENT (No such file or directory)
It looks like a file /was/ missing, and I had been looking in the
wrong place to verify that it was there (the template database).  I'm
going to chalk this one up to bad hardware and hope it doesn't happen
again.  Thanks again...

--
Adam Haberlach         |  "When your product is stolen by thieves, you
adam@mediariffic.com   |  have a police problem.  When it is stolen by
http://mediariffic.com |  millions of honest customers, you have a                      |  marketing problem."  -
GeorgeGilder 

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command   (send "unregister YourEmailAddressHere" to
majordomo@postgresql.org)