Thread: invalid OID warning after disk failure

invalid OID warning after disk failure

From
Gabriel Sánchez Martínez
Date:
I'm running PostgreSQL 9.3.5 on Ubuntu 14.04 on x86_64.  The database
directory is on linux mdadm RAID10, using 4 4TB disks and a far=2
layout.  While the RAID tolerates 1 drive failure nicely, I had the
misfortune of 2 drives failing consecutively, one of which had many
sectors reallocated and began failing SMART criteria.  That one is out
now.  As a result of this some files were corrupted.

I was getting the following errors on some tables:

ERROR:  could not read block 0 in file "base/27810/3995569":
Input/output error

but after dropping those tables the errors are gone.

The situation appears to be stable now, but upon running REINDEX and
VACUUM on one of the databases, I get the following:

WARNING:  relation "pg_attrdef" TID 1/1: OID is invalid
WARNING:  relation "pg_attrdef" TID 1/2: OID is invalid
WARNING:  relation "pg_attrdef" TID 1/3: OID is invalid
...

Should I drop the database and restore it from a backup?  My most recent
backup is from late September, so I would lose some data.  I also backed
up what I could as soon as the disks started giving errors, but I don't
know if I can trust that.

Should I drop the entire cluster?

Regarding hardware, I'm going to add hot standby drives to prevent this
from happening in the future.

Thanks in advance for your advice.

Regards,
Gabriel


Re: invalid OID warning after disk failure

From
Raghu Ram
Date:
On Mon, Nov 17, 2014 at 10:20 AM, Gabriel Sánchez Martínez <gabrielesanchez@gmail.com> wrote:

The situation appears to be stable now, but upon running REINDEX and VACUUM on one of the databases, I get the following:

WARNING:  relation "pg_attrdef" TID 1/1: OID is invalid
WARNING:  relation "pg_attrdef" TID 1/2: OID is invalid
WARNING:  relation "pg_attrdef" TID 1/3: OID is invalid
...

Should I drop the database and restore it from a backup?  My most recent backup is from late September, so I would lose some data.  I also backed up what I could as soon as the disks started giving errors, but I don't know if I can trust that.

Should I drop the entire cluster?



are you receiving any kind of error messages,while taking database dump with "pg_dump" utility. 

If you are not receiving any kind of error message,try to take a database dump and restore database dump file in another database server and perform all sanity checks. If all sanity checks are working fine,you can create a new cluster and restore all database dump's.

Thanks & Regards
Raghu Ram
 

Re: invalid OID warning after disk failure

From
Gabriel Sánchez Martínez
Date:


On Nov 17, 2014, at 3:28, Raghu Ram <raghuchennuru@gmail.com> wrote:

On Mon, Nov 17, 2014 at 10:20 AM, Gabriel Sánchez Martínez <gabrielesanchez@gmail.com> wrote:

The situation appears to be stable now, but upon running REINDEX and VACUUM on one of the databases, I get the following:

WARNING:  relation "pg_attrdef" TID 1/1: OID is invalid
WARNING:  relation "pg_attrdef" TID 1/2: OID is invalid
WARNING:  relation "pg_attrdef" TID 1/3: OID is invalid
...

Should I drop the database and restore it from a backup?  My most recent backup is from late September, so I would lose some data.  I also backed up what I could as soon as the disks started giving errors, but I don't know if I can trust that.

Should I drop the entire cluster?



are you receiving any kind of error messages,while taking database dump with "pg_dump" utility. 

I am getting these when running vacuum on the live database. Earlier I ran pg_dump. It was not working for the whole database due to I/O errors, so i did a series of more selective dumps, schema by schema and excluding the offending tables, so that I didn't get any errors or warnings. In the live database I dropped the few tables that were responsible for the pg_dump errors.


If you are not receiving any kind of error message,try to take a database dump and restore database dump file in another database server and perform all sanity checks. If all sanity checks are working fine,you can create a new cluster and restore all database dump's.

Thanks & Regards
Raghu Ram
 

Re: invalid OID warning after disk failure

From
Gabriel Sánchez Martínez
Date:


On Nov 17, 2014, at 8:10, Gabriel Sánchez Martínez <gabrielesanchez@gmail.com> wrote:



On Nov 17, 2014, at 3:28, Raghu Ram <raghuchennuru@gmail.com> wrote:

On Mon, Nov 17, 2014 at 10:20 AM, Gabriel Sánchez Martínez <gabrielesanchez@gmail.com> wrote:

The situation appears to be stable now, but upon running REINDEX and VACUUM on one of the databases, I get the following:

WARNING:  relation "pg_attrdef" TID 1/1: OID is invalid
WARNING:  relation "pg_attrdef" TID 1/2: OID is invalid
WARNING:  relation "pg_attrdef" TID 1/3: OID is invalid
...

Should I drop the database and restore it from a backup?  My most recent backup is from late September, so I would lose some data.  I also backed up what I could as soon as the disks started giving errors, but I don't know if I can trust that.

Should I drop the entire cluster?



are you receiving any kind of error messages,while taking database dump with "pg_dump" utility. 

I am getting these when running vacuum on the live database. Earlier I ran pg_dump. It was not working for the whole database due to I/O errors, so i did a series of more selective dumps, schema by schema and excluding the offending tables, so that I didn't get any errors or warnings. In the live database I dropped the few tables that were responsible for the pg_dump errors.


If you are not receiving any kind of error message,try to take a database dump and restore database dump file in another database server and perform all sanity checks. If all sanity checks are working fine,you can create a new cluster and restore all database dump's.

Any suggestions on what sanity checks I should run?  For now I am doing reindex and vacuum. I think that forces everything to be read. I am assuming that if vacuum completes without error then the table is fine. 


Thanks & Regards
Raghu Ram
 

Re: invalid OID warning after disk failure

From
Alvaro Herrera
Date:
Gabriel Sánchez Martínez wrote:

> Any suggestions on what sanity checks I should run?  For now I am
> doing reindex and vacuum. I think that forces everything to be read. I
> am assuming that if vacuum completes without error then the table is
> fine.

The main sanity check is to make sure the disks are now in the recycling
center and you have rebuilt the array with new ones.  Then restore the
whole database in a freshly initdb'd system, and praise yourself that
you were able to get them before it all blew to pieces.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services