Thread: Write errors in postgres log

Write errors in postgres log

From
"CAJ CAJ"
Date:
Hello,

We have 2 servers running postgres database 8.0.3 serving a web application. Recently, we started having problems with the web application and diagnosis lead to the following errors repeated in the postgres log files on both the servers. 

==========================================
CONTEXT:  writing block 754 of relation 1663/17230/17443
WARNING:  could not write block 754 of 1663/17230/17443
DETAIL:  Multiple failures --- write error may be permanent.
ERROR:  xlog flush request 2/66B19020 is not satisfied --- flushed only to
2/5F8F95A2 ...
===========================================

Both the servers are running independently and geographically apart. We don't think there is anything wrong with the hardware and this error seem to get worse and worse over time.

Trying to recover from the error,

1. pg_reset_xlog did not work.
2. Recent backups are corrupted as well.

Any advice to recover and pin point the cause of the error will be appreciated.

Thanks in advance

Re: Write errors in postgres log

From
Tom Lane
Date:
"CAJ CAJ" <pguser@gmail.com> writes:
> We have 2 servers running postgres database 8.0.3 serving a web application.

You do realize we are up to 8.0.12 in that branch?  You're missing
nearly two years worth of bug fixes.

> ERROR:  xlog flush request 2/66B19020 is not satisfied --- flushed only to
> 2/5F8F95A2 ...
> CONTEXT:  writing block 754 of relation 1663/17230/17443

Looks a bit ugly --- might be worth looking at that block with
pg_filedump to see what the extent of the corruption is.

> 1. pg_reset_xlog did not work.

Define "did not work".  What did you do exactly, and what results did
you get?

> 2. Recent backups are corrupted as well.

It's not possible for a pg_dump backup to be affected by this problem.
How exactly are you making your backups, and what happens when you try
to use them?

            regards, tom lane

Re: Write errors in postgres log

From
"CAJ CAJ"
Date:
Hello Tom,

Thanks for the response. My replies inline...

On 2/18/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"CAJ CAJ" <pguser@gmail.com> writes:
> We have 2 servers running postgres database 8.0.3 serving a web application.

You do realize we are up to 8.0.12 in that branch?  You're missing
nearly two years worth of bug fixes.

Yes we're aware of it. The web-application is from a 3rd party vendor and comes bundled with postgresql 8.0.3. Is there a specific fix in the recent releases that might have fixed the data corruption I described? We can then request the vendor to upgrade their software stack including Pg. I'm also concerned about the security fixes as well.

> ERROR:  xlog flush request 2/66B19020 is not satisfied --- flushed only to
> 2/5F8F95A2 ...
> CONTEXT:  writing block 754 of relation 1663/17230/17443

Looks a bit ugly --- might be worth looking at that block with
pg_filedump to see what the extent of the corruption is.

Will try pg_filedump and let you know what happens.

> 1. pg_reset_xlog did not work.

Define "did not work".  What did you do exactly, and what results did
you get?

I apologize for the lack of information. I will get that to you as soon as I can. In brief, ran pg_resetxlog which identified the last good WAL address. Postgres successfully recovered at start but.pg_dump ran into similar errors.

> 2. Recent backups are corrupted as well.

It's not possible for a pg_dump backup to be affected by this problem.
How exactly are you making your backups, and what happens when you try
to use them?

We shutdown the database and make a copy of the pgdata directory. pg_dump/pg_restore takes a long time to be used for backups. We are exploring the PITR method (a little too late),

Since the data corruption goes way back, our recent backup is corrupted as well (we see the same errors when we restore the old pgdata backup)

I appreciate your response and feel free to ask for any information that might help.

Thanks