Thread: Reg: PostgreSQL Server "base/data" recovery

Reg: PostgreSQL Server "base/data" recovery

From
Prajilal KP
Date:
Hi all,

Thank you for reading this mail. I need your help to fix the issue with my PostgreSQL server.

Recently i have noticed that my PostgreSQL server is returning the error messages like “Could not read block 6160722 in file “base/data/16384/11033028.47”. 
When i have checked the "base/data/16384" directory, some of the file's data size is zero and exactly from that time i have started receiving the said error messages.

Could you please advise the way to recover the Database without loosing any data?

PostgreSQL version : 9.0.4
OS : RHEL 5.5

regards,
prajilal

Re: Reg: PostgreSQL Server "base/data" recovery

From
Michael Paquier
Date:
On Thu, Mar 19, 2015 at 5:19 PM, Prajilal KP <prajilal.kp@gmail.com> wrote:
> Recently i have noticed that my PostgreSQL server is returning the error
> messages like "Could not read block 6160722 in file
> "base/data/16384/11033028.47".

Don't you have more details to share? There should be some hint here
as well, and usually you may be facing such things because of OS or
hardware problems.

> When i have checked the "base/data/16384" directory, some of the file's data
> size is zero and exactly from that time i have started receiving the said
> error messages.
>
> Could you please advise the way to recover the Database without loosing any
> data?

If you are facing hardware problems, this may be a time to deploy a
backup and replay WAL up to where you wish to on clean disks.

> PostgreSQL version : 9.0.4

9.0.4 has been released in 2011, so you are missing 4 years worth of
bug fixes, the latest minor version of the 9.0.X release being 9.0.19.
Note as well that 9.0 will be EOL at the end of the year, hence you
could do even better: an upgrade to a newer major version.

Regards,
--
Michael


Re: Reg: PostgreSQL Server "base/data" recovery

From
Prajilal KP
Date:
Thank you Michael for the reply.

Here is the other log from pg_log directory

ERROR:  could not open file "base/16384/7969143.26" (target block 13148261): No such file or directory
CONTEXT:  SQL statement "select min(id), max(id)                             from requests where id > last_max_id"
PL/pgSQL function "f_raw_logs_check_for_new_data" line 33 at SQL statement
STATEMENT:  select f_raw_logs_check_for_new_data(10000, 10001)
ERROR:  function f_etl_task_end(integer, integer, integer, integer, unknown) does not exist at character 8
HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
STATEMENT:  select f_etl_task_end(7161174, 1, 0, 3, E'could not open file "base/16384/7969143.26" (target block 13148261): No such file or directory
CONTEXT:  SQL statement "select min(id), max(id)                             from requests where id > last_max_id"
PL/pgSQL function "f_raw_logs_check_for_new_data" line 33 at SQL statement
')
ERROR:  could not open file "base/16384/7969143.26" (target block 13148261): No such file or directory

When i see check the this file, the file itself exists but the size is "0" byte.  

The server is writing the whole log in to the mounted network storage, NFS. I have scanned the storage for any errors and nothing found.

I will consider your suggestion to upgrade the PostgreSQL version.


Regards,
Prajilal

On Thu, Mar 19, 2015 at 5:46 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Thu, Mar 19, 2015 at 5:19 PM, Prajilal KP <prajilal.kp@gmail.com> wrote:
> Recently i have noticed that my PostgreSQL server is returning the error
> messages like "Could not read block 6160722 in file
> "base/data/16384/11033028.47".

Don't you have more details to share? There should be some hint here
as well, and usually you may be facing such things because of OS or
hardware problems.

> When i have checked the "base/data/16384" directory, some of the file's data
> size is zero and exactly from that time i have started receiving the said
> error messages.
>
> Could you please advise the way to recover the Database without loosing any
> data?

If you are facing hardware problems, this may be a time to deploy a
backup and replay WAL up to where you wish to on clean disks.

> PostgreSQL version : 9.0.4

9.0.4 has been released in 2011, so you are missing 4 years worth of
bug fixes, the latest minor version of the 9.0.X release being 9.0.19.
Note as well that 9.0 will be EOL at the end of the year, hence you
could do even better: an upgrade to a newer major version.

Regards,
--
Michael

Re: Reg: PostgreSQL Server "base/data" recovery

From
Andrew Sullivan
Date:
On Thu, Mar 19, 2015 at 07:02:28PM +0900, Prajilal KP wrote:
>
> When i see check the this file, the file itself exists but the size is "0"
> byte.

That suggests you have data corruption, and that you need to restore from
backup.

> The server is writing the whole log in to the mounted network storage, NFS.

There are reasons that people get nervous about databases on NFS.  Are
you ensuring that Postgres fsync() calls (like when COMMIT happens)
are not being handled asynchronously?

Also, a trivial scan of the release notes in the 9.0.x series shows a
number of data corruption fixes since 9.0.4.  You should always try to
stay on the latest minor release of your version of Postgres.

Best regards,

A

--
Andrew Sullivan
ajs@crankycanuck.ca


Re: Reg: PostgreSQL Server "base/data" recovery

From
Prajilal KP
Date:
Thank you Andrew for your reply.

I see, As there are many reasons exists for the data corruption i couldn't figure out it exactly. 
Unfortunately there is no backup for this system and is not possible to restore from the backup. 

I have been using the NFS storage since 2013 and didn't experience this issue before, now i am afraid.
We haven't set to run the fysnc() in our environment. 

If i set "fysnc to on" now, did it make any impact to current flow? 

Also i will considering the suggestion to upgrade the PostgreSQL.

Regards,
Prajilal

On Thu, Mar 19, 2015 at 7:39 PM, Andrew Sullivan <ajs@crankycanuck.ca> wrote:
On Thu, Mar 19, 2015 at 07:02:28PM +0900, Prajilal KP wrote:
>
> When i see check the this file, the file itself exists but the size is "0"
> byte.

That suggests you have data corruption, and that you need to restore from
backup.

> The server is writing the whole log in to the mounted network storage, NFS.

There are reasons that people get nervous about databases on NFS.  Are
you ensuring that Postgres fsync() calls (like when COMMIT happens)
are not being handled asynchronously?

Also, a trivial scan of the release notes in the 9.0.x series shows a
number of data corruption fixes since 9.0.4.  You should always try to
stay on the latest minor release of your version of Postgres.

Best regards,

A

--
Andrew Sullivan
ajs@crankycanuck.ca


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Reg: PostgreSQL Server "base/data" recovery

From
Michael Paquier
Date:
On Thu, Mar 19, 2015 at 8:12 PM, Prajilal KP <prajilal.kp@gmail.com> wrote:
> I see, As there are many reasons exists for the data corruption i couldn't
> figure out it exactly.
> Unfortunately there is no backup for this system and is not possible to
> restore from the backup.

Well, then, lost data is lost.

> I have been using the NFS storage since 2013 and didn't experience this
> issue before, now i am afraid.
> We haven't set to run the fysnc() in our environment.
>
> If i set "fysnc to on" now, did it make any impact to current flow?

Having fsync = off in postgresql.conf is a perfect method to corrupt
your data, you should really not set it to off.
--
Michael


Re: Reg: PostgreSQL Server "base/data" recovery

From
Prajilal KP
Date:


> > I see, As there are many reasons exists for the data corruption i couldn't
> > figure out it exactly.
> > Unfortunately there is no backup for this system and is not possible to
> > restore from the backup.
>
> Well, then, lost data is lost.

Oh.. My bad ...
>
> > I have been using the NFS storage since 2013 and didn't experience this
> > issue before, now i am afraid.
> > We haven't set to run the fysnc() in our environment.
> >
> > If i set "fysnc to on" now, did it make any impact to current flow?
>
> Having fsync = off in postgresql.conf is a perfect method to corrupt
> your data, you should really not set it to off.

Okay, I will take action immediately to set the fync=on. I hope it with save me for a period.

I will consider upgrading to a later version at the earliest..

Thank you very much
Prajilal

On Thu, Mar 19, 2015 at 8:12 PM, Prajilal KP <prajilal.kp@gmail.com> wrote:
> I see, As there are many reasons exists for the data corruption i couldn't
> figure out it exactly.
> Unfortunately there is no backup for this system and is not possible to
> restore from the backup.

Well, then, lost data is lost.

> I have been using the NFS storage since 2013 and didn't experience this
> issue before, now i am afraid.
> We haven't set to run the fysnc() in our environment.
>
> If i set "fysnc to on" now, did it make any impact to current flow?

Having fsync = off in postgresql.conf is a perfect method to corrupt
your data, you should really not set it to off.
--
Michael

Re: Reg: PostgreSQL Server "base/data" recovery

From
Prajilal KP
Date:


> > I see, As there are many reasons exists for the data corruption i couldn't
> > figure out it exactly.
> > Unfortunately there is no backup for this system and is not possible to
> > restore from the backup.
>
> Well, then, lost data is lost.

Oh.. My bad ...
>
> > I have been using the NFS storage since 2013 and didn't experience this
> > issue before, now i am afraid.
> > We haven't set to run the fysnc() in our environment.
> >
> > If i set "fysnc to on" now, did it make any impact to current flow?
>
> Having fsync = off in postgresql.conf is a perfect method to corrupt
> your data, you should really not set it to off.

Okay, I will take action immediately to set the fync=on. I hope it with save me for a period.

I will consider upgrading to a later version at the earliest..

Thank you very much
Prajilal

On Thu, Mar 19, 2015 at 8:24 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Thu, Mar 19, 2015 at 8:12 PM, Prajilal KP <prajilal.kp@gmail.com> wrote:
> I see, As there are many reasons exists for the data corruption i couldn't
> figure out it exactly.
> Unfortunately there is no backup for this system and is not possible to
> restore from the backup.

Well, then, lost data is lost.

> I have been using the NFS storage since 2013 and didn't experience this
> issue before, now i am afraid.
> We haven't set to run the fysnc() in our environment.
>
> If i set "fysnc to on" now, did it make any impact to current flow?

Having fsync = off in postgresql.conf is a perfect method to corrupt
your data, you should really not set it to off.
--
Michael