Thread: [GENERAL] could not fdatasync log file: Input/output error

[GENERAL] could not fdatasync log file: Input/output error

From
said assemlal
Date:
Hi,

The postgresql crashed on friday due to IO errors. It seems that the filesystem puked.

PANIC: could not fdatasync log file 000000010000017600000077: Input/output error
LOG: database system was interrupted; last known up at 2017-10-13 15:26:28 EDT
WARNING: terminating connection because of crash of another server process
FATAL: the database system is in recovery mode

Some errors from the kernel given by the sysadmin:

Buffer I/O error on device dm-3, logical block 2900794
lost page write due to I/O error on dm-3
Buffer I/O error on device dm-3, logical block 2900795
lost page write due to I/O error on dm-3
Buffer I/O error on device dm-3, logical block 2900796
lost page write due to I/O error on dm-3
Buffer I/O error on device dm-3, logical block 2900797
lost page write due to I/O error on dm-3

So we have restarted the server, everything seems to be fine. Should I perform other tests ? Or Should I use a backup to restore the database ?

Thanks for your advice.
Saïd

Re: [GENERAL] could not fdatasync log file: Input/output error

From
said assemlal
Date:
Just before we restart the server today, I found only one line as:

PANIC:  could not fdatasync log file 000000010000017600000083: Input/output error
the database system is in recovery mode


On Mon, Oct 16, 2017 at 10:43 AM said assemlal <said.assemlal@gmail.com> wrote:
Hi,

The postgresql crashed on friday due to IO errors. It seems that the filesystem puked.

PANIC: could not fdatasync log file 000000010000017600000077: Input/output error
LOG: database system was interrupted; last known up at 2017-10-13 15:26:28 EDT
WARNING: terminating connection because of crash of another server process
FATAL: the database system is in recovery mode

Some errors from the kernel given by the sysadmin:

Buffer I/O error on device dm-3, logical block 2900794
lost page write due to I/O error on dm-3
Buffer I/O error on device dm-3, logical block 2900795
lost page write due to I/O error on dm-3
Buffer I/O error on device dm-3, logical block 2900796
lost page write due to I/O error on dm-3
Buffer I/O error on device dm-3, logical block 2900797
lost page write due to I/O error on dm-3

So we have restarted the server, everything seems to be fine. Should I perform other tests ? Or Should I use a backup to restore the database ?

Thanks for your advice.
Saïd

Re: [GENERAL] could not fdatasync log file: Input/output error

From
Michael Paquier
Date:
On Mon, Oct 16, 2017 at 11:47 PM, said assemlal <said.assemlal@gmail.com> wrote:
> Just before we restart the server today, I found only one line as:
>
> PANIC:  could not fdatasync log file 000000010000017600000083: Input/output
> error
> the database system is in recovery mode

Ouch. I would not trust this host at this point, this looks like a
file system or a disk issue. Before doing anything you should stop the
database, and make a cold copy of the data folder on which you could
work on if you don't have a live backup. This wiki page is wise on the
matter:
http://wiki.postgresql.org/wiki/Corruption
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] could not fdatasync log file: Input/output error

From
said assemlal
Date:
Thanks for your response.

We are currently running postgresql-9.4.14
I see there are some tools to check if the indexes/pages are not corrupted. But is there a faster way to check if a PGDATA instance is clean ?

Thanks.

On Mon, Oct 16, 2017 at 9:18 PM Michael Paquier <michael.paquier@gmail.com> wrote:
On Mon, Oct 16, 2017 at 11:47 PM, said assemlal <said.assemlal@gmail.com> wrote:
> Just before we restart the server today, I found only one line as:
>
> PANIC:  could not fdatasync log file 000000010000017600000083: Input/output
> error
> the database system is in recovery mode

Ouch. I would not trust this host at this point, this looks like a
file system or a disk issue. Before doing anything you should stop the
database, and make a cold copy of the data folder on which you could
work on if you don't have a live backup. This wiki page is wise on the
matter:
http://wiki.postgresql.org/wiki/Corruption
--
Michael

Re: [GENERAL] could not fdatasync log file: Input/output error

From
Michael Paquier
Date:
On Wed, Oct 18, 2017 at 8:02 AM, said assemlal <said.assemlal@gmail.com> wrote:
> Thanks for your response.
>
> We are currently running postgresql-9.4.14
> I see there are some tools to check if the indexes/pages are not corrupted.
> But is there a faster way to check if a PGDATA instance is clean ?

Yes, there is something you could try. Peter Geoghegan (actually a
colleague) has been working on amcheck, which is aimed at checking
corrupted heap and btree pages so as you can detect corruptions at
https://github.com/petergeoghegan/amcheck. The master branch is able
to perform checks only on btree indexes (integrated in PG 10), and
Peter has been playing lately with heap checks in the branch
heap-check. The utility can be built using Postgres 9.4 and is
non-intrusive. I have not tested that much myself yet, but you could
run amcheck on this instance, *after* of course taking a cold copy of
the data folder and starting it on a safer host that you think has
non-busted disks.

Note that Peter has also worked on provising Debian packages for the
utility down to 9.4 if I recall correctly, which is nice, but if you
want the heap checks you will need to compile things by youself. We
are currently under way to get something improved in Postgres 11. I
should actually spare some time to look more at the patch concepts..
-- 
Michael


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] could not fdatasync log file: Input/output error

From
Peter Geoghegan
Date:
On Tue, Oct 17, 2017 at 7:13 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Note that Peter has also worked on provising Debian packages for the
> utility down to 9.4 if I recall correctly, which is nice, but if you
> want the heap checks you will need to compile things by youself. We
> are currently under way to get something improved in Postgres 11. I
> should actually spare some time to look more at the patch concepts..

I'm probably going to commit the new version in the next couple of
days, and package the heap check into a new release. It might take a
few more days for the community apt and yum repos to get packages for
the new version (both have 1.0 versions as of right now, though).

-- 
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general