Home > mailing lists

Re: [GENERAL] Would like to below scenario is possible for gettingpage/block corruption - Mailing list pgsql-general

From	Sreekanth Palluru
Subject	Re: [GENERAL] Would like to below scenario is possible for gettingpage/block corruption
Date	December 9, 2016 09:21:32
Msg-id	CAP+fnpg1dPBTMppay7WMu9EEea_F4Ah2Z1JguOQXvFsHremCeQ@mail.gmail.com Whole thread Raw
In response to	Re: [GENERAL] Would like to below scenario is possible for gettingpage/block corruption (Sreekanth Palluru <sree4pg@gmail.com>)
List	pgsql-general

Tree view

Correcting typos

Michael,

Thanks for your prompt reply

In my environment those two parameters are enabled . Just give you brief of PG database envornment

Version 9.2.4.1

Windows 7 Professional SP1

fsync=on

full_page_writes=on

wal_sync_method=open_datasync

My Customer is into building Cancer related systems and we ship Dell systems with our software image contains PG. Few of the customers are facing corruption issues say around 5% .

We are in process of reproducing the issue , since there are different variables involved in reproducing issue like Dell HW, Software image versions, Application versions, write-cache settings RAID/Disk, RAID controllers with no battery backup and power failures etc , I am trying to understand is there possibility that PG can end up in having corrupted blocks due to system crash though we set these parameters

a)As I understand fsycn will write the block from memory to disk and block just after step 4) would have written disk assuming disk cache did not lie

b)and assume that full_page_writes=on has dumped the whole 8k block into WAL

before it updates block i.e. after step 2) and before 3)

c) if crash happens after step4) , since there is no PageHeader data , after system restarts PG will complain that it is corrupted block or invalid header

Please correct me if my understanding about play fsync and full_page_writes are correct ? if so , I see that there is possibility getting corruptions whenever PG extends a relation and crash happens just after step 4)

I am not sure will the same applicable to existing page (not a new page) and how it handles if there is PageHeader available as part of full_page_writes, will same corruption can be happen or will PG can recover database as I am not sure

recovery process can update the PageHeader from WAL records it wrote recptr as part of step 4) during the recovery process .

-Sreekanth

On Fri, Dec 9, 2016 at 2:09 PM, Sreekanth Palluru <sree4pg@gmail.com> wrote:

Michael,
Thanks for your prompt reply

In my environment those two parameters are enabled . Just give you brief of PG database envornment
Version 9.2.4.1
Windows 7 Professional SP1
fsync=on
full_page_writes=on
wal_sync_method=open_datasync

My Customer is into building Cancer related systems and we ship Dell systems with our software image contains PG. Few of the customers are facing corruption issues say around 5% .
We are in process of reproducing the issue , since there are different variables involved in reproducing issue like Dell HW, Software image versions, Application versions, write-cache settings RAID/Disk, RAID controllers with no backup and power failures etc , I am trying to understand is there possibility that PG can end up in having corrupted blocks due to system crash.

1)As I understand fsycn will write the block from memory to disk and block just after step 4) would have written disk assuming disk cache did not lie
2)and assume that full_page_writes=on has dumped the whole 8k block into WAL
before it updates block i.e. after step 2) and before 3)
3) if crash happens after step4) , since there is no PageHeader data , after system restarts PG will complain that it is corrupted block or invalid header

Please correct me if my understanding about play fsync and full_page_writes are correct ? if so , I see that there is possibility getting corruptions whenever PG extends a relation and crash happens just after step 4)

I am not sure will the same applicable to existing page (not a new page) and how it handles if there is PageHeader available as part of full_page_writes, will same corruption can be happen or will PG can recover database as I am not sure
recovery process can update the PageHeader from WAL records it wrote recptr as part of step 4) during the recovery process .

-Sreekanth

On Fri, Dec 9, 2016 at 12:44 PM, Michael Paquier <michael.paquier@gmail.com> wrote:
(Please top-post that's annoying)

On Fri, Dec 9, 2016 at 10:28 AM, Sreekanth Palluru <sree4pg@gmail.com> wrote:
> Can I generalize that, if after step 4) page ( new page or old page) got
> written disk from buffer and crash happens between step 4) and 5) we
> always get
> block corruption issues with Postgres which can only be recovered by setting
> zero_damaged_pages if we just have pg_dump backups and we are OK lose data
> in the affected blocks?
>
> I am also looking at ways of reproducing the issue ? appreciate your advice
> on it ?

Postgres is designed to avoid such corruption problems if
full_page_writes and fsync are enabled, that's a base stone of its
reliability. If you can create a self-contained scenario able to
reproduce a failure, that could be treated as a Postgres bug, but you
are giving no evidence that this is the case.
--
Michael

--
Regards
Sreekanth

Regards

Sreekanth

pgsql-general by date:

From: Tom Lane
Date: 09 December 2016, 09:14:56
Subject: Re: [GENERAL] Importing SQLite database

From: Roshan Jha
Date: 09 December 2016, 10:56:58
Subject: [GENERAL] Regrding:- Arduino-Postgresql Direct Connection

Re: [GENERAL] Would like to below scenario is possible for gettingpage/block corruption - Mailing list pgsql-general

Previous

Next