Re: Silent data loss in its pure form - Mailing list pgsql-general

From Alex Ignatov
Subject Re: Silent data loss in its pure form
Date
Msg-id 19518802B8BEC076.2BB12E9A-14C3-4825-84F0-E9035E9B3CC1@mail.outlook.com
Whole thread Raw
In response to Re: Silent data loss in its pure form  (Scott Marlowe <scott.marlowe@gmail.com>)
Responses Re: Silent data loss in its pure form
List pgsql-general

_____________________________
From: Scott Marlowe <scott.marlowe@gmail.com>
Sent: Monday, May 30, 2016 20:14
Subject: Re: [GENERAL] Silent data loss in its pure form
To: Alex Ignatov <a.ignatov@postgrespro.ru>
Cc: <pgsql-general@postgresql.org>


On Mon, May 30, 2016 at 10:57 AM, Alex Ignatov <a.ignatov@postgrespro.ru> wrote:
> Following this bug reports from redhat
> https://bugzilla.redhat.com/show_bug.cgi?id=845233
>
> it rising some dangerous issue:
>
> If on any reasons you data file is zeroed after some power loss(it is the
> most known issue on XFS in the past) when you do
> select count(*) from you_table you got zero if you table was in one
> 1GB(default) file or some other numbers !=count (*) from you_table before
> power loss
> No errors, nothing suspicious in logs. No any checksum errors. Nothing.
>
> Silent data loss is its pure form.
>
> And thanks to all gods that you notice it before backup recycling which
> contains good data.
> Keep in mind it while checking you "backups" in any forms (pg_dump or the
> more dangerous and short-spoken PITR file backup)
>
> You data is always in danger with "zeroed data file is normal file"
> paradigm.

That bug shows as having been fixed in 2012. Are there any modern,
supported distros that would still have it? It sounds really bad btw.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

It is not about modern distros it is about possible silent data loss in old distros. We have replication, have some form of data check summing, but we are powerless in front of this XFS bug just because "zeroed file is you good friend in Postgres".
 With "zero file is good file" paradigm and this noted XFS bug PG  as it is now is "colossus with feet of clay" It can do many things but it cant even tell us that we have some trouble with our precious data.
 No need to prevent or to some other AI magic and so on when zero doom day has come.
What we need now is some error report about suspicious zeroed file. To make us sure that something went wrong and we have to do recovery.
Today PG "power loss" recovery and this XFS bug poisoning our ensurance that  recovery went well . It went well even with zeroed file. It it not healthy behavior. It like a walk on a mine field with eyes closed. 
I think it is  very dangerous view on data to have data files without any header in it and without any files checking at least on PG start.
With this known XFS bug  it can leads to undetected and unavoidable loss of data.

pgsql-general by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Deleting a table file does not raise an error when the table is touched afterwards, why?
Next
From: Oleg Bartunov
Date:
Subject: Re: Slides for PGCon2016; "FTS is dead ? Long live FTS !"