Thread: inconsistent state after crash recovery
Hi, I received a question about inconsistent state after crash recovery. When a table file is broken (or just lost), PostgreSQL can not recover a whole table, and does not show any notice while recoverying. I think it means "inconsistent" state. (1) create a table, and fill records. (2) process a checkpoint. (3) fill more records. (4) force a crash, and delete the table file. (5) run recovery on restarting. (6) only records after the checkpoint can be recoverd. For example, the attached log shows that PostgreSQL can recover only 1058 records in the table which contains 2000 records before the crash, and does not tell anything in the server log. ---------------------------------------------------------- insert into t1 values (trim(to_char(generate_series(1,1000), '00000000000000000000')) ); INSERT 0 1000 select count(*) from t1; count ------- 1000 (1 row) checkpoint; CHECKPOINT insert into t1 values (trim(to_char(generate_series(1001,2000), '00000000000000000000')) ); INSERT 0 1000 select count(*) from t1; count ------- 2000 (1 row) (delete the table file) (kill postgres) (restart postgres with recovery) select count(*) from t1; count ------- 1058 (1 row) ---------------------------------------------------------- Is this expected or acceptable? I think, at least, PostgreSQL should say something about this situation in the server log, because DBA can not recognize this situation if no server log exists. To reproduce it, please check the attached test script. Any comments? Regards, -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp
Attachment
Hi, On 2013-07-26 13:33:13 +0900, Satoshi Nagayasu wrote: > I received a question about inconsistent state after crash recovery. > > When a table file is broken (or just lost), PostgreSQL can not recover > a whole table, and does not show any notice while recoverying. > I think it means "inconsistent" state. > > (1) create a table, and fill records. > (2) process a checkpoint. > (3) fill more records. > (4) force a crash, and delete the table file. > (5) run recovery on restarting. > (6) only records after the checkpoint can be recoverd. > > For example, the attached log shows that PostgreSQL can recover > only 1058 records in the table which contains 2000 records > before the crash, and does not tell anything in the server log. > > Is this expected or acceptable? I'd say it's both. WAL replay doesn't have the knowledge to detect that in all too many cases. Nearly always a page's contents will be restored by a full page image the first time it gets changed so they will individually look completely normal. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2013-07-26 13:33:13 +0900, Satoshi Nagayasu wrote: >> Is this expected or acceptable? > I'd say it's both. Postgres is built on the assumption that the underlying filesystem is reliable, ie, once you've successfully fsync'd some data that data won't disappear. If the filesystem fails to honor that contract, it's a filesystem bug not a Postgres bug. Nor is it reasonable to expect Postgres to be able to detect every such violation. As an example, would you expect crash recovery to notice the disappearance of a file that was touched nowhere in the replayed actions? regards, tom lane
Hi Satoshi, I was wondering about this problem. Please tell us about your system enviroment which is postgresql version ,OS, raid card, and file system. Best regards, -- Mitsumasa KONDO NTT Open Source Software Center
On Fri, Jul 26, 2013 at 8:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@2ndquadrant.com> writes: >> On 2013-07-26 13:33:13 +0900, Satoshi Nagayasu wrote: >>> Is this expected or acceptable? > >> I'd say it's both. > > Postgres is built on the assumption that the underlying filesystem is > reliable, ie, once you've successfully fsync'd some data that data won't > disappear. If the filesystem fails to honor that contract, it's a > filesystem bug not a Postgres bug. Nor is it reasonable to expect > Postgres to be able to detect every such violation. As an example, > would you expect crash recovery to notice the disappearance of a file > that was touched nowhere in the replayed actions? Eh, maybe not. But should we try harder to detect the unexpected disappearance of one that is? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jul 26, 2013 at 8:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> would you expect crash recovery to notice the disappearance of a file >> that was touched nowhere in the replayed actions? > Eh, maybe not. But should we try harder to detect the unexpected > disappearance of one that is? We do, don't we? The replay stuff should complain unless it sees a drop or truncate covering any unaccounted-for pages. regards, tom lane
On Fri, Aug 2, 2013 at 8:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Jul 26, 2013 at 8:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> would you expect crash recovery to notice the disappearance of a file >>> that was touched nowhere in the replayed actions? > >> Eh, maybe not. But should we try harder to detect the unexpected >> disappearance of one that is? > > We do, don't we? The replay stuff should complain unless it sees a drop > or truncate covering any unaccounted-for pages. Hmm. Yeah. But the OP seems to think it doesn't work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi,
I'm very new here on this mailing list, but I've been using PostgreSQL for a while, and it scares me a little, that it's a real pain to try to recover data from corrupted table.
Situations like file being lost following server crash (after fsck) or page corruption happens quite often.
Having corruption at the row level for example, system could mark the page as corrupted in the system catalog.
Giving that page knows last change to this page, can we use archived WAL-s to recover the page?
We can have a table inside pg_catalog like pg_corrupted_pages with information of page corruption detected by backend.
Similar to tables, in a case of lost file, once system notice that, we should have a column in pg_class called relneedrecovery to record that.
Will it be possible to recover this file from last hot backup and apply redo to it based on WAL records?
Similar functionality is provider by Oracle (media and block recovery).
Similar functionality is provider by Oracle (media and block recovery).
I assume we will need some extra DDL commands to handle file/block recovery.
Also, It would be nice to have a command to quickly cross check files between pg_class and filesystem - just simple open/close system call for each relation - it is always faster that to run vaccum to check that. Tomasz
On 2 August 2013 13:19, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Aug 2, 2013 at 8:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Hmm. Yeah. But the OP seems to think it doesn't work.
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Jul 26, 2013 at 8:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> would you expect crash recovery to notice the disappearance of a file
>>> that was touched nowhere in the replayed actions?
>
>> Eh, maybe not. But should we try harder to detect the unexpected
>> disappearance of one that is?
>
> We do, don't we? The replay stuff should complain unless it sees a drop
> or truncate covering any unaccounted-for pages.--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 2 August 2013 13:19, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Aug 2, 2013 at 8:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Hmm. Yeah. But the OP seems to think it doesn't work.
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Jul 26, 2013 at 8:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> would you expect crash recovery to notice the disappearance of a file
>>> that was touched nowhere in the replayed actions?
>
>> Eh, maybe not. But should we try harder to detect the unexpected
>> disappearance of one that is?
>
> We do, don't we? The replay stuff should complain unless it sees a drop
> or truncate covering any unaccounted-for pages.--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
(2013/08/02 21:19), Robert Haas wrote: > On Fri, Aug 2, 2013 at 8:17 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Fri, Jul 26, 2013 at 8:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> would you expect crash recovery to notice the disappearance of a file >>>> that was touched nowhere in the replayed actions? >> >>> Eh, maybe not. But should we try harder to detect the unexpected >>> disappearance of one that is? >> >> We do, don't we? The replay stuff should complain unless it sees a drop >> or truncate covering any unaccounted-for pages. > > Hmm. Yeah. But the OP seems to think it doesn't work. Yes. I'm afraid that. My attached script shows that crash recovery re-creates the lost table file implicitly, and fills some of those blocks (maybe lower ones) with zero without any notice. We can easily observe it by using pg_filedump. Thus, the table file can lose records, but DBA cannot recognize it because no message is left in the server log. I agree that this is not a PostgreSQL bug. However, DBA still needs to detect this table corruption, brought by several components which PostgreSQL relys on, to consider restoring from database backup. If PostgreSQL can detect and tell something about that, it would be really helpful for DBA to make some critical decision. I think PostgreSQL will be able to do that. Regards, -- Satoshi Nagayasu <snaga@uptime.jp> Uptime Technologies, LLC. http://www.uptime.jp