Truncation failure in autovacuum results in data corruption (duplicate keys) - Mailing list pgsql-hackers

From MauMau
Subject Truncation failure in autovacuum results in data corruption (duplicate keys)
Date
Msg-id 5BBC590AE8DF4ED1A170E4D48F1B53AC@tunaPC
Whole thread Raw
Responses Re: Truncation failure in autovacuum results in data corruption (duplicate keys)
List pgsql-hackers
Hello,

It seems to me that our customer might have hit an unresolved data
corruption issue which is already known in this ML, but I can't figure
out why this happened.  I'd appreciate if you could give me your
thoughts.  Depending on the severity of this issue and the customer's
request, I think I'll submit a patch to solve the unresolved issue.

The customer is using PostgreSQL 9.2 on Windows.  Autovacuum failed to
truncate the pages at the end of a table, which is probably due to
anti-virus software.  FYI, automatic checkpoint was in progress when
this error occurred.

ERROR:  could not truncate file "base/aaa/bbb" to 58 blocks:
Permission denied

After a while, an application got a unique key violation error.  The
customer says that there shouldn't be any duplicate keys.

ERROR:  duplicate key value violates unique constraint "pk_xxx"

The output of pg_dump on that table certainly includes multiple couple
of rows with the same primary key values... data corruption.



Another Japanese user, who is not our customer, hit the same problem
with 9.4, which was not solved (note: the mail is in Japanese).  He
said he repeatedly encountered the same error even after REINDEXing
with 9.4, but it doesn't happen with 9.1.  I wonder if there's
something introduced in 9.2 which causes the issue, such as index-only
scan stuff:

https://ml.postgresql.jp/pipermail/pgsql-jp/2016-October/016865.html


The problem with truncation failure was found in 2010.  The user
reported the problem as another phenomenon on 9.0.  The problem could
not be solved even by leading hackers here -- Tom, Robert, Alvaro,
Heikki, Greg Stark, etc.

TODO
https://wiki.postgresql.org/wiki/Todo
----------------------------------------
Restructure truncation logic to be more resistant to failure
This also involves not writing dirty buffers for a truncated or
dropped relation
http://archives.postgresql.org/pgsql-hackers/2010-08/msg01032.php
----------------------------------------


Tom's comments in above thread
----------------------------------------
Imagine that we have some rows at the end of a table, we delete them,
we vacuum before the next checkpoint.  Vacuum decides it can now
truncate away the last pages, but fails to do so.  The original page
state is still on disk, which means we have lost the fact of the
deletion --- the rows are now effectively live again, though their
index entries are probably gone.

...
Still, we have a live issue with heap truncation during plain VACUUM.
However, the scope of the problem seems a lot less than I was
thinking.
Maybe write-the-buffers-first is a sufficient longterm solution.

...
So it seems like the only case where there is really grounds for PANIC
on failure is the VACUUM case.  And there we might apply Heikki's idea
of trying to zero the untruncatable pages first.

...
I'm thinking that we need some sort of what-to-do-on-error flag passed
into RelationTruncate, plus at least order-of-operations fixes in
several other places, if not a wholesale refactoring of this whole
call
stack.  But I'm running out of steam and don't have a concrete
proposal
to make right now.  In any case, we've got more problems here than
just
the original one of forgetting dirty buffers too soon.
----------------------------------------


However, I have a question.  How does the truncation failure in
autovacuum lead to duplicate keys?  The failed-to-be-truncated pages
should only contain dead tuples, so pg_dump's table scan should ignore
dead tuples in those pages.

Regards
MauMau





pgsql-hackers by date:

Previous
From: Pavan Deolasee
Date:
Subject: Re: reloption to prevent VACUUM from truncating empty pages at theend of relation
Next
From: Tom Lane
Date:
Subject: Re: reloption to prevent VACUUM from truncating empty pages at the end of relation