Re: Truncation failure in autovacuum results in data corruption (duplicate keys) - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Truncation failure in autovacuum results in data corruption (duplicate keys)
Date
Msg-id 15690.1524084557@sss.pgh.pa.us
Whole thread Raw
In response to Re: Truncation failure in autovacuum results in data corruption (duplicate keys)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Truncation failure in autovacuum results in data corruption(duplicate keys)
Re: Truncation failure in autovacuum results in data corruption(duplicate keys)
List pgsql-hackers
I wrote:
> Relation truncation throws away the page image in memory without ever
> writing it to disk.  Then, if the subsequent file truncate step fails,
> we have a problem, because anyone who goes looking for that page will
> fetch it afresh from disk and see the tuples as live.

> There are WAL entries recording the row deletions, but that doesn't
> help unless we crash and replay the WAL.

> It's hard to see a way around this that isn't fairly catastrophic for
> performance :-(.

Just to throw out a possibly-crazy idea: maybe we could fix this by
PANIC'ing if truncation fails, so that we replay the row deletions from
WAL.  Obviously this would be intolerable if the case were frequent,
but we've had only two such complaints in the last nine years, so maybe
it's tolerable.  It seems more attractive than taking a large performance
hit on truncation speed in normal cases, anyway.

A gotcha to be concerned about is what happens if we replay from WAL,
come to the XLOG_SMGR_TRUNCATE WAL record, and get the same truncation
failure again, which is surely not unlikely.  PANIC'ing again will not
do.  I think we could probably handle that by having the replay code
path zero out all the pages it was unable to delete; as long as that
succeeds, we can call it good and move on.

Or maybe just do that in the mainline case too?  That is, if ftruncate
fails, handle it by zeroing the undeletable pages and pressing on?

> But in any case it's wrapped up in order-of-operations
> issues.  I've long since forgotten the details, but I seem to have thought
> that there were additional order-of-operations hazards besides this one.

It'd be a good idea to redo that investigation before concluding this
issue is fixed, too.  I was not thinking at the time that it'd be years
before anybody did anything, or I'd have made more notes.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: WIP: Covering + unique indexes.
Next
From: David Rowley
Date:
Subject: Re: Should we add GUCs to allow partition pruning to be disabled?