Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY
Date
Msg-id CABOikdM1vd66_SS-2+6Kfj58uE0f=fzwfYfJhS7iJHa0Z_TiYg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] Index corruption with CREATE INDEX CONCURRENTLY  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers

On Sat, Feb 4, 2017 at 11:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Based on Pavan's comments, I think trying to force this into next week's
releases would be extremely unwise.  If the bug went undetected this long,
it can wait for a fix for another three months. 

Yes, I think bug existed ever since and went unnoticed. One reason could be that the race happens only when the new index turns HOT updates into non-HOT updates. Another reason could be that we don't have checks in place to catch these kinds of corruption. Having said that, since we have discovered the bug, at least many 2ndQuadrant customers have expressed worry and want to know if the fix will be available in 9.6.2 and other minor releases.  Since the bug can lead to data corruption, the worry is justified. Until we fix the bug, there will be a constant worry about using CIC. 

If we can have some kind of band-aid fix to plug in the hole, that might be enough as well. I tested my first patch (which will need some polishing) and that works well AFAIK. I was worried about prepared queries and all, but that seems ok too. RelationGetIndexList() always get called within ExecInitModifyTable. The fix seems quite unlikely to cause any other side effects.

Another possible band-aid is to throw another relcache invalidation in CIC. Like adding a dummy index_set_state_flags() within yet another transaction. Seems ugly, but should fix the problem for now and not cause any impact on relcache mechanism whatsoever.

That seems better than
risking new breakage when it's barely 48 hours to the release wrap
deadline.  We do not have time to recover from any mistakes.

I'm not sure what the project policies are, but can we consider delaying the release by a week for issues like these? Or do you think it will be hard to come up with a proper fix for the issue and it will need some serious refactoring?

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] PoC: Make it possible to disallow WHERE-less UPDATE and DELETE
Next
From: Fabien COELHO
Date:
Subject: Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands:\quit_if, \quit_unless)