Re: pgsql: Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURR - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: pgsql: Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURR
Date
Msg-id 40d0a4c8-de56-7710-5197-1304fa156aba@enterprisedb.com
Whole thread Raw
Responses Re: pgsql: Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURR  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers

On 2/9/22 01:43, Andres Freund wrote:
> Hi,
> 
> On 2022-02-08 22:13:01 +0100, Tomas Vondra wrote:
>> On 10/24/21 03:40, Noah Misch wrote:
>>> Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.
>>>
>>> CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
>>> no later than each backend's next transaction start.  That failed to
>>> hold when a backend absorbed a relevant invalidation in the middle of
>>> running RelationBuildDesc() on the CIC index.  Queries that use the
>>> resulting index can silently fail to find rows.  Fix this for future
>>> index builds by making RelationBuildDesc() loop until it finishes
>>> without accepting a relevant invalidation.  It may be necessary to
>>> reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
>>> Back-patch to 9.6 (all supported versions).
>>>
>>> Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
>>> Freund.
>>>
>>> Discussion: https://postgr.es/m/20210730022548.GA1940096@gust.leadboat.com
>>>
>>
>> Unfortunately, this seems to have broken CLOBBER_CACHE_ALWAYS builds. Since
>> this commit, initdb never completes due to infinite retrying over and over
>> (on the first RelationBuildDesc call).
> 
> Ugh. Do we need to do something about WRT the next set of minor releases? Is
> there a a chance of this occuring in "real" workloads?
> 

AFAICS this only affects builds with CLOBBER_CACHE_ALWAYS, and anyone 
running such build in production clearly likes painful things anyway.

But really, for the infinite loop to happen, building a relation 
descriptor has to invalidate a cache. And I haven't found a way to do 
that without the CLOBBER_CACHE_ALWAYS thing.

Also, all the November minor releases include this commit, and there 
were no reports about this (pretty obvious) issue. Buildfarm did not 
complain either (but an animal may be stuck for months and we would not 
know about it).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: "tanghy.fnst@fujitsu.com"
Date:
Subject: RE: [BUG]Update Toast data failure in logical replication
Next
From: Peter Smith
Date:
Subject: Re: row filtering for logical replication