On 10/24/21 03:40, Noah Misch wrote:
> Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY.
>
> CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
> no later than each backend's next transaction start. That failed to
> hold when a backend absorbed a relevant invalidation in the middle of
> running RelationBuildDesc() on the CIC index. Queries that use the
> resulting index can silently fail to find rows. Fix this for future
> index builds by making RelationBuildDesc() loop until it finishes
> without accepting a relevant invalidation. It may be necessary to
> reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
> Back-patch to 9.6 (all supported versions).
>
> Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
> Freund.
>
> Discussion: https://postgr.es/m/20210730022548.GA1940096@gust.leadboat.com
>
Unfortunately, this seems to have broken CLOBBER_CACHE_ALWAYS builds.
Since this commit, initdb never completes due to infinite retrying over
and over (on the first RelationBuildDesc call).
We have a CLOBBER_CACHE_ALWAYS buildfarm machine "avocet", and that
currently looks like this (top):
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
2626 buildfa+ 20 0 202888 21416 20084 R 98.34 0.531 151507:16
/home/buildfarm/avocet/buildroot/REL9_6_STABLE/pgsql.build/tmp_install/home/buildfarm/avocet/buildroot/REL9_6_STABLE/inst/bin/postgres
--boot -x1 -F
Yep, that's 151507 minutes, i.e. 104 days in initdb :-/
I haven't looked at this very closely yet, but it seems the whole
problem is we do this at the very beginning:
in_progress_list[in_progress_offset].invalidated = false;
/*
* find the tuple in pg_class corresponding to the given relation id
*/
pg_class_tuple = ScanPgRelation(targetRelId, true, false);
which seems entirely self-defeating, because ScanPgRelation acquires a
lock (on pg_class), which accepts invalidations, which invalidates
system caches (in clobber_cache_always), which sets promptly sets
in_progress_list[in_progress_offset].invalidated = false;
guaranteeing an infinite loop.
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company