Re: race condition in pg_class - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: race condition in pg_class |
Date | |
Msg-id | 63c27c17-0d8f-4ede-a624-3ee41d28b539@iki.fi Whole thread Raw |
In response to | Re: race condition in pg_class (Noah Misch <noah@leadboat.com>) |
List | pgsql-hackers |
On 17/08/2024 07:07, Noah Misch wrote: > On Fri, Aug 16, 2024 at 12:26:28PM +0300, Heikki Linnakangas wrote: >> On 14/07/2024 20:48, Noah Misch wrote: >>> + * ... [any slow preparation not requiring oldtup] ... >>> + * heap_inplace_update_scan([...], &tup, &inplace_state); >>> + * if (!HeapTupleIsValid(tup)) >>> + * elog(ERROR, [...]); >>> + * ... [buffer is exclusive-locked; mutate "tup"] ... >>> + * if (dirty) >>> + * heap_inplace_update_finish(inplace_state, tup); >>> + * else >>> + * heap_inplace_update_cancel(inplace_state); >> >> I wonder if the functions should be called "systable_*" and placed in >> genam.c rather than in heapam.c. The interface looks more like the existing >> systable functions. It feels like a modularity violation for a function in >> heapam.c to take an argument like "indexId", and call back into systable_* >> functions. > > Yes, _scan() and _cancel() especially are wrappers around systable. Some API > options follow. Any preference or other ideas? > > ==== direct s/heap_/systable_/ rename [option 1] > > systable_inplace_update_scan([...], &tup, &inplace_state); > if (!HeapTupleIsValid(tup)) > elog(ERROR, [...]); > ... [buffer is exclusive-locked; mutate "tup"] ... > if (dirty) > systable_inplace_update_finish(inplace_state, tup); > else > systable_inplace_update_cancel(inplace_state); > > ==== make the first and last steps more systable-like [option 2] > > systable_inplace_update_begin([...], &tup, &inplace_state); > if (!HeapTupleIsValid(tup)) > elog(ERROR, [...]); > ... [buffer is exclusive-locked; mutate "tup"] ... > if (dirty) > systable_inplace_update(inplace_state, tup); > systable_inplace_update_end(inplace_state); > > ==== no systable_ wrapper for middle step, more like CatalogTupleUpdate [option 3] > > systable_inplace_update_begin([...], &tup, &inplace_state); > if (!HeapTupleIsValid(tup)) > elog(ERROR, [...]); > ... [buffer is exclusive-locked; mutate "tup"] ... > if (dirty) > heap_inplace_update(relation, > systable_inplace_old_tuple(inplace_state), > tup, > systable_inplace_buffer(inplace_state)); > systable_inplace_update_end(inplace_state); My order of preference is: 2, 1, 3. >> Could we just stipulate that you must always hold LOCKTAG_TUPLE when you >> call heap_update() on pg_class or pg_database? That'd make the rule simple. > > We could. That would change more code sites. Rough estimate: > > $ git grep -E CatalogTupleUpd'.*(class|relrelation|relationRelation)' | wc -l > 23 > > If the count were 2, I'd say let's simplify the rule like you're exploring. > (I originally had a complicated rule for pg_database, but I abandoned that > when it helped few code sites.) If it were 100, I'd say the complicated rule > is worth it. A count of 23 makes both choices fair. Ok. How many of those for RELKIND_INDEX vs tables? I'm thinking if we should always require a tuple lock on indexes, if that would make a difference. > Long-term, I hope relfrozenxid gets reimplemented with storage outside > pg_class, removing the need for inplace updates. So the additional 23 code > sites might change back at a future date. That shouldn't be a big > consideration, though. > > Another option here would be to preface that README section with a simplified > view, something like, "If a warning brought you here, take a tuple lock. The > rest of this section is just for people needing to understand the conditions > for --enable-casserts emitting that warning." How about that instead of > simplifying the rules? Works for me. Or perhaps the rules could just be explained more succinctly. Something like: ----- pg_class heap_inplace_update_scan() callers: before the call, acquire a lock on the relation in mode ShareUpdateExclusiveLock or stricter. If the update targets a row of RELKIND_INDEX (but not RELKIND_PARTITIONED_INDEX), that lock must be on the table, locking the index rel is not necessary. (This allows VACUUM to overwrite per-index pg_class while holding a lock on the table alone.) heap_inplace_update_scan() acquires and releases LOCKTAG_TUPLE in InplaceUpdateTupleLock, an alias for ExclusiveLock, on each tuple it overwrites. pg_class heap_update() callers: before copying the tuple to modify, take a lock on the tuple, or a ShareUpdateExclusiveLock or stricter on the relation. SearchSysCacheLocked1() is one convenient way to acquire the tuple lock. Most heap_update() callers already hold a suitable lock on the relation for other reasons, and can skip the tuple lock. If you do acquire the tuple lock, release it immediately after the update. pg_database: before copying the tuple to modify, all updaters of pg_database rows acquire LOCKTAG_TUPLE. (Few updaters acquire LOCKTAG_OBJECT on the database OID, so it wasn't worth extending that as a second option.) ----- -- Heikki Linnakangas Neon (https://neon.tech)
pgsql-hackers by date: