Re: Refactor how we form HeapTuples for CatalogTuple(Insert|Update) - Mailing list pgsql-hackers

From Greg Burd
Subject Re: Refactor how we form HeapTuples for CatalogTuple(Insert|Update)
Date
Msg-id 09509E9D-BB60-4C79-B2F4-A02727D3D2C2@greg.burd.me
Whole thread Raw
In response to Re: Refactor how we form HeapTuples for CatalogTuple(Insert|Update)  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Dec 2 2025, at 2:09 am, Michael Paquier <michael@paquier.xyz> wrote:

> On Fri, Nov 14, 2025 at 10:31:28AM -0500, Greg Burd wrote:
>> Catalog tuples have knowledge of what attributes are mutated implicitly,
>> but they don't preserve that information and so
>> HeapDetermineColumnsInfo() has to re-discover that later on (while the
>> page is locked).  While on the surface this isn't such a big deal, it's
>> been that way and worked well for years I have other motivations (see
>> [1] again) for improving this.
> 
> Yeah, I've heard that this has been also discussed at the New York
> pgconf, where designs were mentioned so as we do not execute
> expressions while locking pages.  One issue is that this could easily
> deadlock if an expression has the idea to re-read the same page we are
> locking.  So I have a summary of what you have been doing in mind, a
> very rough one I suspect.

Hi Michael,

Thanks for taking the time to review this chunky patch.  I've been
quietly reworking it a bit, I'll go over that in a bit because I'd
appreciate opinions before I'm too far into the changes.

Yes, the motivation for this patch started off as a secondary effect of
the other work I've mentioned.  It's morphing a bit given some feedback
off-list.  There are a few reasons to standardize catalog modification
code and move it away from direct Form/GETSTRUCT or
values/nulls/replaces-style access.  First, these current methods can be
error prone and require attention to a variety of details that can
easily be overlooked.  Second, at present while we know what fields are
modified we don't preserve that information and pass it on to the
heap_update() code.  Third, I understand that at some point we'll need
in-memory representations completely decoupled from the disk to support
upgradeable catalogs.  I started off just trying to make
HeapDetermineColumnsInfo() go away, these seem like good goals as well
and touch the same area of code.  I think, if possible, it would make
sense to ensure that a large-ish change like this doesn't need to be
undone/redone to accomplish these other goals.

>> That turned out to be a bit of a challenge as we have a lot of places
>> where catalog information is updated.  We also have two different
>> methods for updating said information.  And we intermix them.  At times
>> we hack our way to a solution and ignore convention.  I went down the
>> rabbit hole on this one, but I'm back up for a cup of tea because what
>> I've done seems materially better to me and I've accomplished the goal
>> (and can eventually make use of the result in [1]).  Is this patch
>> useful even without that other work?  I think so, just on the basis of
>> cleanup. Let me explain.
> 
> It's a mix of historical and individual commit style, so having things
> slightly different depending on the catalog code path is not a
> surprise, at least here.

Got it, I don't begrudge anyone's style or how it's done now.  I just
want to address the goals I've laid out and hopefully make things a bit
better along the way.

>> [... Two methods for heap tuple updates ... ]
>> 
>> tuple = heap_modify_tuple(tuple, desc, values, nulls, replaces);
>> CatalogTupleUpdate(pg_type_desc, &tuple->t_self, tuple);
> 
> As far as I can see after applying 0002, 90%-ish of the existing
> callers of heap_modify_tuple() are updated to use heap_update_tuple().
> That's interesting to see.

Yes, that was intentional.

>> heap_update_tuple() is functionally equivalent to heap_modify_tuple(),
>> but takes a Bitmapset called "updated" rather than an array of bool
>> (generally) called "replaces" as a method for indicating what was
>> modified. Additionally, this new function tries to balance the
>> trade-offs of calling heap_getattr() versus heap_deform_tuple() based on
>> the ratio of attributes updated and their known runtime complexities. 
>> Both paths are functionally equivalent.


[NOTE: below I think you meant to say "heap_modify_tuple()", so I
correct it]
> I don't think that we can completely remove heap_modify_tuple() as
> code path used for the catalogs updates, but we can get very close to
> that.  Perhaps we should document at the top of heap_modify_tuple()
> that developers should not use this API for catalog changes, switching
> to the bitmap interface instead in priority?

I agree that it's hard to remove heap_modify_tuple() function straight
away which is why I kept it in and created a new function
heap_update_tuple().  I agree, comments are needed to explain why and
point out the preferred way.

>>  * Performance strategy:
>>  * - If updating many attributes (> 2*natts/3), use heap_getattr() to extract
>>  *   only the few non-updated attributes. This is O(k*n) where k is
>> the number
>>  *   of non-updated attributes, which is small when updating many.
>>  * - If updating few attributes (<= 2*natts/3), use
>> heap_deform_tuple() to
>>  *   extract all attributes at once (O(n)), then replace the updated ones.
>>  *   This avoids the O(n^2) cost of many heap_getattr() calls.
>>  *
>>  * The threshold of 2*natts/3 balances the fixed O(n) cost of heap_deform_tuple
>>  * against the variable O(k*n) cost of heap_getattr, where k = natts
>> - num_updated.
> 
> This change has been puzzling me quite a bit.  Why this choice of
> formula with 2/3 of arguments?  The maximum number of arguments is
> 1400 by design, so would one choice matter more than the other in most
> cases?  Catalogs don't have that many arguments either, so I am not
> sure if we need to care with this amount of tuning, TBH, but I am OK
> to be wrong.

Good points, I saw the comment about optimizing it and I took a swing. 
There are two categories for catalog updates, changing a few or setting
them all.  My hope was to choose the less computationally onerous path
based on the ratio of the amount updated vs the number in the relation.

>> indexing.c: has the change to CatalogTupleUpdate/Insert and removal of
>> their "WithInfo" companions.  If that second part is controversial then
>> I don't mind reverting it, but IMO this is cleaner and might either
>> inspire more call sites to create and reuse CatalogIndexState or for
>> someone (me?) to address the comment that pre-dates my work:
> 
> I'm OK with the simplification on this one, but let's make that a
> separate patch extracted from 0001.  The changes with
> CatalogTupleUpdate() and CatalogTupleInsert() don't have to be
> included with the changes that introduce a bitmap integration to track
> the attributes that are updated.  This is pointing to the fact that
> there are not that many callers of the WithInfo() flavors in the tree,
> compared to the non-info ones (24 AFAIK, vs 160-ish for the Insert
> path it seems, for example).

That makes sense, but another approach is to do the work mentioned in
the comment and somehow cache the CatalogIndexState (which is a somewhat
redacted ResultRelInfo by another name) somewhere.

>> pg_aggregate.c: in AggregateCreate() we run into a "compromise" pattern
>> where the code code either update or insert.  In such cases hackers
>> should use the macros for update, not insert, everywhere.  This records
>> the additional information for the update path and doesn't burden the
>> insert path with much more than a Bitmapset to free.
> 
> Hmm.  Fine by me.  The bitmap is not used for an INSERT path, so it's
> a bit of a waste, but I can see why you have done so to satisfy the
> "replace" case, from CREATE OR REPLACE (never liked this kind of
> grammar as it can be impredictible implementation-wise with syscache
> lookups required, just a personal rant).

Right, this isn't ideal.  The "waste" of alloc/free of a Bitmapset is
real, and different people will have different views on that, but IMO
it's worth it.  That said, I did rethink this a bit more.

> Another thing that you could do here is enforce a check in a couple of
> places where CatalogTupleInsert() is called for the "updated" bitmap:
> all the bits in the map should be set, like in OperatorCreate().  That
> may catch bugs at runtime perhaps when adding fields when dealing with
> an insert instead of an update?

That's not a bad idea, catching bugs early is a good outcome.

>> pg_constraint.c: brings us to namestrcpy(), take a look at
>> RenameConstraintById().  When you run into namestrcpy() you've likely
>> just done the mutation to the form directly but you'll want to mark that
>> field as updated, as in:
>> 
>> alter.c: take a look at AlterObjectRename_internal() which uses
>> get_object_attnum_name() to get the AttrNumber that is to be altered. 
>> This is fine, but it's not something we can just plug into a macro and
>> get the proper name expansion.  So, we do it the manual way and add a comment:
> 
> Not surprising to have exceptions like that.  This is a global path
> used for the renames of a bunch of objects.  This makes me wonder if
> we really have to absolutely change all the code paths that currently
> use heap_modify_tuple().  Like this one in alter.c for renames, the
> change is not an improvement in readability.  Same for
> AlterObjectNamespace_internal().  These would live better if still
> based on heap_modify_tuple(), because of the fact that they can be fed
> several object types.

I'd agree except that the paths using heap_update_tuple() won't benefit
from HOT updates.

I think readability is debatable, but I am playing games with macros
which at times makes me feel a bit dirty.

>> analyze.c: uses the HeapTupleSetAllColumnsUpdated() at the start of the
>> update_attstats() function to match the replaced code.  Later there's
>> another interesting deviation from the normal pattern in that function
>> where we want to update the nth stakind/opt/coll this was frustrating at
>> first until I found that this worked:
>> 
>> HeapTupleSetValue(pg_statistic, stakind1 + k,
>> Int16GetDatum(stats->stakind[k]), values);
> 
> I'm getting concerned about HeapTupleSetAllColumnsUpdated() while
> reaching this part of your message, FWIW.

The reason for HeapTupleSetAllColumnsUpdated() was to mimic existing
logic and change as little as possible by simply adding a layer of
friendly macros into the mix.

>> So, we're taking the attribute number adding "k" then subtracting 1, and
>> voila "Bob's your uncle."  While a bit unconventional, I think the
>> resulting form is understandable, much more than before which was:
>> 
>> i = Anum_pg_statistic_stakind1 - 1;
>> for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
>>     values[i++] = Int16GetDatum(stats->stakind[k]); /* stakindN */
> 
> And more concerns.  This change makes me question how wise it is to
> have HeapTupleUpdateSetAllColumnsUpdated(), actually.  Couldn't this
> encourage bugs if somebody forgets to update some of the fields in the
> new tuple, still the bitmap has been updated with a range to mark all
> the attributes as updated. Having HeapTupleUpdateField() do an update
> of the bitmap makes sure that the bitmap and the fields updated are
> never out-of-sync.  I'm OK with marking the individual attributes as
> updated.  Having a more aggressive API that marks all of them as
> updated sounds dangerous to me.

Well, sure I suppose so but again this was simply a replacement for the
existing code which initialized the "replaces" array to all true rather
than the normal case of all false.

>> 0002 - Update the remainder of catalog updates using the new APIs
>> 
>> This is a lot of applying the pattern over and over until it all
>> works. 
>> That's it.  Nothing much more interesting here.
>> 
>> This is a lot of change, I get that, I think there's value in change
>> when the result is cleaner and more maintainable.  The one place I'll
>> agree up front where this might case problems is back-patching fixes. 
>> That'll have to have the new approach in some branches and the old
>> approach in others.  It's a 5-year commitment, I don't take that lightly.
> 
> Backpatches imply catalog tuple manipulations from time to time, so
> that would create noise, yes.
> 
>> If you made it this far, I owe you a
>> beer/coffee/token-of-appreciation. 
> 
> Sentence noted.

Excellent!

>> Without this patch my [1] will have an ugly wart
>> (HeapDetermineColumnsInfo()) for the CatalogTupleUpdate path only.  I
>> can live with that, I'd rather not.
> 
> It seems to me that this answer is better answered as "rather not",
> especially if this improves the code maintenance in the long-term.
> 
> Patch 0001 could be split into more pieces, particularly regarding the 
> argument it makes about the fact that there are few callers of
> CatalogTuplesMultiInsertWithInfo(), CatalogTupleInsertWithInfo() and
> CatalogTupleUpdateWithInfo(), where we can just plug NULL for the
> index state.  I'd suggest to make these routines leaner in a first
> patch.
> 
>             /* No, insert new tuple */
>             stup = heap_form_tuple(RelationGetDescr(sd), values, nulls);
> -            CatalogTupleInsertWithInfo(sd, stup, indstate);
> +            CatalogTupleInsert(sd, stup, NULL);
>         }
> 
> This one in update_attstats() looks wrong.  Why are you passing NULL
> instead of an opened index state?  We keep track of indstate for all
> the pg_statistic tuples updated.
> 
> HeapTupleSetField() is used nowhere.  Do we really need it?
> --
> Michael

As I hinted at earlier on, I've continue to develop this patch [1]. 
It's not ready to post as patches just yet.  I've paused making all the
changes as the time required is non-trivial.  Here's some of what I
thought through and a taste so that I can (hopefully) get feedback (pos
or neg) before investing the rest of a week into updating every case.


bool replaces[Natts...] was static, Bitmapset * is a heap alloc/free
====================================================================

Yes, and I tried a few ideas out to avoid this.  If you recall my
earlier work to test Bitmapset[2] was in part to see if I could extend
that datatype to allow for a static version.  Closest I could come was this

+#define CatalogUpdateContext(table_name, var, tuple) \
+    do { \
+        struct { \
+            bool            nulls[Natts_##table_name]; \
+            Datum           values[Natts_##table_name]; \
+            Bitmapset       updated; \
+            bitmapword      bits[BITMAPSET_SIZE(Natts_##table_name)]; \
+        } _##table_name##_##var##_ = {.nulls = {false}, .values = {0}, \
+                .updated.nwords = BITMAPSET_SIZE(Natts_##table_name), \
+                .updated.type = T_Bitmapset, .bits = {0} }; \
+        CatalogTupleContext *(var##_ctx) = \
+            (CatalogTupleContext *)&_##table_name##_##var##_; \
+        Form_##table_name (var##_form) = \
+            (Form_##table_name) GETSTRUCT(tuple); \
+    } while(0)

Which would create the right size statically allocated Bitmapset, but it
wouldn't be "valid" because the last word in the set is all zeros and so
calling into any function that calls bms_is_valid() would assert.  I
could simply embed the logic for bms_add_member() etc into the macros,
but then I'd be creating a mess for the future.  The only advantage is
that you'd be avoiding a palloc()/free() possibly a realloc().  There's
something to be said for that, but this felt a bit over the line.

When I thought about adding a patch that extended/changed Bitmapset to
allow for static allocation I felt that too would become the objection
that sinks this whole idea, so for now I'm not going to try that.  Maybe
some other time.

In the end, after trying a few things (including pre-sizing the
Bitmapset by creating it as a singleton setting the Natts+1 value at
initialization to avoid realloc) I went back to accepting the
palloc/free model as simple enough.


E_TOO_MANY_MACROS
=======================================================================

I'm not thrilled with the multitude of macros for somewhat similar
cases.  So I tried using _Generic(), but that's not supported on MSVC
and only allowed to be generic at compile time over a single type and
dispatch to a function, not another macro, so you couldn't simply
de-reference in the Form case.

+static inline void
+_CatalogTupleValueSet(CatalogTupleContext *ctx, int attnum, Datum
value, size_t off, size_t sz)
+{
+    ctx->values[attnum] = value;
+    ctx->nulls[attnum] = false;
+}
+
+static inline void
+_CatalogTupleFormSet(void *form, int attnum, Datum value,
+                     size_t off, size_t sz)
+{
+    memcpy((char *)form + off, &value, sz);
+}
+
+#define CatalogTupleSet(ctx_or_form, table_name, field, value) \
+    _Generic((ctx_or_form), \
+        CatalogTupleContext *: _CatalogTupleValueSet, \
+        Form_##table_name *  : _CatalogTupleFormSet \
+    )(ctx_or_form, \
+      Anum_##table_name##_##field - 1, \
+      value, \
+      offsetof(FormData_##table_name, field), \
+      sizeof(((FormData_##table_name *)0)->field))

Another method for macros was to depend on sizeof() at compile time to
adjust what was built.

+#define _CAT_CTX_SET(ctx, attnum, value) \
+    ((ctx)->values[(attnum)] = (value), \
+     (ctx)->nulls[(attnum)]  = false, \
+     (ctx)->updated = bms_add_member((ctx)->updated, (attnum)))
+
+#define _CAT_FORM_SET(form, table_name, field, value) \
+    (form)[Anum_##table_name##_##field - 1] = (value)
+
+#define CatalogTupleSet_turnary(ctx_or_form, table_name, field, value) \
+    (sizeof(*(ctx_or_form)) == sizeof(CatalogTupleContext) \
+     ? (_CAT_CTX_SET((CatalogTupleContext *)(ctx_or_form), \
+                     Anum_##table_name##_##field - 1, value), \
+        (void)0)                      /* force statement context */ \
+     : (_CAT_FORM_SET(((Anum_##table_name *)(ctx_or_form)), table_name,
field, value), \
+        (void)0))

This feels wrong as who knows if today or someday the size of a form
just happens to be the same as the CatalogTupleContext?  So I put that aside.

I looked into some wild ideas that included tuple arity to encode the
action type as Thomas mentioned would be innovative and cool something like:

HeapTupleSet((pg_type, values, nulls),
             (typname, NameGetDatum(&name)),
             (typnamespace, ObjectIdGetDatum(typeNamespace)), 
             (typowner, ObjectIdGetDatum(ownerId)),
             (typlen, Int16GetDatum(sizeof(int32),
             (typdefaultbi), /* list of 1 means null! */
             (typdefault),
             (some_attr, FooGetDatum(...), foo > bar), /* list of 3 =
conditional null! */
             ...);

But a) I couldn't ever get that to work as I'd hoped it would and b)
inventing a DSL in the C pre-processor isn't a goal of mine.


Everything needs you to pass in "values, nulls, updated"!
=====================================================================

I thought about this a bit too, the only way to avoid this was to own
the declaration as well as the manipulation of these things.  I don't
want to change the entirety of the way we manipulate catalog tuples for
insert/update, I just wanted to capture what attributes changed on
update so I could use that later to avoid HeapDetermineColumnsInfo().

But, I've taken a swing at it and here's where I've landed:

Old "Form/GETSTRUCT" model:
  Form_pg_index form = (Form_pg_index) GETSTRUCT(tuple);

  form->inisclustered = false;
  CatalogTupleUpdate(relation, &tuple->t_self, tuple);

New:
  CatalogUpdateFormContext(pg_index, ctx);
  CatalogSetForm(pg_index, ctx, tuple);

  CatalogTupleUpdateField(ctx, pg_index, indisclustered, false);
  ModifyCatalogTupleField(relation, tuple, ctx);


Old "values/nulls/replaces" model:
  bool    nulls[Natts_pg_type];
  bool    replaces[Natts_pg_type];
  Datum   values[Natts_pg_type];

  values[Anum_pg_type_typtype - 1] = CharGetDatum(typeType);
  nulls[Anum_pg_type_typdefaultbin - 1] = true;
  replaces[Anum_pg_type_oid - 1] = false;

  tup = heap_modify_tuple(tuple, desc, values, nulls, replaces);
  CatalogTupleUpdate(relation, &tuple->t_self, tuple);

New:
  CatalogUpdateValuesContext(pg_type, ctx);

  CatalogTupleUpdateValue(ctx, pg_type, typtype, CharGetDatum(typeType));
  ModifyCatalogTupleValues(relation, tuple, ctx);


I'm about a third of the way done converting to this new pattern and
it's better.  As I said earlier I'm concerned that after a lot of effort
the pattern would need to change again, so I'd rather shake out those
concerns/ideas now early on.  You can see the changes in [1], take a
look at htup.h, aclchk.c, pg_aggregate.c, and a few other places.

Now there are macros for: 1) declaration, 2) setting/mutating, 3)
modifying/inserting.  I guess I was starting to feel like I was digging
a hole no one would appreciate or agree was necessary so I'm asking for
early feedback because rule #1 when you find yourself digging a hole is
to stop digging.

best.

-greg

[1] https://github.com/gburd/postgres/pull/18
[2] https://www.postgresql.org/message-id/flat/7BD1ABDB-B03A-464A-9BA9-A73B55AD8A1F%40getmailspring.com




pgsql-hackers by date:

Previous
From: David Geier
Date:
Subject: Re: Consistently use palloc_object() and palloc_array()
Next
From: Álvaro Herrera
Date:
Subject: Re: Issues with ON CONFLICT UPDATE and REINDEX CONCURRENTLY