Re: Refactor how we form HeapTuples for CatalogTuple(Insert|Update) - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Refactor how we form HeapTuples for CatalogTuple(Insert|Update)
Date
Msg-id aS-5pqkCihMvfoq_@paquier.xyz
Whole thread Raw
In response to Re: Refactor how we form HeapTuples for CatalogTuple(Insert|Update)  (Greg Burd <greg@burd.me>)
Responses Re: Refactor how we form HeapTuples for CatalogTuple(Insert|Update)
List pgsql-hackers
On Tue, Dec 02, 2025 at 10:21:36AM -0500, Greg Burd wrote:
> On Dec 2 2025, at 2:09 am, Michael Paquier <michael@paquier.xyz> wrote:
> Thanks for taking the time to review this chunky patch.  I've been
> quietly reworking it a bit, I'll go over that in a bit because I'd
> appreciate opinions before I'm too far into the changes.

Noted.

> I started off just trying to make
> HeapDetermineColumnsInfo() go away, these seem like good goals as well
> and touch the same area of code.

This one is an interesting piece of argument.

>>> indexing.c: has the change to CatalogTupleUpdate/Insert and removal of
>>> their "WithInfo" companions.  If that second part is controversial then
>>> I don't mind reverting it, but IMO this is cleaner and might either
>>> inspire more call sites to create and reuse CatalogIndexState or for
>>> someone (me?) to address the comment that pre-dates my work:
>>
>> I'm OK with the simplification on this one, but let's make that a
>> separate patch extracted from 0001.  The changes with
>> CatalogTupleUpdate() and CatalogTupleInsert() don't have to be
>> included with the changes that introduce a bitmap integration to track
>> the attributes that are updated.  This is pointing to the fact that
>> there are not that many callers of the WithInfo() flavors in the tree,
>> compared to the non-info ones (24 AFAIK, vs 160-ish for the Insert
>> path it seems, for example).
>
> That makes sense, but another approach is to do the work mentioned in
> the comment and somehow cache the CatalogIndexState (which is a somewhat
> redacted ResultRelInfo by another name) somewhere.

Yes, perhaps we should just do that.  That may simplify some of the
existing catalog paths, where we decide to open directly the indexes,
as you have already noticed while working on 0002.

> As I hinted at earlier on, I've continue to develop this patch [1].
> It's not ready to post as patches just yet.  I've paused making all the
> changes as the time required is non-trivial.  Here's some of what I
> thought through and a taste so that I can (hopefully) get feedback (pos
> or neg) before investing the rest of a week into updating every case.

Okay, noted.

> +#define CatalogUpdateContext(table_name, var, tuple) \
> +    do { \
> +        struct { \
> +            bool            nulls[Natts_##table_name]; \
> +            Datum           values[Natts_##table_name]; \
> +            Bitmapset       updated; \
> +            bitmapword      bits[BITMAPSET_SIZE(Natts_##table_name)]; \
> +        } _##table_name##_##var##_ = {.nulls = {false}, .values = {0}, \
> +                .updated.nwords = BITMAPSET_SIZE(Natts_##table_name), \
> +                .updated.type = T_Bitmapset, .bits = {0} }; \
> +        CatalogTupleContext *(var##_ctx) = \
> +            (CatalogTupleContext *)&_##table_name##_##var##_; \
> +        Form_##table_name (var##_form) = \
> +            (Form_##table_name) GETSTRUCT(tuple); \
> +    } while(0)
>
> Which would create the right size statically allocated Bitmapset, but it
> wouldn't be "valid" because the last word in the set is all zeros and so
> calling into any function that calls bms_is_valid() would assert.  I
> could simply embed the logic for bms_add_member() etc into the macros,
> but then I'd be creating a mess for the future.  The only advantage is
> that you'd be avoiding a palloc()/free() possibly a realloc().  There's
> something to be said for that, but this felt a bit over the line.

Hmm.  That feels like too much magic to me.  We don't have such
complex macros in the tree, currently.  And do we really need to
sacrifice readability over some extra palloc()/free() in what are
non-hot code paths?

> In the end, after trying a few things (including pre-sizing the
> Bitmapset by creating it as a singleton setting the Natts+1 value at
> initialization to avoid realloc) I went back to accepting the
> palloc/free model as simple enough.

Yeah, I'd agree that this is better.

> +#define CatalogTupleSet_turnary(ctx_or_form, table_name, field, value) \
> +    (sizeof(*(ctx_or_form)) == sizeof(CatalogTupleContext) \
> +     ? (_CAT_CTX_SET((CatalogTupleContext *)(ctx_or_form), \
> +                     Anum_##table_name##_##field - 1, value), \
> +        (void)0)                      /* force statement context */ \
> +     : (_CAT_FORM_SET(((Anum_##table_name *)(ctx_or_form)), table_name,
> field, value), \
> +        (void)0))
>
> This feels wrong as who knows if today or someday the size of a form
> just happens to be the same as the CatalogTupleContext?  So I put that aside.

I wouldn't do that either, yes.

> New:
>   CatalogUpdateValuesContext(pg_type, ctx);
>
>   CatalogTupleUpdateValue(ctx, pg_type, typtype, CharGetDatum(typeType));
>   ModifyCatalogTupleValues(relation, tuple, ctx);
>
> I'm about a third of the way done converting to this new pattern and
> it's better.  As I said earlier I'm concerned that after a lot of effort
> the pattern would need to change again, so I'd rather shake out those
> concerns/ideas now early on.  You can see the changes in [1], take a
> look at htup.h, aclchk.c, pg_aggregate.c, and a few other places.

One thing not mentioned here is CatalogUpdateValuesDecl(), macro that
hides the bitmap with the updated values, so you'd need one to ensure
that the rest works.  Perhaps all that should not be in htup.h.  We
may want to header reordering to show that there is a split with heap
tuples, also one of your goals.

> Now there are macros for: 1) declaration, 2) setting/mutating, 3)
> modifying/inserting.  I guess I was starting to feel like I was digging
> a hole no one would appreciate or agree was necessary so I'm asking for
> early feedback because rule #1 when you find yourself digging a hole is
> to stop digging.

I'm not completely sure about this one.  Some of the directions and
decisions that need to be taken here also depend on your other work
posted at [1].  Looking at v23 from this other thread, the code paths
touched there are different from what's touched here (heaptuple.c vs
heapam.c).  You have mentioned that both share some ground.  Is there
a specific part in the patch of the other thread I should look at to
get an idea of the bigger picture you have in mind?

[1]: https://www.postgresql.org/message-id/78574B24-BE0A-42C5-8075-3FA9FA63B8FC@amazon.com
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: Improve pg_sync_replication_slots() to wait for primary to advance
Next
From: shveta malik
Date:
Subject: Re: Proposal: Conflict log history table for Logical Replication