Re: Expanding HOT updates for expression and partial indexes - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: Expanding HOT updates for expression and partial indexes
Date
Msg-id CAEze2WgasQk4Nwod=YyqdmGT=zYWGf8=Mne7EN3e7ygmnj9oaA@mail.gmail.com
Whole thread Raw
In response to Re: Expanding HOT updates for expression and partial indexes  ("Burd, Greg" <gregburd@amazon.com>)
List pgsql-hackers
On Thu, 6 Mar 2025 at 13:40, Burd, Greg <gregburd@amazon.com> wrote:
>
> > On Mar 5, 2025, at 6:39 PM, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
> >
> > On Wed, 5 Mar 2025 at 18:21, Burd, Greg <gregburd@amazon.com> wrote:
> >> * augments IndexInfo only when needed for testing expressions and only once
> >
> > ExecExpressionIndexesUpdated seems to always loop over all indexes,
> > always calling AttributeIndexInfo which always updates the fields in
> > the IndexInfo when the index has only !byval attributes (e.g. text,
> > json, or other such varlena types). You say it happens only once, have
> > I missed something?
>
> There's a test that avoids doing it more than once, [...]

Is this that one?

+    if (indexInfo->ii_IndexAttrByVal)
+        return indexInfo;

I think that test doesn't work consistently: a bitmapset * is NULL
when no bits are set; and for some indexes no attribute will be byval,
thus failing this early-exit even after processing.

Another small issue with this approach is that it always calls and
tests in EEIU(), while it's quite likely we would do better if we
pre-processed _all_ indexes at once, so that we can have a path that
doesn't repeatedly get into EEIU only to exit immediately after. It'll
probably be hot enough to not matter much, but it's still cycles spent
on something that we can optimize for in code.

> >> * retains existing summarized index HOT update logic
> >
> > Great, thanks!
> >
> > Kind regards,
> >
> > Matthias van de Meent
> > Neon (https://neon.tech)
>
> I might widen this patch a bit to include support for testing equality of index tuples using custom operators when
theyexist for the index.  In the use case I'm solving for we use a custom operator for equality that is not the same as
amemcmp().  Do you have thoughts on that? 

I don't think that's a very great idea. From a certain point of view,
you can see HOT as "deduplicating multiple tuple versions behind a
single TID". Btree doesn't support deduplication for types that can
have more than one representation of the same value so that e.g.
'0.0'::numeric and '0'::numeric are both displayed correctly, even
when they compare as equal according to certain equality operators.

So, I don't think that's worth investing time into right now. Maybe in
the future if there are new discoveries about what we can and cannot
deduplicate, but I don't think it should be part of an MVP or 1.0.


Kind regards,

Matthias van de Meent



pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: Log connection establishment timings
Next
From: Michail Nikolaev
Date:
Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements