Re: brininsert optimization opportunity - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: brininsert optimization opportunity
Date
Msg-id f412087a-6ef9-4b84-941a-e42f43150070@enterprisedb.com
Whole thread Raw
In response to Re: brininsert optimization opportunity  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: brininsert optimization opportunity
List pgsql-hackers
On 1/8/24 16:51, Alvaro Herrera wrote:
> On 2023-Dec-12, Tomas Vondra wrote:
> 
>> I propose we do a much simpler thing instead - allow the cache may be
>> initialized / cleaned up repeatedly, and make sure it gets reset at
>> convenient place (typically after index_insert calls that don't go
>> through execIndexing). That'd mean the cleanup does not need to happen
>> very far from the index_insert(), which makes the reasoning much easier.
>> 0002 does this.
> 
> I'm not in love with this 0002 patch; I think the layering after 0001 is
> correct in that the insert_cleanup call should remain in validate_index
> and called after the whole thing is done, but 0002 changes things so
> that now every table AM has to worry about doing this correctly; and a
> bug of omission will not be detected unless you have a BRIN index on
> such a table and happen to use CREATE INDEX CONCURRENTLY.  So a
> developer has essentially zero chance to do things correctly, which I
> think we'd rather avoid.
> 

True. If the AM code does not need to worry about this kind of stuff,
that would be good / less error prone.

One thing that is not very clear to me is that I don't think there's a
very good way to determine which places need the cleanup call. Because
it depends on (a) what kind of index is used and (b) what happens in the
code called earlier (which may easily do arbitrary stuff). Which means
we have to call the cleanup whenever the code *might* have done inserts
into the index. Maybe it's not such an issue in practice, though.

> So I think we should do 0001 and perhaps some further tweaks to the
> original brininsert optimization commit: I think the aminsertcleanup
> callback should receive the indexRelation as first argument; and also I
> think it's not index_insert_cleanup() job to worry about whether
> ii_AmCache is NULL or not, but instead the callback should be invoked
> always, and then it's aminsertcleanup job to do nothing if ii_AmCache is
> NULL.  That way, the index AM API doesn't have to worry about which
> parts of IndexInfo (or the indexRelation) is aminsertcleanup going to
> care about.  If we decide to change this, then the docs also need a bit
> of tweaking I think.
> 

Yeah, passing the indexRelation to the am callback seems reasonable.
It's more consistent what we do for other callbacks, and perhaps the
callback might need the indexRelation.

I don't quite see why we should invoke the callback with ii_AmCache=NULL
though. If there's nothing cached, why bother? It just means all cleanup
callbacks have to do this NULL check on their own.

> Lastly, I kinda disagree with the notion that only some of the callers
> of aminsert should call aminsertcleanup, even though btree doesn't have
> an aminsertcleanup and thus it can't affect TOAST or catalogs.  Maybe we
> can turn index_insert_cleanup into an inline function, which can quickly
> do nothing if aminsertcleanup isn't defined.  Then we no longer have the
> layering violation where we assume that btree doesn't care.  But the
> proposed change in this paragraph can be maybe handled separately to
> avoid confusing things with the bugfix in the two paragraphs above.
> 

After thinking about this a bit more I agree with you - we should call
the cleanup from each place calling aminsert, even if it's for nbtree
(or other index types that don't require cleanup at the moment).

I wonder if there's a nice way to check this in assert-enabled builds?
Could we tweak nbtree (or index AM in general) to check that all places
that called aminsert also called aminsertcleanup?

For example, I was thinking we might add a flag to IndexInfo (separate
from the ii_AmCache), tracking if aminsert() was called, and then later
check the aminsertcleanup() got called too. The problem however is
there's no obviously convenient place for this check, because IndexInfo
is not freed explicitly ...


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Where can I find the doxyfile?
Next
From: Julien Rouhaud
Date:
Subject: Re: Small fix on query_id_enabled