Re: brininsert optimization opportunity - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: brininsert optimization opportunity |
Date | |
Msg-id | f412087a-6ef9-4b84-941a-e42f43150070@enterprisedb.com Whole thread Raw |
In response to | Re: brininsert optimization opportunity (Alvaro Herrera <alvherre@alvh.no-ip.org>) |
Responses |
Re: brininsert optimization opportunity
|
List | pgsql-hackers |
On 1/8/24 16:51, Alvaro Herrera wrote: > On 2023-Dec-12, Tomas Vondra wrote: > >> I propose we do a much simpler thing instead - allow the cache may be >> initialized / cleaned up repeatedly, and make sure it gets reset at >> convenient place (typically after index_insert calls that don't go >> through execIndexing). That'd mean the cleanup does not need to happen >> very far from the index_insert(), which makes the reasoning much easier. >> 0002 does this. > > I'm not in love with this 0002 patch; I think the layering after 0001 is > correct in that the insert_cleanup call should remain in validate_index > and called after the whole thing is done, but 0002 changes things so > that now every table AM has to worry about doing this correctly; and a > bug of omission will not be detected unless you have a BRIN index on > such a table and happen to use CREATE INDEX CONCURRENTLY. So a > developer has essentially zero chance to do things correctly, which I > think we'd rather avoid. > True. If the AM code does not need to worry about this kind of stuff, that would be good / less error prone. One thing that is not very clear to me is that I don't think there's a very good way to determine which places need the cleanup call. Because it depends on (a) what kind of index is used and (b) what happens in the code called earlier (which may easily do arbitrary stuff). Which means we have to call the cleanup whenever the code *might* have done inserts into the index. Maybe it's not such an issue in practice, though. > So I think we should do 0001 and perhaps some further tweaks to the > original brininsert optimization commit: I think the aminsertcleanup > callback should receive the indexRelation as first argument; and also I > think it's not index_insert_cleanup() job to worry about whether > ii_AmCache is NULL or not, but instead the callback should be invoked > always, and then it's aminsertcleanup job to do nothing if ii_AmCache is > NULL. That way, the index AM API doesn't have to worry about which > parts of IndexInfo (or the indexRelation) is aminsertcleanup going to > care about. If we decide to change this, then the docs also need a bit > of tweaking I think. > Yeah, passing the indexRelation to the am callback seems reasonable. It's more consistent what we do for other callbacks, and perhaps the callback might need the indexRelation. I don't quite see why we should invoke the callback with ii_AmCache=NULL though. If there's nothing cached, why bother? It just means all cleanup callbacks have to do this NULL check on their own. > Lastly, I kinda disagree with the notion that only some of the callers > of aminsert should call aminsertcleanup, even though btree doesn't have > an aminsertcleanup and thus it can't affect TOAST or catalogs. Maybe we > can turn index_insert_cleanup into an inline function, which can quickly > do nothing if aminsertcleanup isn't defined. Then we no longer have the > layering violation where we assume that btree doesn't care. But the > proposed change in this paragraph can be maybe handled separately to > avoid confusing things with the bugfix in the two paragraphs above. > After thinking about this a bit more I agree with you - we should call the cleanup from each place calling aminsert, even if it's for nbtree (or other index types that don't require cleanup at the moment). I wonder if there's a nice way to check this in assert-enabled builds? Could we tweak nbtree (or index AM in general) to check that all places that called aminsert also called aminsertcleanup? For example, I was thinking we might add a flag to IndexInfo (separate from the ii_AmCache), tracking if aminsert() was called, and then later check the aminsertcleanup() got called too. The problem however is there's no obviously convenient place for this check, because IndexInfo is not freed explicitly ... regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: