Thread: Planned simplification of catalog index updates
I count about seventy occurrences of this code pattern: /* keep system catalog indices current */ if (RelationGetForm(pg_rewrite_desc)->relhasindex) { Relation idescs[Num_pg_rewrite_indices]; CatalogOpenIndices(Num_pg_rewrite_indices, Name_pg_rewrite_indices, idescs); CatalogIndexInsert(idescs,Num_pg_rewrite_indices, pg_rewrite_desc, ruletup); CatalogCloseIndices(Num_pg_rewrite_indices,idescs); } I believe this could be simplified to something like CatalogUpdateIndexes(Relation, HeapTuple, indexnamelist_constant, indexcount_constant); with essentially no speed penalty. An even more radical approach is to get rid of the hardwired index name lists in indexing.h, and instead expect CatalogOpenIndices to make use of the index OID lists that are maintained by the relcache (since 7.1 or so). Then the typical call would reduce to CatalogUpdateIndexes(Relation, HeapTuple); This would simplify development/maintenance at the cost of a small amount of CPU time building the index OID list whenever it wasn't already cached. (OTOH ... I'm unsure whether opening an index by OID is any faster than opening it by name, but it's certainly plausible that it might be --- so we could find we buy back the time spent building relcache index lists by making the actual index open step quicker.) Comments? I want to do the first step in any case, but I'm not sure about eliminating the index name lists. regards, tom lane
> An even more radical approach is to get rid of the hardwired index name > lists in indexing.h, and instead expect CatalogOpenIndices to make use > of the index OID lists that are maintained by the relcache (since 7.1 or > so). Then the typical call would reduce to > > CatalogUpdateIndexes(Relation, HeapTuple); This would be great. Anyway to take it one step further and make it transparent? Hide it in heap_insert / update?
Tom Lane wrote: > This would simplify development/maintenance at the cost of a small > amount of CPU time building the index OID list whenever it wasn't > already cached. (OTOH ... I'm unsure whether opening an index by OID > is any faster than opening it by name, but it's certainly plausible that > it might be --- so we could find we buy back the time spent building > relcache index lists by making the actual index open step quicker.) > > Comments? I want to do the first step in any case, but I'm not sure > about eliminating the index name lists. All your changes make sense to me. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Rod Taylor <rbt@zort.ca> writes: >> Then the typical call would reduce to >> >> CatalogUpdateIndexes(Relation, HeapTuple); > This would be great. Anyway to take it one step further and make it > transparent? Hide it in heap_insert / update? No, that would be quite inappropriate. The control paths that we're talking about here insert or update only one tuple per transaction. We do *not* want to do (a) open all indexes, (b) process one tuple, (c) close all indexes in the performance-critical paths where many tuples are processed per transaction. (Even in the paths that use CatalogOpenIndexes, you wouldn't reduce it to a single call in the routines that insert multiple tuples per call.) regards, tom lane
I said: > An even more radical approach is to get rid of the hardwired index name > lists in indexing.h, and instead expect CatalogOpenIndices to make use > of the index OID lists that are maintained by the relcache (since 7.1 or > so). Then the typical call would reduce to > CatalogUpdateIndexes(Relation, HeapTuple); > This would simplify development/maintenance at the cost of a small > amount of CPU time building the index OID list whenever it wasn't > already cached. (OTOH ... I'm unsure whether opening an index by OID > is any faster than opening it by name, but it's certainly plausible that > it might be --- so we could find we buy back the time spent building > relcache index lists by making the actual index open step quicker.) Indeed, it seems this *is* faster. I used the regression tests as a crude benchmark --- pg_bench wouldn't help since it doesn't do any catalog updates in the inner loop. Over ten runs of "make installcheck" on a RH 7.2 system, yesterday's CVS tip gave the following results for elapsed real time in seconds: min | max | avg | stddev -------+-------+------------------+-------------------26.18 | 28.12 | 27.3590909090909 | 0.767247737637082 With modifications to do all system catalog index updates as above, I instead got: min | max | avg | stddev ------+-------+--------+-------------------24.3 | 26.72 | 25.833 | 0.674372959784605 So it seems to be a fairly reliable 5% win on the regression tests, on top of being a lot simpler and more maintainable. I'm pretty pleased, and will be committing this as soon as CVS tip is back to a non-broken state ... regards, tom lane
Tom Lane wrote: > So it seems to be a fairly reliable 5% win on the regression tests, > on top of being a lot simpler and more maintainable. > > I'm pretty pleased, and will be committing this as soon as CVS tip > is back to a non-broken state ... > Nice work! Joe
Wow, that is a huge difference for such a small change; makes you wonder what other optimizations are sitting out there. --------------------------------------------------------------------------- Tom Lane wrote: > I said: > > An even more radical approach is to get rid of the hardwired index name > > lists in indexing.h, and instead expect CatalogOpenIndices to make use > > of the index OID lists that are maintained by the relcache (since 7.1 or > > so). Then the typical call would reduce to > > CatalogUpdateIndexes(Relation, HeapTuple); > > This would simplify development/maintenance at the cost of a small > > amount of CPU time building the index OID list whenever it wasn't > > already cached. (OTOH ... I'm unsure whether opening an index by OID > > is any faster than opening it by name, but it's certainly plausible that > > it might be --- so we could find we buy back the time spent building > > relcache index lists by making the actual index open step quicker.) > > Indeed, it seems this *is* faster. I used the regression tests as a > crude benchmark --- pg_bench wouldn't help since it doesn't do any > catalog updates in the inner loop. Over ten runs of "make installcheck" > on a RH 7.2 system, yesterday's CVS tip gave the following results for > elapsed real time in seconds: > > min | max | avg | stddev > -------+-------+------------------+------------------- > 26.18 | 28.12 | 27.3590909090909 | 0.767247737637082 > > With modifications to do all system catalog index updates as above, > I instead got: > > min | max | avg | stddev > ------+-------+--------+------------------- > 24.3 | 26.72 | 25.833 | 0.674372959784605 > > So it seems to be a fairly reliable 5% win on the regression tests, > on top of being a lot simpler and more maintainable. > > I'm pretty pleased, and will be committing this as soon as CVS tip > is back to a non-broken state ... > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026