Thread: Planned simplification of catalog index updates

Planned simplification of catalog index updates

From

Tom Lane

Date:

02 August 2002, 20:41:32

I count about seventy occurrences of this code pattern:
   /* keep system catalog indices current */   if (RelationGetForm(pg_rewrite_desc)->relhasindex)   {       Relation
idescs[Num_pg_rewrite_indices];
       CatalogOpenIndices(Num_pg_rewrite_indices, Name_pg_rewrite_indices,                          idescs);
CatalogIndexInsert(idescs,Num_pg_rewrite_indices, pg_rewrite_desc,                          ruletup);
CatalogCloseIndices(Num_pg_rewrite_indices,idescs);   }
 

I believe this could be simplified to something like
   CatalogUpdateIndexes(Relation, HeapTuple, indexnamelist_constant,         indexcount_constant);

with essentially no speed penalty.

An even more radical approach is to get rid of the hardwired index name
lists in indexing.h, and instead expect CatalogOpenIndices to make use
of the index OID lists that are maintained by the relcache (since 7.1 or
so).  Then the typical call would reduce to
   CatalogUpdateIndexes(Relation, HeapTuple);

This would simplify development/maintenance at the cost of a small
amount of CPU time building the index OID list whenever it wasn't
already cached.  (OTOH ... I'm unsure whether opening an index by OID
is any faster than opening it by name, but it's certainly plausible that
it might be --- so we could find we buy back the time spent building
relcache index lists by making the actual index open step quicker.)

Comments?  I want to do the first step in any case, but I'm not sure
about eliminating the index name lists.
        regards, tom lane

Re: Planned simplification of catalog index updates

From

Rod Taylor

Date:

02 August 2002, 21:15:04

> An even more radical approach is to get rid of the hardwired index name
> lists in indexing.h, and instead expect CatalogOpenIndices to make use
> of the index OID lists that are maintained by the relcache (since 7.1 or
> so).  Then the typical call would reduce to
> 
>     CatalogUpdateIndexes(Relation, HeapTuple);

This would be great.  Anyway to take it one step further and make it
transparent?  Hide it in heap_insert / update?

Re: Planned simplification of catalog index updates

From

Bruce Momjian

Date:

02 August 2002, 21:46:33

Tom Lane wrote:
> This would simplify development/maintenance at the cost of a small
> amount of CPU time building the index OID list whenever it wasn't
> already cached.  (OTOH ... I'm unsure whether opening an index by OID
> is any faster than opening it by name, but it's certainly plausible that
> it might be --- so we could find we buy back the time spent building
> relcache index lists by making the actual index open step quicker.)
> 
> Comments?  I want to do the first step in any case, but I'm not sure
> about eliminating the index name lists.

All your changes make sense to me.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Planned simplification of catalog index updates

From

Tom Lane

Date:

03 August 2002, 00:09:59

Rod Taylor <rbt@zort.ca> writes:
>> Then the typical call would reduce to
>> 
>> CatalogUpdateIndexes(Relation, HeapTuple);

> This would be great.  Anyway to take it one step further and make it
> transparent?  Hide it in heap_insert / update?

No, that would be quite inappropriate.  The control paths that we're
talking about here insert or update only one tuple per transaction.
We do *not* want to do (a) open all indexes, (b) process one tuple,
(c) close all indexes in the performance-critical paths where many
tuples are processed per transaction.  (Even in the paths that use
CatalogOpenIndexes, you wouldn't reduce it to a single call in
the routines that insert multiple tuples per call.)
        regards, tom lane

Re: Planned simplification of catalog index updates

From

Tom Lane

Date:

04 August 2002, 18:34:15

I said:
> An even more radical approach is to get rid of the hardwired index name
> lists in indexing.h, and instead expect CatalogOpenIndices to make use
> of the index OID lists that are maintained by the relcache (since 7.1 or
> so).  Then the typical call would reduce to
>     CatalogUpdateIndexes(Relation, HeapTuple);
> This would simplify development/maintenance at the cost of a small
> amount of CPU time building the index OID list whenever it wasn't
> already cached.  (OTOH ... I'm unsure whether opening an index by OID
> is any faster than opening it by name, but it's certainly plausible that
> it might be --- so we could find we buy back the time spent building
> relcache index lists by making the actual index open step quicker.)

Indeed, it seems this *is* faster.  I used the regression tests as a
crude benchmark --- pg_bench wouldn't help since it doesn't do any
catalog updates in the inner loop.  Over ten runs of "make installcheck"
on a RH 7.2 system, yesterday's CVS tip gave the following results for
elapsed real time in seconds:
 min  |  max  |       avg        |      stddev       
-------+-------+------------------+-------------------26.18 | 28.12 | 27.3590909090909 | 0.767247737637082

With modifications to do all system catalog index updates as above,
I instead got:
min  |  max  |  avg   |      stddev       
------+-------+--------+-------------------24.3 | 26.72 | 25.833 | 0.674372959784605

So it seems to be a fairly reliable 5% win on the regression tests,
on top of being a lot simpler and more maintainable.

I'm pretty pleased, and will be committing this as soon as CVS tip
is back to a non-broken state ...
        regards, tom lane

Re: Planned simplification of catalog index updates

From

Joe Conway

Date:

04 August 2002, 19:29:42

Tom Lane wrote:
> So it seems to be a fairly reliable 5% win on the regression tests,
> on top of being a lot simpler and more maintainable.
> 
> I'm pretty pleased, and will be committing this as soon as CVS tip
> is back to a non-broken state ...
> 

Nice work!

Joe

Re: Planned simplification of catalog index updates

From

Bruce Momjian

Date:

04 August 2002, 21:13:08

Wow, that is a huge difference for such a small change; makes you
wonder what other optimizations are sitting out there.

---------------------------------------------------------------------------

Tom Lane wrote:
> I said:
> > An even more radical approach is to get rid of the hardwired index name
> > lists in indexing.h, and instead expect CatalogOpenIndices to make use
> > of the index OID lists that are maintained by the relcache (since 7.1 or
> > so).  Then the typical call would reduce to
> >     CatalogUpdateIndexes(Relation, HeapTuple);
> > This would simplify development/maintenance at the cost of a small
> > amount of CPU time building the index OID list whenever it wasn't
> > already cached.  (OTOH ... I'm unsure whether opening an index by OID
> > is any faster than opening it by name, but it's certainly plausible that
> > it might be --- so we could find we buy back the time spent building
> > relcache index lists by making the actual index open step quicker.)
> 
> Indeed, it seems this *is* faster.  I used the regression tests as a
> crude benchmark --- pg_bench wouldn't help since it doesn't do any
> catalog updates in the inner loop.  Over ten runs of "make installcheck"
> on a RH 7.2 system, yesterday's CVS tip gave the following results for
> elapsed real time in seconds:
> 
>   min  |  max  |       avg        |      stddev       
> -------+-------+------------------+-------------------
>  26.18 | 28.12 | 27.3590909090909 | 0.767247737637082
> 
> With modifications to do all system catalog index updates as above,
> I instead got:
> 
>  min  |  max  |  avg   |      stddev       
> ------+-------+--------+-------------------
>  24.3 | 26.72 | 25.833 | 0.674372959784605
> 
> So it seems to be a fairly reliable 5% win on the regression tests,
> on top of being a lot simpler and more maintainable.
> 
> I'm pretty pleased, and will be committing this as soon as CVS tip
> is back to a non-broken state ...
> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026