Re: Reducing planning time on tables with many indexes - Mailing list pgsql-hackers

From David Geier
Subject Re: Reducing planning time on tables with many indexes
Date
Msg-id CAPsAnrmz8qeSEGNyfePqY3hy8L9q15idqaaM6dAs4kbXmNggug@mail.gmail.com
Whole thread Raw
In response to Re: Reducing planning time on tables with many indexes  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Reducing planning time on tables with many indexes
List pgsql-hackers
Hi Tom,

On Wed, Jul 27, 2022 at 7:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I wrote:
> Unfortunately, as things stand today, the planner needs more than the
> right to look at the indexes' schemas, because it makes physical accesses
> to btree indexes to find out their tree height (and I think there are some
> comparable behaviors in other AMs).  I've never particularly cared for
> that implementation, and would be glad to rip out that behavior if we can
> find another way.  Maybe we could persuade VACUUM or ANALYZE to store that
> info in the index's pg_index row, or some such, and then the planner
> could use it with no lock?
It seems like _bt_getrootheight() first checks if the height is cached and only if it isn't it accesses index meta pages.
If the index locks are only taken for the sake of _bt_getrootheight() accessing index meta pages in case they are not cached, maybe the index locks could be taken conditionally.
However, postponing the call to where it is really needed sounds even better.

A first step here could just be to postpone fetching _bt_getrootheight()
until we actually need it during cost estimation.  That would avoid the
need to do it at all for indexes that indxpath.c discards as irrelevant,
which is a decision made on considerably more information than the
proposed patch uses.

Having done that, you could look into revising plancat.c to fill the
IndexOptInfo structs from catcache entries instead of opening the
index per se.  (You'd have to also make sure that the appropriate
index locks are acquired eventually, for indexes the query does use
at runtime.  I think that's the case, but I'm not sure if anything
downstream has been optimized on the assumption the planner did it.)

I can give this a try.
That way we would get rid of the scalability issues.
However, what about the runtime savings observed with a single query stream?
In that case there is no contention, so it seems like having less indexes to look at further down the road, also yields substantial savings.
Any clue where exactly these savings might come from? Or is it actually the calls to _bt_getrootheight()? I can also do a few perf runs to track that down.
 
This'd probably get us a large part of the way there.  Further
optimization of acquisition of tree height etc could be an
optional follow-up.

                        regards, tom lane

pgsql-hackers by date:

Previous
From: "Euler Taveira"
Date:
Subject: Re: logical replication restrictions
Next
From: Zhang Mingli
Date:
Subject: [feature]COPY FROM enable FORCE_NULL/FORCE_NOT_NULL on all columns