Home > mailing lists

Re: Reducing planning time on tables with many indexes - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Reducing planning time on tables with many indexes
Date	July 27, 2022 16:39:02
Msg-id	3002655.1658939942@sss.pgh.pa.us Whole thread Raw
In response to	Reducing planning time on tables with many indexes (David Geier <geidav.pg@gmail.com>)
Responses	Re: Reducing planning time on tables with many indexes Re: Reducing planning time on tables with many indexes Re: Reducing planning time on tables with many indexes
List	pgsql-hackers

Tree view

David Geier <geidav.pg@gmail.com> writes:
> We tracked down the root cause of this slowdown to lock contention in
> 'get_relation_info()'. The index lock of every single index of every single
> table used in that query is acquired. We attempted a fix by pre-filtering
> out all indexes that anyways cannot be used with a certain query, without
> taking the index locks (credits to Luc Vlaming for idea and
> implementation). The patch does so by caching the columns present in every
> index, inside 'struct Relation', similarly to 'rd_indexlist'.

I wonder how much thought you gave to the costs imposed by that extra
cache space.  We have a lot of users who moan about relcache bloat
already.  But more to the point, I do not buy the assumption that
an index's set of columns is a good filter for which indexes are of
interest.  A trivial counterexample from the regression database is

regression=# explain select count(*) from tenk1;
                                         QUERY PLAN

--------------------------------------------------------------------------------
------------
 Aggregate  (cost=219.28..219.29 rows=1 width=8)
   ->  Index Only Scan using tenk1_hundred on tenk1  (cost=0.29..194.28 rows=100
00 width=0)
(2 rows)

It looks to me like the patch also makes unwarranted assumptions about
being able to discard all but the smallest index having a given set
of columns.  This would, for example, possibly lead to dropping the
index that has the most useful sort order, or that has the operator
class needed to support a specific WHERE clause.

In short, I'm not sure I buy this concept at all.  I think it might
be more useful to attack the locking overhead more directly.  I kind
of wonder why we need per-index locks at all during planning ---
I think that we already interpret AccessShareLock on the parent table
as being sufficient to block schema changes on existing indexes.

Unfortunately, as things stand today, the planner needs more than the
right to look at the indexes' schemas, because it makes physical accesses
to btree indexes to find out their tree height (and I think there are some
comparable behaviors in other AMs).  I've never particularly cared for
that implementation, and would be glad to rip out that behavior if we can
find another way.  Maybe we could persuade VACUUM or ANALYZE to store that
info in the index's pg_index row, or some such, and then the planner
could use it with no lock?

            regards, tom lane

pgsql-hackers by date:

From: Dilip Kumar
Date: 27 July 2022, 16:37:16
Subject: Re: making relfilenodes 56 bits

From: Tom Lane
Date: 27 July 2022, 17:15:02
Subject: Re: Reducing planning time on tables with many indexes

Re: Reducing planning time on tables with many indexes - Mailing list pgsql-hackers

Previous

Next