Re: Reducing planning time on tables with many indexes - Mailing list pgsql-hackers
From | Luc Vlaming Hummel |
---|---|
Subject | Re: Reducing planning time on tables with many indexes |
Date | |
Msg-id | 211D23CD-F7BF-482A-95A5-B0FE1C572B99@servicenow.com Whole thread Raw |
In response to | Re: Reducing planning time on tables with many indexes (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On 27.07.22, 18:39, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: [External Email] David Geier <geidav.pg@gmail.com> writes: > We tracked down the root cause of this slowdown to lock contention in > 'get_relation_info()'. The index lock of every single index of every single > table used in that query is acquired. We attempted a fix by pre-filtering > out all indexes that anyways cannot be used with a certain query, without > taking the index locks (credits to Luc Vlaming for idea and > implementation). The patch does so by caching the columns present in every > index, inside 'struct Relation', similarly to 'rd_indexlist'. I wonder how much thought you gave to the costs imposed by that extra cache space. We have a lot of users who moan about relcache bloat already. But more to the point, I do not buy the assumption that an index's set of columns is a good filter for which indexes are of interest. A trivial counterexample from the regression database is regression=# explain select count(*) from tenk1; QUERY PLAN -------------------------------------------------------------------------------- ------------ Aggregate (cost=219.28..219.29 rows=1 width=8) -> Index Only Scan using tenk1_hundred on tenk1 (cost=0.29..194.28 rows=100 00 width=0) (2 rows) It looks to me like the patch also makes unwarranted assumptions about being able to discard all but the smallest index having a given set of columns. This would, for example, possibly lead to dropping the index that has the most useful sort order, or that has the operator class needed to support a specific WHERE clause. Thanks for checking out the patch! Just to make sure we're on the same page: we're only making this assumption if you select no fields at all. If you select any fields at all it will check for column overlap, and if there's any overlap with any referenced field, then the index will not be filtered out. For producing a row count with no referenced fields it is true that it should select the truly cheapest index to produce the row count and there should be some Index-am callback introduced for that. For now it was just a quick-and-dirty solution. Wouldn't a callback that would estimate the amount of data read be good enough though? For sort orders the field to sort by should be listed and hence the index should not be filtered out, or what am I missing? Likely I've missed some fields that are referenced somehow (potentially indirectly), but that shouldn't disqualify the approach completely. In short, I'm not sure I buy this concept at all. I think it might be more useful to attack the locking overhead more directly. I kind of wonder why we need per-index locks at all during planning --- I think that we already interpret AccessShareLock on the parent table as being sufficient to block schema changes on existing indexes. Could you elaborate as to why this approach is not good enough? To me it seems that avoiding work ahead of time is generally useful. Or are you worried that we remove too much? Unfortunately, as things stand today, the planner needs more than the right to look at the indexes' schemas, because it makes physical accesses to btree indexes to find out their tree height (and I think there are some comparable behaviors in other AMs). I've never particularly cared for that implementation, and would be glad to rip out that behavior if we can find another way. Maybe we could persuade VACUUM or ANALYZE to store that info in the index's pg_index row, or some such, and then the planner could use it with no lock? regards, tom lane The thing you're touching on is specific for a btree. Not sure this generalizes to all index types that are out there though? I could see there being some property that allows you to be "no-lock", and then a callback that allows you to cache some generic data that can be transformed when the indexopt info structs are filled. Is that roughly what you have in mind? Best, Luc
pgsql-hackers by date: