Thread: Identifying optimizer usage of indexed expressions
Hello,
I have a Postgres database where many similar expression-based indexes have been added by a script. It is quite likely that some of the indexes are not actually used, and I would like to remove these unused indexes.
I can identify indexes that are used directly during query execution using pg_stat_user_indexes. However, it is possible that the optimizer might be using the statistics on the indexed expressions to guide query planning. If that is the case then I might remove an index which appears to be unused, only to then find that this results in the optimizer choosing poorer plans as it no longer has the additional statistics on the indexed expressions.
How can I identify indexes which are used by the optimizer when planning? Are there any relevant internal counters similar to pg_stat_user_indexes? Or an extension which provides this information?
If there is nothing available inside the database, I was thinking I would have to parse all of the queries executed on the database, extract the expressions, work out which columns of which tables are referenced by those expressions, and then check whether those expressions are indexed. Or (since that seems like a lot of work), I could remove all indexes which are reported as unused by pg_stat_user_indexes, but for each and every one of them create extended statistics on the relevant expression(s) using CREATE STATISTICS. That might result in a lot of unnecessary expression statistics - are there any significant costs associated with that? The docs say "providing benefits similar to an expression index without the overhead of index maintenance" [1], and don't mention any downside.
Lastly, please could someone point me at the part of the Postgres source code that handles index expressions in the optimizer?
Thank you
Tim
Tim
On Mon, 2024-02-19 at 23:08 +0000, Tim Palmer wrote: > I can identify indexes that are used directly during query execution using > pg_stat_user_indexes. However, it is possible that the optimizer might be > using the statistics on the indexed expressions to guide query planning. > If that is the case then I might remove an index which appears to be unused, > only to then find that this results in the optimizer choosing poorer plans > as it no longer has the additional statistics on the indexed expressions. > > How can I identify indexes which are used by the optimizer when planning? I don't think you can. > If there is nothing available inside the database, I was thinking I would > have to parse all of the queries executed on the database, extract the > expressions, work out which columns of which tables are referenced by > those expressions, and then check whether those expressions are indexed. Well, in an ideal world, you should know which indexes you created for which query. But then indexes might be used in ways you didn't intend. > Or (since that seems like a lot of work), I could remove all indexes which > are reported as unused by pg_stat_user_indexes, but for each and every one > of them create extended statistics on the relevant expression(s) using > CREATE STATISTICS. That might result in a lot of unnecessary expression > statistics - are there any significant costs associated with that? The > docs say "providing benefits similar to an expression index without the > overhead of index maintenance" [1], and don't mention any downside. The small downside is that ANALYZE will have more work, the big downside is that query planning on tables with advanced statistics takes slightly longer. That may or may not be a problem for you, but if you could pay the price of the extra indexes, you will certainly be able to pay the price of advanced statistics. You could also consider the option to drop indexes and see if somebody complains, or watch out for changes in pg_stat_statements. Yours, Laurenz Albe