Re: Index of a table is not used (in any case) - Mailing list pgsql-hackers
From | mlw |
---|---|
Subject | Re: Index of a table is not used (in any case) |
Date | |
Msg-id | 3BD6BB36.6B85376E@mohawksoft.com Whole thread Raw |
In response to | Re: Index of a table is not used (in any case) ("Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at>) |
Responses |
Re: Index of a table is not used (in any case)
Re: Index of a table is not used (in any case) |
List | pgsql-hackers |
Zeugswetter Andreas SB SD wrote: > > > > > *very* slow, due to seq scan on > > > > 20 million entries, which is a test setup up to now) > > > > > > Perennial first question: did you VACUUM ANALYZE? > > > > Can there, or could there, be a notion of "rule based" optimization of > > queries in PostgreSQL? The "not using index" problem is probably the > most > > common and most misunderstood problem. > > There is a (sort of) rule based behavior in PostgreSQL, > the down side of the current implementation is, that certain > other commands than ANALYZE (e.g. "create index") partly update > optimizer statistics. This is bad behavior, since then only part > of the statistics are accurate. Statistics always have to be seen > in context to other table's and other index'es statistics. > > Thus, currently the rule based optimizer only works if you create > the indexes on empty tables (before loading data), which obviously > has downsides. Else you have no choice but to ANALYZE frequently. > > I have tried hard to fight for this pseudo rule based behavior, > but was only partly successful in convincing core. My opinion is, > that (unless runtime statistics are kept) no other command than > ANALYZE should be allowed to touch optimizer relevant statistics > (maybe unless explicitly told to). Perhaps there could be an extension to ANALYZE, i.e. ANALYZE RULEBASED tablename that would restore or recalculate the state that a table would be if all indexes were created from scratch? The "not using index" was very frustrating to understand. The stock answer, "did you vacuum?" just isn't enough. There has to be some explanation (in the FAQ or something) about the indexed key distribution in your data. Postgres' statistics are pretty poor too, a relative few very populous entries in a table will make it virtually impossible for the cost based optimizer (CBO) to use an index. At my site we have lots of tables that have many duplicate items in an index. It is a music based site and has a huge amount of "Various Artists" entries. No matter what we do, there is NO way to get Postgres to use the index from the query alone. We have over 20 thousand artists, but 5 "Various Artists" or "Soundtrack" entries change the statistics so much that they exclude an index scan. We have to run the system with sequential scan disabled. Running with seq disabled eliminates the usefulness of the CBO because when it is a justified table scan, it does an index scan. I have approached this windmill before and a bit regretful at bringing it up again, but it is important, very important. There needs to be a way to direct the optimizer about how to optimize the query. Using "set foo=bar" prior to a query is not acceptable. Web sites use persistent connections to the databases and since "set" can not be restored, you override global settings for the session, or have to code, in the web page, the proper default setting. The result is either that different web processes will behave differently depending on the order in which they execute queries, or you have to have your DBA write web pages. A syntax like: select * from table where /* enable_seqscan = false */ key = 'value'; Would be great in that you could tune the optimizer as long as the settings were for the clause directly following the directive, without affecting the state of the session or transaction. For instance: select id from t1, t2 where /* enable_seqscan = false */ t1.key = 'value' and t2.key = 'test' and t1.id = t2.id; The where "t1.key = 'value'" condition would be prohibited from using a sequntial scan, while the "t2.key = 'test'" would use it if it made sense. Is this possible?
pgsql-hackers by date: