Re: Inconsistant use of index. - Mailing list pgsql-bugs
From | Ron Mayer |
---|---|
Subject | Re: Inconsistant use of index. |
Date | |
Msg-id | Pine.LNX.4.33.0204031108250.24107-100000@ron Whole thread Raw |
In response to | Re: Inconsistant use of index. (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Inconsistant use of index.
|
List | pgsql-bugs |
On Tue, 26 Mar 2002, Tom Lane wrote: > Ron Mayer <ron@intervideo.com> writes: > >> I'm particularly interested in the correlation estimate for the dat > >> column. (Would you happen to have an idea whether the data has been > >> inserted more-or-less in dat order?) > > > I beleve much of February was loaded first, then we back-filled January, > > and daily I've been adding March's results. I don't believe the index-usage > > stopped when we did the january fill... something happend a few days ago after > > a pretty routine daily load. > > The correlation estimate for dat is pretty low (0.086088), which I think > reflects the fact that on a macro level your data is not very orderly > (Feb/Jan/Mar). However, if it's been loaded on a daily basis then the > records for any single day will be together --- which is why the indexed > probe for a single day is so fast. I don't see any way that we can > expect the system to model this effect with only one ordering-correlation > number :-( ... so a proper fix will have to wait for some future release > when we can think about having more extensive stats about ordering. > > In the meantime, it would be interesting to see if re-establishing the > big-picture order correlation would persuade the planner to do the right > thing. Could you do something like this: > > CREATE TABLE foo AS SELECT * FROM fact ORDER BY dat; > TRUNCATE TABLE fact; > INSERT INTO fact SELECT * FROM foo; > DROP TABLE foo; > VACUUM ANALYZE fact; > > (this should leave you in a state where pg_stats shows correlation 1.0 > for fact.dat) and then see what you get from EXPLAIN? > > regards, tom lane I did quite a bit more playing with this, and no matter what the correlation was (1, -0.001), it never seemed to have any effect at all on the execution plan. Should it? With a high correlation the index scan is a much better choice. Ron --- --- create the table with a correlation of "1". --- logs2=# CREATE TABLE fact_by_dat AS SELECT * FROM fact ORDER BY dat; SELECT logs2=# CREATE INDEX fact_by_dat__dat ON fact_by_dat(dat); CREATE logs2=# vacuum analyze fact_by_dat; VACUUM logs2=# select correlation from pg_stats where tablename='fact_by_dat' and attname='dat'; correlation ------------- 1 (1 row) --- --- Still does the "Seq Scan" --- logs2=# explain analyze select count(*) from fact_by_dat where dat='2002-03-01'; NOTICE: QUERY PLAN: Aggregate (cost=380347.31..380347.31 rows=1 width=0) (actual time=77785.14..77785.14 rows=1 loops=1) -> Seq Scan on fact (cost=0.00..379816.25 rows=212423 width=0) (actual time=20486.16..77420.05 rows=180295 loops=1) Total runtime: 77785.28 msec EXPLAIN --- --- Disable Seq Scan... 30 times faster. --- logs2=# set enable_seqscan to off; SET VARIABLE logs2=# explain analyze select count(*) from fact_by_dat where dat='2002-03-01'; NOTICE: QUERY PLAN: Aggregate (cost=5502.57..5502.57 rows=1 width=0) (actual time=2671.20..2671.20 rows=1 loops=1) -> Index Scan using fact_by_dat__dat on fact_by_dat (cost=0.00..4974.99 rows=211036 width=0) (actual time=90.24..2339.64 rows=180295 loops=1) Total runtime: 2693.87 msec
pgsql-bugs by date: