Re: Poor index choice -- multiple indexes of the same - Mailing list pgsql-performance
From | Karl O. Pinc |
---|---|
Subject | Re: Poor index choice -- multiple indexes of the same |
Date | |
Msg-id | 1119926211l.11411l.2l@mofo Whole thread Raw |
In response to | Re: Poor index choice -- multiple indexes of the same columns (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: Poor index choice -- multiple indexes of the same
|
List | pgsql-performance |
On 06/27/2005 05:37:41 PM, Josh Berkus wrote: > Karl, > > > Seems to me that when there's a constant value in the query > > and an = comparision it will always be faster to use the (b-tree) > > index that's ordered first by the constant value, as then all > further > > blocks are guarenteed to have a higher relevant information > > density. At least when compared with another index that has the > > same columns in it. > > That really depends on the stats. Such a choice would *not* be > appropriate if the < comparison was expected to return 1- rows while > the = > condition applied to 15% of the table. We're talking internals here so I don't know what I'm talking about, but, when the = comparison returns 15% of the table you can find your way straight to the 1- (sic) relevent rows because that 15% is further sorted by the second column of the index. So that's one disk read and after that when you scan the rest of the blocks every datum is relevant/returned. So your scan will pass through fewer disk blocks. The only case that would make sense to consider using the other index is if the planner knew it could get the answer in 1 disk read, in which case it should be able to get the answer out of either index in one disk read as both indexes are on the same columns. > What is your STATISTICS_TARGET for the relevant columns set to? STATISTICS_TARGET is the default, which I read as 10 the docs. > When's > the last time you ran analyze? I'm doing this in a torture test script, loading data. Every fibnocci number of rows * 100 I VACCUM ANALYZE. So, 100, 200, 300, 500, 800, etc. Just for grins I've created the index I'd like it to use and run VACUUM ANALYZE and shown the EXPLAIN ANALYZE below. > If this is all updated, you want to > post > the pg_stats rows for the relevant columns? Pg_stats rows below. (I've tried to wrap the lines short so as not to mess up anybody's mailer.) # create index census_sname_date on census (sname, date); CREATE INDEX # vacuum analyze census; VACUUM # explain analyze select 1 from census where date < '1975-9-21' and sname = 'RAD' and status != 'A' ; QUERY PLAN --------------------------------------------------------------- --------------------------------------------------------------- ---- Index Scan using census_date_sname on census (cost=0.00..2169.51 rows=1437 width=0) (actual time=40.610..40.610 rows=0 loops=1) Index Cond: ((date < '1975-09-21'::date) AND (sname = 'RAD'::bpchar)) Filter: (status <> 'A'::bpchar) Total runtime: 40.652 ms (4 rows) Compare with: # drop index census_date_sname; DROP INDEX # explain analyze select date from census where sname = 'RAD' and date < '1975-9-21' and status != 'A' limit 1; QUERY PLAN ------------------------------------------------------------------- ------------------------------------------------------------------- Limit (cost=0.00..3.37 rows=1 width=4) (actual time=0.097..0.097 rows=0 loops=1) -> Index Scan using census_sname_date on census (cost=0.00..5203.95 rows=1544 width=4) (actual time=0.094..0.094 rows=0 loops=1) Index Cond: ((sname = 'RAD'::bpchar) AND (date < '1975-09-21'::date)) Filter: (status <> 'A'::bpchar) Total runtime: 0.133 ms (5 rows) # select * from pg_stats where tablename = 'census' and (attname = 'sname' or attname = 'date'); schemaname | tablename | attname | null_frac | avg_width | n_distinct | most_common_vals | most_common_freqs | histogram_bounds | correlation ------------+-----------+---------+-----------+-----------+----------- -+-------------------------------------------------------------------- ---------------------------------------------+------------------------ ---------------------------------------------------------------------- --------------+------------------------------------------------------- ---------------------------------------------------------------------+ ------------- babase | census | date | 0 | 4 | 4687 | {1979-02-01,1976-06-16,1977-03-23,1978-08-25,1979-09-20,1971-06-28 ,1972-04-28,1972-08-27,1974-04-06,1975-03-19} | {0.002,0.00166667,0.00166667,0.00166667,0.00166667,0.00133333 ,0.00133333,0.00133333,0.00133333,0.00133333} | {1959-07-15,1966-02-18,1969-02-22,1971-01-10,1972-07-26,1974-02-09 ,1975-05-27,1976-07-28,1977-08-19,1978-08-07,1979-10-02} | 1 babase | census | sname | 0 | 7 | 177 | {MAX,ALT,PRE,COW,EST,JAN,RIN,ZUM,DUT,LUL} | {0.0166667,0.015,0.015,0.0146667 ,0.0143333,0.014,0.0136667,0.0136667,0.0133333,0.0133333} | {ALI,BUN,FAN,IBI,LER,NDO,PET,RUS,SLM,TOT,XEN} | 0.0446897 (2 rows) Thanks. Karl <kop@meme.com> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein
pgsql-performance by date: