Re: index v. seqscan for certain values - Mailing list pgsql-performance
| From | Jeremy Dunn | 
|---|---|
| Subject | Re: index v. seqscan for certain values | 
| Date | |
| Msg-id | 001801c420b0$b8b79b30$4f01a8c0@jeremydunn Whole thread Raw | 
| In response to | Re: index v. seqscan for certain values (Bill Moran <wmoran@potentialtech.com>) | 
| List | pgsql-performance | 
Sorry I should have written that we do VACUUM VERBOSE ANALYZE every night. - Jeremy -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Bill Moran Sent: Monday, April 12, 2004 12:09 PM To: jdunn@autorevenue.com Cc: Postgresql Performance Subject: Re: [PERFORM] index v. seqscan for certain values Quick bit of input, since you didn't mention it. How often do you run ANALYZE? I found it interesting that a database I was doing tests on sped up by a factor of 20 after ANALYZE. If your data changes a lot, you should probably schedule ANALYZE to run with VACUUM. Jeremy Dunn wrote: > I've searched the archives and can't find an answer to this seemingly > simple question. Apologies if it's too common. > > The table in question has ~1.3M rows. It has 85 columns, 5 of which > have single-column indexes. > > The column in question (CID) has 183 distinct values. For these > values, > the largest has ~38,000 rows, and the smallest has 1 row. About 30 > values have < 100 rows, and about 10 values have > 20,000 rows. > > The database is 7.2.3 running on RedHat 7.1. (we are in process of > upgrading to PG 7.4.2) All of the query plan options are enabled, and > the cpu costs are set to the default values. ( cpu_tuple_cost is 0.01, > cpu_index_tuple_cost is 0.001). The database is VACUUM'd every night. > > The problem: > A simply query: > select count(*) from xxx where CID=<smalval> > where <smalval> is a CID value which has relatively few rows, returns > a > plan using the index on that column. > > explain analyze select count(*) from xxx where cid=869366; > Aggregate (cost=19136.33..19136.33 rows=1 width=0) (actual > time=78.49..78.49 rows=1 loops=1) > -> Index Scan using xxx_cid on emailrcpts (cost=0.00..19122.21 > rows=5648 width=0) (actual time=63.40..78.46 rows=1 loops=1) > Total runtime: 78.69 msec > > The same plan is true for values which have up to about 20,000 rows: > > explain analyze select count(*) from xxx where cid=6223341; > Aggregate (cost=74384.19..74384.19 rows=1 width=0) (actual > time=11614.89..11614.89 rows=1 loops=1) > -> Index Scan using xxx_cid on emailrcpts (cost=0.00..74329.26 > rows=21974 width=0) (actual time=35.75..11582.10 rows=20114 loops=1) > Total runtime: 11615.05 msec > However for the values that have > 20,000 rows, the plan changes to a > sequential scan, which is proportionately much slower. > > explain analyze select count(*) from xxx where cid=7191032; > Aggregate (cost=97357.61..97357.61 rows=1 width=0) (actual > time=46427.81..46427.82 rows=1 loops=1) > -> Seq Scan on xxx (cost=0.00..97230.62 rows=50792 width=0) > (actual time=9104.45..46370.27 rows=37765 loops=1) > Total runtime: 46428.00 msec > > > The question: why does the planner consider a sequential scan to be > better for these top 10 values? In terms of elapsed time it is more > than twice as slow, proportionate to an index scan for the same number > of rows. > > What I tried: > > A) alter table xxx alter column cid set statistics 500; > analyze xxx; > This does not affect the results. > > B) dropped/rebuilt the index, with no improvement. > > C) decreasing cpu_index_tuple_cost by a factor of up to 1000, with no > success > > D) force an index scan for the larger values by using a very high > value > for cpu_tuple_cost (e.g. .5) but this doesn't seem like a wise thing to do. > > Your thoughts appreciated in advance! > > - Jeremy > > 7+ years experience in Oracle performance-tuning > relatively new to postgresql -- Bill Moran Potential Technologies http://www.potentialtech.com ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster
pgsql-performance by date: