annoying query/planner choice - Mailing list pgsql-performance
From | Andrew Rawnsley |
---|---|
Subject | annoying query/planner choice |
Date | |
Msg-id | 22837C75-44AC-11D8-8262-000393A47FCC@ravensfield.com Whole thread Raw |
Responses |
Re: annoying query/planner choice
Re: annoying query/planner choice |
List | pgsql-performance |
I have a situation that is giving me small fits, and would like to see if anyone can shed any light on it. I have a modest table (@1.4 million rows, and growing), that has a variety of queries run against it. One is a very straightforward one - pull a set of distinct rows out based on two columns, with a simple where clause based on one of the indexed columns. For illustration here, I've removed the distinct and order-by clauses, as they are not the culprits. Before I go on - v7.4.1, currently on a test box, dual P3, 1G ram, 10K scsi, Slackware 9 or so. The table has been vacuumed and analyzed. Even offered pizza and beer. Production box will be a dual Xeon with 2G ram and RAID 5. When the query is run with a where clause that returns small number of rows, the query uses the index and is quite speedy: rav=# explain analyze select casno, parameter from hai.results where site_id = 9982; QUERY PLAN ------------------------------------------------------------------------ -------------------------------------------------------------- Index Scan using hai_res_siteid_ndx on results (cost=0.00..7720.87 rows=2394 width=30) (actual time=12.118..12.933 rows=50 loops=1) Index Cond: (site_id = 9982) Total runtime: 13.145 ms When a query is run that returns a much larger set, the index is not used, I assume because the planner thinks that a sequential scan would work just as well with a large result set: rav=# explain analyze select casno, parameter from hai.results where site_id = 18; QUERY PLAN ------------------------------------------------------------------------ ---------------------------------------------- Seq Scan on results (cost=0.00..73396.39 rows=211205 width=30) (actual time=619.020..15012.807 rows=186564 loops=1) Filter: (site_id = 18) Total runtime: 15279.789 ms (3 rows) Unfortunately, its way off: rav=# set enable_seqscan=off; SET rav=# explain analyze select casno, parameter from hai.results where site_id = 18; QUERY PLAN ------------------------------------------------------------------------ ----------------------------------------------------------------------- Index Scan using hai_res_siteid_ndx on results (cost=0.00..678587.01 rows=211205 width=30) (actual time=9.575..3569.387 rows=186564 loops=1) Index Cond: (site_id = 18) Total runtime: 3872.292 ms (3 rows) I would like, of course, for it to use the index, given that it takes 20-25% of the time. Fiddling with CPU_TUPLE_COST doesn't do anything until I exceed 0.5, which strikes me as a bit high (though please correct me if I am assuming too much...). RANDOM_PAGE_COST seems to have no effect. I suppose I could cluster it, but it is constantly being added to, and would have to be re-done on a daily basis (if not more). Any suggestions? -------------------- Andrew Rawnsley President The Ravensfield Digital Resource Group, Ltd. (740) 587-0114 www.ravensfield.com
pgsql-performance by date: