Thread: Tuning random_page_cost
Hi!
I've got a query that has a where clause on a timestamp field:
select t.board_id
, t.thread_id
from public.board_thread t
where t.last_reply <= now()-'6 months'::interval
limit 1
, t.thread_id
from public.board_thread t
where t.last_reply <= now()-'6 months'::interval
limit 1
I've got random_page_cost set to 1.4 which is fine for most queries; yet here the planner prefers a (slower) sequential scan:
QUERY PLAN
Limit (cost=0.00..0.14 rows=1 width=8) (actual time=2.598..2.600 rows=1 loops=1)
-> Seq Scan on board_thread t (cost=0.00..4613.87 rows=33674 width=8) (actual time=2.592..2.592 rows=1 loops=1)
Filter: (last_reply <= (now() - '6 mons'::interval))
Total runtime: 2.711 ms
Limit (cost=0.00..0.14 rows=1 width=8) (actual time=2.598..2.600 rows=1 loops=1)
-> Seq Scan on board_thread t (cost=0.00..4613.87 rows=33674 width=8) (actual time=2.592..2.592 rows=1 loops=1)
Filter: (last_reply <= (now() - '6 mons'::interval))
Total runtime: 2.711 ms
When I set random_page_cost extremely low, the planner makes a better decision; the "breaking point" ist at set random_page_cost = 0.16938, which seems much to unrealistic to use as a general setting:
QUERY PLAN
Limit (cost=0.00..0.14 rows=1 width=8) (actual time=0.142..0.143 rows=1 loops=1)
-> Index Scan using idx_bt_last_reply on board_thread t (cost=0.00..4613.72 rows=33674 width=8) (actual time=0.137..0.137 rows=1 loops=1)
Index Cond: (last_reply <= (now() - '6 mons'::interval))
Total runtime: 0.281 ms
Limit (cost=0.00..0.14 rows=1 width=8) (actual time=0.142..0.143 rows=1 loops=1)
-> Index Scan using idx_bt_last_reply on board_thread t (cost=0.00..4613.72 rows=33674 width=8) (actual time=0.137..0.137 rows=1 loops=1)
Index Cond: (last_reply <= (now() - '6 mons'::interval))
Total runtime: 0.281 ms
When I change my query slightly so as to retrieve only rows with an "equals" match, the planner prefers the index-scan alright, even with the 1.4-setting:
select t.board_id
, t.thread_id
from public.board_thread t
where t.last_reply <= now()-'6 months'::interval
limit 1
, t.thread_id
from public.board_thread t
where t.last_reply <= now()-'6 months'::interval
limit 1
QUERY PLAN
Limit (cost=0.00..1.83 rows=1 width=8) (actual time=0.321..0.321 rows=0 loops=1)
-> Index Scan using idx_bt_last_reply on board_thread t (cost=0.00..3.66 rows=2 width=8) (actual time=0.315..0.315 rows=0 loops=1)
Index Cond: (last_reply = (now() - '6 mons'::interval))
Total runtime: 0.457 ms
Limit (cost=0.00..1.83 rows=1 width=8) (actual time=0.321..0.321 rows=0 loops=1)
-> Index Scan using idx_bt_last_reply on board_thread t (cost=0.00..3.66 rows=2 width=8) (actual time=0.315..0.315 rows=0 loops=1)
Index Cond: (last_reply = (now() - '6 mons'::interval))
Total runtime: 0.457 ms
Now here's my question: Is there some other screw to turn in order to tune general planner behaviour so it better matches this specific kind of query, too, or is it preferrable to leave general settings as they are and just use a set enable_seqscan = off; before this type of query?
Kind regards
Markus
On Tue, Jul 13, 2004 at 13:30:29 +0200, Markus Wollny <Markus.Wollny@computec.de> wrote: > Hi! > > I've got a query that has a where clause on a timestamp field: > > select t.board_id > , t.thread_id > from public.board_thread t > where t.last_reply <= now()-'6 months'::interval > limit 1 > > I've got random_page_cost set to 1.4 which is fine for most queries; yet > here the planner prefers a (slower) sequential scan: If you know that an index scan is better you can fudge the query to change the planner's estimate of the number of rows that will be returned. The normal fudge is to add a >= check for some timestamp that is earlier than any in the column, so that you will have a range condition, but not any change in the rows returned.
Thanks, that seems to have done the trick :) > -----Ursprüngliche Nachricht----- > Von: Bruno Wolff III [mailto:bruno@wolff.to] > Gesendet: Dienstag, 13. Juli 2004 14:44 > An: Markus Wollny > Cc: pgsql-general@postgresql.org > Betreff: Re: [GENERAL] Tuning random_page_cost > > On Tue, Jul 13, 2004 at 13:30:29 +0200, > Markus Wollny <Markus.Wollny@computec.de> wrote: > > Hi! > > > > I've got a query that has a where clause on a timestamp field: > > > > select t.board_id > > , t.thread_id > > from public.board_thread t > > where t.last_reply <= now()-'6 months'::interval > > limit 1 > > > > I've got random_page_cost set to 1.4 which is fine for most > queries; > > yet here the planner prefers a (slower) sequential scan: > > If you know that an index scan is better you can fudge the > query to change the planner's estimate of the number of rows > that will be returned. > The normal fudge is to add a >= check for some timestamp that > is earlier than any in the column, so that you will have a > range condition, but not any change in the rows returned. >
On Tue, 2004-07-13 at 05:30, Markus Wollny wrote: > Hi! > > I've got a query that has a where clause on a timestamp field: > > select t.board_id > , t.thread_id > from public.board_thread t > where t.last_reply <= now()-'6 months'::interval > limit 1 > > I've got random_page_cost set to 1.4 which is fine for most queries; > yet here the planner prefers a (slower) sequential scan: > > QUERY PLAN > Limit (cost=0.00..0.14 rows=1 width=8) (actual time=2.598..2.600 > rows=1 loops=1) > -> Seq Scan on board_thread t (cost=0.00..4613.87 rows=33674 > width=8) (actual time=2.592..2.592 rows=1 loops=1) > Filter: (last_reply <= (now() - '6 mons'::interval)) > Total runtime: 2.711 ms Did you happen to notice that the estimated number of rows is 33674 and the actual number is only 1? Are you analyzing this table, and if so, have you tried upping your target statistics on the column here? > When I set random_page_cost extremely low, the planner makes a better > decision; the "breaking point" ist at set random_page_cost = 0.16938, > which seems much to unrealistic to use as a general setting: Yes, but this is the wrong way. First, the planner needs the right numbers. Bonking it on the head with a sledge hammer is not how to get it to behave. Analyzing with a high enough stats target is. Note that these types of questions are generally better handled on the performance mailing list.