Thread: Planner constants for RAM resident databases
I'm working with an application where the database is entirely resident in RAM (the server is a quad opteron with 16GBytes of memory). It's a web application and handles a high volume of queries. The planner seems to be generating poor plans for some of our queries which I can fix by raising cpu_tuple_cost. I have seen some other comments in the archives saying that this is a bad idea but is that necessarily the case when the database is entirely resident in RAM? Emil
On Fri, Jul 01, 2005 at 09:59:38PM -0400, Emil Briggs wrote: > I'm working with an application where the database is entirely resident in RAM > (the server is a quad opteron with 16GBytes of memory). It's a web > application and handles a high volume of queries. The planner seems to be > generating poor plans for some of our queries which I can fix by raising > cpu_tuple_cost. I have seen some other comments in the archives saying that > this is a bad idea but is that necessarily the case when the database is > entirely resident in RAM? If I'm understanding correctly that'll mostly increase the estimated cost of handling a row relative to a sequential page fetch, which sure sounds like it'll push plans in the right direction, but it doesn't sound like the right knob to twiddle. What do you have random_page_cost set to? Cheers, Steve
Emil Briggs wrote: >I'm working with an application where the database is entirely resident in RAM >(the server is a quad opteron with 16GBytes of memory). It's a web >application and handles a high volume of queries. The planner seems to be >generating poor plans for some of our queries which I can fix by raising >cpu_tuple_cost. I have seen some other comments in the archives saying that >this is a bad idea but is that necessarily the case when the database is >entirely resident in RAM? > >Emil > > > Generally, the key knob to twiddle when everything fits in RAM is random_page_cost. If you truly have everything in RAM you could set it almost to 1. 1 means that it costs exactly the same to go randomly through the data then it does to go sequential. I would guess that even in RAM it is faster to go sequential (since you still have to page and deal with L1/L2/L3 cache, etc). But the default random_page_cost of 4 is probably too high for you. John =:->
Attachment
Emil Briggs wrote: >>I just mentioned random_page_cost, but you should also tune >>effective_cache_size, since that is effectively most of your RAM. It >>depends what else is going on in the system, but setting it as high as >>say 12-14GB is probably reasonable if it is a dedicated machine. With >>random_page_cost 1.5-2, and higher effective_cache_size, you should be >>doing pretty well. >>John >>=:-> >> >> > >I tried playing around with these and they had no effect. It seems the only >thing that makes a difference is cpu_tuple_cost. > > > I'm surprised. I know cpu_tuple_cost can effect it as well, but usually the recommended way to get indexed scans is the above two parameters. When you do "explain analyze" of a query that you have difficulties with, how are the planner's estimates. Are the estimated number of rows about equal to the actual number of rows? If the planner is mis-estimating, there is a whole different set of tuning to do to help it estimate correctly. John =:-> PS> Use reply-all so that your comments go to the list.
Attachment
> When you do "explain analyze" of a query that you have difficulties > with, how are the planner's estimates. Are the estimated number of rows > about equal to the actual number of rows? Some of them are pretty far off. For example -> Merge Left Join (cost=9707.71..13993.52 rows=1276 width=161) (actual time=164.423..361.477 rows=49 loops=1) I tried setting enable_merge_joins to off and that made the query about three times faster. It's using a hash join instead.
Emil, > -> Merge Left Join (cost=9707.71..13993.52 rows=1276 width=161) > (actual time=164.423..361.477 rows=49 loops=1) That would indicate that you need to either increase your statistical sampling (SET STATISTICS) or your frequency of running ANALYZE, or both. -- --Josh Josh Berkus Aglio Database Solutions San Francisco