Re: Moving postgresql.conf tunables into 2003... - Mailing list pgsql-performance
From | Sean Chittenden |
---|---|
Subject | Re: Moving postgresql.conf tunables into 2003... |
Date | |
Msg-id | 20030703232351.GS72567@perrin.int.nxad.com Whole thread Raw |
In response to | Re: Moving postgresql.conf tunables into 2003... ("scott.marlowe" <scott.marlowe@ihs.com>) |
Responses |
Re: Moving postgresql.conf tunables into 2003...
|
List | pgsql-performance |
> > What are the odds of going through and revamping some of the > > tunables in postgresql.conf for the 7.4 release? I was just > > working with someone on IRC and on their 7800 RPM IDE drives, > > their random_page_cost was ideally suited to be 0.32: a far cry > > from 4. Doing so has been a win across the board and the problem > > query went from about 40sec (seq scan) down to 0.25ms (using idx, > > higher than 0.32 resulted in a query time jump to 2sec, and at 0.4 > > it went back up to a full seq scan at 40sec). > > I'm the guy who advocates settings of 1 to 2, and that still sounds > low to me. :-) I'm wondering if the effective_cache_size was set > properly, as well as there be enough buffers allocated. > > I generally set effective cache size to 100,000 pages (800 megs or > so) on my box, which is where it sits most days. with this setting > I've found that settings of under 1 are not usually necessary to > force the planner to take the path of righteousness (i.e. the > fastest one :-) 1.2 to 1.4 are optimal to me. This is a nightly report that's run, cache sizes won't impact performance of the query at all. The planner was consistently choosing a sequential scan over using the index until the random_page_cost was set to 0.32. After adjustment, the query just flies (0.25ms@0.32 vs 0.350s@0.33 vs. 40s@>0.4). Since it's a nightly report that only gets performed once a day and data is COPY'ed in once every few minutes, there's a huge amount of data that's not cached nor should it be. > Since theoretically a random page of of 1 means no penalty to move > the heads around, and there's ALWAYS a penalty for moving the heads > around, we have to assume: > > 1: That either the planner is making poor decisions on some other > variable, and we can whack the planner in the head with a really low > random page count. By all accounts of having played with this query+data, this is the correct assumption from what I can tell. > OR > > 2: The other settings are suboptimal (buffers, sort_mem, > effective_cache_size, etc...) and lowering random page costs helps > there. None of those other than possibly sort_mem had any impact on the query, but even then, lower sort_mem doesn't help until the data's been picked out of the table. Sorting ~16k of rows is quicker with more sort_mem. Higher sort_mem has zero impact on fetching ~16K rows out of a table with 40M rows of data. Getting the planner to pick using the index to filter out data inserted in the last 3 days over doing a seq scan... well, I don't know how you could do that without changing the random_page_cost. A good thump to the side of the head would be welcome too if I'm wrong, just make sure it's a good thump with the appropriate clue-bat. > I've always wondered if most performance issues aren't a bit of both. Eh, in my experience, it's generally that random_page_cost needs to be adjusted to match the hardware and this value every year with new hardware, seems to be getting lower. > The answer, of course, is fixing the planner so that a > random_page_cost of anything less than 1 would never be needed, > since by design, anything under 1 represents a computer that likely > doesn't exist (in theory of course.) A 1 would be a machine that > was using solid state hard drives and had the same cost in terms of > OS paths to do random accesses as sequential. Well, this could be a bug then, but I'm skeptical. What's odd to me is that hanging the value between 0.32, 0.33, and 0.4 all radically change the performance of the query. > What constants in the planner, and / or formulas would be the likely > culprits I wonder? I've wandered through that page and wasn't sure > what to play with. random_page_cost should be proportional to the seek time necessary for the disk to find a page of data on its platters. It makes sense that this value, as time progresses, gets smaller as hardware gets faster. -sc -- Sean Chittenden
pgsql-performance by date: