Re: Performance - Mailing list pgsql-performance

From Ogden
Subject Re: Performance
Date
Msg-id A0B9339E-9D58-4842-A206-050B773360B6@darkstatic.com
Whole thread Raw
In response to Re: Performance  (Tomas Vondra <tv@fuzzy.cz>)
Responses Re: Performance
Re: Performance
List pgsql-performance
On Apr 12, 2011, at 5:36 PM, Tomas Vondra wrote:

> Dne 12.4.2011 23:19, Ogden napsal(a):
>>
>> On Apr 12, 2011, at 4:09 PM, Tomas Vondra wrote:
>>
>>> Dne 12.4.2011 20:28, Ogden napsal(a):
>>>>
>>>> On Apr 12, 2011, at 1:16 PM, Tomas Vondra wrote:
>>>>
>>>>> Dne 12.4.2011 19:23, Ogden napsal(a):
>>>>>>
>>>>>> On Apr 12, 2011, at 12:18 PM, Andreas Kretschmer wrote:
>>>>>>
>>>>>>> Ogden <lists@darkstatic.com> wrote:
>>>>>>>
>>>>>>>> I have been wrestling with the configuration of the dedicated Postges 9.0.3
>>>>>>>> server at work and granted, there's more activity on the production server, but
>>>>>>>> the same queries take twice as long on the beefier server than my mac at home.
>>>>>>>> I have pasted what I have changed in postgresql.conf - I am wondering if
>>>>>>>> there's any way one can help me change things around to be more efficient.
>>>>>>>>
>>>>>>>> Dedicated PostgreSQL 9.0.3 Server with 16GB Ram
>>>>>>>>
>>>>>>>> Heavy write and read (for reporting and calculations) server.
>>>>>>>>
>>>>>>>> max_connections = 350
>>>>>>>> shared_buffers = 4096MB
>>>>>>>> work_mem = 32MB
>>>>>>>> maintenance_work_mem = 512MB
>>>>>>>
>>>>>>> That's okay.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> seq_page_cost = 0.02                    # measured on an arbitrary scale
>>>>>>>> random_page_cost = 0.03
>>>>>>>
>>>>>>> Do you have super, Super, SUPER fast disks? I think, this (seq_page_cost
>>>>>>> and random_page_cost) are completly wrong.
>>>>>>>
>>>>>>
>>>>>> No, I don't have super fast disks. Just the 15K SCSI over RAID. I
>>>>>> find by raising them to:
>>>>>>
>>>>>> seq_page_cost = 1.0
>>>>>> random_page_cost = 3.0
>>>>>> cpu_tuple_cost = 0.3
>>>>>> #cpu_index_tuple_cost = 0.005           # same scale as above - 0.005
>>>>>> #cpu_operator_cost = 0.0025             # same scale as above
>>>>>> effective_cache_size = 8192MB
>>>>>>
>>>>>> That this is better, some queries run much faster. Is this better?
>>>>>
>>>>> I guess it is. What really matters with those cost variables is the
>>>>> relative scale - the original values
>>>>>
>>>>> seq_page_cost = 0.02
>>>>> random_page_cost = 0.03
>>>>> cpu_tuple_cost = 0.02
>>>>>
>>>>> suggest that the random reads are almost as expensive as sequential
>>>>> reads (which usually is not true - the random reads are significantly
>>>>> more expensive), and that processing each row is about as expensive as
>>>>> reading the page from disk (again, reading data from disk is much more
>>>>> expensive than processing them).
>>>>>
>>>>> So yes, the current values are much more likely to give good results.
>>>>>
>>>>> You've mentioned those values were recommended on this list - can you
>>>>> point out the actual discussion?
>>>>>
>>>>>
>>>>
>>>> Thank you for your reply.
>>>>
>>>> http://archives.postgresql.org/pgsql-performance/2010-09/msg00169.php is how I first played with those values...
>>>>
>>>
>>> OK, what JD said there generally makes sense, although those values are
>>> a bit extreme - in most cases it's recommended to leave seq_page_cost=1
>>> and decrease the random_page_cost (to 2, the dafault value is 4). That
>>> usually pushes the planner towards index scans.
>>>
>>> I'm not saying those small values (0.02 etc.) are bad, but I guess the
>>> effect is about the same and it changes the impact of the other cost
>>> variables (cpu_tuple_cost, etc.)
>>>
>>> I see there is 16GB of RAM but shared_buffers are just 4GB. So there's
>>> nothing else running and the rest of the RAM is used for pagecache? I've
>>> noticed the previous discussion mentions there are 8GB of RAM and the DB
>>> size is 7GB (so it might fit into memory). Is this still the case?
>>>
>>> regards
>>> Tomas
>>
>>
>> Thomas,
>>
>> By decreasing random_page_cost to 2 (instead of 4), there is a slight performance decrease as opposed to leaving it
justat 4. For example, if I set it 3 (or 4), a query may take 0.057 seconds. The same query takes 0.144s when I set
random_page_costto 2. Should I keep it at 3 (or 4) as I have done now? 
>>
>> Yes there is 16GB of RAM but the database is much bigger than that. Should I increase shared_buffers?
>
> OK, that's a very important information and it kinda explains all the
> problems you had. When the planner decides what execution plan to use,
> it computes a 'virtual cost' for different plans and then chooses the
> cheapest one.
>
> Decreasing 'random_page_cost' decreases the expected cost of plans
> involving index scans, so that at a certain point it seems cheaper than
> a plan using sequential scans etc.
>
> You can see this when using EXPLAIN - do it with the original cost
> values, then change the values (for that session only) and do the
> EXPLAIN only. You'll see how the execution plan suddenly changes and
> starts to use index scans.
>
> The problem with random I/O is that it's usually much more expensive
> than sequential I/O as the drives need to seek etc. The only case when
> random I/O is just as cheap as sequential I/O is when all the data is
> cached in memory, because within RAM there's no difference between
> random and sequential access (right, that's why it's called Random
> Access Memory).
>
> So in the previous post setting both random_page_cost and seq_page_cost
> to the same value makes sense, because when the whole database fits into
> the memory, there's no difference and index scans are favorable.
>
> In this case (the database is much bigger than the available RAM) this
> no longer holds - index scans hit the drives, resulting in a lot of
> seeks etc. So it's a serious performance killer ...
>
> Not sure about increasing the shared_buffers - if the block is not found
> in shared buffers, it still might be found in pagecache (without need to
> do a physical read). There are ways to check if the current size of
> shared buffers is enough or not - I usually use pg_stat views (bgwriter
> and database).


Thomas,

Thank you for your very detailed and well written description. In conclusion, I should keep my random_page_cost (3.0)
toa value more than seq_page_cost (1.0)? Is this bad practice or will this suffice for my setup (where the database is
muchbigger than the RAM in the system)? Or is this not what you are suggesting at all? 

Thank you

Ogden



pgsql-performance by date:

Previous
From: Glyn Astill
Date:
Subject: Re: Linux: more cores = less concurrency.
Next
From: tv@fuzzy.cz
Date:
Subject: Re: Performance