Re: Volunteer to build a configuration tool - Mailing list pgsql-performance
From | Campbell, Lance |
---|---|
Subject | Re: Volunteer to build a configuration tool |
Date | |
Msg-id | A3AC4FA47DC0B1458C3E5396E685E63302395E6C@SAB-DC1.sab.uiuc.edu Whole thread Raw |
In response to | Re: Volunteer to build a configuration tool (Greg Smith <gsmith@gregsmith.com>) |
Responses |
Re: Volunteer to build a configuration tool
Re: Volunteer to build a configuration tool |
List | pgsql-performance |
It is amazing how many times you can read something before it actually sinks in. There seems to be two possible approaches to optimizing PostgreSQL 8.2: File caching approach: This approach is based on the fact that the OS will cache the necessary PostgreSQL files. The key here is to set the size of effective_cache_size value as high as you think the OS has memory to cache the files. This approach would need the value of shared_buffers to be relatively low. Otherwise you are in a cense storing the data twice. One would also have to make sure that work_mem is not too high. Since the files would be cached by the OS, work_mem could be relatively low. This is an ideal approach if you have a dedicated server since there would be no other software using memory or accessing files that the OS would try to cache. Memory driven approach: In this approach you want to create a large value for shared_buffers. You are relying on shared_buffers to hold the most commonly accessed disk blocks. The value for effective_cache_size would be relatively small since you are not relying on the OS to cache files. This seems like it would be the ideal situation if you have other applications running on the box. By setting shared_buffers to a high value you are guaranteeing memory available to PostgreSQL (this assumes the other applications did not suck up to much memory to make your OS use virtual memory). This also seems more like how Oracle approaches things. Do I understand the possible optimization paths correctly? The only question I have about this approach is: if I use the "memory driven approach" since effective_cache_size would be small I would assume I would need to fiddle with random_page_cost since there would be know way for PostgreSQL to know I have a well configured system. If everything I said is correct then I agree "Why have effective_cache_size?" Why not just go down the approach that Oracle has taken and require people to rely more on shared_buffers and the general memory driven approach? Why rely on the disk caching of the OS? Memory is only getting cheaper. Thanks, Lance Campbell Project Manager/Software Architect Web Services at Public Affairs University of Illinois 217.333.0382 http://webservices.uiuc.edu -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Greg Smith Sent: Wednesday, June 20, 2007 10:21 AM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Volunteer to build a configuration tool On Wed, 20 Jun 2007, PFC wrote: > Except planner hints (argh) I see no way to give this information to the > machine... since it's mostly in the mind of the DBA. And the mind of the DBA has a funny way of being completely wrong some days about what's really happening under the hood. > Maybe a per-table "cache temperature" param (hot, warm, cold), but what > about the log table, the end of which is cached, but not the old records > ? It's messy. One of the things that was surprising to me when I started looking at the organization of the PostgreSQL buffer cache is how little gross information about its contents is available. I kept expecting to find a summary section where you could answer questions like "how much of the cache currently has information about index/table X?" used as an input to the optimizer. I understand that the design model expects much of this is unknowable due to the interaction with the OS cache, and in earlier versions you couldn't make shared_buffers big enough for its contents to be all that interesting, so until recently this wasn't worth collecting. But in the current era, where it's feasible to have multi-GB caches efficiently managed by PG and one can expect processor time is relatively cheap, it seems to me one way to give a major boost to the optimizer is to add some overhead to buffer cache management so it collects such information. When I was trying to do a complete overhaul on the background writer, the #1 problem was that I had to assemble my own statistics on what was inside the buffer cache as it was scanned, because a direct inspection of every buffer is the only way to know things like what percentage of the cache is currently dirty. I can't figure out if I'm relieved or really worried to discover that Tom isn't completely sure what to do with effective_cache_size either. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
pgsql-performance by date: