Re: Volunteer to build a configuration tool - Mailing list pgsql-performance

From Campbell, Lance
Subject Re: Volunteer to build a configuration tool
Date
Msg-id A3AC4FA47DC0B1458C3E5396E685E63302395E6C@SAB-DC1.sab.uiuc.edu
Whole thread Raw
In response to Re: Volunteer to build a configuration tool  (Greg Smith <gsmith@gregsmith.com>)
Responses Re: Volunteer to build a configuration tool  (Heikki Linnakangas <heikki@enterprisedb.com>)
Re: Volunteer to build a configuration tool  (Greg Smith <gsmith@gregsmith.com>)
List pgsql-performance
It is amazing how many times you can read something before it actually
sinks in.

There seems to be two possible approaches to optimizing PostgreSQL 8.2:

File caching approach:
This approach is based on the fact that the OS will cache the necessary
PostgreSQL files.  The key here is to set the size of
effective_cache_size value as high as you think the OS has memory to
cache the files.  This approach would need the value of shared_buffers
to be relatively low.  Otherwise you are in a cense storing the data
twice.  One would also have to make sure that work_mem is not too high.
Since the files would be cached by the OS, work_mem could be relatively
low.  This is an ideal approach if you have a dedicated server since
there would be no other software using memory or accessing files that
the OS would try to cache.

Memory driven approach:
In this approach you want to create a large value for shared_buffers.
You are relying on shared_buffers to hold the most commonly accessed
disk blocks.  The value for effective_cache_size would be relatively
small since you are not relying on the OS to cache files.  This seems
like it would be the ideal situation if you have other applications
running on the box.  By setting shared_buffers to a high value you are
guaranteeing memory available to PostgreSQL (this assumes the other
applications did not suck up to much memory to make your OS use virtual
memory).   This also seems more like how Oracle approaches things.

Do I understand the possible optimization paths correctly?  The only
question I have about this approach is: if I use the "memory driven
approach" since effective_cache_size would be small I would assume I
would need to fiddle with random_page_cost since there would be know way
for PostgreSQL to know I have a well configured system.

If everything I said is correct then I agree "Why have
effective_cache_size?"  Why not just go down the approach that Oracle
has taken and require people to rely more on shared_buffers and the
general memory driven approach?  Why rely on the disk caching of the OS?
Memory is only getting cheaper.

Thanks,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu


-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Greg Smith
Sent: Wednesday, June 20, 2007 10:21 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Volunteer to build a configuration tool

On Wed, 20 Jun 2007, PFC wrote:

> Except planner hints (argh) I see no way to give this information to
the
> machine... since it's mostly in the mind of the DBA.

And the mind of the DBA has a funny way of being completely wrong some
days about what's really happening under the hood.

> Maybe a per-table "cache temperature" param (hot, warm, cold), but
what
> about the log table, the end of which is cached, but not the old
records
> ? It's messy.

One of the things that was surprising to me when I started looking at
the
organization of the PostgreSQL buffer cache is how little gross
information about its contents is available.  I kept expecting to find a

summary section where you could answer questions like "how much of the
cache currently has information about index/table X?" used as an input
to
the optimizer.  I understand that the design model expects much of this
is
unknowable due to the interaction with the OS cache, and in earlier
versions you couldn't make shared_buffers big enough for its contents to

be all that interesting, so until recently this wasn't worth collecting.

But in the current era, where it's feasible to have multi-GB caches
efficiently managed by PG and one can expect processor time is
relatively
cheap, it seems to me one way to give a major boost to the optimizer is
to
add some overhead to buffer cache management so it collects such
information.  When I was trying to do a complete overhaul on the
background writer, the #1 problem was that I had to assemble my own
statistics on what was inside the buffer cache as it was scanned,
because
a direct inspection of every buffer is the only way to know things like
what percentage of the cache is currently dirty.

I can't figure out if I'm relieved or really worried to discover that
Tom
isn't completely sure what to do with effective_cache_size either.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Slow indexscan
Next
From: "Mikko Partio"
Date:
Subject: Re: Slow indexscan