Re: Now I am back, next thing. Final PGS tuning. - Mailing list pgsql-general

From Scott Marlowe
Subject Re: Now I am back, next thing. Final PGS tuning.
Date
Msg-id dcc563d10904081053v2428965ew3fd5939bac05fd13@mail.gmail.com
Whole thread Raw
In response to Now I am back, next thing. Final PGS tuning.  (Jennifer Trey <jennifer.trey@gmail.com>)
List pgsql-general
On Wed, Apr 8, 2009 at 8:01 AM, Jennifer Trey <jennifer.trey@gmail.com> wrote:

> I would like to further tune the tuning wizards recommendations though. I
> think it put itself on the lower scale.

OK, instead of blindly guessing at better values, and making a lot of
concurrent changes, you need to set up some kind of simple yet
realistic benchmark for your database.  It doesn't have to be perfect,
but it should realistically reflect the number of clients you'll have
connecting at once and the types of queries they're likely to run.
Write a simple pgbench script and use it to test your changes.

> I have 8GB memory, Intel Quad Core 2.4Ghz with 8MB L2 cache. I am running
> Windows Web Server 2008 x64 and will be running a Java (64 bit version)
> application.

Note that the very first thing you could do to performance tune your
server would be to run it on something other than windows.  This is
not to bash windows, it's a simple fact of postgresql's architecture
not being a great match for windows under heavy load.

> I want to give the java app room for working on 2-3GB. The operating system
> is currently consuming around 1GB but lets give it a little more room. Lets
> give it a total of 2GB.
>
> That leaves 8-3-2 = 3 GB for PostgreSQL alone. Sounds good?

Note that PostgreSQL relies on the OS caching as much as its own, and
this tends to be even more true in windows environments.

> Here is my config file :
>
> max_connections = 100
>
> What does this do? Should I be looking at this as max similtaneous queries ?
> is 100 really enough? I think I want to max this more.

Max connections is the maximum number of clients that can connect at
the same time.  Each connection uses up a few megs of memory and can
start a query independent of other connections.  Generally a hundred
or so is a reasonable place to start.  But capacity planning will tell
you how many you really need.  If you find yourself going over 100 by
much, start looking at connection pooling.

> I am looking for a worst scenario around like 50-100 similitaneous user
> clicks (per second?). But the querying might be around like 200 queries per
> seocond, not really, but I want to be prepared. :)

OK, the real issue here will be whether or not you have persistent
pooled connections.  Creating PostgreSQL connections is expensive,
especially so on Windows.  Definitely look at pooling

> I would appreciate if could have a discussion on these topics. On whats
> important and whats not.
>
> Here is some other settings I am thinking about :
>
> effective_cache_size = 449697
>
> is this kilo bytes ? Is this a good value?

In 8.3 you can put the actual unit after, so 400MB would be a nicer
way to put that number in.  Effective cache size just tells the
planner about how much cache there is in the OS, and postgresql.  It's
not a very fine grained control, so just guestimate it at say 3000MB
or something for now.

> maintenance_work_mem = 16384

Again, it's a good idea to put units after in 8.3.  Since
maintenance_work_mem is only used by things like vacuum, you can set
it pretty high (256MB is common) and not worry too much.

> work_mem = 1024  # I think this is kb. Way to low, right? What is a better
> value?

Again, throw a unit on the end.  default is kb.  1M is fine for now.
Again, implement some kind of benchmark, increase it when it provably
makes a difference for most of your queries.  If there's a lone query
that can use a lot more, then set work_mem higher in that session or
for a special user so it's not higher for everybody.

work_mem is PER SORT type op / PER SESSION.  So, if you have 100 users
doing 2 sorts each you can theoretically use up 100x2xwork_mem memory.
 A machine that's running fine one moment can collapse under load as
the number of processes increase and memory gets allocated out of
control.

That said, on my servers, with 100 to 200 connections, it's set to 8
meg.  That machine has 32Gig of ram, so 800 to 1600 Meg of ram
theoretically getting used won't cause some kind of swap storm.

> shared_buffers = 1024 # min 128kB or max_connections*16kB  ## Also to low.
> Right? I've got 3GB to work with!

But, you're on Windows, and the shared_buffer implementation there
doesn't scale as well as it does on linux or other flavors of unix.
So, while setting it a bit higher is good, don't set it any higher
than it needs to be to hold the current working set of all queries,
which is usually in the hundreds of megabytes, not the gigabyte range.
 Again, benchmark and test, but a good starting point is likely in the
128MB to 512MB range for windows.

> Please give your thoughts. I was also wondering about the Vacuum, force
> reindex and stuff. Are those things good to run once in a while? Force
> sounds a little brutal though!

It's much better to monitor your db for such things and vacuum full /
reindex only when / if needed, and do what you can to head those
things off.

> Something else I should consider?

If your dataset can fit in memory, consider lowering random_page_cost
to something in the range of but above 1.0.

pgsql-general by date:

Previous
From: Radcon Entec
Date:
Subject: Re: Table has 22 million records, but backup doesn't see them
Next
From: Greg Smith
Date:
Subject: Re: Now I am back, next thing. Final PGS tuning.