Re: Help how to tune-up my Database - Mailing list pgsql-performance

From scott.marlowe
Subject Re: Help how to tune-up my Database
Date
Msg-id Pine.LNX.4.33.0405070939020.2340-100000@css120.ihs.com
Whole thread Raw
In response to Help how to tune-up my Database  ("Michael Ryan S. Puncia" <mpuncia@census.gov.ph>)
Responses Re: Help how to tune-up my Database
List pgsql-performance
On Fri, 7 May 2004, Michael Ryan S. Puncia wrote:

> Hi,
>
>             I am a newbie here and just starting to use postgresql. My
> problems is how to tune up my server because it its too slow.

First, read this:

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

> This is my PC specs: P4, 512Ram, Linux 9

get more ram.

Hard Drives:  interface, how many, RAID???

For a mostly read database IDEs are pretty good.  Having multiple drives
in a RAID-5 or RAID1+0 works well on a mostly read database too.  Keep the
stripe size small is setting up a RAID array for a database.

> Because I am working in a statistical organization we have a very large data
> volume
>
> These are my data:
>
>
>
> Table 1 with 60 million data but only with 10 fields
>
> Table 2 with 30 million data with 15 fields
>
> Table 3 with 30 million data with 10 fields

That's not really that big, but it's big enough you have to make sure your
server is tuned properly.

> I will only use this server for querying ... I already read and apply those
> articles found in the archives section but still the performance is not
> good.
>
> I am planning to add another 512 RAM .Another question is how to calculate
> shared_buffer size ..

I'm assuming you've recently vacuumed full and analyzed your database...

Shared buffers should probably be between 1000 and 10000 on about 98% of
all installations.  Setting it higher than 25% of memory is usually a bad
idea.  Since they're in 8k blocks (unless you compiled with a customer
block size, you'd know if you did, it's not something you can accidentally
do by leaning on the wrong switch...) you probably want about 10000 blocks
or so to start, which will give you about 80 megs of shared buffer.

PostgreSQL doesn't really cache as well as the kernel, so it's better to
leave more memory available for kernel cache than you allocate to buffer
cache.  On a machine with only 512Meg, I'm guessing you'll get about 128
to 200 megs of kernel cache if you're only running postgresql and you have
it set to 10000 buffers.

The important things to check / set are things lik effective_cache_size.
It too is measured in 8k blocks, and reflects the approximate amount of
kernel cache being dedicated to postgresql.  assuming a single service
postgresql only box, that will be the number that a server that's been up
for a while shows under top like so:

  9:50am  up 12:16,  4 users,  load average: 0.00, 0.00, 0.00
104 processes: 102 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:  0.7% user,  0.3% system,  0.0% nice,  1.7% idle
Mem:   512924K av,  499248K used,   13676K free,       0K shrd,   54856K buff
Swap: 2048248K av,    5860K used, 2042388K free                  229572K cached

the 229572k cached entry shows about 230 megs.  divided by 8192 we get
about 28000.

sort_mem might do with a small bump, especially if you're only handling a
few connections at a time.  Be careful, it's per sort, and measured in
megs, so it's easy for folks to set it too high and make their machine
start flushing too much kernel cache, which will slow down the other
backends that have to go to disk for data.

A good starting point for testing is anywhere from 8192 to 32768.  32768
is 32 megs, which can starve a machine as small as yours if there are a
couple of queries each running a couple of sorts on large sets at the same
time.

Lastly, using explain analyze <your query here> you can see if postgresql
is making a bad plan choice.  compared estimated rows to actual rows.
Look for things like nested loops being run on what the planner thinks
will be 80 rows but is, in fact, 8000 rows.

You can change random page cost to change the tendency of the server to
favor seq scans to index scans.  Lower = greater tendency towards index
scans.  the default is 4, but most production servers with enough memory
to cache most of their data will run well on a setting of 1.2 to 1.4.  My
dual 2800 with 2 gig ram runs quite well at 1.3 to 1.4.

You can also change the settings to random_page_cost, as well as turning
off options to the planner with the following env vars:

enable_hashagg
enable_hashjoin
enable_indexscan
enable_mergejoin
enable_nestloop
enable_seqscan
enable_sort
enable_tidscan

They are all on by default, and shouldn't really be turned off by default
for the most part.  but for an individual session to figure out if the
query planner is making the right plan you can set them to off to see if
using another plan works better.

so, if you've got a nested loop running over 80000 rows that the planner
thought was gonna be 80 rows, you can force it to stop using the nested
loop for your session with:

set enable_nestloop=off;

and use explain analyze to see if it runs faster.

You can set effective_cache_size and sort_mem on the fly for a single
connection, or set them in postgresql.conf and restart or reload to make a
change in the default.

shared_buffers is set on postgresql startup, and can't be changed without
restarting the database.  Reloading won't do it.




pgsql-performance by date:

Previous
From: "Michael Ryan S. Puncia"
Date:
Subject: Help how to tune-up my Database
Next
From: Gaetano Mendola
Date:
Subject: Re: [OFF-TOPIC] - Known maximum size of the