Thread: postgres optimization

postgres optimization

From

Kailash Vyas

Date:

29 July 2005, 10:24:23

hi

i am optmizing postgres database and need some help on it.
I currently have a server with 2 Gb RAM.

I have setup the /proc/sys/kernel/shmall and /proc/sys/kernel/shmmax to 536870912 i.e 512 Mb.
should i change it to 2 Gb and how will it affect the perfomance and will there be any downsides to it or is it better to keep it less.

I then changed the shared buffers in config file according to this expression specifies in postgres manual

max connections=128
250kB + 8.2 kB * shared_buffers + 14.2 kB * max_connections or infinity

shared buffer=65277

what value should i specify for Effective Cache Size. I came across this article but it is slightly confusing.

"Say there is 1.5GB RAM in your machine, shared buffers are set to 32MB and effective cache size is set to 800MB. So if a query needs 700MB of data set, PostgreSQL would estimate that all the data required should be available in memory and would opt for more aggressive plan in terms of optimization, involving heavier index usage and merge joins. But if effective cache is set to only 200MB, the query planner is liable to opt for the more I/O efficient sequential scan."

thanks,
kailash

Re: postgres optimization

From

Scott Marlowe

Date:

29 July 2005, 12:47:46

On Fri, 2005-07-29 at 08:24, Kailash Vyas wrote:
> hi
>
> i am optmizing postgres database and need some help on it.
> I currently have a server with 2 Gb RAM.
>
> I have setup the /proc/sys/kernel/shmall and /proc/sys/kernel/shmmax
> to 536870912 i.e 512 Mb.
> should i change it to 2 Gb and how will it affect the perfomance and
> will there be any downsides to it or is it better to keep it less.

Generally speaking, you're better off letting your kernel do the
majority of CACHING, and letting postgresql do BUFFERING.

On a linux box, the kernel will use all spare memory to cache disk
accesses automagically.  So, assuming postgresql uses up a few hundred
megs, after the machine reaches a steady state, the kernel should be
using the 1.5 gig or so left over for caching.

While the latest versions of postgresql have much improved caching
algorithms for its buffers, it still pretty much dumps the buffers when
the last backend looking at them disconnects. I.e. it's not a persistent
caching system like the kernel cache is.

> I then changed the shared buffers in config file according to this
> expression specifies in postgres manual
>
> max connections=128
> 250kB + 8.2 kB * shared_buffers + 14.2 kB * max_connections or
> infinity
>
>
> shared buffer=65277

That's large, but not unreasonable for a machine handling a large
dataset.  I assume you do have a decent sized dataset (a gig or more) to
handle.

> what value should i specify for Effective Cache Size. I came across
> this article but it is slightly confusing.

Effective cache size just tells the query planner about how much memory
the OS is using to cache your dataset.

Bring the machine up, run lots of queries, and check the cache and
buffers with top, and there's your amount.  divide by 8k to get the
setting for effective cache size.

Re: postgres optimization (effective_cache_size)

From

Date:

12 August 2005, 19:53:59

Hello,

I followed this advice for picking a good effective_cache_size value
(below) from Scott Marlowe, and run into a bit of trouble:

I looked at the `top' output and saw "721380k cached".
So I calculated the effective cache size using Scott's formula:

  721380/8 = 90172

Then I changed my effective_cache size from the previous 10000 to
90172:

  effective_cache_size = 90172   # typically 8KB each

I restarted PG, and thought I'd see some performance improvements, but
I actually saw degradation in performance.  All of a sudden a query
that took a second started taking a few seconds, and consumed more CPU
than before.

Can anyone explain this drop in performance and increase in CPU usage
and tell me what I did wrong?

I also noticed that the "721380k cached" number in top dropped to about
300000k (about a half).  Maybe that was simply due to PG restart?  If
so, does that indicate the kernel had about 400,000K worth of PG data
cached?

Thanks,
Otis

--- Scott Marlowe <smarlowe@g2switchworks.com> wrote:

> Effective cache size just tells the query planner about how much
> memory
> the OS is using to cache your dataset.
>
> Bring the machine up, run lots of queries, and check the cache and
> buffers with top, and there's your amount.  divide by 8k to get the
> setting for effective cache size.

Re: postgres optimization (effective_cache_size)

From

Scott Marlowe

Date:

22 August 2005, 17:21:03

On Fri, 2005-08-12 at 17:48, ogjunk-pgjedan@yahoo.com wrote:
> Hello,
>
> I followed this advice for picking a good effective_cache_size value
> (below) from Scott Marlowe, and run into a bit of trouble:
>
> I looked at the `top' output and saw "721380k cached".
> So I calculated the effective cache size using Scott's formula:
>
>   721380/8 = 90172
>
> Then I changed my effective_cache size from the previous 10000 to
> 90172:
>
>   effective_cache_size = 90172   # typically 8KB each
>
>
> I restarted PG, and thought I'd see some performance improvements, but
> I actually saw degradation in performance.  All of a sudden a query
> that took a second started taking a few seconds, and consumed more CPU
> than before.

There are a lot of reasons why this may be so, but the most likely is
that when restarting postgresql, the kernel cache got old and was dumped
out, so that it wasn't holding all the data.

Basically, a larger effective cache size favors indexes over seq scan,
since the data are more likely to be in memory when you ask for them.

> Can anyone explain this drop in performance and increase in CPU usage
> and tell me what I did wrong?
>
> I also noticed that the "721380k cached" number in top dropped to about
> 300000k (about a half).  Maybe that was simply due to PG restart?  If
> so, does that indicate the kernel had about 400,000K worth of PG data
> cached?

That's pretty normal, over time, with postgresql being idle.  Don't
worry about how the query runs the first time you run it so much as how
it runs consistently over time.  If the numbers come back to where they
were, then you've got the right setting.  Like I said, the
effective_cache_size is kind of a lead hammer, used to nudge the planner
one way or another but doesn't need to be exact to be useful.

Take a look at the plan being chosen, it is likely to be a random access
(i.e. use an index) right now rather than a seq scan.  The seq scan
would win on a machine with nothing in the kernel cache, but the index
scan will usually win if everything is already cached.  Again, these are
broad strokes of the brush.  It all depends on the actual % of a table
returned and a few other things...