Thread: basic question (shared buffers vs. effective cache size)

basic question (shared buffers vs. effective cache size)

From

"Sally Sally"

Date:

10 May 2004, 12:45:28

I have a very basic question on the two parameters shared buffers and
effective cache size. I have read articles on what each is about etc. But I
still think I don't quite grasp what these settings mean (especially in
relation to each other). Since these two settings seem crucial for
performance can somebody explain to me the relationship/difference between
these two settings and how they deal with shared memory.
Thanks much
Sally

_________________________________________________________________
Is your PC infected? Get a FREE online computer virus scan from McAfee�
Security. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963

Re: basic question (shared buffers vs. effective cache

From

"scott.marlowe"

Date:

10 May 2004, 13:29:15

On Mon, 10 May 2004, Sally Sally wrote:

> I have a very basic question on the two parameters shared buffers and
> effective cache size. I have read articles on what each is about etc. But I
> still think I don't quite grasp what these settings mean (especially in
> relation to each other). Since these two settings seem crucial for
> performance can somebody explain to me the relationship/difference between
> these two settings and how they deal with shared memory.

shared_buffers is the amount of space postgresql can use as temp memory
space to put together result sets.  It is not intended as a cache, and
once the last backend holding open a buffer space shuts down, the
information in that buffer is lost.  If you're working on several large
data sets in a row, the buffer currently operates FIFO when dumping old
references to make room for the incoming data.

Contrast this to the linux or BSD kernels, which cache everything they can
in the "spare" memory of the computer.  This cache is maintained until
some other process requests enough memory to make the kernel give up some
of the otherwise unused memory, or something new pushes out something old.
A lot of tuning has gone into this cache to make it fast when handling
large amounts of data, and it caches, of course, more than just
postgresql's data, it caches all the data for everything hitting the hard
drives.  If you're on a machine that is mostly a postgresql box, then it
is likely that most of this memory is being used for postgresql, but on a
box running apache / ldap / postgresql / etc... the percentage used for
postgresql will be lower, maybe 75% or so.

The important point here is that caching is the job of the kernel,
buffering is the job of the database.  I.e. holding onto data that got
accessed 30 minutes ago is the kernel's job, holding onto data that we're
processing RIGHT NOW is postgresql's job.

Because of this splitting of the jobs as it were, it is usually best to
have postgresql's buffers be a fraction of the size of the kernel caches
on the machine, otherwise it is quite likely that all calls for data not
in postgresql's buffers will result in a disk read, not a kernel cache
hit, since ramping up postgresql's buffers to be as large or larger than
the kernel cache will result in the data you need almost being guaranteed
to be flushed out of the kernel by the time it's been flushed out of
postgresql.  Since Postgresql's buffer access methods are inherently
slower than those of the kernel, and they don't seem to scale real well,
allocating too much shared_buffers is a "bad thing".

Now, effective_cache_size sets nothing other than itself.  I.e. it
allocates nothing in memory.  It is pretty much a big course setting knob
that tells the planner about how much memory the kernel is using to cache
its data, and therefore lets the planner make a rough guesstimate of how
likely an access is to hit memory cache versus having to hit the hard
drives.  Since random accesses in memory are only slightly more expensive
than seq scans in memory, higher effective_cache_size favors random
accesses.

Re: basic question (shared buffers vs. effective cache

From

Jack Orenstein

Date:

10 May 2004, 14:29:01

scott.marlowe wrote:
>
> shared_buffers is the amount of space postgresql can use as temp memory
> space to put together result sets.  It is not intended as a cache, and
> once the last backend holding open a buffer space shuts down, the
> information in that buffer is lost.  If you're working on several large
> data sets in a row, the buffer currently operates FIFO when dumping old
> references to make room for the incoming data.
>
> Contrast this to the linux or BSD kernels, which cache everything they can
> in the "spare" memory of the computer.  This cache is maintained until
> some other process requests enough memory to make the kernel give up some
> of the otherwise unused memory, or something new pushes out something old.

Do checkpoints operate on the Postgres-managed buffer, or the kernel-managed
cache?

Jack Orenstein

Re: basic question (shared buffers vs. effective cache size)

From

"Sally Sally"

Date:

10 May 2004, 15:12:36

Thanks much Scott, makes sense now.

You said

"Now, effective_cache_size sets nothing other than itself.  I.e. it
allocates nothing in memory.  It is pretty much a big course setting knob
that tells the planner about how much memory the kernel is using to cache
its data"

So how can you know how much memory the kernel is actually using to cache
(Solaris)? and specifically is it something you can set/change and also
watch as it is happening with some command like top (Linux shows how much is
cached but I don't see that in Solaris).
Thanks
Sally

_________________________________________________________________
Stop worrying about overloading your inbox - get MSN Hotmail Extra Storage!
http://join.msn.com/?pgmarket=en-us&page=hotmail/es2&ST=1/go/onm00200362ave/direct/01/

Re: basic question (shared buffers vs. effective cache

From

"scott.marlowe"

Date:

10 May 2004, 16:58:14

On Mon, 10 May 2004, Jack Orenstein wrote:

> scott.marlowe wrote:
> >
> > shared_buffers is the amount of space postgresql can use as temp memory
> > space to put together result sets.  It is not intended as a cache, and
> > once the last backend holding open a buffer space shuts down, the
> > information in that buffer is lost.  If you're working on several large
> > data sets in a row, the buffer currently operates FIFO when dumping old
> > references to make room for the incoming data.
> >
> > Contrast this to the linux or BSD kernels, which cache everything they can
> > in the "spare" memory of the computer.  This cache is maintained until
> > some other process requests enough memory to make the kernel give up some
> > of the otherwise unused memory, or something new pushes out something old.
>
> Do checkpoints operate on the Postgres-managed buffer, or the kernel-managed
> cache?

Checkpoints consist of writing the postgres managed data in the buffers to
the drive, which is cached by the kernel, then issuing an fsync to tell
the kernel to write it out to disk, so it affects both.

Re: basic question (shared buffers vs. effective cache

From

"scott.marlowe"

Date:

10 May 2004, 17:02:40

On Mon, 10 May 2004, Sally Sally wrote:

> Thanks much Scott, makes sense now.
>
> You said
>
> "Now, effective_cache_size sets nothing other than itself.  I.e. it
> allocates nothing in memory.  It is pretty much a big course setting knob
> that tells the planner about how much memory the kernel is using to cache
> its data"
>
> So how can you know how much memory the kernel is actually using to cache
> (Solaris)? and specifically is it something you can set/change and also
> watch as it is happening with some command like top (Linux shows how much is
> cached but I don't see that in Solaris).

I'm not sure.  I think vmstat can tell you that.