Thread: Kernel cache vs shared_buffers

Kernel cache vs shared_buffers

From
"Michael van Rooyen"
Date:
We're in the process of upgrading our db server's memory from 2GB to 8GB to
improve access performance.  This is a dedicated dual Xeon db server not
running any significant non-db processes.  Our database size on disk is
~11GB, although we expect it to grow to ~20GB.  Much of this data is
inactive and seldom used or accessed.  Read access is our primary concern as
our data is normalized and retrieveing a complete product requires many
reads to associated tables and indexes to put it all together.

Our larger tables have 10-20 rows per disk block, I assume that most blocks
will have a mix of frequently accessed and inactive rows.  Of course, we
wouldn't want to double-cache, so my inclination would be to either give
most of the memory to shared_buffers, or to leave that small and let the
kernel (Linux 2.6x) do the buffering.

I have no idea regarding the inner working of the pg's shared cache, but
what I would like to find out is whether it is table-row-based, or
disk-block-based.  In the case of it being disk-block based, my inclination
would be to let the kernel do the buffering.  In the case of the cache being
table-row-based, I would expect it to be much more space-efficient and I
would be inclined to give the memory to the pg.  In that case, is it
feasible to set shared_buffers to something like 500000 x 8k blocks?  We
make extensive use of indexes on the larger tables and would seldom, if
ever, do sequential scans.

Any comments or advice would be great!

Michael.


Re: Kernel cache vs shared_buffers

From
Heikki Linnakangas
Date:
Michael van Rooyen wrote:
> I have no idea regarding the inner working of the pg's shared cache, but
> what I would like to find out is whether it is table-row-based, or
> disk-block-based.

It's block based.

> In the case of it being disk-block based, my
> inclination would be to let the kernel do the buffering.  In the case of
> the cache being table-row-based, I would expect it to be much more
> space-efficient and I would be inclined to give the memory to the pg.
> In that case, is it feasible to set shared_buffers to something like
> 500000 x 8k blocks?  We make extensive use of indexes on the larger
> tables and would seldom, if ever, do sequential scans.

A common rule of thumb people quote here is to set shared_buffers to 1/4
of available RAM, and leave the rest for OS cache. That's probably a
good configuration to start with.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Kernel cache vs shared_buffers

From
"Jim C. Nasby"
Date:
On Sat, May 12, 2007 at 03:28:45PM +0100, Heikki Linnakangas wrote:
> >In the case of it being disk-block based, my
> >inclination would be to let the kernel do the buffering.  In the case of
> >the cache being table-row-based, I would expect it to be much more
> >space-efficient and I would be inclined to give the memory to the pg.
> >In that case, is it feasible to set shared_buffers to something like
> >500000 x 8k blocks?  We make extensive use of indexes on the larger
> >tables and would seldom, if ever, do sequential scans.
>
> A common rule of thumb people quote here is to set shared_buffers to 1/4
> of available RAM, and leave the rest for OS cache. That's probably a
> good configuration to start with.

If you really care about performance it would be a good idea to start
with that and do your own benchmarking. Much of the consensus about
shared_buffers was built up before 8.0, and the shared buffer management
we have today looks nothing like what was in 7.4. You might find that
shared_buffers = 50% of memory or even higher might perform better for
your workload.

If you do find results like that, please share them. :)
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: Kernel cache vs shared_buffers

From
"Harald Armin Massa"
Date:
> A common rule of thumb people quote here is to set shared_buffers to 1/4
> of available RAM, and leave the rest for OS cache. That's probably a
> good configuration to start with.
>

And just for the record: This rule of thumb does NOT apply to
PostgreSQL on Windows. My current rule of thumb on Windows: set
shared_buffers to minimum * 2
Adjust effective_cache_size to the number given as "system cache"
within the task manager.

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
fx 01212-5-13695179
-
Python: the only language with more web frameworks than keywords.

Re: Kernel cache vs shared_buffers

From
Heikki Linnakangas
Date:
Harald Armin Massa wrote:
>> A common rule of thumb people quote here is to set shared_buffers to 1/4
>> of available RAM, and leave the rest for OS cache. That's probably a
>> good configuration to start with.
>
> And just for the record: This rule of thumb does NOT apply to
> PostgreSQL on Windows. My current rule of thumb on Windows: set
> shared_buffers to minimum * 2
> Adjust effective_cache_size to the number given as "system cache"
> within the task manager.

Why?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Kernel cache vs shared_buffers

From
"Harald Armin Massa"
Date:
Heikki,


> > PostgreSQL on Windows. My current rule of thumb on Windows: set
> > shared_buffers to minimum * 2
> > Adjust effective_cache_size to the number given as "system cache"
> > within the task manager.
>
> Why?

I tried with shared_buffers = 50% of available memory, and with 30% of
available memory, and the thoughput on complex queries stalled or got
worse.

I lowered shared_buffers to minimum, and started raising
effective_cache_size, and performance on real world queries improved.
pg_bench did not fully agree when simulating large numbers concurrent
queries.

So I tried setting shared_buffers between minimum and 2.5*minimum, and
pg_bench speeds recovered and real world queries did similiar.

My understanding is that shared_buffers are realised as memory mapped
file in win32; and that they are only usually kept in memory. Maybe I
understood that wrong.

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
fx 01212-5-13695179
-
Python: the only language with more web frameworks than keywords.

Re: Kernel cache vs shared_buffers

From
Magnus Hagander
Date:
Harald Armin Massa wrote:
> Heikki,
>
>
>> > PostgreSQL on Windows. My current rule of thumb on Windows: set
>> > shared_buffers to minimum * 2
>> > Adjust effective_cache_size to the number given as "system cache"
>> > within the task manager.
>>
>> Why?
>
> I tried with shared_buffers = 50% of available memory, and with 30% of
> available memory, and the thoughput on complex queries stalled or got
> worse.
>
> I lowered shared_buffers to minimum, and started raising
> effective_cache_size, and performance on real world queries improved.
> pg_bench did not fully agree when simulating large numbers concurrent
> queries.
>
> So I tried setting shared_buffers between minimum and 2.5*minimum, and
> pg_bench speeds recovered and real world queries did similiar.
>
> My understanding is that shared_buffers are realised as memory mapped
> file in win32; and that they are only usually kept in memory. Maybe I
> understood that wrong.

Almost. It's a memory mapped region backed by the system pagefile.

That said, it would be good to try to figure out *why* this is
happening. It's been on my list of things to do to run checks with the
profiler (now that we can ;-) with the msvc stuff) and try to figure out
where it's slowing down. It could be as simple as that there's much more
overhead trying to access the shared memory from different processes
than we're used to on other platforms.

//Magnus