Thread: shared_buffers performance

shared_buffers performance

From
Gaetano Mendola
Date:
Hi all,
I started to do some performance tests (using pgbench) in order to
estimate the DRBD impact on our servers, my plan was to perform some
benchmarks without DRBD in order to compare the same benchmark with
DRBD.
I didn't perform yet the benchmark with DRBD and I'm already facing
something I can not explain (I performed at the moment only reads test).

I'm using postgres 8.2.3 on Red Hat compiled with GCC 3.4.6.

I'm using pgbench with scaling factor with a range [1:500], my server
has 4 cores so I'm trying with 16 client and 4000 transaction per
client: pgbench -t 4000 -c 16 -S db_perf. I did 3 session using 3 different
values of shared_buffers: 64MB, 256MB, 512MB  and my server has 2GB.

The following graph reports the results:

http://img84.imageshack.us/my.php?image=totalid7.png

as you can see using 64MB as value for shared_buffers I'm obtaining better
results. Is this something expected or I'm looking in the wrong direction?
I'm going to perform same tests without using the -S option in pgbench but
being a time expensive operation I would like to ear your opinion first.

Regards
Gaetano Mendola





Re: shared_buffers performance

From
Gregory Stark
Date:
"Gaetano Mendola" <mendola@gmail.com> writes:

> The following graph reports the results:
>
> http://img84.imageshack.us/my.php?image=totalid7.png

That's a *fascinating* graph.

It seems there are basically three domains.

The small domain where the database fits in shared buffers -- though actually
this domain seems to hold until the accounts table is about 1G so maybe it's
more that the *indexes* fit in memory. Here larger shared buffers do clearly
win.

The transition domain where performance drops dramatically as the database
starts to not fit in shared buffers but does still fit in filesystem cache.
Here every megabyte stolen from the filesystem cache makes a *huge*
difference. At a scale factor of 120 or so you're talking about a factor of 4
between each of the shared buffer sizes.

The large domain where the database doesn't fit in filesystem cache. Here it
doesn't make a large difference but the more buffers duplicated between
postgres and the filesystem cache the lower the overall cache effectiveness.

If we used something like either mmap or directio to avoid the double
buffering we would be able to squeeze these into a single curve, as well as
push the dropoff slightly to the right. In theory.

In practice it would depend on the OS's ability to handle page faults
efficiently in the mmap case, and our ability to do read-ahead and cache
management in the directio case. And it would be a huge increase in complexity
for Postgres and a push into a direction which isn't our "core competency". We
might find that while in theory it should perform better our code just can't
keep up with Linux's and it doesn't.

I'm curious about the total database size as a for each of the scaling factors
as well as the total of the index sizes. And how much memory Linux says is
being used for filesystem buffers.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's PostGIS support!

Re: shared_buffers performance

From
Richard Huxton
Date:
Gregory Stark wrote:
> "Gaetano Mendola" <mendola@gmail.com> writes:
>
>> The following graph reports the results:
>>
>> http://img84.imageshack.us/my.php?image=totalid7.png
>
> That's a *fascinating* graph.

It is, isn't it? Thanks Gaetano.

> It seems there are basically three domains.
>
> The small domain where the database fits in shared buffers -- though actually
> this domain seems to hold until the accounts table is about 1G so maybe it's
> more that the *indexes* fit in memory. Here larger shared buffers do clearly
> win.

I think this is actually in two parts - you can see it clearly on the
red trace (64MB), less so on the green (256MB) and not at all on the
blue (512MB). Presumably the left-hand steeper straight-line decline
starts with the working-set in shared-buffers, and the "knee" is where
we're down to just indexes in shared-buffers.

With the blue I guess you just get the first part, because by the time
you're overflowing shared-buffers, you've not got enough disk-cache to
take up the slack for you.

I wonder what difference 8.3 makes to this?

--
   Richard Huxton
   Archonet Ltd

Re: shared_buffers performance

From
Greg Smith
Date:
On Mon, 14 Apr 2008, Gaetano Mendola wrote:

> I'm using postgres 8.2.3 on Red Hat compiled with GCC 3.4.6.

8.2.3 has a performance bug that impacts how accurate pgbench results are;
you really should be using a later version.

> http://img84.imageshack.us/my.php?image=totalid7.png
> as you can see using 64MB as value for shared_buffers I'm obtaining
> better results.

I'm assuming you've read my scaling article at
http://www.westnet.com/~gsmith/content/postgresql/pgbench-scaling.htm
since you're using the graph template I suggest there.

If you look carefully at your results, you are getting better results for
higher shared_buffers values in the cases where performance is memory
bound (the lower scale numbers).  Things reverse so that more buffers
gives worse performance only when your scale >100.  I wouldn't conclude
too much from that.  The pgbench select test is doing a low-level
operation that doesn't benefit as much from having more memory available
to PostgreSQL instead of the OS as a real-world workload will.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: shared_buffers performance

From
Greg Smith
Date:
On Mon, 14 Apr 2008, Gregory Stark wrote:

> I'm curious about the total database size as a for each of the scaling factors
> as well as the total of the index sizes.

That's all in a table at
http://www.westnet.com/~gsmith/content/postgresql/pgbench-scaling.htm

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: shared_buffers performance

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> The transition domain where performance drops dramatically as the database
> starts to not fit in shared buffers but does still fit in filesystem cache.

It looks to me like the knee comes where the DB no longer fits in
filesystem cache.  What's interesting is that there seems to be no
synergy at all between shared_buffers and the filesystem cache.
Ideally, very hot pages would stay in shared buffers and drop out of the
kernel cache, allowing you to use a database approximating all-of-RAM
before you hit the performance wall.  It's clear that in this example
that's not happening, or at least that only a small part of shared
buffers isn't getting duplicated in filesystem cache.

Of course, that's because pgbench reads a randomly-chosen row of
"accounts" in each transaction, so that there's exactly zero locality
of access.  A more realistic workload would probably have a Zipfian
distribution of account number touches, and might look a little better
on this type of test.

            regards, tom lane

Re: shared_buffers performance

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> The transition domain where performance drops dramatically as the database
>> starts to not fit in shared buffers but does still fit in filesystem cache.
>
> It looks to me like the knee comes where the DB no longer fits in
> filesystem cache.

That does seem to make a lot more sense. I think I misread the units of the
size of the accounts table. Reading it again it seems to be in the 1.5G-2G
range for the transition which with indexes and other tables might be starting
to stress the filesystem cache -- though it still seems a little low.

I think if I squint I can see another dropoff at the very small scaling
numbers. That must be the point where the database is comparable to the shared
buffers size. Except then I would expect the green and blue curves to be
pushed to the right a bit rather than just havin a shallower slope.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

Re: shared_buffers performance

From
Greg Smith
Date:
On Mon, 14 Apr 2008, Tom Lane wrote:

> Ideally, very hot pages would stay in shared buffers and drop out of the
> kernel cache, allowing you to use a database approximating all-of-RAM
> before you hit the performance wall.

With "pgbench -S", the main hot pages that get elevated usage counts and
cling persistantly to shared buffers are those holding data from the
primary key on the accounts table.

Here's an example of what the buffer cache actually has after running
"pgbench -S -c 8 -t 10000 pgbench" on a system with shared_buffers=256MB
and a total of 2GB of RAM.  Database scale is 100, so there's
approximately 1.5GB worth of database, mainly a 1.3GB accounts table and
171MB of primary key on accounts:

relname       |buffered| buffers % | % of rel
accounts      | 306 MB | 65.3      | 24.7
accounts pkey | 160 MB | 34.1      | 93.2

relname       | buffers | usage
accounts      | 10223   | 0
accounts      | 25910   | 1
accounts      | 2825    | 2
accounts      | 214     | 3
accounts      | 14      | 4
accounts pkey | 2173    | 0
accounts pkey | 5392    | 1
accounts pkey | 5086    | 2
accounts pkey | 3747    | 3
accounts pkey | 2296    | 4
accounts pkey | 1756    | 5

This example and the queries to produce that summary are all from the
"Inside the PostgreSQL Buffer Cache" talk on my web page.

For this simple workload, if you can fit the main primary key in shared
buffers that helps, but making that too large takes away memory that could
be more usefully given to the OS to manage.  The fact that you can start
to suffer from double-buffering (where the data is in the OS filesystem
cache and shared_buffers) when making shared_buffers too large on a
benchmark workload is interesting.  But I'd suggest considering the real
application, rather than drawing a conclusion about shared_buffers sizing
based just on that phenomenon.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: shared_buffers performance

From
Gaetano Mendola
Date:
Greg Smith wrote:
> On Mon, 14 Apr 2008, Gaetano Mendola wrote:
>
>> I'm using postgres 8.2.3 on Red Hat compiled with GCC 3.4.6.
>
> 8.2.3 has a performance bug that impacts how accurate pgbench results
> are; you really should be using a later version.

Thank you, I will give it a shot and performe some tests to see if
they change a lot, in case I will repeat the entire benchmarks.

Regards
Gaetano Mendola

Re: shared_buffers performance

From
Gaetano Mendola
Date:
Greg Smith wrote:
> On Mon, 14 Apr 2008, Gaetano Mendola wrote:
>
>> I'm using postgres 8.2.3 on Red Hat compiled with GCC 3.4.6.
>
> 8.2.3 has a performance bug that impacts how accurate pgbench results
> are; you really should be using a later version.
>
>> http://img84.imageshack.us/my.php?image=totalid7.png
>> as you can see using 64MB as value for shared_buffers I'm obtaining
>> better results.
>
> I'm assuming you've read my scaling article at
> http://www.westnet.com/~gsmith/content/postgresql/pgbench-scaling.htm
> since you're using the graph template I suggest there.
>

Yes I was basically inspired from that page, my true goal is not to study
the effect of shared_buffers (this was a side effect) but to study the
performance lose using DRBD on our server. I'm producing similar graph
using pgperf without -S, I will post them as soon they are ready.

Regards
Gaetano Mendola

Re: shared_buffers performance

From
Gaetano Mendola
Date:
Gaetano Mendola wrote:
> Hi all,
> I started to do some performance tests (using pgbench) in order to
> estimate the DRBD impact on our servers, my plan was to perform some
> benchmarks without DRBD in order to compare the same benchmark with
> DRBD.
> I didn't perform yet the benchmark with DRBD and I'm already facing
> something I can not explain (I performed at the moment only reads test).
>
> I'm using postgres 8.2.3 on Red Hat compiled with GCC 3.4.6.
>
> I'm using pgbench with scaling factor with a range [1:500], my server
> has 4 cores so I'm trying with 16 client and 4000 transaction per
> client: pgbench -t 4000 -c 16 -S db_perf. I did 3 session using 3 different
> values of shared_buffers: 64MB, 256MB, 512MB  and my server has 2GB.
>
> The following graph reports the results:
>
> http://img84.imageshack.us/my.php?image=totalid7.png
>
> as you can see using 64MB as value for shared_buffers I'm obtaining better
> results. Is this something expected or I'm looking in the wrong direction?
> I'm going to perform same tests without using the -S option in pgbench but
> being a time expensive operation I would like to ear your opinion first.

I have complete today the other benchmarks using pgbench in write mode as well,
and the following graph resumes the results:

http://img440.imageshack.us/my.php?image=totalwbn0.png

what I can say here the trend is the opposite seen on the read only mode as
increasing the shared_buffers increases the TPS.

I still didn't upgrade to 8.2.7 as suggested by Greg Smith because I would like
to compare the results obtained till now with the new one (simulations running
while I write) using postgres on a "DRBD partition"; sure as soon the current
tests terminate I will upgrade postgres.

If you have any suggestions on what you would like to see/know, just let me know.

Regards
Gaetano Mendola



Vacuum settings

From
dforums
Date:
Hello,

I need wo advice on vacuum settings.

I have a quad core X5355 @ 2.66GHz with 8 Go of memory

1Q) Why autovaccum  does not work, I have set the value to on in
postgresql.conf but when the server start it's still off !!!!

2Q) Here are my settings for vacuum, could you help me to optimise those
settings, at the moment the vacuum analyse sent every night is taking
around 18 h to run, which slow down the server performance.

# - Cost-Based Vacuum Delay -

vacuum_cost_delay = 5                   # 0-1000 milliseconds
vacuum_cost_page_hit = 1000             # 0-10000 credits
vacuum_cost_page_miss = 1000            # 0-10000 credits
vacuum_cost_page_dirty = 120            # 0-10000 credits
vacuum_cost_limit = 20          # 0-10000 credits

# - Background writer -

bgwriter_delay = 50                     # 10-10000 milliseconds between
rounds
bgwriter_lru_percent = 1.0              # 0-100% of LRU buffers
scanned/round
bgwriter_lru_maxpages = 25              # 0-1000 buffers max written/round
bgwriter_all_percent = 0.333            # 0-100% of all buffers
scanned/round
bgwriter_all_maxpages = 50              # 0-1000 buffers max written/round



Thanks in advance for your helps

Regards

David

Re: Vacuum settings

From
Alvaro Herrera
Date:
dforums wrote:
> Hello,
>
> I need wo advice on vacuum settings.
>
> I have a quad core X5355 @ 2.66GHz with 8 Go of memory
>
> 1Q) Why autovaccum  does not work, I have set the value to on in
> postgresql.conf but when the server start it's still off !!!!

You need to turn stats_row_level on too.

> # - Cost-Based Vacuum Delay -
>
> vacuum_cost_delay = 5                   # 0-1000 milliseconds
> vacuum_cost_page_hit = 1000             # 0-10000 credits
> vacuum_cost_page_miss = 1000            # 0-10000 credits
> vacuum_cost_page_dirty = 120            # 0-10000 credits
> vacuum_cost_limit = 20          # 0-10000 credits

The cost are all too high and the limit too low.  I suggest resetting to
the default values, and figuring out a reasonable delay limit (your
current 5ms value seems a bit too low, but I think in most cases 10ms is
the practical limit due to sleep granularity in the kernel.  In any
case, since the other values are all wrong I suggest just setting it to
10ms and seeing what happens).

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Vacuum settings

From
Guillaume Cottenceau
Date:
dforums <dforums 'at' vieonet.com> writes:

> 2Q) Here are my settings for vacuum, could you help me to optimise
> those settings, at the moment the vacuum analyse sent every night is
> taking around 18 h to run, which slow down the server performance.

It's a lot of time for a daily job (and it is interesting to
vacuum hot tables more often than daily). With typical settings,
it's probable that autovacuum will run forever (e.g. at the end
of run, another run will already be needed). You should first
verify you don't have bloat in your tables (a lot of dead rows) -
bloat can be created by too infrequent vacuuming and too low FSM
settings[1]. To fix the bloat, you can dump and restore your DB
if you can afford interrupting your application, or use VACUUM
FULL if you can afford blocking your application (disclaimer:
many posters here passionately disgust VACUUM FULL and keep on
suggesting the use of CLUSTER).

Ref:
[1] to say whether you have bloat, you can use
    contrib/pgstattuple (you can easily add it to a running
    PostgreSQL). If the free_percent reported for interesting
    tables is large, and free_space is large compared to 8K, then
    you have bloat;

    another way is to dump your database, restore it onto another
    database, issue VACUUM VERBOSE on a given table on both
    databases (in live, and on the restore) and compare the
    reported number of pages needed. The difference is the
    bloat.

      live=# VACUUM VERBOSE interesting_table;
          [...]
      INFO:  "interesting_table": found 408 removable, 64994 nonremovable row versions in 4395 pages

      restored=# VACUUM VERBOSE interesting_table;
          [...]
      INFO:  "interesting_table": found 0 removable, 64977 nonremovable row versions in 628 pages

    => (4395-628)*8/1024.0 MB of bloat

    (IIRC, this VACUUM output is for 7.4, it has changed a bit
    since then)

--
Guillaume Cottenceau

Re: shared_buffers performance

From
Decibel!
Date:
On Apr 14, 2008, at 3:31 PM, Tom Lane wrote:
> Gregory Stark <stark@enterprisedb.com> writes:
>> The transition domain where performance drops dramatically as the
>> database
>> starts to not fit in shared buffers but does still fit in
>> filesystem cache.
>
> It looks to me like the knee comes where the DB no longer fits in
> filesystem cache.  What's interesting is that there seems to be no
> synergy at all between shared_buffers and the filesystem cache.
> Ideally, very hot pages would stay in shared buffers and drop out
> of the
> kernel cache, allowing you to use a database approximating all-of-RAM
> before you hit the performance wall.  It's clear that in this example
> that's not happening, or at least that only a small part of shared
> buffers isn't getting duplicated in filesystem cache.

I suspect that we're getting double-buffering on everything because
every time we dirty a buffer and write it out the OS is considering
that as access, and keeping that data in it's cache. It would be
interesting to try an overcome that and see how it impacts things.
With our improvement in checkpoint handling, we might be able to just
write via DIO... if not maybe there's some way to tell the OS to
buffer the write for us, but target that data for removal from cache
as soon as it's written.

> Of course, that's because pgbench reads a randomly-chosen row of
> "accounts" in each transaction, so that there's exactly zero locality
> of access.  A more realistic workload would probably have a Zipfian
> distribution of account number touches, and might look a little better
> on this type of test.

--
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



Attachment