Thread: FW: [ADMIN] Shared_buffers and kernel parameters, tuning

FW: [ADMIN] Shared_buffers and kernel parameters, tuning

From
Howard Oblowitz
Date:


> -----Original Message-----
> From:    Lending, Rune [SMTP:rune.lending@inpoc.com]
> Sent:    05 June 2003 10:11
> To:    'pgsql-admin@postgresql.org'
> Subject:    [ADMIN] Shared_buffers and kernel parameters, tuning
>
> After days of searching and testing I have come up with this way of
> configuring our postgresql 7.2 db.
> I have not yet increased my shared_buffer as high as suggested below on
> our
> prod machine (24-7 high traffic), but after testing on our dev machines
> this allows at least the databse to start up. It is very difficult to test
> the actual performance since there is a hugh difference in traffic on dev
> and prod.
> Here is what we have:
>
> We have a high traffic system with a database described as followed:
>
> 4 pentium 3 633 cpu's
> 3753456 kB RAM (3.5 Gb)
> Red Hat Linux 7.2
> postgresql 7.2
>
>
> What I like to do is:
>
> /proc/sys/kernel/sem=250 32000 100 500 (after advise from forum/docs)
> /proc/sys/kernel/shmmax=1921769472 (RAM / 2 * 1024 - this
> piece of math I got from some of oracle's support pages (ooopps) actually
> )
>
> /proc/sys/kernel/shmall=1921769472 (RAM / 2 * 1024)
>
> in postgresql.conf:
>
> shared_buffers = 117248 (shmmax / 2 / 1024 / 8 ) This I got from this
> forum.
>
>
> Does this sound right or am I totally out of bounds here? I have, as said
> before done this on our dev macine ( a lot smaller machine ), but it would
> be nice with some feedback ..
>
> Thanx in advance for response.
>
> /rune
>
>

Re: FW: [ADMIN] Shared_buffers and kernel parameters, tuning

From
"Shridhar Daithankar"
Date:
On 9 Jun 2003 at 11:18, Howard Oblowitz wrote:
> > in postgresql.conf:
> >
> > shared_buffers = 117248 (shmmax / 2 / 1024 / 8 ) This I got from this
> > forum.
> >
> >
> > Does this sound right or am I totally out of bounds here? I have, as said
> > before done this on our dev macine ( a lot smaller machine ), but it would
> > be nice with some feedback ..

With that kind of RAM and that kind of shared buffers setting, you must set
effective OS cache size so that postgresql can calculate when to flush buffers.

While tuning database, it always help to pin down the target first and then try
to reach it. If you could let us know what performance you are expecting out of
this machine and for what kind of load in terms of concurrent users, database
size and usage pattern etc., that would help.

HTH

Bye
 Shridhar

--
QOTD:    "I'm just a boy named 'su'..."


Re: FW: [ADMIN] Shared_buffers and kernel parameters, tuning

From
Josh Berkus
Date:
Rune,

> > shared_buffers = 117248 (shmmax / 2 / 1024 / 8 ) This I got from this
> > forum.
> > Does this sound right or am I totally out of bounds here? I have, as said

Out of bounds, through no fault of your own .... I'm still working on
documentation for this.  However, let me qoute the upcoming supplimentary
docs:

SHARED_BUFFERS
Sets the size of Postgres' memory buffer where queries are held before being
fed into the Kernel buffer of the host system.  It's very important to
remember that this is only a holding area, and not the total memory available
for the server.  As such, resist the urge to set this number to a large
portion of your RAM, as this will actually degrade performance on many OSes.
Members of the pgsql-performance mailing list have found useful values in the
range of 1000-6000, depending on available RAM, database size, and number of
concurrent queries. No one has yet reported positive results for any number
over 6000.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: FW: [ADMIN] Shared_buffers and kernel parameters, tuning

From
"Shridhar Daithankar"
Date:
On 10 Jun 2003 at 8:46, Josh Berkus wrote:

> SHARED_BUFFERS
> Sets the size of Postgres' memory buffer where queries are held before being
> fed into the Kernel buffer of the host system.  It's very important to
> remember that this is only a holding area, and not the total memory available
> for the server.  As such, resist the urge to set this number to a large
> portion of your RAM, as this will actually degrade performance on many OSes.
> Members of the pgsql-performance mailing list have found useful values in the
> range of 1000-6000, depending on available RAM, database size, and number of
> concurrent queries. No one has yet reported positive results for any number
> over 6000.

I was planning to document postgresql.conf with little hints, enough to get one
started, drawing inspiration from lilo.conf of debian, which is beautiful to
say the least..

I haven't find enough time to do that. But I will do it.. But I don't know all
the parameters enough. Of course I will post a starter but any input would be
welcome.

Point is we should be able to say RTFC rather than RTFA as that would get a DBA
single place to look at.  I agree that no amount of simplicity is enough but
still..:-)


Bye
 Shridhar

--
Brooke's Law:    Whenever a system becomes completely defined, some damn fool
discovers something which either abolishes the system or    expands it beyond
recognition.


Re: FW: [ADMIN] Shared_buffers and kernel parameters, tuning

From
Josh Berkus
Date:
Shridhar,

> I was planning to document postgresql.conf with little hints, enough to get
> one started, drawing inspiration from lilo.conf of debian, which is
> beautiful to say the least..

This week, I am:

1) Submiting a patch to re-organize postgresql.conf.sample and "Run-Time
Configuration" docs in a more logical order.

2) Finishing up a massive OpenOffice.org spreadsheet full information on each
postgresql.conf option, including anecdotal advice from this list.

Next week, I will try to turn the spreadsheet into a series of HTML pages for
Techdocs.

I would be thrilled to have your help on:
a) editing + augmenting the spreadsheet contents
b) transforming it into HTML pages.


--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: FW: [ADMIN] Shared_buffers and kernel parameters, tuning

From
Bruno Wolff III
Date:
On Tue, Jun 10, 2003 at 21:26:49 +0530,
  Shridhar Daithankar <shridhar_daithankar@persistent.co.in> wrote:
>
> Point is we should be able to say RTFC rather than RTFA as that would get a DBA
> single place to look at.  I agree that no amount of simplicity is enough but
> still..:-)

I believe there was discussion a couple of months ago that came to a
different conclusion. There was concern about having documenation that
wasn't in the documentation and have to versions of essentially the
same information that both need to be maintained.

Re: FW: [ADMIN] Shared_buffers and kernel parameters, tuning

From
Vincent van Leeuwen
Date:
On 2003-06-10 08:46:21 -0700, Josh Berkus wrote:
> Rune,
>
> > > shared_buffers = 117248 (shmmax / 2 / 1024 / 8 ) This I got from this
> > > forum.
> > > Does this sound right or am I totally out of bounds here? I have, as said
>
> Out of bounds, through no fault of your own .... I'm still working on
> documentation for this.  However, let me qoute the upcoming supplimentary
> docs:
>
> SHARED_BUFFERS
> Sets the size of Postgres' memory buffer where queries are held before being
> fed into the Kernel buffer of the host system.  It's very important to
> remember that this is only a holding area, and not the total memory available
> for the server.  As such, resist the urge to set this number to a large
> portion of your RAM, as this will actually degrade performance on many OSes.
> Members of the pgsql-performance mailing list have found useful values in the
> range of 1000-6000, depending on available RAM, database size, and number of
> concurrent queries. No one has yet reported positive results for any number
> over 6000.
>

We run a dual P3 1GHz server, running Debian Linux (stable), kernel 2.4.20,
with a 5-disk (10K rpm) RAID 5 array (ICP Vortex controller) and 4GB RAM, most
of which is used for filesystem cache. This server runs Postgresql 7.3.2
exclusively, with a database of roughly 7GB. This database is used for a very
busy community website, running an enormous amount of small and simple
select/update/insert queries and a number of complex select queries, to search
through all kinds of data.

This server isn't running postgres that long, and we're still trying to figure
out the best configuration parameters for the highest possible performance.
Shared_buffers was one of the first things we looked at. We've tested with
shared_buffers at 1024, 8192, 32768 and 131072. So far, performance with
shared_buffers set at 32768 was the best we could attain. 8192 and 131072 came
out roughly equal. 1024 was miserable.

(yay, 3 lines in a row starting with the word 'shared_buffers'! ;))

Also, there was a very strong relation between the shared_buffers setting and
the amount of cpu time spent in kernelland. Currently, the server spends
roughly 20% of it's time in kernelspace (according to vmstat). When
shared_buffers was 8192, this went up to about 30%.

I don't have any hard performance statistics, we just threw the site live with
different settings and watched the load on all servers, and the amount of
requests/second our webservers could generate (the bottleneck is the
postgresql server, not the webservers).


I'm really eager for any useful tips regarding the various cost settings. I've
been following this list for months and read through a large portion of the
archives, but noone has been able to do more than handwaving around certain
numbers, which are close to the defaults anyway.

Currently, we have the following settings:
shared_buffers = 32768
max_fsm_relations = 100
max_fsm_pages = 100000
sort_mem = 16384
vacuum_mem = 131072
effective_cache_size = 327680
random_page_cost = 1.5
cpu_tuple_cost = 0.005
#cpu_index_tuple_cost = 0.001 (default)
#cpu_operator_cost = 0.0025 (default)

Halving the cpu_tuple_cost has given a very impressive performance boost
(performance roughly doubled). I'm not sure why, because the plans of the
large queries I was checking haven't changed as far as I can see, but maybe
some smaller queries I didn't bother to check are using a different plan now.
Although I was quite sure those smaller queries were all using the correct
indexes etc before the change anyway.

Just to be absolutely sure: all *_cost parameters only influence the chosen
plan, right? There is absolutely nothing else influenced which doesn't show up
in an EXPLAIN ANALYZE, right?


Regards,

Vincent van Leeuwen
Media Design - http://www.mediadesign.nl/

Re: FW: [ADMIN] Shared_buffers and kernel parameters, tuning

From
Josh Berkus
Date:
Vincent,

> This server isn't running postgres that long, and we're still trying to
figure
> out the best configuration parameters for the highest possible performance.
> Shared_buffers was one of the first things we looked at. We've tested with
> shared_buffers at 1024, 8192, 32768 and 131072. So far, performance with
> shared_buffers set at 32768 was the best we could attain. 8192 and 131072
came
> out roughly equal. 1024 was miserable.

Cool!   This is the first report we've had of a successful higher setting for
shared_buffers.  I'll need to revise the text.  What do people think of:

SHARED_BUFFERS
Sets the size of Postgres' memory buffer where queries are held before being
fed into the Kernel buffer of the host system.  It's very important to
remember that this is only a holding area, and not the total memory available
for the server.  As such, resist the urge to set this number to a large
portion of your RAM, as this will actually degrade performance on many OSes.
Members of the pgsql-performance mailing list have mostly found useful values
in the range of 1000-6000, depending on available RAM, database size, and
number of concurrent queries.
This can go up slightly for servers with a great deal of RAM; the useful
maximum on Linux seems to be 6% to 10% of available RAM, with performance
degrading at higher settings.  Information on other OSes is not yet posted.
On multi-purpose servers, of course, the setting should be lowered.

> Also, there was a very strong relation between the shared_buffers setting
and
> the amount of cpu time spent in kernelland. Currently, the server spends
> roughly 20% of it's time in kernelspace (according to vmstat). When
> shared_buffers was 8192, this went up to about 30%.

This makes perfect sense ... less shared_buffers = more kernel_buffers, and
vice-versa.

> Currently, we have the following settings:
> shared_buffers = 32768
> max_fsm_relations = 100

You might wanna increase this; current recommended is 300 just to make sure
that you have one for every table.

> Just to be absolutely sure: all *_cost parameters only influence the chosen
> plan, right? There is absolutely nothing else influenced which doesn't show
up
> in an EXPLAIN ANALYZE, right?

Yes, AFAIK.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: FW: [ADMIN] Shared_buffers and kernel parameters, tuning

From
Bruce Momjian
Date:
Josh Berkus wrote:
> Vincent,
>
> > This server isn't running postgres that long, and we're still trying to
> figure
> > out the best configuration parameters for the highest possible performance.
> > Shared_buffers was one of the first things we looked at. We've tested with
> > shared_buffers at 1024, 8192, 32768 and 131072. So far, performance with
> > shared_buffers set at 32768 was the best we could attain. 8192 and 131072
> came
> > out roughly equal. 1024 was miserable.
>
> Cool!   This is the first report we've had of a successful higher setting for
> shared_buffers.  I'll need to revise the text.  What do people think of:

I have been thinking about shared_buffers, and it seems it is the
age-old issue of working set.

Traditionally Unix doesn't use working set (though a few do).  It just
allocates memory proportionally among all processes, with unreferenced
pages being paged out first.

For PostgreSQL, if your working set is X, if you set your shared buffers
to X, you will get optimal performance (assuming there is no memory
pressure).  If set allocate X/2, you will probably get worse
performance.  If you allocate X*2, you will also probably get slightly
worse performance.

Now, let's suppose you can't allocate X shared buffers, because of
memory pressure.  Suppose you can allocate X/2 shared buffers, and that
will leave X/2 kernel buffers.  It would be better to allocate X/4
shared buffers, and leave X*3/4 kernel buffers.  If you can only
allocate X/5 shared buffers, you might be better with X/10 shared
buffers because you are going to be doing a lot of I/O, and you need
lots of kernel buffers for that.

I think that is what people are seeing when modifying shared buffers:

    X shared buffers is best
    >X shared buffers is too much overhead and starves kernel
    <X might be better by not maximizing shared buffers and have
       more kernel buffers

Add to this that it is very hard to estimate working set.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: FW: [ADMIN] Shared_buffers and kernel parameters, tuning

From
Tom Lane
Date:
Vincent van Leeuwen <pgsql.spam@vinz.nl> writes:
> Halving the cpu_tuple_cost has given a very impressive performance boost
> (performance roughly doubled). I'm not sure why, because the plans of the
> large queries I was checking haven't changed as far as I can see, but maybe
> some smaller queries I didn't bother to check are using a different plan now.

That's very curious; I'd expect that parameter to have only marginal
effect in the first place (unless you make huge changes in it, of course).
It must have changed some plan that you didn't take note of.  If you can
find it I'd be interested to know.

> Just to be absolutely sure: all *_cost parameters only influence the
> chosen plan, right? There is absolutely nothing else influenced which
> doesn't show up in an EXPLAIN ANALYZE, right?

AFAIR, the only one of these parameters that the executor pays any
attention to is SORT_MEM; that will determine how soon the runtime code
starts to spill tuples to disk in sorts, hash tables, etc.

            regards, tom lane

Re: FW: [ADMIN] Shared_buffers and kernel parameters, tuning

From
Vincent van Leeuwen
Date:
On 2003-06-10 15:08:22 -0400, Bruce Momjian wrote:
> For PostgreSQL, if your working set is X, if you set your shared buffers
> to X, you will get optimal performance (assuming there is no memory
> pressure).  If set allocate X/2, you will probably get worse
> performance.  If you allocate X*2, you will also probably get slightly
> worse performance.
>
> Now, let's suppose you can't allocate X shared buffers, because of
> memory pressure.  Suppose you can allocate X/2 shared buffers, and that
> will leave X/2 kernel buffers.  It would be better to allocate X/4
> shared buffers, and leave X*3/4 kernel buffers.  If you can only
> allocate X/5 shared buffers, you might be better with X/10 shared
> buffers because you are going to be doing a lot of I/O, and you need
> lots of kernel buffers for that.
>
> I think that is what people are seeing when modifying shared buffers:
>
>     X shared buffers is best
>     >X shared buffers is too much overhead and starves kernel
>     <X might be better by not maximizing shared buffers and have
>        more kernel buffers
>
> Add to this that it is very hard to estimate working set.
>

Makes a lot of sense to me. We're doing a lot of I/O on a small part of that
7GB, and the rest is accessed in a more or less random fashion, so 256MB of
shared buffers sounds about right. I'll play more with this in the future to
see at what setting it performs best.

Is there any information available in the system tables or statistics
collector that can help determine X? Could PostgreSQL be easily modified to
provide more information in this area?

Vincent van Leeuwen
Media Design - http://www.mediadesign.nl/

Re: FW: [ADMIN] Shared_buffers and kernel parameters, tuning

From
Vincent van Leeuwen
Date:
On 2003-06-10 15:43:47 -0400, Tom Lane wrote:
> Vincent van Leeuwen <pgsql.spam@vinz.nl> writes:
> > Halving the cpu_tuple_cost has given a very impressive performance boost
> > (performance roughly doubled). I'm not sure why, because the plans of the
> > large queries I was checking haven't changed as far as I can see, but maybe
> > some smaller queries I didn't bother to check are using a different plan now.
>
> That's very curious; I'd expect that parameter to have only marginal
> effect in the first place (unless you make huge changes in it, of course).
> It must have changed some plan that you didn't take note of.  If you can
> find it I'd be interested to know.
>

Unfortunately, we're not exactly in the best position to test a lot of things.
Our website has been running on MySQL and PHP for the last 3 years, and I've
been wanting to switch to PostgreSQL for about the last 2 years. A lot of
preparation went in to the change, but once we switched our live site to use
PostgreSQL as it's main database we were utterly dissapointed in our own
preparations. I knew our website was somewhat optimized for MySQL usage, but
looking back I am totally amazed that we were able to squeeze so much
performance out of a database that locks entire tables for every update (yes,
we used the MyISAM table format). One of the most surprising things we learned
was that MySQL was totally bottlenecking on I/O, with a large chunk of CPU
unused, and with PostgreSQL it's the other way around.

The last couple of weeks have been a nice collection of whacky antics and
performance tuning all over the place. The first week everything performed
abysmal, and another week later we're close to our original performance again.
Ofcourse, the goal is to exceed MySQL's performance by a comfortable margin,
but we're not there yet :)

So, basically, this server is pushed far harder than it should be. Average
system load is at about 4, and there are always 50-200 postgresql threads
running during daytime. A new server that will replace this one and which is
roughly 2-3 times as fast will be put live in a few weeks, and until that's
here this box will have to bear the burden on it's own.

> > Just to be absolutely sure: all *_cost parameters only influence the
> > chosen plan, right? There is absolutely nothing else influenced which
> > doesn't show up in an EXPLAIN ANALYZE, right?
>
> AFAIR, the only one of these parameters that the executor pays any
> attention to is SORT_MEM; that will determine how soon the runtime code
> starts to spill tuples to disk in sorts, hash tables, etc.
>

Current sort_mem setting is based on monitoring the pgsql_tmp directory and
concluding that sort_mem needed to be doubled to avoid swapping to disk. It's
not as if this box doesn't have enough RAM :)

But this means I'll have to look more closely at my query plans, more things
are changing than I'm noticing when I tweak various settings.

One of the hardest parts is that some queries which should use sequential
scans are using indexes and some queries which should use indexes are using
sequential scans :) We're currently using some ugly 'set enable_seqscan to
off;' hacks in a few places, until everything is tweaked right, but I hope we
can remove those as soon as possible.


Regards,

Vincent van Leeuwen
Media Design - http://www.mediadesign.nl/

Re: FW: [ADMIN] Shared_buffers and kernel parameters, tuning

From
Bruce Momjian
Date:
Vincent van Leeuwen wrote:
> On 2003-06-10 15:08:22 -0400, Bruce Momjian wrote:
> > For PostgreSQL, if your working set is X, if you set your shared buffers
> > to X, you will get optimal performance (assuming there is no memory
> > pressure).  If set allocate X/2, you will probably get worse
> > performance.  If you allocate X*2, you will also probably get slightly
> > worse performance.
> >
> > Now, let's suppose you can't allocate X shared buffers, because of
> > memory pressure.  Suppose you can allocate X/2 shared buffers, and that
> > will leave X/2 kernel buffers.  It would be better to allocate X/4
> > shared buffers, and leave X*3/4 kernel buffers.  If you can only
> > allocate X/5 shared buffers, you might be better with X/10 shared
> > buffers because you are going to be doing a lot of I/O, and you need
> > lots of kernel buffers for that.
> >
> > I think that is what people are seeing when modifying shared buffers:
> >
> >     X shared buffers is best
> >     >X shared buffers is too much overhead and starves kernel
> >     <X might be better by not maximizing shared buffers and have
> >        more kernel buffers
> >
> > Add to this that it is very hard to estimate working set.
> >
>
> Makes a lot of sense to me. We're doing a lot of I/O on a small part of that
> 7GB, and the rest is accessed in a more or less random fashion, so 256MB of
> shared buffers sounds about right. I'll play more with this in the future to
> see at what setting it performs best.
>
> Is there any information available in the system tables or statistics
> collector that can help determine X? Could PostgreSQL be easily modified to
> provide more information in this area?

Estimatinge working set is an old problem.  You can look at pgsql_tmp
under each database directory for sort mem, but for shared buffers, I am
not sure how to know the proper size.

Anyone else have an idea?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073