Thread: My own performance/tuning q&a

My own performance/tuning q&a

From
Allen Landsidel
Date:
Asked and answered on the list probably a thousand times, but what else is
there to discuss on the performance list? :)

I recently built a rather powerful machine to be used in a heavily accessed
database.. the machine is a dual AthlonMP 2800+, 2GB of PC2100 ECC, and a
4x18GB RAID-0 using 15k rpm fujitsu MAS drives on a 4ch u160 ICP-Vortex
card with 256MB of cache.

The box runs FreeBSD, tracking RELENG_4 (-STABLE) and PostGreSQL 7.3.4 from
ports (7.3.4_1)

There are a few databases running on the machine, but for now, the one that
is the most performance sensitive is also arguably the worst designed.  The
access pattern on a day to day basis looks basically like this:

1. ~75k rows aggregate are inserted into two different tables, 70/30 split
between two tables.  The 70% going to the smaller table (containing just
two integers) and the 30% going into a larger table containing a rather
largish (~4KB) text field and more integer types; no searching of any kind
is done on this text field, it appears in no where clauses, and is not indexed.

2. As these rows are inserted, other processes see them and for each row:
   a. A new row containing just one field is inserted, that row being an FK
into the 30% table mentioned above.
   b. A row in a 3rd table is updated; this table never gets deleted from,
and rarely sees inserts, it's just a status table, but it has nearly a
million rows.  The updated row is an integer.
   c. The 30% table itself is updated.

3. When these processes finish their processing, the rows in both the 70/30
tables and the table from 2a are deleted; The 2b table has a row again updated.

There is only one process that does all the inserting, from a web
backend.  Steps 2 and 3 are done by several other backend processes on
different machines, "fighting" to pick up the newly inserted rows and
process them.  Not the most efficient design, but modifying the current
code isn't an option; rest assured that this is being redesigned and new
code is being written, but the developer who wrote the original left us
with his spaghetti-python mess and no longer works for us.

I run a 'vacuum analyze verbose' on the database in question every hour,
and a reindex on every table in the database every six hours, 'vacuum full'
is run manually as required perhaps anywhere from once a week to once a
month.  I realize the analyze may not be running often enough and the
reindex more often than need be, but I don't think these are adversely
affecting performance very much; degredation over time does not appear to
be an issue.

So on with the question.  Given the above machine with the above database
and access pattern, I've configured the system with the following
options.  I'm just wondering what some of you more experierenced pg tuners
have to say.  I can provide more information such as ipcs, vmstat, iostat,
etc output on request but I figure this message is getting long enough
already..

Thanks for any input.  Kernel and postgres information follows.

Related kernel configuration options:

...
cpu I686_CPU
maxusers 256
...
options         MAXDSIZ="(1024UL*1024*1024)"
options         MAXSSIZ="(512UL*1024*1024)"
options         DFLDSIZ="(512UL*1024*1024)"
...
options         SYSVSHM                 #SYSV-style shared memory
options         SYSVMSG                 #SYSV-style message queues
options         SYSVSEM                 #SYSV-style semaphores
options         SHMMAXPGS=65536
options         SHMMAX="(SHMMAXPGS*PAGE_SIZE+1)"
options         SHMSEG=256
options         SEMMNI=384
options         SEMMNS=768
options         SEMMNU=384
options         SEMMAP=384
...

relevant postgresql.conf options:

max_connections = 128
shared_buffers = 20000
max_fsm_relations = 10000
max_fsm_pages = 2000000
max_locks_per_transaction = 64
wal_buffers = 128
sort_mem = 262144 # we have some large queries running at times
vacuum_mem = 131072
checkpoint_segments = 16
checkpoint_timeout = 300
commit_delay = 1000
commit_siblings = 32
fsync = true
wal_fsync_method = fsync
effective_cache_size = 49152 # 384MB, this could probably be higher
random_page_cost = 1.7
cpu_tuble_cost = 0.005
cpu_index_tuple_cost = 0.0005
cpu_operator_cost = 0.0012
geqo_threshold = 20
stats_start_collector = true
stats_reset_on_server_start = off
stats_command_string = true
stats_row_level = true
stats_block_level = true


Re: My own performance/tuning q&a

From
Vivek Khera
Date:
>>>>> "AL" == Allen Landsidel <all@biosys.net> writes:

AL> I recently built a rather powerful machine to be used in a heavily
AL> accessed database.. the machine is a dual AthlonMP 2800+, 2GB of
AL> PC2100 ECC, and a 4x18GB RAID-0 using 15k rpm fujitsu MAS drives on a
AL> 4ch u160 ICP-Vortex card with 256MB of cache.

The only recommendation I'd make is to switch from RAID0 to RAID10,
unless you can afford the downtime (and loss of data) when one of your
drives takes a vacation.

Also, is your RAID card cache battery backed up?  If no, then you lose
the ability to use write-back and this costs *dearly* in performance.


AL> The box runs FreeBSD, tracking RELENG_4 (-STABLE) and PostGreSQL 7.3.4
AL> from ports (7.3.4_1)

An excellent choice. :-)

[[ ... ]]

AL> I run a 'vacuum analyze verbose' on the database in question every
AL> hour, and a reindex on every table in the database every six hours,
AL> 'vacuum full' is run manually as required perhaps anywhere from once a
AL> week to once a month.  I realize the analyze may not be running often
AL> enough and the reindex more often than need be, but I don't think
AL> these are adversely affecting performance very much; degredation over
AL> time does not appear to be an issue.

Personally, I don't think you need to reindex that much.  And I don't
think you need to vacuum full *ever* if you vacuum often like you do.
Perhaps reducing the vacuum frequency may let you reach a steady state
of disk usage?

Depending on how many concurrent actions you process, perhaps you can
use a temporary table for each, so you don't have to delete many rows
when you're done.


On my busy tables, I vacuum every 6 hours.  The vacuum analyze is run
on the entire DB nightly.  I reindex every month or so my most often
updated tables that show index bloat.  Watch for bloat by monitoring
the size of your indexes:

SELECT relname,relpages FROM pg_class WHERE relname LIKE 'some_table%' ORDER BY relname;

AL> Related kernel configuration options:

AL> ...
AL> cpu I686_CPU
AL> maxusers 256

let the system autoconfigure maxusers...

AL> ...
AL> options         MAXDSIZ="(1024UL*1024*1024)"
AL> options         MAXSSIZ="(512UL*1024*1024)"
AL> options         DFLDSIZ="(512UL*1024*1024)"

above are ok at defaults.

AL> options         SHMMAXPGS=65536

perhaps bump this and increase your shared buffers.  I find that if
you do lots of writes, having a few more shared buffers helps.

AL> options         SHMMAX="(SHMMAXPGS*PAGE_SIZE+1)"

you don't need to explicitly set this... it is automatically set based
on the above setting.


AL> relevant postgresql.conf options:

AL> max_fsm_pages = 2000000

this may be overkill.  I currently run with 1000000

AL> effective_cache_size = 49152 # 384MB, this could probably be higher

the current recommendation for freebsd is to set this to:

`sysctl -n vfs.hibufspace` / 8192

where 8192 is the blocksize used by postgres.

You may also want to increase the max buffer space used by FreeBSD,
which apparently is capped at 200M (I think) by dafault.   I'll have
to look up how to bump that, as most likely you have plenty of RAM
sitting around unused.  What does "top" say about that when you're
busy?

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: My own performance/tuning q&a

From
Allen Landsidel
Date:
At 17:14 10/23/2003, Vivek Khera wrote:
> >>>>> "AL" == Allen Landsidel <all@biosys.net> writes:
>
>AL> I recently built a rather powerful machine to be used in a heavily
>AL> accessed database.. the machine is a dual AthlonMP 2800+, 2GB of
>AL> PC2100 ECC, and a 4x18GB RAID-0 using 15k rpm fujitsu MAS drives on a
>AL> 4ch u160 ICP-Vortex card with 256MB of cache.
>
>The only recommendation I'd make is to switch from RAID0 to RAID10,
>unless you can afford the downtime (and loss of data) when one of your
>drives takes a vacation.
>
>Also, is your RAID card cache battery backed up?  If no, then you lose
>the ability to use write-back and this costs *dearly* in performance.

I'm planning to move it to -10 or -5 (or even -50) once we have more money
to spend on drives.  As it is right now though, I couldn't spare the
space.. The box this was moved from was a 2x1000 P3 with a single u160
drive.. Battery backup is something I really should have gotten on the
memory but I spaced out when placing the order, it'll come in the future.

I'm kind of "living on the edge" here with regard to no bbu on the raid and
using raid-0 I know.. but it's for a short time, and I don't think in the
scheme of things this is any more failure-prone than the crummy setup it
was on before.  Backup and backup often, I know that mantra very well and
live by it. :)


>AL> The box runs FreeBSD, tracking RELENG_4 (-STABLE) and PostGreSQL 7.3.4
>AL> from ports (7.3.4_1)
>
>An excellent choice. :-)

I recognize you from those lists.. didn't notice the Ph.D. before though..
but yes, I'm a huge FreeBSD fan.. I didn't need anyone to talk me into that
particular choice. ;)

>AL> I run a 'vacuum analyze verbose' on the database in question every
>AL> hour, and a reindex on every table in the database every six hours,
>AL> 'vacuum full' is run manually as required perhaps anywhere from once a
>AL> week to once a month.  I realize the analyze may not be running often
>AL> enough and the reindex more often than need be, but I don't think
>AL> these are adversely affecting performance very much; degredation over
>AL> time does not appear to be an issue.
>
>Personally, I don't think you need to reindex that much.  And I don't
>think you need to vacuum full *ever* if you vacuum often like you do.
>Perhaps reducing the vacuum frequency may let you reach a steady state
>of disk usage?

Well I had the vacuums running every 15 minutes for a while.. via a simple
cron script I wrote just to make sure no more than one vacuum ran at once,
and to 'nice' the job.. but performance on the db does suffer a bit during
vacuums or so it seems.  The performance doesn't degrade noticably after
only an hour without a vacuum though, so I'd like to make the state of
degraded performance more periodic -- not the general rule during 24/7
operation.

I'll monkey around more with running the vacuum more often and see if the
performance hit was more imagined than real.


>Depending on how many concurrent actions you process, perhaps you can
>use a temporary table for each, so you don't have to delete many rows
>when you're done.

I'd love to but unfortunately the daemons that use the database are a mess,
more or less 'unsupported' at this point.. thankfully they're being
replaced along with a lot of performance-hurting SQL.


>On my busy tables, I vacuum every 6 hours.  The vacuum analyze is run
>on the entire DB nightly.  I reindex every month or so my most often
>updated tables that show index bloat.  Watch for bloat by monitoring
>the size of your indexes:
>
>SELECT relname,relpages FROM pg_class WHERE relname LIKE 'some_table%'
>ORDER BY relname;

Thanks for that tidbit.. maybe I'll cron something else to grab the values
once a day or so and archive them in another table for history.. make my
life easier. ;)


>AL> Related kernel configuration options:
>
>AL> ...
>AL> cpu I686_CPU
>AL> maxusers 256
>
>let the system autoconfigure maxusers...

Are you sure about this?  I have always understood that explicitly setting
this value was the best thing to do if you knew the maximum number of users
you would encounter, as the kernel doesn't have to 'guess' at structure
sizes and the like, or grow them later..


>AL> ...
>AL> options         MAXDSIZ="(1024UL*1024*1024)"
>AL> options         MAXSSIZ="(512UL*1024*1024)"
>AL> options         DFLDSIZ="(512UL*1024*1024)"
>
>above are ok at defaults.

These are related to something else.. a linux developer on the system used
to the way it'll always allow you access to all the memory on a machine and
just kill a random process to give you memory if you allocated more than
was free.. ;)

He didn't know processes were getting killed, but the defaults turned out
to be not high enough.  This will get turned back down to default once he's
done migrating everything into the new database and his app no longer needs
to run there.  I just mentioned them in case they could adversely affect
performance as-is.


>AL> options         SHMMAXPGS=65536
>
>perhaps bump this and increase your shared buffers.  I find that if
>you do lots of writes, having a few more shared buffers helps.

Any ideas how much of a bump, or does that depend entirely on me and I
should just play with it?  Would doubling it be too much of a bump?


>AL> options         SHMMAX="(SHMMAXPGS*PAGE_SIZE+1)"
>
>you don't need to explicitly set this... it is automatically set based
>on the above setting.

I'm an explicit kind of guy. ;)


>AL> relevant postgresql.conf options:
>
>AL> max_fsm_pages = 2000000
>
>this may be overkill.  I currently run with 1000000

At only 6 bytes each I thought 12M wasn't too much to spare for the sake of
making sure there is enough room there for everything.. I am watching my
file sizes and vacuum numbers to try and tune this value but it's an
arduous process.


>AL> effective_cache_size = 49152 # 384MB, this could probably be higher
>
>the current recommendation for freebsd is to set this to:
>
>`sysctl -n vfs.hibufspace` / 8192
>
>where 8192 is the blocksize used by postgres.

That comes out as 25520.. I have it at 384MB because I wanted to take the
256MB on the RAID controller into account as well.

I'm not entirely certain how much of that 256MB is available, and for what
kind of cache.. I know the i960 based controllers all need to set aside at
least 16MB for their "OS" and it isn't used for cache, not sure about ARM
based cards like the ICP.. but I don't think assuming 128MB is too much of
a stretch, or even 192MB.



>You may also want to increase the max buffer space used by FreeBSD,
>which apparently is capped at 200M (I think) by dafault.   I'll have
>to look up how to bump that, as most likely you have plenty of RAM
>sitting around unused.  What does "top" say about that when you're
>busy?

Yes that hibufspace value comes out to 200MB.. (199.375 really, odd)

top usually shows me running with that same value.. 199MB.. and most of the
time, with maybe 1.2GB free in the Inact area..

I'll see if sysctl lets me write this value, or if it's a kernel config
option I missed, unless you have remembered between then and now.  I'd
really like to have this higher, say around 512MB.. more if I can spare it
after watching for a bit.

Given this and the above about the controllers onboard cache (not to
mention the per-drive cache) do you think I'll still need to lower
effective_cache_size?

Thanks..

-Allen


Re: My own performance/tuning q&a

From
Vivek Khera
Date:
>>>>> "AL" == Allen Landsidel <all@biosys.net> writes:

AL> maxusers 256
>>
>> let the system autoconfigure maxusers...

AL> Are you sure about this?  I have always understood that explicitly
AL> setting this value was the best thing to do if you knew the maximum

Yes, recent freebsd kernels autosize various tables and limits based
on existing RAM.  It does pretty well.


AL> These are related to something else.. a linux developer on the system
AL> used to the way it'll always allow you access to all the memory on a

Ahhh... I guess we don't believe in multi-user systems ;-)

AL> options         SHMMAXPGS=65536
>>
>> perhaps bump this and increase your shared buffers.  I find that if
>> you do lots of writes, having a few more shared buffers helps.

AL> Any ideas how much of a bump, or does that depend entirely on me and I
AL> should just play with it?  Would doubling it be too much of a bump?

I use 262144 for SHMMAXPGS and SHMALL.  I also use about 30000 shared
buffers.

AL> I'll see if sysctl lets me write this value, or if it's a kernel
AL> config option I missed, unless you have remembered between then and

you need to bump some header file constant and rebuild the kernel.  it
also increases the granularity of how the buffer cache is used, so I'm
not sure how it affects overall system.  nothing like an experiment...

AL> Given this and the above about the controllers onboard cache (not to
AL> mention the per-drive cache) do you think I'll still need to lower
AL> effective_cache_size?

It is hard to say.  If you tell PG you have more than you do, I don't
know what kind of decisions it will make incorrectly.  I'd rather be
conservative and limit it to the RAM that the system says it will
use.  The RAM in the controller is not additive to this -- it is
redundant to it, since all data goes thru that cache into the main
memory.


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: My own performance/tuning q&a

From
Vivek Khera
Date:
>>>>> "AL" == Allen Landsidel <all@biosys.net> writes:

AL> Well I had the vacuums running every 15 minutes for a while.. via a
AL> simple cron script I wrote just to make sure no more than one vacuum
AL> ran at once, and to 'nice' the job.. but performance on the db does

"nice"-ing the client does nothing for the backend server that does
the actual work.  You need to track down the PID of the backend server
running the vacuum and renice *it* to get any effect.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: My own performance/tuning q&a

From
"scott.marlowe"
Date:
On Fri, 24 Oct 2003, Vivek Khera wrote:

> >>>>> "AL" == Allen Landsidel <all@biosys.net> writes:
>
> AL> Well I had the vacuums running every 15 minutes for a while.. via a
> AL> simple cron script I wrote just to make sure no more than one vacuum
> AL> ran at once, and to 'nice' the job.. but performance on the db does
>
> "nice"-ing the client does nothing for the backend server that does
> the actual work.  You need to track down the PID of the backend server
> running the vacuum and renice *it* to get any effect.

Note that Tom has mentioned problems with possible deadlocks when nicing
individual backends before, so proceed with caution here.


Re: My own performance/tuning q&a

From
Vivek Khera
Date:
>>>>> "sm" == scott marlowe <scott.marlowe> writes:


sm> Note that Tom has mentioned problems with possible deadlocks when nicing
sm> individual backends before, so proceed with caution here.

I can see possible starvation, but if scheduling changes cause
deadlocks, then there's something wrong with the design.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: My own performance/tuning q&a

From
Bruce Momjian
Date:
Vivek Khera wrote:
> >>>>> "sm" == scott marlowe <scott.marlowe> writes:
>
>
> sm> Note that Tom has mentioned problems with possible deadlocks when nicing
> sm> individual backends before, so proceed with caution here.
>
> I can see possible starvation, but if scheduling changes cause
> deadlocks, then there's something wrong with the design.

Yes, I think Tom's concern was priority inversion, where a low priority
process holds a lock while a higher one waits for it.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: My own performance/tuning q&a

From
Allen Landsidel
Date:
Pardon this for looking somewhat "weird" but it seems I'm not getting all
the messages to the list.. I've noticed the past few days a lot of them are
coming out of order as well..

So, this was copy/pasted from the web archive of the list..

Vivek Khera wrote:
> >>>>> "AL" == Allen Landsidel <all ( at ) biosys ( dot ) net> writes:
>
>AL> maxusers 256
> >> let the system autoconfigure maxusers...
>
>AL> Are you sure about this?  I have always understood that explicitly
>
>Yes, recent freebsd kernels autosize various tables and limits based
>on existing RAM.  It does pretty well.

I'll disable it then and see how it goes.

>AL> These are related to something else.. a linux developer on the system
>AL> used to the way it'll always allow you access to all the memory on a
>
>Ahhh... I guess we don't believe in multi-user systems ;-)

No, that's a foreign concept to a lot of people it seems.  As a matter of
trivia, I first suggested we run this on another server instead and hit the
db remotely, as it's only going to be a "run once" type of thing that
converts the old system to the new one but was rebuffed.  Yesterday during
a test run the thing ran over the 1GB limit, failed on some new() or other
and dumped core.  I couldn't bring the db down at that time to update the
kernel, so we ran it on another box that has MAXDSIZ set to 1.5GB and it
ran ok, but took about six hours.. so I'll be upping the that value yet
again for this one special run this weekend when we do the *real* switch
over, then putting it back down once we're all done.

I can deal with it since it's not going to be "normal" but simply a one-off
type thing.

FWIW the same kind of thing has happened to me with this postgres install;
Occasionally large atomic queries like DELETE will fail for the same reason
(out of memory) if there are a lot of rows to get removed, and TRUNCATE
isn't an option since there are FKs on the table in question.  This is an
annoyance I'd be interested to hear how other people work around, but only
a minor one.

>I use 262144 for SHMMAXPGS and SHMALL.  I also use about 30000 shared
>buffers.

I believe I had it fairly high once before and didn't notice much of an
improvement but I'll fool with numbers around where you suggest.

>AL> I'll see if sysctl lets me write this value, or if it's a kernel
>AL> config option I missed, unless you have remembered between then and
>
>you need to bump some header file constant and rebuild the kernel.  it
>also increases the granularity of how the buffer cache is used, so I'm
>not sure how it affects overall system.  nothing like an experiment...

So far I've found a whole lot of questions about this, but nothing about
the constant.  The sysctl (vfs.hibufspace I believe is the one) is read
only, although I should be able to work around that via /boot/loader.conf
if I can't find the kernel option.

>AL> Given this and the above about the controllers onboard cache (not to
>AL> mention the per-drive cache) do you think I'll still need to lower
>AL> effective_cache_size?
>
>It is hard to say.  If you tell PG you have more than you do, I don't
>know what kind of decisions it will make incorrectly.  I'd rather be
>conservative and limit it to the RAM that the system says it will
>use.  The RAM in the controller is not additive to this -- it is
>redundant to it, since all data goes thru that cache into the main
>memory.

A very good point, I don't know why I thought they may hold different
data.  I think it could differ somewhat but probably most everything in the
controller cache will be duplicated in the OS cache, provided the OS cache
is at least as large.

A separate reply concatenated here to a message I actually did get
delivered via email:

At 16:50 10/24/2003, Bruce Momjian wrote:
>Vivek Khera wrote:
> > >>>>> "sm" == scott marlowe <scott.marlowe> writes:
> >
> >
> > sm> Note that Tom has mentioned problems with possible deadlocks when
> nicing
> > sm> individual backends before, so proceed with caution here.
> >
> > I can see possible starvation, but if scheduling changes cause
> > deadlocks, then there's something wrong with the design.
>
>Yes, I think Tom's concern was priority inversion, where a low priority
>process holds a lock while a higher one waits for it.

1. Vivek, you were absolutely right about the backend process not being
lowered in priority by nice'ing the psql.  Yet another thing that "just
didn't occur" when I wrote the script.

2. Vivek and Bruce (and even Tom), "VACUUM ANALYZE (VERBOSE)" isn't
supposed to lock anything though, right?  I can see this being a possible
problem for other queries that do lock things, but between Vivek pointing
out that the nice isn't *really* affecting the vacuum (as I just run one
query db-wide) and the fact that the vacuum doesn't lock, I don't think
it's hurting (or helping) in this case.

However, I do the same thing with the reindex, so I'll definitely be taking
it out there, as that one does lock.. although I would think the worst this
would do would be a making the index unavailable and forcing a seq scan..
is that not the case?


Re: My own performance/tuning q&a

From
Vivek Khera
Date:
>>>>> "AL" == Allen Landsidel <all@biosys.net> writes:

AL> However, I do the same thing with the reindex, so I'll definitely be
AL> taking it out there, as that one does lock.. although I would think
AL> the worst this would do would be a making the index unavailable and
AL> forcing a seq scan.. is that not the case?

Nope.  *All* access to the table is locked out.




AL> ---------------------------(end of broadcast)---------------------------
AL> TIP 5: Have you checked our extensive FAQ?

AL>                http://www.postgresql.org/docs/faqs/FAQ.html

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: My own performance/tuning q&a

From
Vivek Khera
Date:
>>>>> "AL" == Allen Landsidel <all@biosys.net> writes:

>> you need to bump some header file constant and rebuild the kernel.  it
>> also increases the granularity of how the buffer cache is used, so I'm
>> not sure how it affects overall system.  nothing like an experiment...

AL> So far I've found a whole lot of questions about this, but nothing
AL> about the constant.  The sysctl (vfs.hibufspace I believe is the one)
AL> is read only, although I should be able to work around that via
AL> /boot/loader.conf if I can't find the kernel option.

Here's what I have in my personal archive.  I have not tried it yet.
BKVASIZE is in a system header file, so is not a regular "tunable" for
a kernel.  That is, you must muck with the source files to change it,
which make for maintenance headaches.



From: Sean Chittenden <sean@chittenden.org>
Subject: Re: go for a script! / ex:  PostgreSQL vs. MySQL
Newsgroups: ml.postgres.performance
To: Vivek Khera <khera@kcilink.com>
Cc: pgsql-performance@postgresql.org
Date: Mon, 13 Oct 2003 12:04:46 -0700
Organization: none

> >> echo "effective_cache_size = $((`sysctl -n vfs.hibufspace` / 8192))"
> >>
> >> I've used it for my dedicated servers.  Is this calculation correct?
>
> SC> Yes, or it's real close at least.  vfs.hibufspace is the amount
> of SC> kernel space that's used for caching IO operations (minus the
>
> I'm just curious if anyone has a tip to increase the amount of
> memory FreeBSD will use for the cache?

Recompile your kernel with BKVASIZE set to 4 times its current value
and double your nbuf size.  According to Bruce Evans:

"Actually there is a way: the vfs_maxbufspace gives the amount of
space reserved for buffer kva (= nbuf * BKVASIZE).  nbuf is easy to
recover from this, and the buffer kva space may be what is wanted
anyway."
[snip]
"I've never found setting nbuf useful, however.  I want most
parametrized sizes including nbuf to scale with resource sizes, and
it's only with RAM sizes of similar sizes to the total virtual address
size that its hard to get things to fit.  I haven't hit this problem
myself since my largest machine has only 1GB.  I use an nbuf of
something like twice the default one, and a BKVASIZE of 4 times the
default.  vfs.maxbufspace ends up at 445MB on the machine with 1GB, so
it is maxed out now."

YMMV.

-sc


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: My own performance/tuning q&a

From
Neil Conway
Date:
On Fri, 2003-10-24 at 20:11, Allen Landsidel wrote:
> However, I do the same thing with the reindex, so I'll definitely be taking
> it out there, as that one does lock.. although I would think the worst this
> would do would be a making the index unavailable and forcing a seq scan..
> is that not the case?

No, it exclusively locks the table. It has been mentioned before that we
should probably be able to fall back to a seqscan while the REINDEX is
going on, but that's not currently done.

-Neil