Thread: Help with performance problems

Help with performance problems

From

"Chris Hoover"

Date:

23 April 2004, 10:32:09

I need some help.  I have 5 db servers running our database servers, and they
all are having various degrees of performance problems.  The problems we are
experiencing are:

1.  General slowness
2.  High loads

All of our db's are running on Dell Poweredge 2650 with 2 P4 Xeons (2.8 ->
3.06 GHz) with 8 to 12 GB of memory.  The databases are running on attached
Dell Powervault 220s running raid5.

The databases were created and taken into production before I started working
here and are very flat.  Most of the major tables have a combined primary key
using an int field and a single char field.  There are some additional
indexes on some tables.  Most queries I see in the  logs are running at less
than .01 seconds with many significantly slower.

We are trying to narrow down the performance problem to either the db or the
hardware.  As the dba, I need to  try and get these db's tuned to the best
possible way considering the current db state.  We are in the beginning of a
complete db redesign and application re-write, but the completion and
deployment of the new db and app are quite a ways off.

Anyway, we are running the following:
PE 2650 w/ 2 cpus (2.8-3.06) - HT on
8-12 GB memory
OS on raid 0
DB's on Powervaults 220S using raid 5 (over 6 disks)
Each Postgresql cluster has 2 db up to almost 170db's (project to level out
the num of db's/cluster is being started)
DB's are no bigger than a few GB in size (largest is about 11GB according to a
du -h)
Running RH ES 2.1

Here is the postgresql.conf from the server with the 11GB db:

max_connections = 64
shared_buffers = 32768        # 256MB=32768(buffs)*8192(bytes/buff)
max_fsm_relations = 1000    # min 10, fsm is free space map, ~40 bytes
max_fsm_pages = 10000        # min 1000, fsm is free space map, ~6 bytes
sort_mem = 4096            # 256MB=4096(bytes/proc)*64(procs or conns)
checkpoint_segments = 16    # in logfile segments, min 1, 16MB each
checkpoint_timeout = 30        # range 30-3600, in seconds
effective_cache_size = 131072    # typically 8KB each
log_connections = true
log_pid = true
log_statement = true
log_duration = true
log_timestamp = true
stats_start_collector = true
stats_reset_on_server_start = true
stats_command_string = true
stats_row_level = true
stats_block_level = true
LC_MESSAGES = 'en_US'
LC_MONETARY = 'en_US'
LC_NUMERIC = 'en_US'
LC_TIME = 'en_US'

Here is top (server running pretty good right now)
  9:28am  up 25 days, 16:02,  2 users,  load average: 0.54, 0.33, 0.22
94 processes: 91 sleeping, 3 running, 0 zombie, 0 stopped
CPU0 states: 64.0% user,  0.1% system,  0.0% nice, 34.0% idle
CPU1 states: 29.0% user,  9.0% system,  0.0% nice, 60.0% idle
CPU2 states:  2.0% user,  0.1% system,  0.0% nice, 96.0% idle
CPU3 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle
Mem:  7720072K av, 7711648K used,    8424K free,  265980K shrd,  749888K buff
Swap: 2096440K av,   22288K used, 2074152K free                 6379304K
cached

Here is top from another server (with the most db's):
 9:31am  up 25 days, 16:05,  5 users,  load average: 2.34, 3.39, 4.28
147 processes: 145 sleeping, 2 running, 0 zombie, 0 stopped
CPU0 states:  6.0% user,  1.0% system,  0.0% nice, 91.0% idle
CPU1 states:  9.0% user,  4.0% system,  0.0% nice, 85.0% idle
CPU2 states:  9.0% user,  3.0% system,  0.0% nice, 86.0% idle
CPU3 states:  9.0% user,  4.0% system,  0.0% nice, 85.0% idle
Mem:  7721096K av, 7708040K used,   13056K free,  266132K shrd, 3151336K buff
Swap: 2096440K av,   24208K used, 2072232K free                 3746596K
cached

Thanks for any help/advice,

Chris

Re: Help with performance problems

From

"Aaron Werman"

Date:

23 April 2004, 13:16:02

Your second server has queuing (load averages are highish), only 2 processes
running, and almost all cycles are idle. You need to track down your
bottleneck. Have you looked at iostat/vmstat? I think it would be useful to
post these, ideally both before and after full vacuum analyze.

/Aaron


----- Original Message -----
From: "Chris Hoover" <revoohc@sermonaudio.com>
To: <pgsql-performance@postgresql.org>
Cc: <revoohc@sermonaudio.com>
Sent: Friday, April 23, 2004 9:31 AM
Subject: [PERFORM] Help with performance problems


I need some help.  I have 5 db servers running our database servers, and
they
all are having various degrees of performance problems.  The problems we are
experiencing are:

1.  General slowness
2.  High loads

All of our db's are running on Dell Poweredge 2650 with 2 P4 Xeons (2.8 ->
3.06 GHz) with 8 to 12 GB of memory.  The databases are running on attached
Dell Powervault 220s running raid5.

The databases were created and taken into production before I started
working
here and are very flat.  Most of the major tables have a combined primary
key
using an int field and a single char field.  There are some additional
indexes on some tables.  Most queries I see in the  logs are running at less
than .01 seconds with many significantly slower.

We are trying to narrow down the performance problem to either the db or the
hardware.  As the dba, I need to  try and get these db's tuned to the best
possible way considering the current db state.  We are in the beginning of a
complete db redesign and application re-write, but the completion and
deployment of the new db and app are quite a ways off.

Anyway, we are running the following:
PE 2650 w/ 2 cpus (2.8-3.06) - HT on
8-12 GB memory
OS on raid 0
DB's on Powervaults 220S using raid 5 (over 6 disks)
Each Postgresql cluster has 2 db up to almost 170db's (project to level out
the num of db's/cluster is being started)
DB's are no bigger than a few GB in size (largest is about 11GB according to
a
du -h)
Running RH ES 2.1

Here is the postgresql.conf from the server with the 11GB db:

max_connections = 64
shared_buffers = 32768 # 256MB=32768(buffs)*8192(bytes/buff)
max_fsm_relations = 1000 # min 10, fsm is free space map, ~40 bytes
max_fsm_pages = 10000 # min 1000, fsm is free space map, ~6 bytes
sort_mem = 4096 # 256MB=4096(bytes/proc)*64(procs or conns)
checkpoint_segments = 16 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 30 # range 30-3600, in seconds
effective_cache_size = 131072 # typically 8KB each
log_connections = true
log_pid = true
log_statement = true
log_duration = true
log_timestamp = true
stats_start_collector = true
stats_reset_on_server_start = true
stats_command_string = true
stats_row_level = true
stats_block_level = true
LC_MESSAGES = 'en_US'
LC_MONETARY = 'en_US'
LC_NUMERIC = 'en_US'
LC_TIME = 'en_US'

Here is top (server running pretty good right now)
  9:28am  up 25 days, 16:02,  2 users,  load average: 0.54, 0.33, 0.22
94 processes: 91 sleeping, 3 running, 0 zombie, 0 stopped
CPU0 states: 64.0% user,  0.1% system,  0.0% nice, 34.0% idle
CPU1 states: 29.0% user,  9.0% system,  0.0% nice, 60.0% idle
CPU2 states:  2.0% user,  0.1% system,  0.0% nice, 96.0% idle
CPU3 states:  0.0% user,  0.0% system,  0.0% nice, 100.0% idle
Mem:  7720072K av, 7711648K used,    8424K free,  265980K shrd,  749888K
buff
Swap: 2096440K av,   22288K used, 2074152K free                 6379304K
cached

Here is top from another server (with the most db's):
 9:31am  up 25 days, 16:05,  5 users,  load average: 2.34, 3.39, 4.28
147 processes: 145 sleeping, 2 running, 0 zombie, 0 stopped
CPU0 states:  6.0% user,  1.0% system,  0.0% nice, 91.0% idle
CPU1 states:  9.0% user,  4.0% system,  0.0% nice, 85.0% idle
CPU2 states:  9.0% user,  3.0% system,  0.0% nice, 86.0% idle
CPU3 states:  9.0% user,  4.0% system,  0.0% nice, 85.0% idle
Mem:  7721096K av, 7708040K used,   13056K free,  266132K shrd, 3151336K
buff
Swap: 2096440K av,   24208K used, 2072232K free                 3746596K
cached

Thanks for any help/advice,

Chris


---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: Help with performance problems

From

Josh Berkus

Date:

23 April 2004, 13:43:41

Chris,

> I need some help.  I have 5 db servers running our database servers, and
> they all are having various degrees of performance problems.  The problems
> we are experiencing are:

I'mm confused.  You're saying "general slowness" but say that most queries run
in under .01 seconds.   And you say "high loads" but the TOP snapshots you
provide show servers with 2 CPUs idle.

Are you sure you actually *have* a performance issue?

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Help with performance problems

From

"Chris Hoover"

Date:

23 April 2004, 14:16:12

I know the numbers look ok, but we are definetly suffering.  Also, if I try to
run any sort of vacuum or other db activity during normal business hours,
load goes through the roof.  I have seen loads of over 10 when trying to
vacuum the larger cluster and would have to kill the vacuums due to
complaints.

I think this is probably related to the hardware configuration, but I want to
make sure that there are no changes I could make configuration wise to the db
that might lighten the problem.

I'm especially want to make sure that I have the memory parameters set to good
numbers for my db's so that I can minimize thrashing between the postgres
memory pools and the hard drive.  I am thinking that this may be a big issue
here?

Thanks for any help,

Chris
On Friday 23 April 2004 12:42, Josh Berkus wrote:
> Chris,
>
> > I need some help.  I have 5 db servers running our database servers, and
> > they all are having various degrees of performance problems.  The
> > problems we are experiencing are:
>
> I'mm confused.  You're saying "general slowness" but say that most queries
> run in under .01 seconds.   And you say "high loads" but the TOP snapshots
> you provide show servers with 2 CPUs idle.
>
> Are you sure you actually *have* a performance issue?

Re: Help with performance problems

From

"scott.marlowe"

Date:

23 April 2004, 14:25:38

On Fri, 23 Apr 2004, Chris Hoover wrote:

> DB's on Powervaults 220S using raid 5 (over 6 disks)

What controller is this, the adaptec?  We've found it to be slower than
the LSI megaraid based controller, but YMMV.

> Running RH ES 2.1

Are you running the latest kernel for ES 2.1?  Early 2.4 kernels are
pretty pokey and have some odd behaviour under load that later 2.4
kernels seemed to fix.

> Here is the postgresql.conf from the server with the 11GB db:
>
> max_connections = 64
> shared_buffers = 32768        # 256MB=32768(buffs)*8192(bytes/buff)
> max_fsm_relations = 1000    # min 10, fsm is free space map, ~40 bytes
> max_fsm_pages = 10000        # min 1000, fsm is free space map, ~6 bytes

IF you're doing lots of updates and such, you might want these higher.
Have you vacuumed full the databases since taking over?

> sort_mem = 4096            # 256MB=4096(bytes/proc)*64(procs or conns)

Sorry, that's wrong.  sort_mem is measure in kbytes.  i.e. 8192 means 8
megs sort_mem.  Try setting it a bit higher (you've got LOTS of ram in these
boxes) to something like 16 or 32 meg.

> checkpoint_segments = 16    # in logfile segments, min 1, 16MB each
> checkpoint_timeout = 30        # range 30-3600, in seconds
> effective_cache_size = 131072    # typically 8KB each

This still looks low.  On one machine you're showing kernel cache of about
.7 gig, on the other it's 6 gig.  6 gigs of kernel cache would be a
setting of 800000.  It's more of a nudge factor than an exact science, so
don't worry too much.

If you've got fast I/O look at lowering random page cost to something
between 1 and 2.  We use 1.3 to 1.4 on most of our machines with fast
drives under them.

I'd use vmstat to see if you're I/O bound.

also, look for index bloat.  Before 7.4 it was a serious problem.  With
7.4 regular vacuuming should reclaim most lost space, but there are corner
cases where you still might need to re-index.

Re: Help with performance problems

From

"Chris Hoover"

Date:

23 April 2004, 14:53:24

Sorry for the confusion here.  I can't run any sort of vacuum durin the day
due to performance hits.  However, I have run vacuums at night.  Several
nights a week I run a vacuumdb -f -z  on all of the clusters.  I can take
serveral hours to complete, but it does complete.

During the day, I have tried to run a vacuumdb -v and a vacuumdb -z -v during
the day since I read it is supposed to help performance, but as I said, it
causes to much of a stress on the system.

I did change the vacuumdb script to do set the vacuum_mem to 512 when
vacuuming to try and help the situation (from the script: ${PATHNAME}psql
$PSQLOPT $ECHOOPT -c "SET vacuum_mem=524288;SET autocommit TO 'on';VACUUM
$full $verbose $analyze $table" -d $db ), and I reset it to 8192 at the end.

Anyway, thank you for the ideas so far, and any additional will be greatly
appreciated.

Chris
On Friday 23 April 2004 13:44, Kevin Barnard wrote:
> Chris Hoover wrote:
> >I know the numbers look ok, but we are definetly suffering.  Also, if I
> > try to run any sort of vacuum or other db activity during normal business
> > hours, load goes through the roof.  I have seen loads of over 10 when
> > trying to vacuum the larger cluster and would have to kill the vacuums
> > due to complaints.
>
> This is your problem then.  You have to regularly vacuum the DB.  You
> might want to dump and reload or schedule a vacuum full.  If you don't
> it doesn't matter what you do you will never get decent performance.
> Make sure you vacuum as a superuser this way you get system tables as well.
>
> Killing a vacuum is bad it tends to make the situation worse.  If you
> need to vaccuum one table at a time.
>
> >I think this is probably related to the hardware configuration, but I want
> > to make sure that there are no changes I could make configuration wise to
> > the db that might lighten the problem.
> >
> >I'm especially want to make sure that I have the memory parameters set to
> > good numbers for my db's so that I can minimize thrashing between the
> > postgres memory pools and the hard drive.  I am thinking that this may be
> > a big issue here?
>
> Get the vacuum done and don't worry about the hardware or the settings
> until afterwords.

Re: Help with performance problems

From

"Chris Hoover"

Date:

23 April 2004, 15:01:40

On Friday 23 April 2004 13:21, scott.marlowe wrote:
> On Fri, 23 Apr 2004, Chris Hoover wrote:
> > DB's on Powervaults 220S using raid 5 (over 6 disks)
>
> What controller is this, the adaptec?  We've found it to be slower than
> the LSI megaraid based controller, but YMMV.
>
We are using the perc3/di controller.  Believe it is using the megaraid
driver.
> > Running RH ES 2.1
>
> Are you running the latest kernel for ES 2.1?  Early 2.4 kernels are
> pretty pokey and have some odd behaviour under load that later 2.4
> kernels seemed to fix.
>
I'm not sure we are at the latest and greatest for 2.1, but I am trying to get
there.  Management won't let me do the upgrade w/o first testing/proving it
will not cause any more issues.  Due to all of the current issues, and the
criticality of these systems to our bottom line, they are being very careful
with any change that may impact our users further.

We are waiting on our datacenter to plug in our test server and powervault so
that we can test the upgrades the the latest RH 2.1 kernel.
> > Here is the postgresql.conf from the server with the 11GB db:
> >
> > max_connections = 64
> > shared_buffers = 32768        # 256MB=32768(buffs)*8192(bytes/buff)
> > max_fsm_relations = 1000    # min 10, fsm is free space map, ~40 bytes
> > max_fsm_pages = 10000        # min 1000, fsm is free space map, ~6 bytes
>
> IF you're doing lots of updates and such, you might want these higher.
> Have you vacuumed full the databases since taking over?
>
> > sort_mem = 4096            # 256MB=4096(bytes/proc)*64(procs or conns)
>
> Sorry, that's wrong.  sort_mem is measure in kbytes.  i.e. 8192 means 8
> megs sort_mem.  Try setting it a bit higher (you've got LOTS of ram in
> these boxes) to something like 16 or 32 meg.
>
> > checkpoint_segments = 16    # in logfile segments, min 1, 16MB each
> > checkpoint_timeout = 30        # range 30-3600, in seconds
> > effective_cache_size = 131072    # typically 8KB each
>
> This still looks low.  On one machine you're showing kernel cache of about
> .7 gig, on the other it's 6 gig.  6 gigs of kernel cache would be a
> setting of 800000.  It's more of a nudge factor than an exact science, so
> don't worry too much.
I believe changing this requires a restart of the cluster (correct?).  If so,
I'll try bumping up the effective_cache_size over the weekend.

Also, will all of the memory available to these machines, should I be running
with larger shared_buffers?  It seems like 256M is a bit small.
>
> If you've got fast I/O look at lowering random page cost to something
> between 1 and 2.  We use 1.3 to 1.4 on most of our machines with fast
> drives under them.
>
> I'd use vmstat to see if you're I/O bound.
>
If we end up being I/O bound, should the random page cost be set higher?

> also, look for index bloat.  Before 7.4 it was a serious problem.  With
> 7.4 regular vacuuming should reclaim most lost space, but there are corner
> cases where you still might need to re-index.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
Thanks for the help,

Chris

Re: Help with performance problems

From

Josh Berkus

Date:

23 April 2004, 15:16:13

Chris,

> Sorry for the confusion here.  I can't run any sort of vacuum durin the day
> due to performance hits.  However, I have run vacuums at night.  Several
> nights a week I run a vacuumdb -f -z  on all of the clusters.  I can take
> serveral hours to complete, but it does complete.

Well, here's your first problem:  since your FSM pages is low, and you're only
vacuuming once a day, you've got to have some serious table and index bloat.
SO you're going to need to do VACUUM FULL on all of your databases, and then
REINDEX on all of your indexes.

After that, raise your max_fsm_pages to something useful, like 1,000,000.   Of
course, data on your real rate of updates would help more.

If you're getting severe disk choke when you vacuum, you probably are I/O
bound.   You may want to try something which allows you to vacuum one table
at a time, either pg_autovacuum or a custom script.

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: Help with performance problems

From

Ron St-Pierre

Date:

23 April 2004, 15:56:12

Josh Berkus wrote:

>Chris,
>
>
>
>>Sorry for the confusion here.  I can't run any sort of vacuum durin the day
>>due to performance hits.  However, I have run vacuums at night.  Several
>>nights a week I run a vacuumdb -f -z  on all of the clusters.  I can take
>>serveral hours to complete, but it does complete.
>>
>>
>
>Well, here's your first problem:  since your FSM pages is low, and you're only
>vacuuming once a day, you've got to have some serious table and index bloat.
>SO you're going to need to do VACUUM FULL on all of your databases, and then
>REINDEX on all of your indexes.
>
>After that, raise your max_fsm_pages to something useful, like 1,000,000.   Of
>course, data on your real rate of updates would help more.
>
>If you're getting severe disk choke when you vacuum, you probably are I/O
>bound.   You may want to try something which allows you to vacuum one table
>at a time, either pg_autovacuum or a custom script.
>
>
>
Tom and Josh recently gave me some help about setting the fsm settings
which was quite useful. The full message is at
http://archives.postgresql.org/pgsql-performance/2004-04/msg00229.php
and the 'most interesting' posrtion was:

 Actually, since he's running 7.4, there's an even better way.  Do a
 "VACUUM VERBOSE" (full-database vacuum --- doesn't matter whether you
 ANALYZE or not).  At the end of the very voluminous output, you'll see
 something like

 INFO:  free space map: 240 relations, 490 pages stored; 4080 total pages needed
 DETAIL:  Allocated FSM size: 1000 relations + 20000 pages = 178 kB shared memory.

 Here, I would need max_fsm_relations = 240 and max_fsm_pages = 4080 to
 exactly cover the present freespace needs of my system.  I concur with
 the suggestion to bump that up a good deal, of course, but that gives
 you a real number to start from.

 The DETAIL part of the message shows my current settings (which are the
 defaults) and what the FSM is costing me in shared memory space.

Good luck
Ron

Re: Help with performance problems

From

"Chris Hoover"

Date:

23 April 2004, 16:27:31

On Friday 23 April 2004 14:57, Ron St-Pierre wrote:
Does this apply to 7.3.4 also?
>  Actually, since he's running 7.4, there's an even better way.  Do a
>  "VACUUM VERBOSE" (full-database vacuum --- doesn't matter whether you
>  ANALYZE or not).  At the end of the very voluminous output, you'll see
>  something like
>
>
>  INFO:  free space map: 240 relations, 490 pages stored; 4080 total pages
> needed DETAIL:  Allocated FSM size: 1000 relations + 20000 pages = 178 kB
> shared memory.
>
>
>  Here, I would need max_fsm_relations = 240 and max_fsm_pages = 4080 to
>  exactly cover the present freespace needs of my system.  I concur with
>  the suggestion to bump that up a good deal, of course, but that gives
>  you a real number to start from.
>
>
>  The DETAIL part of the message shows my current settings (which are the
>  defaults) and what the FSM is costing me in shared memory space.
>
> Good luck
> Ron
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings

Re: Help with performance problems

From

"scott.marlowe"

Date:

23 April 2004, 17:02:48

On Fri, 23 Apr 2004, Chris Hoover wrote:

> On Friday 23 April 2004 13:21, scott.marlowe wrote:
> > On Fri, 23 Apr 2004, Chris Hoover wrote:
> > > DB's on Powervaults 220S using raid 5 (over 6 disks)
> >
> > What controller is this, the adaptec?  We've found it to be slower than
> > the LSI megaraid based controller, but YMMV.
> >
> We are using the perc3/di controller.  Believe it is using the megaraid
> driver.

No, that's the adaptec, the PERC3/DC is the lsi megaraid.  See if there
are newer drivers for the RAID card.  In terms of performance, the adaptec
and lsi drivers have improved considerably in later versions.  In terms of
stability they've largely gotten better with a few in between releases on
the megaraid getting poor grades.  The latest / greatest from Dell is
pretty up to date.

> > > Running RH ES 2.1
> >
> > Are you running the latest kernel for ES 2.1?  Early 2.4 kernels are
> > pretty pokey and have some odd behaviour under load that later 2.4
> > kernels seemed to fix.
> >
> I'm not sure we are at the latest and greatest for 2.1, but I am trying to get
> there.  Management won't let me do the upgrade w/o first testing/proving it
> will not cause any more issues.  Due to all of the current issues, and the
> criticality of these systems to our bottom line, they are being very careful
> with any change that may impact our users further.

Understood.  It's why my production box is still running a 2.4 kernel on
rh 7.2 with pg 7.2.  They just work, but for us stability AND performance
are both good with our load.

You can install a new kernel and set up the machine to still boot off of
the old one, and test on the weekend to see how it behaves under
simulated load.  Mining the logs for slow queries is a good way to build
one.

while we don't upgrade our production server's applications to the latest
and greatest all the time (i.e. php or postgresql or openldap) we always
run the latest security patches, and I think the latest kernels had
security fixes for ES 2.1, so NOT upgrading it dangerous.  Late model
linux kernels (the 2.0.x and 2.2.x where x>20) tend to be VERY stable and
very conservatively backported and upgraded, so running a new one isn't
usually a big risk.

> > > Here is the postgresql.conf from the server with the 11GB db:
> > >
> > > max_connections = 64
> > > shared_buffers = 32768        # 256MB=32768(buffs)*8192(bytes/buff)
> > > max_fsm_relations = 1000    # min 10, fsm is free space map, ~40 bytes
> > > max_fsm_pages = 10000        # min 1000, fsm is free space map, ~6 bytes
> >
> > IF you're doing lots of updates and such, you might want these higher.
> > Have you vacuumed full the databases since taking over?
> >
> > > sort_mem = 4096            # 256MB=4096(bytes/proc)*64(procs or conns)
> >
> > Sorry, that's wrong.  sort_mem is measure in kbytes.  i.e. 8192 means 8
> > megs sort_mem.  Try setting it a bit higher (you've got LOTS of ram in
> > these boxes) to something like 16 or 32 meg.
> >
> > > checkpoint_segments = 16    # in logfile segments, min 1, 16MB each
> > > checkpoint_timeout = 30        # range 30-3600, in seconds
> > > effective_cache_size = 131072    # typically 8KB each
> >
> > This still looks low.  On one machine you're showing kernel cache of about
> > .7 gig, on the other it's 6 gig.  6 gigs of kernel cache would be a
> > setting of 800000.  It's more of a nudge factor than an exact science, so
> > don't worry too much.
> I believe changing this requires a restart of the cluster (correct?).  If so,
> I'll try bumping up the effective_cache_size over the weekend.
>
> Also, will all of the memory available to these machines, should I be running
> with larger shared_buffers?  It seems like 256M is a bit small.

No, you probably shouldn't.  PostgreSQL doesn't "cache" in the classical
sense.   If all backends close, the stuff they had in their buffers
disappears in a flash.  So, it's generally considered better to let the
kernel do the bulk of the caching, and having the buffer area be large
enough to hold a large portion, if not all, of your working set of data.
But between the cache management which is dirt simple and works but seems
to have performance issues with large numbers of buffers, and the fact
that all the memory in it disappears when the last backend using it.

for instance, in doing the following seq scan select:

explain analyze select * from test;

where test is a ~10 megabyte table, the first time I ran it it took 5
seconds to run.  The second time took it 2.5, the third 1.9, and it
levelled out around there.  Starting up another backend and running the
same query got a 1.9 second response also.  Shutting down both
connections, and running the query again, with only the kernel for
caching, I got 1.9.

That's on a 2.4.2[2-4] kernel.

> > If you've got fast I/O look at lowering random page cost to something
> > between 1 and 2.  We use 1.3 to 1.4 on most of our machines with fast
> > drives under them.
> >
> > I'd use vmstat to see if you're I/O bound.
> >
> If we end up being I/O bound, should the random page cost be set higher?

Not necessarily.  Often times on a machine with a lot of memory, you are
better off using index scans where disk seek time would be expensive, but
with indexes in ram, the page cost in comparison to seq pages is almost 1,
with a slight overhead cost.  So, lowering the random page cost favors
indexes, generally.  If your I/O subsystem is doing a lot of seq scans,
when only part of the data set is ever really being worked on, this tends
to flush out the kernel cache, and we wind up going back to disk over and
over.  On the other hand, if your data is normally going to be
sequentially accessed, then you'll have to invest in better RAID hardware
/ more drives etc...

but with 12 gigs on one box, and an already reasonably fast I/O subsystem
in place, I'd think a lower random page cost would help, not hurt
performance.

Have you explain analyzed your slower queries?

Re: Help with performance problems

From

Ron St-Pierre

Date:

23 April 2004, 18:35:16

Chris Hoover wrote:

>On Friday 23 April 2004 14:57, Ron St-Pierre wrote:
>Does this apply to 7.3.4 also?
>
No it doesn't, I didn't look back through the thread far enough to see
what you were running. I tried it on 7.3.4 and none of the summary info
listed below was returned.  FWIW one of our DBs was slowing down
considerably on an update (30+ minutes) and after I changed
max_fsm_pages from the 7.4 default of 20,000 to 50,000, it completed in
about eight minutes.

Ron

>
>
>> Actually, since he's running 7.4, there's an even better way.  Do a
>> "VACUUM VERBOSE" (full-database vacuum --- doesn't matter whether you
>> ANALYZE or not).  At the end of the very voluminous output, you'll see
>> something like
>>
>>
>> INFO:  free space map: 240 relations, 490 pages stored; 4080 total pages
>>needed DETAIL:  Allocated FSM size: 1000 relations + 20000 pages = 178 kB
>>shared memory.
>>
>>
>> Here, I would need max_fsm_relations = 240 and max_fsm_pages = 4080 to
>> exactly cover the present freespace needs of my system.  I concur with
>> the suggestion to bump that up a good deal, of course, but that gives
>> you a real number to start from.
>>
>>
>> The DETAIL part of the message shows my current settings (which are the
>> defaults) and what the FSM is costing me in shared memory space.
>>
>>Good luck
>>Ron
>>
>>
>>
>>
>>---------------------------(end of broadcast)---------------------------
>>TIP 7: don't forget to increase your free space map settings
>>
>>
>
>
>
>
>

Re: Help with performance problems

From

Tom Lane

Date:

24 April 2004, 00:00:52

"Chris Hoover" <revoohc@sermonaudio.com> writes:
> Here is the postgresql.conf from the server with the 11GB db:

> max_fsm_pages = 10000        # min 1000, fsm is free space map, ~6 bytes

It's unlikely that that's enough for an 11Gb database, especially if
you're only vacuuming a few times a week.  You should make your next run
be a "vacuum verbose" and look at the output to get an idea of what sort
of table bloat you're seeing, but I'll bet it's bad ...

            regards, tom lane

Re: OT: Help with performance problems

From

Rob Fielding

Date:

26 April 2004, 06:02:01

scott.marlowe wrote:
> On Fri, 23 Apr 2004, Chris Hoover wrote:
>
>
>>DB's on Powervaults 220S using raid 5 (over 6 disks)
>
>
> What controller is this, the adaptec?  We've found it to be slower than
> the LSI megaraid based controller, but YMMV.

Wow, really? You got any more details of the chipset, mobo and kernel
driver ?

I've been taken to my wits end wrestling with an LSI MegaRAID 320-1
controller on a supermicro board all weekend. I just couldn't get
anything more than 10MB/sec out of it with megaraid driver v1 OR v2 in
Linux 2.4.26, nor the version in 2.6.6-rc2. After 2 days of humming the
Adaptec mantra I gave in and switched the array straight onto the
onboard Adaptec 160 controller (same cable and everything). Software
RAID 5 gets me over 40MB sec for a nominal cpu hit - more than 4 times
what I could get out of the MegaRAID controller :( Even the 2nd SCSI-2
channel gets 40MB/sec max (pg_xlog :)

And HOW LONG does it take to detect drives during POST....ohhhh never
mind ... I really just wanna rant :) There should be a free counseling
service for enraged sysops.

--

Rob Fielding
rob@dsvr.net

www.dsvr.co.uk              Development             Designer Servers Ltd

Re: Help with performance problems

From

"Chris Hoover"

Date:

26 April 2004, 16:46:47

Ok, I was able to run a vacuumdb -f -v on my largest db over the weekend.
However, I am having trouble reading the results of the table portion.  Here
area a couple of tables, what should I be looking at.  First table is the key
table to the db, and the second is the largest table in the db.

Thanks Chris

INFO:  --Relation public.clmhdr--
INFO:  Pages 32191: Changed 0, reaped 5357, Empty 0, New 0; Tup 339351: Vac
48358, Keep/VTL 0/0, UnUsed 129, MinLen 560, MaxLen 696; Re-using: Free/Av
ail. Space 42011004/32546120; EndEmpty/Avail. Pages 0/5310.
        CPU 0.53s/0.09u sec elapsed 0.61 sec.
INFO:  Index clmhdr_pkey: Pages 1429; Tuples 339351: Deleted 48358.
        CPU 0.06s/0.28u sec elapsed 4.54 sec.
INFO:  Index clmhdr_hdr_user_id_idx: Pages 1711; Tuples 339351: Deleted 48358.
        CPU 0.09s/0.31u sec elapsed 2.40 sec.
INFO:  Index clmhdr_hdr_clm_status_idx: Pages 1237; Tuples 339351: Deleted
48358.
        CPU 0.03s/0.26u sec elapsed 1.66 sec.
INFO:  Index clmhdr_hdr_create_dt_idx: Pages 1475; Tuples 339351: Deleted
48358.
        CPU 0.05s/0.24u sec elapsed 1.96 sec.
INFO:  Index clmhdr_inv_idx: Pages 1429; Tuples 339351: Deleted 48358.
        CPU 0.08s/0.22u sec elapsed 1.20 sec.
INFO:  Index clmhdr_userid_status_idx: Pages 2161; Tuples 339351: Deleted
48358.
        CPU 0.05s/0.18u sec elapsed 3.02 sec.
INFO:  Rel clmhdr: Pages: 32191 --> 28247; Tuple(s) moved: 8257.
        CPU 0.37s/1.81u sec elapsed 16.24 sec.
INFO:  Index clmhdr_pkey: Pages 1429; Tuples 339351: Deleted 8257.
        CPU 0.00s/0.03u sec elapsed 0.03 sec.
INFO:  Index clmhdr_hdr_user_id_idx: Pages 1743; Tuples 339351: Deleted 8257.
        CPU 0.00s/0.05u sec elapsed 0.04 sec.
INFO:  Index clmhdr_hdr_clm_status_idx: Pages 1265; Tuples 339351: Deleted
8257.
        CPU 0.00s/0.04u sec elapsed 0.03 sec.
INFO:  Index clmhdr_hdr_create_dt_idx: Pages 1503; Tuples 339351: Deleted
8257.
        CPU 0.00s/0.04u sec elapsed 0.12 sec.
INFO:  Index clmhdr_inv_idx: Pages 1429; Tuples 339351: Deleted 8257.
        CPU 0.00s/0.04u sec elapsed 0.03 sec.
INFO:  Index clmhdr_userid_status_idx: Pages 2203; Tuples 339351: Deleted
8257.
        CPU 0.01s/0.03u sec elapsed 0.04 sec.

INFO:  --Relation public.sent837--
INFO:  Pages 463552: Changed 0, reaped 6690, Empty 0, New 0; Tup 27431539: Vac
204348, Keep/VTL 0/0, UnUsed 2801, MinLen 107, MaxLen 347; Re-using:
Free/Avail. Space 54541468/34925860; EndEmpty/Avail. Pages 0/70583.
        CPU 10.68s/2.18u sec elapsed 188.32 sec.
INFO:  Index sent837_pkey: Pages 124424; Tuples 27431539: Deleted 204348.
        CPU 4.24s/3.45u sec elapsed 144.79 sec.
INFO:  Rel sent837: Pages: 463552 --> 459954; Tuple(s) moved: 91775.
        CPU 1.12s/9.36u sec elapsed 20.13 sec.
INFO:  Index sent837_pkey: Pages 124424; Tuples 27431539: Deleted 91775.
        CPU 3.51s/2.03u sec elapsed 6.13 sec.

Re: OT: Help with performance problems

From

"scott.marlowe"

Date:

27 April 2004, 15:03:10

On Mon, 26 Apr 2004, Rob Fielding wrote:

> scott.marlowe wrote:
> > On Fri, 23 Apr 2004, Chris Hoover wrote:
> >
> >
> >>DB's on Powervaults 220S using raid 5 (over 6 disks)
> >
> >
> > What controller is this, the adaptec?  We've found it to be slower than
> > the LSI megaraid based controller, but YMMV.
>
> Wow, really? You got any more details of the chipset, mobo and kernel
> driver ?

We're running on a Dell 2650, the controller is the U320 LSI megaraid 2
channel (they only make the one that I know of right now).  Don't know my
mobo chipset offhand, but might be able to find out what one dell includes
on the 2650.  The kernel driver is the latest megaraid2 driver as of about
Feb this year.

> I've been taken to my wits end wrestling with an LSI MegaRAID 320-1
> controller on a supermicro board all weekend. I just couldn't get
> anything more than 10MB/sec out of it with megaraid driver v1 OR v2 in
> Linux 2.4.26, nor the version in 2.6.6-rc2. After 2 days of humming the
> Adaptec mantra I gave in and switched the array straight onto the
> onboard Adaptec 160 controller (same cable and everything). Software
> RAID 5 gets me over 40MB sec for a nominal cpu hit - more than 4 times
> what I could get out of the MegaRAID controller :( Even the 2nd SCSI-2
> channel gets 40MB/sec max (pg_xlog :)
>
> And HOW LONG does it take to detect drives during POST....ohhhh never
> mind ... I really just wanna rant :) There should be a free counseling
> service for enraged sysops.

I wonder if your controller is broken or something?  Or maybe on a PCI
slow that has to share IRQs or something.  I've had great luck with
SuperMicro mobos in the past (we're talking dual PPro 200 mobos, so
seriously, IN THE PAST here... )  Hell, my Dual PPro 200 with an old
MegaRAID 428 got 18 Megs a second cfer rate no problem.

Have you tried that lsi card in another machine / mobo combo?  Can you
disable the onboard adaptec?  We have on our Dell 2650s, the only active
controllers are the onboard IDE and the add in LSI-320-2 controller.

We're running ours with 128 Meg cache (I think could be 64) set to write
back.  I think our throughput on a RAID-1 pair was somewhere around 40+
megs a second reads with bonnie++  With RAID-5 it was not really much
faster at reads (something like 60 megs a second) but was much more
scalable under heavy parellel read/write access for PostgreSQL.

Have you updated the BIOS on the mobo to see if that helps?  I'm just
throwing darts at the wall here.