Thread: Best OS & Configuration for Dual Xeon w/4GB & Adaptec RAID 2200S

Best OS & Configuration for Dual Xeon w/4GB & Adaptec RAID 2200S

From
Kenji Morishige
Date:
About a year ago we decided to migrate our central database that powers various
intranet tools from MySQL to PostgreSQL. We have about 130 tables and about
10GB of data that stores various status information for a variety of services
for our intranet.  We generally have somewhere between 150-200 connections to
the database at any given time and probably anywhere between 5-10 new
connections being made every second and about 100 queries per second. Most
of the queries and transactions are very small due to the fact that the tools
were designed to work around the small functionality of MySQL 3.23 DB.
Our company primarily uses FreeBSD and we are stuck on FreeBSD 4.X series due
to IT support issues, but I believe I may be able to get more performance out
of our server by reconfiguring and setting up the postgresql.conf file up
better.  The performance is not as good as I was hoping at the moment and
it seems as if the database is not making use of the available ram.

snapshot of active server:
last pid:  5788;  load averages:  0.32,  0.31,  0.28                                                     up
127+15:16:0813:59:24 
169 processes: 1 running, 168 sleeping
CPU states:  5.4% user,  0.0% nice,  9.9% system,  0.0% interrupt, 84.7% idle
Mem: 181M Active, 2632M Inact, 329M Wired, 179M Cache, 199M Buf, 81M Free
Swap: 4096M Total, 216K Used, 4096M Free

  PID USERNAME      PRI NICE  SIZE    RES STATE  C   TIME   WCPU    CPU COMMAND
14501 pgsql           2   0   254M   242M select 2  76:26  1.95%  1.95% postgre
 5720 root           28   0  2164K  1360K CPU0   0   0:00  1.84%  0.88% top
 5785 pgsql           2   0   255M 29296K sbwait 0   0:00  3.00%  0.15% postgre
 5782 pgsql           2   0   255M 11900K sbwait 0   0:00  3.00%  0.15% postgre
 5772 pgsql           2   0   255M 11708K sbwait 2   0:00  1.54%  0.15% postgre


Here is my current configuration:

Dual Xeon 3.06Ghz 4GB RAM
Adaptec 2200S 48MB cache & 4 disks configured in RAID5
FreeBSD 4.11 w/kernel options:
options         SHMMAXPGS=65536
options         SEMMNI=256
options         SEMMNS=512
options         SEMUME=256
options         SEMMNU=256
options         SMP                     # Symmetric MultiProcessor Kernel
options         APIC_IO                 # Symmetric (APIC) I/O

The OS is installed on the local single disk and postgres data directory
is on the RAID5 partition.  Maybe Adaptec 2200S RAID5 performance is not as
good as the vendor claimed.  It was my impression that the raid controller
these days are optimized for RAID5 and going RAID10 would not benefit me much.

Also, I may be overlooking a postgresql.conf setting.  I have attached the
config file.

In summary, my questions:

1. Would running PG on FreeBSD 5.x or 6.x or Linux improve performance?

2. Should I change SCSI controller config to use RAID 10 instead of 5?

3. Why isn't postgres using all 4GB of ram for at least caching table for reads?

4. Are there any other settings in the conf file I could try to tweak?

Attachment

Re: Best OS & Configuration for Dual Xeon w/4GB &

From
Scott Marlowe
Date:
On Fri, 2006-03-17 at 16:11, Kenji Morishige wrote:
> About a year ago we decided to migrate our central database that powers various
> intranet tools from MySQL to PostgreSQL. We have about 130 tables and about
> 10GB of data that stores various status information for a variety of services
> for our intranet.  We generally have somewhere between 150-200 connections to
> the database at any given time and probably anywhere between 5-10 new
> connections being made every second and about 100 queries per second. Most
> of the queries and transactions are very small due to the fact that the tools
> were designed to work around the small functionality of MySQL 3.23 DB.
> Our company primarily uses FreeBSD and we are stuck on FreeBSD 4.X series due
> to IT support issues,

There were a LOT of performance enhancements to FreeBSD with the 5.x
series release.  I'd recommend fast tracking the database server to the
5.x branch.  4-stable was release 6 years ago.  5-stable was released
two years ago.

> but I believe I may be able to get more performance out
> of our server by reconfiguring and setting up the postgresql.conf file up
> better.

Can't hurt.  But if your OS isn't doing the job, postgresql.conf can
only do so much, nee?

>   The performance is not as good as I was hoping at the moment and
> it seems as if the database is not making use of the available ram.
> snapshot of active server:
> last pid:  5788;  load averages:  0.32,  0.31,  0.28                                                     up
127+15:16:0813:59:24 
> 169 processes: 1 running, 168 sleeping
> CPU states:  5.4% user,  0.0% nice,  9.9% system,  0.0% interrupt, 84.7% idle
> Mem: 181M Active, 2632M Inact, 329M Wired, 179M Cache, 199M Buf, 81M Free
> Swap: 4096M Total, 216K Used, 4096M Free
>
>   PID USERNAME      PRI NICE  SIZE    RES STATE  C   TIME   WCPU    CPU COMMAND
> 14501 pgsql           2   0   254M   242M select 2  76:26  1.95%  1.95% postgre
>  5720 root           28   0  2164K  1360K CPU0   0   0:00  1.84%  0.88% top
>  5785 pgsql           2   0   255M 29296K sbwait 0   0:00  3.00%  0.15% postgre
>  5782 pgsql           2   0   255M 11900K sbwait 0   0:00  3.00%  0.15% postgre
>  5772 pgsql           2   0   255M 11708K sbwait 2   0:00  1.54%  0.15% postgre

That doesn't look good.  Is this machine freshly rebooted, or has it
been running postgres for a while?  179M cache and 199M buffer with 2.6
gig inactive is horrible for a machine running a 10gig databases.

For comparison, here's what my production linux boxes show in top:
 16:42:27  up 272 days, 14:49,  1 user,  load average: 1.02, 1.04, 1.00
162 processes: 161 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total    0.2%    0.0%    0.4%   0.0%     0.0%    0.4%   98.7%
           cpu00    0.4%    0.0%    0.4%   0.0%     0.0%    0.0%   99.0%
           cpu01    0.0%    0.0%    0.4%   0.0%     0.0%    0.9%   98.5%
Mem: 6096912k av, 4529208k used, 1567704k free, 0k shrd,  306884k buff
                  2398948k actv, 1772072k in_d,   78060k in_c
Swap: 4192880k av,  157480k used, 4035400k free        3939332k cached

PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
24000 postgres  15 0  752  524  456 S   0.0  0.0   0:00   1 rotatelogs
24012 postgres  15 0 1536 1420 1324 S   0.0  0.0   7:11   0 postmaster
24015 postgres  15 0 2196 2032  996 S   0.0  0.0  56:07   0 postmaster
24016 postgres  15 0 1496 1352 1004 S   0.0  0.0 233:46   1 postmaster

Note that the kernel here is caching ~3.9 gigs of data.  so, postgresql
doesn't have to.   Also, the disk buffers are sitting at > 300 Megs.

If FreeBSD 4.x can't or won't cache more than that, there's an OS issue
here, either endemic to FreeBSD 4.x, or your configuration of it.


> Dual Xeon 3.06Ghz 4GB RAM

Make sure hyperthreading is disabled, it's generally a performance loss
for pgsql.

> Adaptec 2200S 48MB cache & 4 disks configured in RAID5

I'm not a huge fan of adaptec RAID controllers, and 48 Megs ain't much.
But for what you're doing, I'd expect it to run well enough.  Have you
tested this array with bonnie++ to see what kind of performance it gets
in general?  There could be some kind of hardware issue going on you're
not seeing in the logs.

Is that memory cache set to write back not through, and does it have
battery backup (the cache, not the machine)?

> The OS is installed on the local single disk and postgres data directory
> is on the RAID5 partition.  Maybe Adaptec 2200S RAID5 performance is not as
> good as the vendor claimed.  It was my impression that the raid controller
> these days are optimized for RAID5 and going RAID10 would not benefit me much.

You have to be careful about RAID 10, since many controllers serialize
access through multiple levels of RAID, and therefore wind up being
slower in RAID 10 or 50 than in RAID 1 or 5.

> Also, I may be overlooking a postgresql.conf setting.  I have attached the
> config file.

If you're doing a lot of small transactions you might see some gain from
increasing commit_delay to 100 to 1000 and commit siblings to 25 to
100.  It won't set the world on fire, but it's given me a 25% boost on
certain loads with lots of small transactions

>
> In summary, my questions:
>
> 1. Would running PG on FreeBSD 5.x or 6.x or Linux improve performance?

It most probably would.  I'd at least test it out.

> 2. Should I change SCSI controller config to use RAID 10 instead of 5?

Maybe.  With that controller, and many others in its league, you may be
slowing things down doing that.  You may be better off with a simple
RAID 1 instead as well.  Also, if you've got a problem with the
controller serializing multiple raid levels, you might see the best
performance with one raid level on the controller and the other handled
by the kernel.  BSD does do kernel level RAID, right?

> 3. Why isn't postgres using all 4GB of ram for at least caching table for reads?

Because that's your Operating System's job.

> 4. Are there any other settings in the conf file I could try to tweak?

With the later versions of PostgreSQL, it's gotten better at doing the
OS job of caching, IF you set it to use enough memory.  You might try
cranking up shared memory / shared_buffers to something large like 75%
of the machine memory and see if that does help.  With 7.4 and before,
it's generally a really bad idea.   Looking at your postgresql.conf it
appears you're running a post-7.4 version, so you might be able to get
away with handing over all the ram to the database.

Now that the tuning stuff is out of the way.  Have you been using the
logging to look for individual slow queries and run explain analyze on
them? Are you analyzing your database and vacuuming it too?

Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec RAID 2200S

From
"Claus Guttesen"
Date:
> Here is my current configuration:
>
> Dual Xeon 3.06Ghz 4GB RAM
> Adaptec 2200S 48MB cache & 4 disks configured in RAID5
> FreeBSD 4.11 w/kernel options:
> options         SHMMAXPGS=65536
> options         SEMMNI=256
> options         SEMMNS=512
> options         SEMUME=256
> options         SEMMNU=256
> options         SMP                     # Symmetric MultiProcessor Kernel
> options         APIC_IO                 # Symmetric (APIC) I/O
>
> The OS is installed on the local single disk and postgres data directory
> is on the RAID5 partition.  Maybe Adaptec 2200S RAID5 performance is not as
> good as the vendor claimed.  It was my impression that the raid controller
> these days are optimized for RAID5 and going RAID10 would not benefit me much.

I don't know whether 'systat -vmstat' is available on 4.x, if so try
to issue the command with 'systat -vmstat 1' for 1 sec. updates. This
will (amongst much other info) show how much disk-transfer you have.

> Also, I may be overlooking a postgresql.conf setting.  I have attached the
> config file.

You could try to lower shared_buffers from 30000 to 16384. Setting
this value too high can in some cases be counterproductive according
to doc's I read.

Also try to lower work_mem from 16384 to 8192 or 4096. This setting is
for each sort, so it does become expensive in terms of memory when
many sorts are being carried out. It does depend on the complexity of
your sorts of course.

Try to do a vacuum analyse in your crontab. If your aliases-file is
set up correctly mails generated by crontab will be forwarded to a
human being. I have the following in my (root) crontab (and mail to
root forwarded to me):

time /usr/local/bin/psql -d dbname -h dbhost -U username -c "vacuum
analyse verbose;"

> In summary, my questions:
>
> 1. Would running PG on FreeBSD 5.x or 6.x or Linux improve performance?

Going to 6.x would probably increase overall performance, but you have
to try it out first. Many people report increased performance just by
upgrading, some report that it grinds to a halt. But SMP-wise 6.x is a
more mature release than 4.x is. Changes to the kernel from being
giant-locked in 4.x to be "fine-grained locked" started in 5.x and
have improved in 6.x. The disk- and network-layer should behave
better.

Linux, don't know. If your expertise is in FreeBSD try this first and
then move to Linux (or Solaris 10) if 6.x does not meet your
expectations.

> 3. Why isn't postgres using all 4GB of ram for at least caching table for reads?

I guess it's related to the usage of the i386-architecture in general.
If the zzeons are the newer noconas you can try the amd64-port
instead. This can utilize more memory (without going through PAE).

> 4. Are there any other settings in the conf file I could try to tweak?

max_fsm_pages and max_fsm_relations. You can look at the bottom of
vacuum analyze and increase the values:

INFO:  free space map: 153 relations, 43445 pages stored; 45328 total
pages needed

Raise max_fsm_pages so it meet or exceed 'total pages needed' and
max_fsm_relations to relations.

This is finetuning though. It's more important to set work- and
maintenance-mem correct.

hth
Claus

Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec RAID 2200S

From
Tom Lane
Date:
Kenji Morishige <kenjim@juniper.net> writes:
> ...  We generally have somewhere between 150-200 connections to
> the database at any given time and probably anywhere between 5-10 new
> connections being made every second and about 100 queries per second. Most
> of the queries and transactions are very small due to the fact that the tools
> were designed to work around the small functionality of MySQL 3.23 DB.

You should think seriously about putting in some sort of
connection-pooling facility.  Postgres backends aren't especially
lightweight things; the overhead involved in forking a process and then
getting its internal caches populated etc. is significant.  You don't
want to be doing that for one small query, at least not if you're doing
so many times a second.

> it seems as if the database is not making use of the available ram.

Postgres generally relies on the kernel to do the bulk of the disk
caching.  Your shared_buffers setting of 30000 seems quite reasonable to
me; I don't think you want to bump it up (not much anyway).  I'm not too
familiar with FreeBSD and so I'm not clear on what "Inact" is:

> Mem: 181M Active, 2632M Inact, 329M Wired, 179M Cache, 199M Buf, 81M Free
> Swap: 4096M Total, 216K Used, 4096M Free

If "Inact" covers disk pages cached by the kernel then this is looking
reasonably good.  If it's something else then you got a problem, but
fixing it is a kernel issue not a database issue.

> #max_fsm_pages = 20000        # min max_fsm_relations*16, 6 bytes each

You almost certainly need to bump this way up.  20000 is enough to cover
dirty pages in about 200MB of database, which is only a fiftieth of
what you say your disk footprint is.  Unless most of your data is
static, you're going to be suffering severe table bloat over time due
to inability to recycle free space properly.

            regards, tom lane

Re: Best OS & Configuration for Dual Xeon w/4GB &

From
Scott Marlowe
Date:
On Fri, 2006-03-17 at 17:03, Claus Guttesen wrote:
> > Here is my current configuration:

> > Also, I may be overlooking a postgresql.conf setting.  I have attached the
> > config file.
>
> You could try to lower shared_buffers from 30000 to 16384. Setting
> this value too high can in some cases be counterproductive according
> to doc's I read.

FYI, that was very true before 8.0, but since the introduction of better
cache management algorithms, you can have pretty big shared_buffers
settings.

> Also try to lower work_mem from 16384 to 8192 or 4096. This setting is
> for each sort, so it does become expensive in terms of memory when
> many sorts are being carried out. It does depend on the complexity of
> your sorts of course.

But looking at his usage of RAM on his box, it doesn't look like one at
the time that snapshot was taken.  Assuming the box was busy then, he's
OK.  Otherwise, he'd show a usage of swapping, which he doesn't.

Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec RAID 2200S

From
"Claus Guttesen"
Date:
> 4. Are there any other settings in the conf file I could try to tweak?

One more thing :-)

I stumbled over this setting, this made the db (PG 7.4.9) make use of
the index rather than doing a sequential scan and it reduced a query
from several minutes to some 20 seconds.

random_page_cost = 2 (original value was 4).

Another thing you ought to do is to to get the four-five most used
queries and do an explain analyze in these. Since our website wasn't
prepared for this type of statistics I simply did a tcpdump, grep'ed
all select's, sorted them and sorted them unique so I could see which
queries were used most.

regards
Claus

Re: Best OS & Configuration for Dual Xeon w/4GB &

From
Mark Kirkwood
Date:
Scott Marlowe wrote:
> On Fri, 2006-03-17 at 16:11, Kenji Morishige wrote:
>
>>About a year ago we decided to migrate our central database that powers various
>>intranet tools from MySQL to PostgreSQL. We have about 130 tables and about
>>10GB of data that stores various status information for a variety of services
>>for our intranet.  We generally have somewhere between 150-200 connections to
>>the database at any given time and probably anywhere between 5-10 new
>>connections being made every second and about 100 queries per second. Most
>>of the queries and transactions are very small due to the fact that the tools
>>were designed to work around the small functionality of MySQL 3.23 DB.
>>Our company primarily uses FreeBSD and we are stuck on FreeBSD 4.X series due
>>to IT support issues,
>
>
> There were a LOT of performance enhancements to FreeBSD with the 5.x
> series release.  I'd recommend fast tracking the database server to the
> 5.x branch.  4-stable was release 6 years ago.  5-stable was released
> two years ago.
>
>

I would recommend skipping 5.x and using 6.0 - as it performs measurably
better than 5.x. In particular the vfs layer is no longer under the
GIANT lock, so you will get considerably improved concurrent filesystem
access on your dual Xeon.

Regards

Mark

Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec RAID 2200S

From
Kenji Morishige
Date:
Thanks guys, I'm studying each of your responses and am going to start to
experiement. Unfortunately, I don't have another box with similar specs to
do a perfect experiment with, but I think I'm going to go ahead and open a
service window to ungrade the box to FBSD6.0 and apply some other changes. It
also gives me the chance to go from 8.0.1 to 8.1 series which I been wanting
to do as well.  Thanks guys and I will see if any of your suggestions make
a noticable difference.  I also have been looking at log result of slow queries
and making necessary indexes to make those go faster.

-Kenji

On Sat, Mar 18, 2006 at 12:29:17AM +0100, Claus Guttesen wrote:
> > 4. Are there any other settings in the conf file I could try to tweak?
>
> One more thing :-)
>
> I stumbled over this setting, this made the db (PG 7.4.9) make use of
> the index rather than doing a sequential scan and it reduced a query
> from several minutes to some 20 seconds.
>
> random_page_cost = 2 (original value was 4).
>
> Another thing you ought to do is to to get the four-five most used
> queries and do an explain analyze in these. Since our website wasn't
> prepared for this type of statistics I simply did a tcpdump, grep'ed
> all select's, sorted them and sorted them unique so I could see which
> queries were used most.
>
> regards
> Claus

Re: Best OS & Configuration for Dual Xeon w/4GB &

From
"Luke Lonergan"
Date:
Kenji,


On 3/17/06 4:08 PM, "Kenji Morishige" <kenjim@juniper.net> wrote:

> Thanks guys, I'm studying each of your responses and am going to start to
> experiement.

I notice that no one asked you about your disk bandwidth - the Adaptec 2200S
is a "known bad" controller - the bandwidth to/from in RAID5 is about 1/2 to
1/3 of a single disk drive, which is far too slow for a 10GB database, and
IMO should disqualify a RAID adapter from being used at all.

Without fixing this, I'd suggest that all of the other tuning described here
will have little value, provided your working set is larger than your RAM.

You should test the I/O bandwidth using these simple tests:
  time bash -c "dd if=/dev/zero of=bigfile bs=8k count=1000000 && sync"

then:
  time dd if=bigfile of=/dev/null bs=8k

You should get on the order of 150MB/s on four disk drives in RAID5.

And before people jump in about "random I/O", etc, the sequential scan test
will show whether the controller is just plain bad very quickly.  If it
can't do sequential fast, it won't do seeks fast either.

- Luke



Re: Best OS & Configuration for Dual Xeon w/4GB &

From
"Jim C. Nasby"
Date:
On Fri, Mar 17, 2006 at 05:00:34PM -0600, Scott Marlowe wrote:
> > last pid:  5788;  load averages:  0.32,  0.31,  0.28                                                     up
127+15:16:0813:59:24 
> > 169 processes: 1 running, 168 sleeping
> > CPU states:  5.4% user,  0.0% nice,  9.9% system,  0.0% interrupt, 84.7% idle
> > Mem: 181M Active, 2632M Inact, 329M Wired, 179M Cache, 199M Buf, 81M Free
> > Swap: 4096M Total, 216K Used, 4096M Free
> >
> >   PID USERNAME      PRI NICE  SIZE    RES STATE  C   TIME   WCPU    CPU COMMAND
> > 14501 pgsql           2   0   254M   242M select 2  76:26  1.95%  1.95% postgre
> >  5720 root           28   0  2164K  1360K CPU0   0   0:00  1.84%  0.88% top
> >  5785 pgsql           2   0   255M 29296K sbwait 0   0:00  3.00%  0.15% postgre
> >  5782 pgsql           2   0   255M 11900K sbwait 0   0:00  3.00%  0.15% postgre
> >  5772 pgsql           2   0   255M 11708K sbwait 2   0:00  1.54%  0.15% postgre
>
> That doesn't look good.  Is this machine freshly rebooted, or has it
> been running postgres for a while?  179M cache and 199M buffer with 2.6
> gig inactive is horrible for a machine running a 10gig databases.

No, this is perfectly fine. Inactive memory in FreeBSD isn't the same as
Free. It's the same as 'active' memory except that it's pages that
haven't been accessed in X amount of time (between 100 and 200 ms, I
think). When free memory starts getting low, FBSD will start moving
pages from the inactive queue to the free queue (possibly resulting in
writes to disk along the way).

IIRC, Cache is the directory cache, and Buf is disk buffers, which is
somewhat akin to shared_buffers in PostgreSQL.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Best OS & Configuration for Dual Xeon w/4GB &

From
Scott Marlowe
Date:
On Mon, 2006-03-20 at 08:45, Jim C. Nasby wrote:
> On Fri, Mar 17, 2006 at 05:00:34PM -0600, Scott Marlowe wrote:
> > > last pid:  5788;  load averages:  0.32,  0.31,  0.28                                                     up
127+15:16:0813:59:24 
> > > 169 processes: 1 running, 168 sleeping
> > > CPU states:  5.4% user,  0.0% nice,  9.9% system,  0.0% interrupt, 84.7% idle
> > > Mem: 181M Active, 2632M Inact, 329M Wired, 179M Cache, 199M Buf, 81M Free
> > > Swap: 4096M Total, 216K Used, 4096M Free
> > >
> > >   PID USERNAME      PRI NICE  SIZE    RES STATE  C   TIME   WCPU    CPU COMMAND
> > > 14501 pgsql           2   0   254M   242M select 2  76:26  1.95%  1.95% postgre
> > >  5720 root           28   0  2164K  1360K CPU0   0   0:00  1.84%  0.88% top
> > >  5785 pgsql           2   0   255M 29296K sbwait 0   0:00  3.00%  0.15% postgre
> > >  5782 pgsql           2   0   255M 11900K sbwait 0   0:00  3.00%  0.15% postgre
> > >  5772 pgsql           2   0   255M 11708K sbwait 2   0:00  1.54%  0.15% postgre
> >
> > That doesn't look good.  Is this machine freshly rebooted, or has it
> > been running postgres for a while?  179M cache and 199M buffer with 2.6
> > gig inactive is horrible for a machine running a 10gig databases.
>
> No, this is perfectly fine. Inactive memory in FreeBSD isn't the same as
> Free. It's the same as 'active' memory except that it's pages that
> haven't been accessed in X amount of time (between 100 and 200 ms, I
> think). When free memory starts getting low, FBSD will start moving
> pages from the inactive queue to the free queue (possibly resulting in
> writes to disk along the way).
>
> IIRC, Cache is the directory cache, and Buf is disk buffers, which is
> somewhat akin to shared_buffers in PostgreSQL.

So, then, the inact is pretty much the same as kernel buffers in linux?

Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec RAID 2200S

From
Vivek Khera
Date:
On Mar 17, 2006, at 5:11 PM, Kenji Morishige wrote:

> In summary, my questions:
>
> 1. Would running PG on FreeBSD 5.x or 6.x or Linux improve
> performance?

FreeBSD 6.x will definitely get you improvements.  Many speedup
improvements have been made to both the generic disk layer and the
specific drivers.  However, the current best of breed RAID controller
is the LSI 320-x (I use 320-2X).   I have one box into which this
card will not fit (Thanks Sun, for making a box with only low-profile
slots!) so I use an Adaptec 2230SLP card in it.  Testing shows it is
about 80% speed of a LSI 320-2x on sequential workload (load DB, run
some queries, rebuild indexes, etc.)

If you do put on FreeBSD 6, I'd love to see the output of "diskinfo -
v -t" on your RAID volume(s).

>
> 2. Should I change SCSI controller config to use RAID 10 instead of 5?

I use RAID10.

>
> 3. Why isn't postgres using all 4GB of ram for at least caching
> table for reads?

I think FreeBSD has a hard upper limit on the total ram it will use
for disk cache.  I haven't been able to get reliable, irrefutable,
answers about it, though.

>
> 4. Are there any other settings in the conf file I could try to tweak?

I like to bump up the checkpoint segments to 256.


Re: Best OS & Configuration for Dual Xeon w/4GB &

From
Alex Hayward
Date:
On Mon, 20 Mar 2006, Jim C. Nasby wrote:

> No, this is perfectly fine. Inactive memory in FreeBSD isn't the same as
> Free. It's the same as 'active' memory except that it's pages that
> haven't been accessed in X amount of time (between 100 and 200 ms, I
> think). When free memory starts getting low, FBSD will start moving
> pages from the inactive queue to the free queue (possibly resulting in
> writes to disk along the way).
>
> IIRC, Cache is the directory cache, and Buf is disk buffers, which is
> somewhat akin to shared_buffers in PostgreSQL.

I don't believe that's true. I'm not an expert in FreeBSD's VM internals,
but this is how I believe it works:

Active pages are pages currently mapped in to a process's address space.

Inactive pages are pages which are marked dirty (must be written to
backing store before they can be freed) and which are not mapped in to a
process's address. They're still associated with a VM object of some kind
- like part of a process's virtual address space or a as part of the cache
for a file on disk. If it's still part of a process's virtual address
space and is accessed a fault is generated. The page is then put back in
to the address mappings.

Cached pages are like inactive pages but aren't dirty. Then can be either
re-mapped or freed immediately.

Free pages are properly free. Wired pages are unswappable. Buf I'm not
sure about. It doesn't represent that amount of memory used to cache files
on disk, I'm sure of that. The sysctl -d description is 'KVA memory used
for bufs', so I suspect that it's the amount of kernel virtual address
space mapped to pages in the 'active', 'inactive' and 'cache' queues.

--
  Alex Hayward
  Seatbooker


Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec

From
Miguel
Date:
Vivek Khera wrote:

>
> On Mar 17, 2006, at 5:11 PM, Kenji Morishige wrote:
>
>> In summary, my questions:
>>
>> 1. Would running PG on FreeBSD 5.x or 6.x or Linux improve  performance?
>
>
> FreeBSD 6.x will definitely get you improvements.  Many speedup
> improvements have been made to both the generic disk layer and the
> specific drivers.  However, the current best of breed RAID controller
> is the LSI 320-x (I use 320-2X).   I have one box into which this
> card will not fit (Thanks Sun, for making a box with only low-profile
> slots!) so I use an Adaptec 2230SLP card in it.  Testing shows it is
> about 80% speed of a LSI 320-2x on sequential workload (load DB, run
> some queries, rebuild indexes, etc.)
>
> If you do put on FreeBSD 6, I'd love to see the output of "diskinfo -
> v -t" on your RAID volume(s).
>
Not directly related ...
i have a HP dl380 g3 with array 5i controlled (1+0), these are my results

shiva2# /usr/sbin/diskinfo -v -t /dev/da2s1d
/dev/da2s1d
        512             # sectorsize
        218513555456    # mediasize in bytes (204G)
        426784288       # mediasize in sectors
        52301           # Cylinders according to firmware.
        255             # Heads according to firmware.
        32              # Sectors according to firmware.

Seek times:
        Full stroke:      250 iter in   1.138232 sec =    4.553 msec
        Half stroke:      250 iter in   1.084474 sec =    4.338 msec
        Quarter stroke:   500 iter in   1.690313 sec =    3.381 msec
        Short forward:    400 iter in   0.752646 sec =    1.882 msec
        Short backward:   400 iter in   1.306270 sec =    3.266 msec
        Seq outer:       2048 iter in   0.766676 sec =    0.374 msec
        Seq inner:       2048 iter in   0.803759 sec =    0.392 msec
Transfer rates:
        outside:       102400 kbytes in   2.075984 sec =    49326 kbytes/sec
        middle:        102400 kbytes in   2.100510 sec =    48750 kbytes/sec
        inside:        102400 kbytes in   2.042313 sec =    50139 kbytes/sec


is this good enough?

Re: Best OS & Configuration for Dual Xeon w/4GB &

From
"Luke Lonergan"
Date:
Miguel,

On 3/20/06 12:52 PM, "Miguel" <mmiranda@123.com.sv> wrote:

> i have a HP dl380 g3 with array 5i controlled (1+0), these are my results

Another "known bad" RAID controller.  The Smartarray 5i is horrible on Linux
- this is the first BSD result I've seen.

> Seek times:
>         Full stroke:      250 iter in   1.138232 sec =    4.553 msec
>         Half stroke:      250 iter in   1.084474 sec =    4.338 msec

These seem OK - are they "access times" or are they actually "seek times"?
Seems like with RAID 10, you should get better by maybe double.

> Transfer rates:
>         outside:       102400 kbytes in   2.075984 sec =    49326 kbytes/sec
>         middle:        102400 kbytes in   2.100510 sec =    48750 kbytes/sec
>         inside:        102400 kbytes in   2.042313 sec =    50139 kbytes/sec
>
>
> is this good enough?

It's pretty slow.  How many disk drives do you have?

- Luke



Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec

From
Miguel
Date:
Luke Lonergan wrote:

>Miguel,
>
>On 3/20/06 12:52 PM, "Miguel" <mmiranda@123.com.sv> wrote:
>
>
>
>>i have a HP dl380 g3 with array 5i controlled (1+0), these are my results
>>
>>
>
>Another "known bad" RAID controller.  The Smartarray 5i is horrible on Linux
>- this is the first BSD result I've seen.
>
>
>
>>Seek times:
>>        Full stroke:      250 iter in   1.138232 sec =    4.553 msec
>>        Half stroke:      250 iter in   1.084474 sec =    4.338 msec
>>
>>
>
>These seem OK - are they "access times" or are they actually "seek times"?
>
>
i dont know, how can i check?

>Transfer rates:
>        outside:       102400 kbytes in   2.075984 sec =    49326 kbytes/sec
>        middle:        102400 kbytes in   2.100510 sec =    48750 kbytes/sec
>        inside:        102400 kbytes in   2.042313 sec =    50139 kbytes/sec
>
>
>is this good enough?
>It's pretty slow.  How many disk drives do you have?
>
>
>
>
I have 6 ultra a320 72G 10k discs

---
Miguel

Re: Best OS & Configuration for Dual Xeon w/4GB &

From
"Luke Lonergan"
Date:
Miguel,

On 3/20/06 1:12 PM, "Miguel" <mmiranda@123.com.sv> wrote:

> i dont know, how can i check?

No matter - it's the benchmark that would tell you, it's probably "access
time" that's being measured even though the text says "seek time".  The
difference is that seek time represents only the head motion, where access
time is the whole access including seek.  Access times of 4.5ms are typical
of a single 10K RPM SCSI disk drive like the Seagate barracuda.

>> Transfer rates:
>>        outside:       102400 kbytes in   2.075984 sec =    49326 kbytes/sec
>>        middle:        102400 kbytes in   2.100510 sec =    48750 kbytes/sec
>>        inside:        102400 kbytes in   2.042313 sec =    50139 kbytes/sec
>>
> I have 6 ultra a320 72G 10k discs

Yah - ouch.  With 6 drives in a RAID10, you should expect 3 drives worth of
sequential scan performance, or anywhere from 100MB/s to 180MB/s.  You're
getting from half to 1/3 of the performance you'd get with a decent raid
controller.

If you add a simple SCSI adapter like the common LSI U320 adapter to your
DL380G3 and then run software RAID, you will get more than 150MB/s with less
CPU consumption.  I'd also expect you'd get down to about 2ms access times.

This might not be easy for you to do, and you might prefer hardware RAID
adapters, but I don't have a recommendation for you there.  I'd stay away
from the HP line.

- Luke



Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec

From
Miguel
Date:
Luke Lonergan wrote:

>>>Transfer rates:
>>>       outside:       102400 kbytes in   2.075984 sec =    49326 kbytes/sec
>>>       middle:        102400 kbytes in   2.100510 sec =    48750 kbytes/sec
>>>       inside:        102400 kbytes in   2.042313 sec =    50139 kbytes/sec
>>>
>>>
>>>
>>I have 6 ultra a320 72G 10k discs
>>
>>
>
>Yah - ouch.  With 6 drives in a RAID10, you should expect 3 drives worth of
>sequential scan performance, or anywhere from 100MB/s to 180MB/s.  You're
>getting from half to 1/3 of the performance you'd get with a decent raid
>controller.
>
>If you add a simple SCSI adapter like the common LSI U320 adapter to your
>DL380G3 and then run software RAID, you will get more than 150MB/s with less
>CPU consumption.  I'd also expect you'd get down to about 2ms access times.
>
>This might not be easy for you to do, and you might prefer hardware RAID
>adapters, but I don't have a recommendation for you there.  I'd stay away
>from the HP line.
>
>
>
This is my new postgreql 8.1.3 server, so i have many options (in fact,
any option) to choose from, i want maximum performance, if i undestood
you well, do you mean using something like vinum?
i forgot to mention that the 6 discs are in a MSA500 G2 external
storadge, additionally  i have two 36G a320 10k in raid 10 for the os
installed in the server slots.
---
Miguel


Re: Best OS & Configuration for Dual Xeon w/4GB &

From
"Luke Lonergan"
Date:
Miguel,


On 3/20/06 1:51 PM, "Miguel" <mmiranda@123.com.sv> wrote:

> i forgot to mention that the 6 discs are in a MSA500 G2 external
> storadge, additionally  i have two 36G a320 10k in raid 10 for the os
> installed in the server slots.

I just checked online and I think the MSA500 G2 has it's own SCSI RAID
controllers, so you are actually just using the 5i as a SCSI attach, which
it's not good at (no reordering/command queueing, etc).  So, just using a
simple SCSI adapter to connect to the MSA might be a big win.

- Luke



Re: Best OS & Configuration for Dual Xeon w/4GB &

From
Miguel
Date:
Luke Lonergan wrote:

>Miguel,
>
>
>On 3/20/06 1:51 PM, "Miguel" <mmiranda@123.com.sv> wrote:
>
>
>
>>i forgot to mention that the 6 discs are in a MSA500 G2 external
>>storadge, additionally  i have two 36G a320 10k in raid 10 for the os
>>installed in the server slots.
>>
>>
>
>I just checked online and I think the MSA500 G2 has it's own SCSI RAID
>controllers,
>
Yes, it has its own redundant controller,

> so you are actually just using the 5i as a SCSI attach, which
>it's not good at (no reordering/command queueing, etc).  So, just using a
>simple SCSI adapter to connect to the MSA might be a big win.
>
>

I will try a LS320 and will let you know if i got any performance gain,
thanks for your advises

---
Miguel


Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec RAID 2200S

From
Vivek Khera
Date:
>> If you do put on FreeBSD 6, I'd love to see the output of
>> "diskinfo - v -t" on your RAID volume(s).
>>
> Not directly related ...
> i have a HP dl380 g3 with array 5i controlled (1+0), these are my
> results
> [...]
> is this good enough?

Is that on a loaded box or a mostly quiet box?  Those number seem
rather low for my tastes.  For comparison, here are numbers from a
Dell 1850 with a built-in PERC 4e/Si RAID in a two disk mirror.  All
numbers below are on mostly or totally quiet disk systems.

amrd0
         512             # sectorsize
         73274490880     # mediasize in bytes (68G)
         143114240       # mediasize in sectors
         8908            # Cylinders according to firmware.
         255             # Heads according to firmware.
         63              # Sectors according to firmware.

Seek times:
         Full stroke:      250 iter in   0.756718 sec =    3.027 msec
         Half stroke:      250 iter in   0.717824 sec =    2.871 msec
         Quarter stroke:   500 iter in   1.972368 sec =    3.945 msec
         Short forward:    400 iter in   1.193179 sec =    2.983 msec
         Short backward:   400 iter in   1.322440 sec =    3.306 msec
         Seq outer:       2048 iter in   0.271402 sec =    0.133 msec
         Seq inner:       2048 iter in   0.271151 sec =    0.132 msec
Transfer rates:
         outside:       102400 kbytes in   1.080339 sec =    94785
kbytes/sec
         middle:        102400 kbytes in   1.166021 sec =    87820
kbytes/sec
         inside:        102400 kbytes in   1.461498 sec =    70065
kbytes/sec


And for the *real* disks....  In the following two cases, I used a
Dell 1425SC with 1GB RAM and connected the controllers to the same
Dell PowerVault 14 disk U320 array (one controller at a time,
obviously).  For each controller each pair of the mirror was on the
opposite channel of the controller for optimal speed.  disk 0 is a
RAID1 of two drives, and disk 1 is a RAID10 of the remaining 12
drives.  All running FreeBSD 6.0 RELEASE.  First I tested the Adaptec
2230SLP and got these:

aacd0
         512             # sectorsize
         36385456128     # mediasize in bytes (34G)
         71065344        # mediasize in sectors
         4423            # Cylinders according to firmware.
         255             # Heads according to firmware.
         63              # Sectors according to firmware.

Seek times:
         Full stroke:      250 iter in   2.288389 sec =    9.154 msec
         Half stroke:      250 iter in   1.657302 sec =    6.629 msec
         Quarter stroke:   500 iter in   2.756597 sec =    5.513 msec
         Short forward:    400 iter in   1.205275 sec =    3.013 msec
         Short backward:   400 iter in   1.249310 sec =    3.123 msec
         Seq outer:       2048 iter in   0.412770 sec =    0.202 msec
         Seq inner:       2048 iter in   0.428585 sec =    0.209 msec
Transfer rates:
         outside:       102400 kbytes in   1.204412 sec =    85021
kbytes/sec
         middle:        102400 kbytes in   1.347325 sec =    76002
kbytes/sec
         inside:        102400 kbytes in   2.036832 sec =    50274
kbytes/sec


aacd1
         512             # sectorsize
         218307231744    # mediasize in bytes (203G)
         426381312       # mediasize in sectors
         26541           # Cylinders according to firmware.
         255             # Heads according to firmware.
         63              # Sectors according to firmware.

Seek times:
         Full stroke:      250 iter in   0.856699 sec =    3.427 msec
         Half stroke:      250 iter in   1.475651 sec =    5.903 msec
         Quarter stroke:   500 iter in   2.693270 sec =    5.387 msec
         Short forward:    400 iter in   1.127831 sec =    2.820 msec
         Short backward:   400 iter in   1.216876 sec =    3.042 msec
         Seq outer:       2048 iter in   0.416340 sec =    0.203 msec
         Seq inner:       2048 iter in   0.436471 sec =    0.213 msec
Transfer rates:
         outside:       102400 kbytes in   1.245798 sec =    82196
kbytes/sec
         middle:        102400 kbytes in   1.169033 sec =    87594
kbytes/sec
         inside:        102400 kbytes in   1.390840 sec =    73625
kbytes/sec


And the LSI 320-2X card:

amrd0
         512             # sectorsize
         35999711232     # mediasize in bytes (34G)
         70311936        # mediasize in sectors
         4376            # Cylinders according to firmware.
         255             # Heads according to firmware.
         63              # Sectors according to firmware.

Seek times:
         Full stroke:      250 iter in   0.737130 sec =    2.949 msec
         Half stroke:      250 iter in   0.694498 sec =    2.778 msec
         Quarter stroke:   500 iter in   2.040667 sec =    4.081 msec
         Short forward:    400 iter in   1.418592 sec =    3.546 msec
         Short backward:   400 iter in   0.896076 sec =    2.240 msec
         Seq outer:       2048 iter in   0.292390 sec =    0.143 msec
         Seq inner:       2048 iter in   0.300836 sec =    0.147 msec
Transfer rates:
         outside:       102400 kbytes in   1.102025 sec =    92920
kbytes/sec
         middle:        102400 kbytes in   1.247608 sec =    82077
kbytes/sec
         inside:        102400 kbytes in   1.905603 sec =    53736
kbytes/sec


amrd1
         512             # sectorsize
         215998267392    # mediasize in bytes (201G)
         421871616       # mediasize in sectors
         26260           # Cylinders according to firmware.
         255             # Heads according to firmware.
         63              # Sectors according to firmware.

Seek times:
         Full stroke:      250 iter in   0.741648 sec =    2.967 msec
         Half stroke:      250 iter in   1.021720 sec =    4.087 msec
         Quarter stroke:   500 iter in   2.220321 sec =    4.441 msec
         Short forward:    400 iter in   0.945948 sec =    2.365 msec
         Short backward:   400 iter in   1.036555 sec =    2.591 msec
         Seq outer:       2048 iter in   0.378911 sec =    0.185 msec
         Seq inner:       2048 iter in   0.457275 sec =    0.223 msec
Transfer rates:
         outside:       102400 kbytes in   0.986572 sec =   103794
kbytes/sec
         middle:        102400 kbytes in   0.998528 sec =   102551
kbytes/sec
         inside:        102400 kbytes in   0.857322 sec =   119442
kbytes/sec



Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec

From
Miguel
Date:
Vivek Khera wrote:

>>> If you do put on FreeBSD 6, I'd love to see the output of  "diskinfo
>>> - v -t" on your RAID volume(s).
>>>
>> Not directly related ...
>> i have a HP dl380 g3 with array 5i controlled (1+0), these are my
>> results
>> [...]
>> is this good enough?
>
>
> Is that on a loaded box or a mostly quiet box?  Those number seem
> rather low for my tastes.  For comparison, here are numbers from a
> Dell 1850 with a built-in PERC 4e/Si RAID in a two disk mirror.  All
> numbers below are on mostly or totally quiet disk systems.

My numbers are on totally quiet box, i've just installed it.

>
> amrd0
>         512             # sectorsize
>         73274490880     # mediasize in bytes (68G)
>         143114240       # mediasize in sectors
>         8908            # Cylinders according to firmware.
>         255             # Heads according to firmware.
>         63              # Sectors according to firmware.
>
> Seek times:
>         Full stroke:      250 iter in   0.756718 sec =    3.027 msec
>         Half stroke:      250 iter in   0.717824 sec =    2.871 msec
>         Quarter stroke:   500 iter in   1.972368 sec =    3.945 msec
>         Short forward:    400 iter in   1.193179 sec =    2.983 msec
>         Short backward:   400 iter in   1.322440 sec =    3.306 msec
>         Seq outer:       2048 iter in   0.271402 sec =    0.133 msec
>         Seq inner:       2048 iter in   0.271151 sec =    0.132 msec
> Transfer rates:
>         outside:       102400 kbytes in   1.080339 sec =    94785
> kbytes/sec
>         middle:        102400 kbytes in   1.166021 sec =    87820
> kbytes/sec
>         inside:        102400 kbytes in   1.461498 sec =    70065
> kbytes/sec
>
>
Umm, in my box i see better seektimes but worst transfer rates, does it
make sense?
i think i have something wrong, the question i cant answer is what
tunning  am i missing?

---
Miguel






Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec RAID 2200S

From
Vivek Khera
Date:
On Mar 20, 2006, at 6:04 PM, Miguel wrote:

> Umm, in my box i see better seektimes but worst transfer rates,
> does it make sense?
> i think i have something wrong, the question i cant answer is what
> tunning  am i missing?

Well, I forgot to mention I have 15k RPM disks, so the transfers
should be faster.

I did no tuning to the disk configurations.  I think your controller
is either just not supported well in FreeBSD, or is bad in general...

I *really* wish LSI would make a low profile card that would fit in a
Sun X4100...  as it stands the only choice for dual channel cards is
the adaptec 2230SLP...


Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec

From
Miguel
Date:
Vivek Khera wrote:

>
> On Mar 20, 2006, at 6:04 PM, Miguel wrote:
>
>> Umm, in my box i see better seektimes but worst transfer rates,  does
>> it make sense?
>> i think i have something wrong, the question i cant answer is what
>> tunning  am i missing?
>
>
> Well, I forgot to mention I have 15k RPM disks, so the transfers
> should be faster.
>
> I did no tuning to the disk configurations.  I think your controller
> is either just not supported well in FreeBSD, or is bad in general...

:-(

I guess you are right, i made a really bad choice, i better look at dell
next time,
thanks

---
Miguel

Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec

From
PFC
Date:
    This is a 2-Disk Linux software RAID1 with 2 7200RPM IDE Drives, 1 PATA
and 1 SATA :

apollo13 ~ # hdparm -t /dev/md0

/dev/md0:
  Timing buffered disk reads:  156 MB in  3.02 seconds =  51.58 MB/sec
apollo13 ~ # hdparm -t /dev/md0

/dev/md0:
  Timing buffered disk reads:  168 MB in  3.06 seconds =  54.87 MB/sec

    This is a 5-Disk Linux software RAID5 with 4 7200RPM IDE Drives and 1
5400RPM, 3 SATA and 2 PATA:

apollo13 ~ # hdparm -t /dev/md2
/dev/md2:
  Timing buffered disk reads:  348 MB in  3.17 seconds = 109.66 MB/sec

apollo13 ~ # hdparm -t /dev/md2
/dev/md2:
  Timing buffered disk reads:  424 MB in  3.00 seconds = 141.21 MB/sec

apollo13 ~ # hdparm -t /dev/md2
/dev/md2:
  Timing buffered disk reads:  426 MB in  3.00 seconds = 141.88 MB/sec

apollo13 ~ # hdparm -t /dev/md2
/dev/md2:
  Timing buffered disk reads:  426 MB in  3.01 seconds = 141.64 MB/sec


    The machine is a desktop Athlon 64 3000+, buggy nforce3 chipset, 1G
DDR400, Gentoo Linux 2.6.15-ck4 running in 64 bit mode.
    The bottleneck is the PCI bus.

    Expensive SCSI hardware RAID cards with expensive 10Krpm harddisks should
not get humiliated by such a simple (and cheap) setup. (I'm referring to
the 12-drive RAID10 mentioned before, not the other one which was a simple
2-disk mirror). Toms hardware benchmarked some hardware RAIDs and got
humongous transfer rates... hm ?

Re: Best OS & Configuration for Dual Xeon w/4GB &

From
Mark Kirkwood
Date:
Scott Marlowe wrote:
> On Mon, 2006-03-20 at 08:45, Jim C. Nasby wrote:
>
>>On Fri, Mar 17, 2006 at 05:00:34PM -0600, Scott Marlowe wrote:
>>
>>>>last pid:  5788;  load averages:  0.32,  0.31,  0.28                                                     up
127+15:16:0813:59:24 
>>>>169 processes: 1 running, 168 sleeping
>>>>CPU states:  5.4% user,  0.0% nice,  9.9% system,  0.0% interrupt, 84.7% idle
>>>>Mem: 181M Active, 2632M Inact, 329M Wired, 179M Cache, 199M Buf, 81M Free
>>>>Swap: 4096M Total, 216K Used, 4096M Free
>>>>
>>>>  PID USERNAME      PRI NICE  SIZE    RES STATE  C   TIME   WCPU    CPU COMMAND
>>>>14501 pgsql           2   0   254M   242M select 2  76:26  1.95%  1.95% postgre
>>>> 5720 root           28   0  2164K  1360K CPU0   0   0:00  1.84%  0.88% top
>>>> 5785 pgsql           2   0   255M 29296K sbwait 0   0:00  3.00%  0.15% postgre
>>>> 5782 pgsql           2   0   255M 11900K sbwait 0   0:00  3.00%  0.15% postgre
>>>> 5772 pgsql           2   0   255M 11708K sbwait 2   0:00  1.54%  0.15% postgre
>>>
>>>That doesn't look good.  Is this machine freshly rebooted, or has it
>>>been running postgres for a while?  179M cache and 199M buffer with 2.6
>>>gig inactive is horrible for a machine running a 10gig databases.
>>
>>No, this is perfectly fine. Inactive memory in FreeBSD isn't the same as
>>Free. It's the same as 'active' memory except that it's pages that
>>haven't been accessed in X amount of time (between 100 and 200 ms, I
>>think). When free memory starts getting low, FBSD will start moving
>>pages from the inactive queue to the free queue (possibly resulting in
>>writes to disk along the way).
>>
>>IIRC, Cache is the directory cache, and Buf is disk buffers, which is
>>somewhat akin to shared_buffers in PostgreSQL.
>
>
> So, then, the inact is pretty much the same as kernel buffers in linux?
>

I think Freebsd 'Inactive' corresponds pretty closely to Linux's
'Inactive Dirty'|'Inactive Laundered'|'Inactive Free'.

 From what I can see, 'Buf' is a bit misleading e.g. read a 1G file
randomly and you increase 'Inactive' by about 1G - 'Buf' might get to
200M. However read the file again and you'll see zero i/o in vmstat or
gstat. From reading the Freebsd architecture docs, I think 'Buf'
consists of those pages from 'Inactive' or 'Active' that were last kvm
mapped for read/write operations. However 'Buf' is restricted to a
fairly small size (various sysctls), so really only provides a lower
bound on the file buffer cache activity.

Sorry to not really answer your question Scott - how are Linux kernel
buffers actually defined?

Cheers

Mark

Re: Best OS & Configuration for Dual Xeon w/4GB &

From
Mark Kirkwood
Date:
Mark Kirkwood wrote:
>
> I think Freebsd 'Inactive' corresponds pretty closely to Linux's
> 'Inactive Dirty'|'Inactive Laundered'|'Inactive Free'.
>

Hmmm - on second thoughts I think I've got that wrong :-(, since in
Linux all the file buffer pages appear in 'Cached' don't they...

(I also notice that 'Inactive Laundered' does not seem to be mentioned
in vanilla - read non-Redhat - 2.6 kernels)

So I think its more correct to say Freebsd 'Inactive' is similar to
Linux 'Inactive' + some|most of Linux 'Cached'.

A good discussion of how the Freebsd vm works is here:

http://www.freebsd.org/doc/en_US.ISO8859-1/books/arch-handbook/vm.html

In particular:

"FreeBSD reserves a limited amount of KVM to hold mappings from struct
bufs, but it should be made clear that this KVM is used solely to hold
mappings and does not limit the ability to cache data."

Cheers

Mark

Re: Best OS & Configuration for Dual Xeon w/4GB &

From
"Jim C. Nasby"
Date:
On Tue, Mar 21, 2006 at 03:51:35PM +1200, Mark Kirkwood wrote:
> Mark Kirkwood wrote:
> >
> >I think Freebsd 'Inactive' corresponds pretty closely to Linux's
> >'Inactive Dirty'|'Inactive Laundered'|'Inactive Free'.
> >
>
> Hmmm - on second thoughts I think I've got that wrong :-(, since in
> Linux all the file buffer pages appear in 'Cached' don't they...
>
> (I also notice that 'Inactive Laundered' does not seem to be mentioned
> in vanilla - read non-Redhat - 2.6 kernels)
>
> So I think its more correct to say Freebsd 'Inactive' is similar to
> Linux 'Inactive' + some|most of Linux 'Cached'.
>
> A good discussion of how the Freebsd vm works is here:
>
> http://www.freebsd.org/doc/en_US.ISO8859-1/books/arch-handbook/vm.html
>
> In particular:
>
> "FreeBSD reserves a limited amount of KVM to hold mappings from struct
> bufs, but it should be made clear that this KVM is used solely to hold
> mappings and does not limit the ability to cache data."

It's worth noting that starting in either 2.4 or 2.6, linux pretty much
adopted the FreeBSD VM system (or so I've been told).
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Best OS & Configuration for Dual Xeon w/4GB &

From
"Jim C. Nasby"
Date:
On Mon, Mar 20, 2006 at 07:46:13PM +0000, Alex Hayward wrote:
> On Mon, 20 Mar 2006, Jim C. Nasby wrote:
>
> > No, this is perfectly fine. Inactive memory in FreeBSD isn't the same as
> > Free. It's the same as 'active' memory except that it's pages that
> > haven't been accessed in X amount of time (between 100 and 200 ms, I
> > think). When free memory starts getting low, FBSD will start moving
> > pages from the inactive queue to the free queue (possibly resulting in
> > writes to disk along the way).
> >
> > IIRC, Cache is the directory cache, and Buf is disk buffers, which is
> > somewhat akin to shared_buffers in PostgreSQL.
>
> I don't believe that's true. I'm not an expert in FreeBSD's VM internals,
> but this is how I believe it works:
>
> Active pages are pages currently mapped in to a process's address space.
>
> Inactive pages are pages which are marked dirty (must be written to
> backing store before they can be freed) and which are not mapped in to a
> process's address. They're still associated with a VM object of some kind

Actually, a page that is in the inactive queue *may* be dirty. In fact,
if you start with a freshly booted system (or one that's been recently
starved of memory) and read in a large file, you'll see the inactive
queue grow even though the pages haven't been dirtied.

> - like part of a process's virtual address space or a as part of the cache
> for a file on disk. If it's still part of a process's virtual address
> space and is accessed a fault is generated. The page is then put back in
> to the address mappings.
>
> Cached pages are like inactive pages but aren't dirty. Then can be either
> re-mapped or freed immediately.
>
> Free pages are properly free. Wired pages are unswappable. Buf I'm not
> sure about. It doesn't represent that amount of memory used to cache files
> on disk, I'm sure of that. The sysctl -d description is 'KVA memory used
> for bufs', so I suspect that it's the amount of kernel virtual address
> space mapped to pages in the 'active', 'inactive' and 'cache' queues.
>
> --
>   Alex Hayward
>   Seatbooker
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec RAID 2200S

From
"Jim C. Nasby"
Date:
On Mon, Mar 20, 2006 at 02:15:22PM -0500, Vivek Khera wrote:
> I think FreeBSD has a hard upper limit on the total ram it will use
> for disk cache.  I haven't been able to get reliable, irrefutable,
> answers about it, though.

It does not. Any memory in the inactive queue is effectively your 'disk
cache'. Pages start out in the active queue, and if they aren't used
fairly frequently they will move into the inactive queue. From there
they will be moved to the cache queue, but only if the cache queue falls
below a certain threshold, because in order to go into the cache queue
the page must be marked clean, possibly incurring a write to disk. AFAIK
pages only go into the free queue if they have been completely released
by all objects that were referencing them, so it's theoretically
posisble for that queue to go to 0.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec

From
Mark Kirkwood
Date:
Jim C. Nasby wrote:
> On Mon, Mar 20, 2006 at 02:15:22PM -0500, Vivek Khera wrote:
>
>>I think FreeBSD has a hard upper limit on the total ram it will use
>>for disk cache.  I haven't been able to get reliable, irrefutable,
>>answers about it, though.
>
>
> It does not. Any memory in the inactive queue is effectively your 'disk
> cache'. Pages start out in the active queue, and if they aren't used
> fairly frequently they will move into the inactive queue. From there
> they will be moved to the cache queue, but only if the cache queue falls
> below a certain threshold, because in order to go into the cache queue
> the page must be marked clean, possibly incurring a write to disk. AFAIK
> pages only go into the free queue if they have been completely released
> by all objects that were referencing them, so it's theoretically
> posisble for that queue to go to 0.

Exactly.

The so-called limit (controllable via various sysctl's) is on the amount
of memory used for kvm mapped pages, not cached pages, i.e - its a
subset of the cached pages that are set up for immediate access (the
others require merely to be shifted from the 'Inactive' queue to this
one before they can be operated on - a relatively cheap operation).

So its really all about accounting, in a sense - whether pages end up in
the 'Buf' or 'Inactive' queue, they are still cached!

Cheers

Mark

Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec

From
"Jim C. Nasby"
Date:
On Tue, Mar 21, 2006 at 11:03:26PM +1200, Mark Kirkwood wrote:
> Jim C. Nasby wrote:
> >On Mon, Mar 20, 2006 at 02:15:22PM -0500, Vivek Khera wrote:
> >
> >>I think FreeBSD has a hard upper limit on the total ram it will use
> >>for disk cache.  I haven't been able to get reliable, irrefutable,
> >>answers about it, though.
> >
> >
> >It does not. Any memory in the inactive queue is effectively your 'disk
> >cache'. Pages start out in the active queue, and if they aren't used
> >fairly frequently they will move into the inactive queue. From there
> >they will be moved to the cache queue, but only if the cache queue falls
> >below a certain threshold, because in order to go into the cache queue
> >the page must be marked clean, possibly incurring a write to disk. AFAIK
> >pages only go into the free queue if they have been completely released
> >by all objects that were referencing them, so it's theoretically
> >posisble for that queue to go to 0.
>
> Exactly.
>
> The so-called limit (controllable via various sysctl's) is on the amount
> of memory used for kvm mapped pages, not cached pages, i.e - its a
> subset of the cached pages that are set up for immediate access (the
> others require merely to be shifted from the 'Inactive' queue to this
> one before they can be operated on - a relatively cheap operation).
>
> So its really all about accounting, in a sense - whether pages end up in
> the 'Buf' or 'Inactive' queue, they are still cached!

So what's the difference between Buf and Active then? Just that active
means it's a code page, or that it's been directly mapped into a
processes memory (perhaps via mmap)?
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Best OS & Configuration for Dual Xeon w/4GB &

From
"Jim C. Nasby"
Date:
On Mon, Mar 20, 2006 at 01:27:56PM -0800, Luke Lonergan wrote:
> >> Transfer rates:
> >>        outside:       102400 kbytes in   2.075984 sec =    49326 kbytes/sec
> >>        middle:        102400 kbytes in   2.100510 sec =    48750 kbytes/sec
> >>        inside:        102400 kbytes in   2.042313 sec =    50139 kbytes/sec
> >>
> > I have 6 ultra a320 72G 10k discs
>
> Yah - ouch.  With 6 drives in a RAID10, you should expect 3 drives worth of
> sequential scan performance, or anywhere from 100MB/s to 180MB/s.  You're
> getting from half to 1/3 of the performance you'd get with a decent raid
> controller.
>
> If you add a simple SCSI adapter like the common LSI U320 adapter to your
> DL380G3 and then run software RAID, you will get more than 150MB/s with less
> CPU consumption.  I'd also expect you'd get down to about 2ms access times.

FWIW, here's my dirt-simple workstation, with 2 segate SATA drives setup
as a mirror using software (first the mirror, then one of the raw
drives):

decibel@noel.2[5:43]~:15>sudo diskinfo -vt /dev/mirror/gm0
Password:
/dev/mirror/gm0
        512             # sectorsize
        300069051904    # mediasize in bytes (279G)
        586072367       # mediasize in sectors

Seek times:
        Full stroke:      250 iter in   1.416409 sec =    5.666 msec
        Half stroke:      250 iter in   1.404503 sec =    5.618 msec
        Quarter stroke:   500 iter in   2.887344 sec =    5.775 msec
        Short forward:    400 iter in   2.101949 sec =    5.255 msec
        Short backward:   400 iter in   2.373578 sec =    5.934 msec
        Seq outer:       2048 iter in   0.209539 sec =    0.102 msec
        Seq inner:       2048 iter in   0.347499 sec =    0.170 msec
Transfer rates:
        outside:       102400 kbytes in   3.183924 sec =    32162 kbytes/sec
        middle:        102400 kbytes in   3.216232 sec =    31838 kbytes/sec
        inside:        102400 kbytes in   4.242779 sec =    24135 kbytes/sec

decibel@noel.2[5:43]~:16>sudo diskinfo -vt /dev/ad4
/dev/ad4
        512             # sectorsize
        300069052416    # mediasize in bytes (279G)
        586072368       # mediasize in sectors
        581421          # Cylinders according to firmware.
        16              # Heads according to firmware.
        63              # Sectors according to firmware.

Seek times:
        Full stroke:      250 iter in   5.835744 sec =   23.343 msec
        Half stroke:      250 iter in   4.364424 sec =   17.458 msec
        Quarter stroke:   500 iter in   6.981597 sec =   13.963 msec
        Short forward:    400 iter in   2.157210 sec =    5.393 msec
        Short backward:   400 iter in   2.330445 sec =    5.826 msec
        Seq outer:       2048 iter in   0.181176 sec =    0.088 msec
        Seq inner:       2048 iter in   0.198974 sec =    0.097 msec
Transfer rates:
        outside:       102400 kbytes in   1.715810 sec =    59680 kbytes/sec
        middle:        102400 kbytes in   1.937027 sec =    52865 kbytes/sec
        inside:        102400 kbytes in   3.260515 sec =    31406 kbytes/sec

No, I don't know why the transfer rates for the mirror are 1/2 that as the raw
device. :(
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec

From
Alex Hayward
Date:
On Tue, 21 Mar 2006, Jim C. Nasby wrote:

> On Tue, Mar 21, 2006 at 11:03:26PM +1200, Mark Kirkwood wrote:
> >
> > So its really all about accounting, in a sense - whether pages end up in
> > the 'Buf' or 'Inactive' queue, they are still cached!
>
> So what's the difference between Buf and Active then? Just that active
> means it's a code page, or that it's been directly mapped into a
> processes memory (perhaps via mmap)?

I don't think that Buf and Active are mutually exclusive. Try adding up
Active, Inactive, Cache, Wired, Buf and Free - it'll come to more than
your physical memory.

Active gives an amount of physical memory. Buf gives an amount of
kernel-space virtual memory which provide the kernel with a window on to
pages in the other categories. In fact, I don't think that 'Buf' really
belongs in the list as it doesn't represent a 'type' of page at all.

--
  Alex Hayward
  Seatbooker

Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec

From
"Jim C. Nasby"
Date:
On Tue, Mar 21, 2006 at 12:22:31PM +0000, Alex Hayward wrote:
> On Tue, 21 Mar 2006, Jim C. Nasby wrote:
>
> > On Tue, Mar 21, 2006 at 11:03:26PM +1200, Mark Kirkwood wrote:
> > >
> > > So its really all about accounting, in a sense - whether pages end up in
> > > the 'Buf' or 'Inactive' queue, they are still cached!
> >
> > So what's the difference between Buf and Active then? Just that active
> > means it's a code page, or that it's been directly mapped into a
> > processes memory (perhaps via mmap)?
>
> I don't think that Buf and Active are mutually exclusive. Try adding up
> Active, Inactive, Cache, Wired, Buf and Free - it'll come to more than
> your physical memory.
>
> Active gives an amount of physical memory. Buf gives an amount of
> kernel-space virtual memory which provide the kernel with a window on to
> pages in the other categories. In fact, I don't think that 'Buf' really
> belongs in the list as it doesn't represent a 'type' of page at all.

Ahhh, I get it... a KVM (what's that stand for anyway?) is required any
time the kernel wants to access a page that doesn't belong to it, right?

And actually, I just checked 4 machines and adding all the queues plus
buf together didn't add up to total memory except on one of them (there
adding just the queues came close; 1507.6MB on a 1.5GB machine).
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Best OS & Configuration for Dual Xeon w/4GB &

From
"Luke Lonergan"
Date:
Jim,

On 3/21/06 3:49 AM, "Jim C. Nasby" <jnasby@pervasive.com> wrote:

> No, I don't know why the transfer rates for the mirror are 1/2 that as the raw
> device. :(

Well - lessee.  Would those drives be attached to a Silicon Image (SII) SATA
controller?  A Highpoint?

I found in testing about 2 years ago that under Linux (looks like you're
BSD), most SATA controllers other than the Intel PIIX are horribly broken
from a performance standpoint, probably due to bad drivers but I'm not sure.

Now I think whatever is commonly used by Nforce 4 implementations seems to
work ok, but we don't count on them for RAID configurations yet.

- Luke



Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec

From
Vivek Khera
Date:
On Mar 20, 2006, at 6:27 PM, PFC wrote:

>     Expensive SCSI hardware RAID cards with expensive 10Krpm harddisks
> should not get humiliated by such a simple (and cheap) setup. (I'm
> referring to the 12-drive RAID10 mentioned before, not the other
> one which was a simple 2-disk mirror). Toms hardware benchmarked
> some hardware RAIDs and got humongous transfer rates... hm ?
>

I'll put up my "slow" 12 disk SCSI array up against your IDE array on
a large parallel load any day.


Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec

From
Vivek Khera
Date:
On Mar 21, 2006, at 6:03 AM, Mark Kirkwood wrote:

> The so-called limit (controllable via various sysctl's) is on the
> amount of memory used for kvm mapped pages, not cached pages, i.e -
> its a subset of the cached pages that are set up for immediate
> access (the

Thanks... now that makes sense to me.



Re: Best OS & Configuration for Dual Xeon w/4GB &

From
Adam Witney
Date:

> decibel@noel.2[5:43]~:15>sudo diskinfo -vt /dev/mirror/gm0

Can anyone point me to where I can find diskinfo or an equivalent to run on
my debian system, I have been googling for the last hour but can't find it!
I would like to analyse my own disk setup for comparison

Thanks for any help

Adam


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


Re: Best OS & Configuration for Dual Xeon w/4GB &

From
"Jim C. Nasby"
Date:
On Tue, Mar 21, 2006 at 07:25:07AM -0800, Luke Lonergan wrote:
> Jim,
>
> On 3/21/06 3:49 AM, "Jim C. Nasby" <jnasby@pervasive.com> wrote:
>
> > No, I don't know why the transfer rates for the mirror are 1/2 that as the raw
> > device. :(
>
> Well - lessee.  Would those drives be attached to a Silicon Image (SII) SATA
> controller?  A Highpoint?
>
> I found in testing about 2 years ago that under Linux (looks like you're
> BSD), most SATA controllers other than the Intel PIIX are horribly broken
> from a performance standpoint, probably due to bad drivers but I'm not sure.
>
> Now I think whatever is commonly used by Nforce 4 implementations seems to
> work ok, but we don't count on them for RAID configurations yet.

atapci1: <nVidia nForce4 SATA150 controller>

And note that this is using FreeBSD gmirror, not the built-in raid
controller.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec

From
PFC
Date:

>>     Expensive SCSI hardware RAID cards with expensive 10Krpm harddisks
>> should not get humiliated by such a simple (and cheap) setup. (I'm
>> referring to the 12-drive RAID10 mentioned before, not the other one
>> which was a simple 2-disk mirror). Toms hardware benchmarked some
>> hardware RAIDs and got humongous transfer rates... hm ?
>>
>
> I'll put up my "slow" 12 disk SCSI array up against your IDE array on a
> large parallel load any day.

    Sure, and I have no doubt that yours will be immensely faster on parallel
loads than mine, but still, it should also be the case on sequential
scan... especially since I have desktop PCI and the original poster has a
real server with PCI-X I think.

Re: Best OS & Configuration for Dual Xeon w/4GB &

From
Mark Kirkwood
Date:
Adam Witney wrote:
>
>
>>decibel@noel.2[5:43]~:15>sudo diskinfo -vt /dev/mirror/gm0
>
>
> Can anyone point me to where I can find diskinfo or an equivalent to run on
> my debian system, I have been googling for the last hour but can't find it!
> I would like to analyse my own disk setup for comparison
>

I guess you could use hdparm (-t or -T flags do a simple benchmark).

Though iozone or bonnie++ are probably better.


Cheers

Mark


Re: Best OS & Configuration for Dual Xeon w/4GB &

From
Jeff Frost
Date:
On Wed, 22 Mar 2006, Mark Kirkwood wrote:

> Adam Witney wrote:
>>
>>> decibel@noel.2[5:43]~:15>sudo diskinfo -vt /dev/mirror/gm0
>>
>> Can anyone point me to where I can find diskinfo or an equivalent to run on
>> my debian system, I have been googling for the last hour but can't find it!
>> I would like to analyse my own disk setup for comparison
>
> I guess you could use hdparm (-t or -T flags do a simple benchmark).
>
> Though iozone or bonnie++ are probably better.

You might also have a look at lmdd for sequential read/write performance from
the lmbench suite: http://sourceforge.net/projects/lmbench

As numbers from lmdd are seen on this frequently.

--
Jeff Frost, Owner     <jeff@frostconsultingllc.com>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954

Re: Best OS & Configuration for Dual Xeon w/4GB & Adaptec

From
Vivek Khera
Date:
On Mar 21, 2006, at 2:04 PM, PFC wrote:

> especially since I have desktop PCI and the original poster has a
> real server with PCI-X I think.

that was me :-)

but yeah, I never seem to get full line speed for some reason.  i
don't know if it is because of inadequate measurement tools or what...


Re: Best OS & Configuration for Dual Xeon w/4GB &

From
Vivek Khera
Date:
On Mar 21, 2006, at 12:59 PM, Jim C. Nasby wrote:

> atapci1: <nVidia nForce4 SATA150 controller>
>
> And note that this is using FreeBSD gmirror, not the built-in raid
> controller.

I get similar counter-intuitive slowdown with gmirror SATA disks on
an IBM e326m I'm evaluating.  If/when I buy one I'll get the onboard
SCSI RAID instead.

The IBM uses ServerWorks chipset, which shows up to freebsd 6.0 as
"generic ATA" and only does UDMA33 transfers.