Thread: Best OS & Configuration for Dual Xeon w/4GB & Adaptec RAID 2200S
About a year ago we decided to migrate our central database that powers various intranet tools from MySQL to PostgreSQL. We have about 130 tables and about 10GB of data that stores various status information for a variety of services for our intranet. We generally have somewhere between 150-200 connections to the database at any given time and probably anywhere between 5-10 new connections being made every second and about 100 queries per second. Most of the queries and transactions are very small due to the fact that the tools were designed to work around the small functionality of MySQL 3.23 DB. Our company primarily uses FreeBSD and we are stuck on FreeBSD 4.X series due to IT support issues, but I believe I may be able to get more performance out of our server by reconfiguring and setting up the postgresql.conf file up better. The performance is not as good as I was hoping at the moment and it seems as if the database is not making use of the available ram. snapshot of active server: last pid: 5788; load averages: 0.32, 0.31, 0.28 up 127+15:16:0813:59:24 169 processes: 1 running, 168 sleeping CPU states: 5.4% user, 0.0% nice, 9.9% system, 0.0% interrupt, 84.7% idle Mem: 181M Active, 2632M Inact, 329M Wired, 179M Cache, 199M Buf, 81M Free Swap: 4096M Total, 216K Used, 4096M Free PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND 14501 pgsql 2 0 254M 242M select 2 76:26 1.95% 1.95% postgre 5720 root 28 0 2164K 1360K CPU0 0 0:00 1.84% 0.88% top 5785 pgsql 2 0 255M 29296K sbwait 0 0:00 3.00% 0.15% postgre 5782 pgsql 2 0 255M 11900K sbwait 0 0:00 3.00% 0.15% postgre 5772 pgsql 2 0 255M 11708K sbwait 2 0:00 1.54% 0.15% postgre Here is my current configuration: Dual Xeon 3.06Ghz 4GB RAM Adaptec 2200S 48MB cache & 4 disks configured in RAID5 FreeBSD 4.11 w/kernel options: options SHMMAXPGS=65536 options SEMMNI=256 options SEMMNS=512 options SEMUME=256 options SEMMNU=256 options SMP # Symmetric MultiProcessor Kernel options APIC_IO # Symmetric (APIC) I/O The OS is installed on the local single disk and postgres data directory is on the RAID5 partition. Maybe Adaptec 2200S RAID5 performance is not as good as the vendor claimed. It was my impression that the raid controller these days are optimized for RAID5 and going RAID10 would not benefit me much. Also, I may be overlooking a postgresql.conf setting. I have attached the config file. In summary, my questions: 1. Would running PG on FreeBSD 5.x or 6.x or Linux improve performance? 2. Should I change SCSI controller config to use RAID 10 instead of 5? 3. Why isn't postgres using all 4GB of ram for at least caching table for reads? 4. Are there any other settings in the conf file I could try to tweak?
Attachment
On Fri, 2006-03-17 at 16:11, Kenji Morishige wrote: > About a year ago we decided to migrate our central database that powers various > intranet tools from MySQL to PostgreSQL. We have about 130 tables and about > 10GB of data that stores various status information for a variety of services > for our intranet. We generally have somewhere between 150-200 connections to > the database at any given time and probably anywhere between 5-10 new > connections being made every second and about 100 queries per second. Most > of the queries and transactions are very small due to the fact that the tools > were designed to work around the small functionality of MySQL 3.23 DB. > Our company primarily uses FreeBSD and we are stuck on FreeBSD 4.X series due > to IT support issues, There were a LOT of performance enhancements to FreeBSD with the 5.x series release. I'd recommend fast tracking the database server to the 5.x branch. 4-stable was release 6 years ago. 5-stable was released two years ago. > but I believe I may be able to get more performance out > of our server by reconfiguring and setting up the postgresql.conf file up > better. Can't hurt. But if your OS isn't doing the job, postgresql.conf can only do so much, nee? > The performance is not as good as I was hoping at the moment and > it seems as if the database is not making use of the available ram. > snapshot of active server: > last pid: 5788; load averages: 0.32, 0.31, 0.28 up 127+15:16:0813:59:24 > 169 processes: 1 running, 168 sleeping > CPU states: 5.4% user, 0.0% nice, 9.9% system, 0.0% interrupt, 84.7% idle > Mem: 181M Active, 2632M Inact, 329M Wired, 179M Cache, 199M Buf, 81M Free > Swap: 4096M Total, 216K Used, 4096M Free > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND > 14501 pgsql 2 0 254M 242M select 2 76:26 1.95% 1.95% postgre > 5720 root 28 0 2164K 1360K CPU0 0 0:00 1.84% 0.88% top > 5785 pgsql 2 0 255M 29296K sbwait 0 0:00 3.00% 0.15% postgre > 5782 pgsql 2 0 255M 11900K sbwait 0 0:00 3.00% 0.15% postgre > 5772 pgsql 2 0 255M 11708K sbwait 2 0:00 1.54% 0.15% postgre That doesn't look good. Is this machine freshly rebooted, or has it been running postgres for a while? 179M cache and 199M buffer with 2.6 gig inactive is horrible for a machine running a 10gig databases. For comparison, here's what my production linux boxes show in top: 16:42:27 up 272 days, 14:49, 1 user, load average: 1.02, 1.04, 1.00 162 processes: 161 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 0.2% 0.0% 0.4% 0.0% 0.0% 0.4% 98.7% cpu00 0.4% 0.0% 0.4% 0.0% 0.0% 0.0% 99.0% cpu01 0.0% 0.0% 0.4% 0.0% 0.0% 0.9% 98.5% Mem: 6096912k av, 4529208k used, 1567704k free, 0k shrd, 306884k buff 2398948k actv, 1772072k in_d, 78060k in_c Swap: 4192880k av, 157480k used, 4035400k free 3939332k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 24000 postgres 15 0 752 524 456 S 0.0 0.0 0:00 1 rotatelogs 24012 postgres 15 0 1536 1420 1324 S 0.0 0.0 7:11 0 postmaster 24015 postgres 15 0 2196 2032 996 S 0.0 0.0 56:07 0 postmaster 24016 postgres 15 0 1496 1352 1004 S 0.0 0.0 233:46 1 postmaster Note that the kernel here is caching ~3.9 gigs of data. so, postgresql doesn't have to. Also, the disk buffers are sitting at > 300 Megs. If FreeBSD 4.x can't or won't cache more than that, there's an OS issue here, either endemic to FreeBSD 4.x, or your configuration of it. > Dual Xeon 3.06Ghz 4GB RAM Make sure hyperthreading is disabled, it's generally a performance loss for pgsql. > Adaptec 2200S 48MB cache & 4 disks configured in RAID5 I'm not a huge fan of adaptec RAID controllers, and 48 Megs ain't much. But for what you're doing, I'd expect it to run well enough. Have you tested this array with bonnie++ to see what kind of performance it gets in general? There could be some kind of hardware issue going on you're not seeing in the logs. Is that memory cache set to write back not through, and does it have battery backup (the cache, not the machine)? > The OS is installed on the local single disk and postgres data directory > is on the RAID5 partition. Maybe Adaptec 2200S RAID5 performance is not as > good as the vendor claimed. It was my impression that the raid controller > these days are optimized for RAID5 and going RAID10 would not benefit me much. You have to be careful about RAID 10, since many controllers serialize access through multiple levels of RAID, and therefore wind up being slower in RAID 10 or 50 than in RAID 1 or 5. > Also, I may be overlooking a postgresql.conf setting. I have attached the > config file. If you're doing a lot of small transactions you might see some gain from increasing commit_delay to 100 to 1000 and commit siblings to 25 to 100. It won't set the world on fire, but it's given me a 25% boost on certain loads with lots of small transactions > > In summary, my questions: > > 1. Would running PG on FreeBSD 5.x or 6.x or Linux improve performance? It most probably would. I'd at least test it out. > 2. Should I change SCSI controller config to use RAID 10 instead of 5? Maybe. With that controller, and many others in its league, you may be slowing things down doing that. You may be better off with a simple RAID 1 instead as well. Also, if you've got a problem with the controller serializing multiple raid levels, you might see the best performance with one raid level on the controller and the other handled by the kernel. BSD does do kernel level RAID, right? > 3. Why isn't postgres using all 4GB of ram for at least caching table for reads? Because that's your Operating System's job. > 4. Are there any other settings in the conf file I could try to tweak? With the later versions of PostgreSQL, it's gotten better at doing the OS job of caching, IF you set it to use enough memory. You might try cranking up shared memory / shared_buffers to something large like 75% of the machine memory and see if that does help. With 7.4 and before, it's generally a really bad idea. Looking at your postgresql.conf it appears you're running a post-7.4 version, so you might be able to get away with handing over all the ram to the database. Now that the tuning stuff is out of the way. Have you been using the logging to look for individual slow queries and run explain analyze on them? Are you analyzing your database and vacuuming it too?
> Here is my current configuration: > > Dual Xeon 3.06Ghz 4GB RAM > Adaptec 2200S 48MB cache & 4 disks configured in RAID5 > FreeBSD 4.11 w/kernel options: > options SHMMAXPGS=65536 > options SEMMNI=256 > options SEMMNS=512 > options SEMUME=256 > options SEMMNU=256 > options SMP # Symmetric MultiProcessor Kernel > options APIC_IO # Symmetric (APIC) I/O > > The OS is installed on the local single disk and postgres data directory > is on the RAID5 partition. Maybe Adaptec 2200S RAID5 performance is not as > good as the vendor claimed. It was my impression that the raid controller > these days are optimized for RAID5 and going RAID10 would not benefit me much. I don't know whether 'systat -vmstat' is available on 4.x, if so try to issue the command with 'systat -vmstat 1' for 1 sec. updates. This will (amongst much other info) show how much disk-transfer you have. > Also, I may be overlooking a postgresql.conf setting. I have attached the > config file. You could try to lower shared_buffers from 30000 to 16384. Setting this value too high can in some cases be counterproductive according to doc's I read. Also try to lower work_mem from 16384 to 8192 or 4096. This setting is for each sort, so it does become expensive in terms of memory when many sorts are being carried out. It does depend on the complexity of your sorts of course. Try to do a vacuum analyse in your crontab. If your aliases-file is set up correctly mails generated by crontab will be forwarded to a human being. I have the following in my (root) crontab (and mail to root forwarded to me): time /usr/local/bin/psql -d dbname -h dbhost -U username -c "vacuum analyse verbose;" > In summary, my questions: > > 1. Would running PG on FreeBSD 5.x or 6.x or Linux improve performance? Going to 6.x would probably increase overall performance, but you have to try it out first. Many people report increased performance just by upgrading, some report that it grinds to a halt. But SMP-wise 6.x is a more mature release than 4.x is. Changes to the kernel from being giant-locked in 4.x to be "fine-grained locked" started in 5.x and have improved in 6.x. The disk- and network-layer should behave better. Linux, don't know. If your expertise is in FreeBSD try this first and then move to Linux (or Solaris 10) if 6.x does not meet your expectations. > 3. Why isn't postgres using all 4GB of ram for at least caching table for reads? I guess it's related to the usage of the i386-architecture in general. If the zzeons are the newer noconas you can try the amd64-port instead. This can utilize more memory (without going through PAE). > 4. Are there any other settings in the conf file I could try to tweak? max_fsm_pages and max_fsm_relations. You can look at the bottom of vacuum analyze and increase the values: INFO: free space map: 153 relations, 43445 pages stored; 45328 total pages needed Raise max_fsm_pages so it meet or exceed 'total pages needed' and max_fsm_relations to relations. This is finetuning though. It's more important to set work- and maintenance-mem correct. hth Claus
Kenji Morishige <kenjim@juniper.net> writes: > ... We generally have somewhere between 150-200 connections to > the database at any given time and probably anywhere between 5-10 new > connections being made every second and about 100 queries per second. Most > of the queries and transactions are very small due to the fact that the tools > were designed to work around the small functionality of MySQL 3.23 DB. You should think seriously about putting in some sort of connection-pooling facility. Postgres backends aren't especially lightweight things; the overhead involved in forking a process and then getting its internal caches populated etc. is significant. You don't want to be doing that for one small query, at least not if you're doing so many times a second. > it seems as if the database is not making use of the available ram. Postgres generally relies on the kernel to do the bulk of the disk caching. Your shared_buffers setting of 30000 seems quite reasonable to me; I don't think you want to bump it up (not much anyway). I'm not too familiar with FreeBSD and so I'm not clear on what "Inact" is: > Mem: 181M Active, 2632M Inact, 329M Wired, 179M Cache, 199M Buf, 81M Free > Swap: 4096M Total, 216K Used, 4096M Free If "Inact" covers disk pages cached by the kernel then this is looking reasonably good. If it's something else then you got a problem, but fixing it is a kernel issue not a database issue. > #max_fsm_pages = 20000 # min max_fsm_relations*16, 6 bytes each You almost certainly need to bump this way up. 20000 is enough to cover dirty pages in about 200MB of database, which is only a fiftieth of what you say your disk footprint is. Unless most of your data is static, you're going to be suffering severe table bloat over time due to inability to recycle free space properly. regards, tom lane
On Fri, 2006-03-17 at 17:03, Claus Guttesen wrote: > > Here is my current configuration: > > Also, I may be overlooking a postgresql.conf setting. I have attached the > > config file. > > You could try to lower shared_buffers from 30000 to 16384. Setting > this value too high can in some cases be counterproductive according > to doc's I read. FYI, that was very true before 8.0, but since the introduction of better cache management algorithms, you can have pretty big shared_buffers settings. > Also try to lower work_mem from 16384 to 8192 or 4096. This setting is > for each sort, so it does become expensive in terms of memory when > many sorts are being carried out. It does depend on the complexity of > your sorts of course. But looking at his usage of RAM on his box, it doesn't look like one at the time that snapshot was taken. Assuming the box was busy then, he's OK. Otherwise, he'd show a usage of swapping, which he doesn't.
> 4. Are there any other settings in the conf file I could try to tweak? One more thing :-) I stumbled over this setting, this made the db (PG 7.4.9) make use of the index rather than doing a sequential scan and it reduced a query from several minutes to some 20 seconds. random_page_cost = 2 (original value was 4). Another thing you ought to do is to to get the four-five most used queries and do an explain analyze in these. Since our website wasn't prepared for this type of statistics I simply did a tcpdump, grep'ed all select's, sorted them and sorted them unique so I could see which queries were used most. regards Claus
Scott Marlowe wrote: > On Fri, 2006-03-17 at 16:11, Kenji Morishige wrote: > >>About a year ago we decided to migrate our central database that powers various >>intranet tools from MySQL to PostgreSQL. We have about 130 tables and about >>10GB of data that stores various status information for a variety of services >>for our intranet. We generally have somewhere between 150-200 connections to >>the database at any given time and probably anywhere between 5-10 new >>connections being made every second and about 100 queries per second. Most >>of the queries and transactions are very small due to the fact that the tools >>were designed to work around the small functionality of MySQL 3.23 DB. >>Our company primarily uses FreeBSD and we are stuck on FreeBSD 4.X series due >>to IT support issues, > > > There were a LOT of performance enhancements to FreeBSD with the 5.x > series release. I'd recommend fast tracking the database server to the > 5.x branch. 4-stable was release 6 years ago. 5-stable was released > two years ago. > > I would recommend skipping 5.x and using 6.0 - as it performs measurably better than 5.x. In particular the vfs layer is no longer under the GIANT lock, so you will get considerably improved concurrent filesystem access on your dual Xeon. Regards Mark
Thanks guys, I'm studying each of your responses and am going to start to experiement. Unfortunately, I don't have another box with similar specs to do a perfect experiment with, but I think I'm going to go ahead and open a service window to ungrade the box to FBSD6.0 and apply some other changes. It also gives me the chance to go from 8.0.1 to 8.1 series which I been wanting to do as well. Thanks guys and I will see if any of your suggestions make a noticable difference. I also have been looking at log result of slow queries and making necessary indexes to make those go faster. -Kenji On Sat, Mar 18, 2006 at 12:29:17AM +0100, Claus Guttesen wrote: > > 4. Are there any other settings in the conf file I could try to tweak? > > One more thing :-) > > I stumbled over this setting, this made the db (PG 7.4.9) make use of > the index rather than doing a sequential scan and it reduced a query > from several minutes to some 20 seconds. > > random_page_cost = 2 (original value was 4). > > Another thing you ought to do is to to get the four-five most used > queries and do an explain analyze in these. Since our website wasn't > prepared for this type of statistics I simply did a tcpdump, grep'ed > all select's, sorted them and sorted them unique so I could see which > queries were used most. > > regards > Claus
Kenji, On 3/17/06 4:08 PM, "Kenji Morishige" <kenjim@juniper.net> wrote: > Thanks guys, I'm studying each of your responses and am going to start to > experiement. I notice that no one asked you about your disk bandwidth - the Adaptec 2200S is a "known bad" controller - the bandwidth to/from in RAID5 is about 1/2 to 1/3 of a single disk drive, which is far too slow for a 10GB database, and IMO should disqualify a RAID adapter from being used at all. Without fixing this, I'd suggest that all of the other tuning described here will have little value, provided your working set is larger than your RAM. You should test the I/O bandwidth using these simple tests: time bash -c "dd if=/dev/zero of=bigfile bs=8k count=1000000 && sync" then: time dd if=bigfile of=/dev/null bs=8k You should get on the order of 150MB/s on four disk drives in RAID5. And before people jump in about "random I/O", etc, the sequential scan test will show whether the controller is just plain bad very quickly. If it can't do sequential fast, it won't do seeks fast either. - Luke
On Fri, Mar 17, 2006 at 05:00:34PM -0600, Scott Marlowe wrote: > > last pid: 5788; load averages: 0.32, 0.31, 0.28 up 127+15:16:0813:59:24 > > 169 processes: 1 running, 168 sleeping > > CPU states: 5.4% user, 0.0% nice, 9.9% system, 0.0% interrupt, 84.7% idle > > Mem: 181M Active, 2632M Inact, 329M Wired, 179M Cache, 199M Buf, 81M Free > > Swap: 4096M Total, 216K Used, 4096M Free > > > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND > > 14501 pgsql 2 0 254M 242M select 2 76:26 1.95% 1.95% postgre > > 5720 root 28 0 2164K 1360K CPU0 0 0:00 1.84% 0.88% top > > 5785 pgsql 2 0 255M 29296K sbwait 0 0:00 3.00% 0.15% postgre > > 5782 pgsql 2 0 255M 11900K sbwait 0 0:00 3.00% 0.15% postgre > > 5772 pgsql 2 0 255M 11708K sbwait 2 0:00 1.54% 0.15% postgre > > That doesn't look good. Is this machine freshly rebooted, or has it > been running postgres for a while? 179M cache and 199M buffer with 2.6 > gig inactive is horrible for a machine running a 10gig databases. No, this is perfectly fine. Inactive memory in FreeBSD isn't the same as Free. It's the same as 'active' memory except that it's pages that haven't been accessed in X amount of time (between 100 and 200 ms, I think). When free memory starts getting low, FBSD will start moving pages from the inactive queue to the free queue (possibly resulting in writes to disk along the way). IIRC, Cache is the directory cache, and Buf is disk buffers, which is somewhat akin to shared_buffers in PostgreSQL. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Mon, 2006-03-20 at 08:45, Jim C. Nasby wrote: > On Fri, Mar 17, 2006 at 05:00:34PM -0600, Scott Marlowe wrote: > > > last pid: 5788; load averages: 0.32, 0.31, 0.28 up 127+15:16:0813:59:24 > > > 169 processes: 1 running, 168 sleeping > > > CPU states: 5.4% user, 0.0% nice, 9.9% system, 0.0% interrupt, 84.7% idle > > > Mem: 181M Active, 2632M Inact, 329M Wired, 179M Cache, 199M Buf, 81M Free > > > Swap: 4096M Total, 216K Used, 4096M Free > > > > > > PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND > > > 14501 pgsql 2 0 254M 242M select 2 76:26 1.95% 1.95% postgre > > > 5720 root 28 0 2164K 1360K CPU0 0 0:00 1.84% 0.88% top > > > 5785 pgsql 2 0 255M 29296K sbwait 0 0:00 3.00% 0.15% postgre > > > 5782 pgsql 2 0 255M 11900K sbwait 0 0:00 3.00% 0.15% postgre > > > 5772 pgsql 2 0 255M 11708K sbwait 2 0:00 1.54% 0.15% postgre > > > > That doesn't look good. Is this machine freshly rebooted, or has it > > been running postgres for a while? 179M cache and 199M buffer with 2.6 > > gig inactive is horrible for a machine running a 10gig databases. > > No, this is perfectly fine. Inactive memory in FreeBSD isn't the same as > Free. It's the same as 'active' memory except that it's pages that > haven't been accessed in X amount of time (between 100 and 200 ms, I > think). When free memory starts getting low, FBSD will start moving > pages from the inactive queue to the free queue (possibly resulting in > writes to disk along the way). > > IIRC, Cache is the directory cache, and Buf is disk buffers, which is > somewhat akin to shared_buffers in PostgreSQL. So, then, the inact is pretty much the same as kernel buffers in linux?
On Mar 17, 2006, at 5:11 PM, Kenji Morishige wrote: > In summary, my questions: > > 1. Would running PG on FreeBSD 5.x or 6.x or Linux improve > performance? FreeBSD 6.x will definitely get you improvements. Many speedup improvements have been made to both the generic disk layer and the specific drivers. However, the current best of breed RAID controller is the LSI 320-x (I use 320-2X). I have one box into which this card will not fit (Thanks Sun, for making a box with only low-profile slots!) so I use an Adaptec 2230SLP card in it. Testing shows it is about 80% speed of a LSI 320-2x on sequential workload (load DB, run some queries, rebuild indexes, etc.) If you do put on FreeBSD 6, I'd love to see the output of "diskinfo - v -t" on your RAID volume(s). > > 2. Should I change SCSI controller config to use RAID 10 instead of 5? I use RAID10. > > 3. Why isn't postgres using all 4GB of ram for at least caching > table for reads? I think FreeBSD has a hard upper limit on the total ram it will use for disk cache. I haven't been able to get reliable, irrefutable, answers about it, though. > > 4. Are there any other settings in the conf file I could try to tweak? I like to bump up the checkpoint segments to 256.
On Mon, 20 Mar 2006, Jim C. Nasby wrote: > No, this is perfectly fine. Inactive memory in FreeBSD isn't the same as > Free. It's the same as 'active' memory except that it's pages that > haven't been accessed in X amount of time (between 100 and 200 ms, I > think). When free memory starts getting low, FBSD will start moving > pages from the inactive queue to the free queue (possibly resulting in > writes to disk along the way). > > IIRC, Cache is the directory cache, and Buf is disk buffers, which is > somewhat akin to shared_buffers in PostgreSQL. I don't believe that's true. I'm not an expert in FreeBSD's VM internals, but this is how I believe it works: Active pages are pages currently mapped in to a process's address space. Inactive pages are pages which are marked dirty (must be written to backing store before they can be freed) and which are not mapped in to a process's address. They're still associated with a VM object of some kind - like part of a process's virtual address space or a as part of the cache for a file on disk. If it's still part of a process's virtual address space and is accessed a fault is generated. The page is then put back in to the address mappings. Cached pages are like inactive pages but aren't dirty. Then can be either re-mapped or freed immediately. Free pages are properly free. Wired pages are unswappable. Buf I'm not sure about. It doesn't represent that amount of memory used to cache files on disk, I'm sure of that. The sysctl -d description is 'KVA memory used for bufs', so I suspect that it's the amount of kernel virtual address space mapped to pages in the 'active', 'inactive' and 'cache' queues. -- Alex Hayward Seatbooker
Vivek Khera wrote: > > On Mar 17, 2006, at 5:11 PM, Kenji Morishige wrote: > >> In summary, my questions: >> >> 1. Would running PG on FreeBSD 5.x or 6.x or Linux improve performance? > > > FreeBSD 6.x will definitely get you improvements. Many speedup > improvements have been made to both the generic disk layer and the > specific drivers. However, the current best of breed RAID controller > is the LSI 320-x (I use 320-2X). I have one box into which this > card will not fit (Thanks Sun, for making a box with only low-profile > slots!) so I use an Adaptec 2230SLP card in it. Testing shows it is > about 80% speed of a LSI 320-2x on sequential workload (load DB, run > some queries, rebuild indexes, etc.) > > If you do put on FreeBSD 6, I'd love to see the output of "diskinfo - > v -t" on your RAID volume(s). > Not directly related ... i have a HP dl380 g3 with array 5i controlled (1+0), these are my results shiva2# /usr/sbin/diskinfo -v -t /dev/da2s1d /dev/da2s1d 512 # sectorsize 218513555456 # mediasize in bytes (204G) 426784288 # mediasize in sectors 52301 # Cylinders according to firmware. 255 # Heads according to firmware. 32 # Sectors according to firmware. Seek times: Full stroke: 250 iter in 1.138232 sec = 4.553 msec Half stroke: 250 iter in 1.084474 sec = 4.338 msec Quarter stroke: 500 iter in 1.690313 sec = 3.381 msec Short forward: 400 iter in 0.752646 sec = 1.882 msec Short backward: 400 iter in 1.306270 sec = 3.266 msec Seq outer: 2048 iter in 0.766676 sec = 0.374 msec Seq inner: 2048 iter in 0.803759 sec = 0.392 msec Transfer rates: outside: 102400 kbytes in 2.075984 sec = 49326 kbytes/sec middle: 102400 kbytes in 2.100510 sec = 48750 kbytes/sec inside: 102400 kbytes in 2.042313 sec = 50139 kbytes/sec is this good enough?
Miguel, On 3/20/06 12:52 PM, "Miguel" <mmiranda@123.com.sv> wrote: > i have a HP dl380 g3 with array 5i controlled (1+0), these are my results Another "known bad" RAID controller. The Smartarray 5i is horrible on Linux - this is the first BSD result I've seen. > Seek times: > Full stroke: 250 iter in 1.138232 sec = 4.553 msec > Half stroke: 250 iter in 1.084474 sec = 4.338 msec These seem OK - are they "access times" or are they actually "seek times"? Seems like with RAID 10, you should get better by maybe double. > Transfer rates: > outside: 102400 kbytes in 2.075984 sec = 49326 kbytes/sec > middle: 102400 kbytes in 2.100510 sec = 48750 kbytes/sec > inside: 102400 kbytes in 2.042313 sec = 50139 kbytes/sec > > > is this good enough? It's pretty slow. How many disk drives do you have? - Luke
Luke Lonergan wrote: >Miguel, > >On 3/20/06 12:52 PM, "Miguel" <mmiranda@123.com.sv> wrote: > > > >>i have a HP dl380 g3 with array 5i controlled (1+0), these are my results >> >> > >Another "known bad" RAID controller. The Smartarray 5i is horrible on Linux >- this is the first BSD result I've seen. > > > >>Seek times: >> Full stroke: 250 iter in 1.138232 sec = 4.553 msec >> Half stroke: 250 iter in 1.084474 sec = 4.338 msec >> >> > >These seem OK - are they "access times" or are they actually "seek times"? > > i dont know, how can i check? >Transfer rates: > outside: 102400 kbytes in 2.075984 sec = 49326 kbytes/sec > middle: 102400 kbytes in 2.100510 sec = 48750 kbytes/sec > inside: 102400 kbytes in 2.042313 sec = 50139 kbytes/sec > > >is this good enough? >It's pretty slow. How many disk drives do you have? > > > > I have 6 ultra a320 72G 10k discs --- Miguel
Miguel, On 3/20/06 1:12 PM, "Miguel" <mmiranda@123.com.sv> wrote: > i dont know, how can i check? No matter - it's the benchmark that would tell you, it's probably "access time" that's being measured even though the text says "seek time". The difference is that seek time represents only the head motion, where access time is the whole access including seek. Access times of 4.5ms are typical of a single 10K RPM SCSI disk drive like the Seagate barracuda. >> Transfer rates: >> outside: 102400 kbytes in 2.075984 sec = 49326 kbytes/sec >> middle: 102400 kbytes in 2.100510 sec = 48750 kbytes/sec >> inside: 102400 kbytes in 2.042313 sec = 50139 kbytes/sec >> > I have 6 ultra a320 72G 10k discs Yah - ouch. With 6 drives in a RAID10, you should expect 3 drives worth of sequential scan performance, or anywhere from 100MB/s to 180MB/s. You're getting from half to 1/3 of the performance you'd get with a decent raid controller. If you add a simple SCSI adapter like the common LSI U320 adapter to your DL380G3 and then run software RAID, you will get more than 150MB/s with less CPU consumption. I'd also expect you'd get down to about 2ms access times. This might not be easy for you to do, and you might prefer hardware RAID adapters, but I don't have a recommendation for you there. I'd stay away from the HP line. - Luke
Luke Lonergan wrote: >>>Transfer rates: >>> outside: 102400 kbytes in 2.075984 sec = 49326 kbytes/sec >>> middle: 102400 kbytes in 2.100510 sec = 48750 kbytes/sec >>> inside: 102400 kbytes in 2.042313 sec = 50139 kbytes/sec >>> >>> >>> >>I have 6 ultra a320 72G 10k discs >> >> > >Yah - ouch. With 6 drives in a RAID10, you should expect 3 drives worth of >sequential scan performance, or anywhere from 100MB/s to 180MB/s. You're >getting from half to 1/3 of the performance you'd get with a decent raid >controller. > >If you add a simple SCSI adapter like the common LSI U320 adapter to your >DL380G3 and then run software RAID, you will get more than 150MB/s with less >CPU consumption. I'd also expect you'd get down to about 2ms access times. > >This might not be easy for you to do, and you might prefer hardware RAID >adapters, but I don't have a recommendation for you there. I'd stay away >from the HP line. > > > This is my new postgreql 8.1.3 server, so i have many options (in fact, any option) to choose from, i want maximum performance, if i undestood you well, do you mean using something like vinum? i forgot to mention that the 6 discs are in a MSA500 G2 external storadge, additionally i have two 36G a320 10k in raid 10 for the os installed in the server slots. --- Miguel
Miguel, On 3/20/06 1:51 PM, "Miguel" <mmiranda@123.com.sv> wrote: > i forgot to mention that the 6 discs are in a MSA500 G2 external > storadge, additionally i have two 36G a320 10k in raid 10 for the os > installed in the server slots. I just checked online and I think the MSA500 G2 has it's own SCSI RAID controllers, so you are actually just using the 5i as a SCSI attach, which it's not good at (no reordering/command queueing, etc). So, just using a simple SCSI adapter to connect to the MSA might be a big win. - Luke
Luke Lonergan wrote: >Miguel, > > >On 3/20/06 1:51 PM, "Miguel" <mmiranda@123.com.sv> wrote: > > > >>i forgot to mention that the 6 discs are in a MSA500 G2 external >>storadge, additionally i have two 36G a320 10k in raid 10 for the os >>installed in the server slots. >> >> > >I just checked online and I think the MSA500 G2 has it's own SCSI RAID >controllers, > Yes, it has its own redundant controller, > so you are actually just using the 5i as a SCSI attach, which >it's not good at (no reordering/command queueing, etc). So, just using a >simple SCSI adapter to connect to the MSA might be a big win. > > I will try a LS320 and will let you know if i got any performance gain, thanks for your advises --- Miguel
>> If you do put on FreeBSD 6, I'd love to see the output of >> "diskinfo - v -t" on your RAID volume(s). >> > Not directly related ... > i have a HP dl380 g3 with array 5i controlled (1+0), these are my > results > [...] > is this good enough? Is that on a loaded box or a mostly quiet box? Those number seem rather low for my tastes. For comparison, here are numbers from a Dell 1850 with a built-in PERC 4e/Si RAID in a two disk mirror. All numbers below are on mostly or totally quiet disk systems. amrd0 512 # sectorsize 73274490880 # mediasize in bytes (68G) 143114240 # mediasize in sectors 8908 # Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. Seek times: Full stroke: 250 iter in 0.756718 sec = 3.027 msec Half stroke: 250 iter in 0.717824 sec = 2.871 msec Quarter stroke: 500 iter in 1.972368 sec = 3.945 msec Short forward: 400 iter in 1.193179 sec = 2.983 msec Short backward: 400 iter in 1.322440 sec = 3.306 msec Seq outer: 2048 iter in 0.271402 sec = 0.133 msec Seq inner: 2048 iter in 0.271151 sec = 0.132 msec Transfer rates: outside: 102400 kbytes in 1.080339 sec = 94785 kbytes/sec middle: 102400 kbytes in 1.166021 sec = 87820 kbytes/sec inside: 102400 kbytes in 1.461498 sec = 70065 kbytes/sec And for the *real* disks.... In the following two cases, I used a Dell 1425SC with 1GB RAM and connected the controllers to the same Dell PowerVault 14 disk U320 array (one controller at a time, obviously). For each controller each pair of the mirror was on the opposite channel of the controller for optimal speed. disk 0 is a RAID1 of two drives, and disk 1 is a RAID10 of the remaining 12 drives. All running FreeBSD 6.0 RELEASE. First I tested the Adaptec 2230SLP and got these: aacd0 512 # sectorsize 36385456128 # mediasize in bytes (34G) 71065344 # mediasize in sectors 4423 # Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. Seek times: Full stroke: 250 iter in 2.288389 sec = 9.154 msec Half stroke: 250 iter in 1.657302 sec = 6.629 msec Quarter stroke: 500 iter in 2.756597 sec = 5.513 msec Short forward: 400 iter in 1.205275 sec = 3.013 msec Short backward: 400 iter in 1.249310 sec = 3.123 msec Seq outer: 2048 iter in 0.412770 sec = 0.202 msec Seq inner: 2048 iter in 0.428585 sec = 0.209 msec Transfer rates: outside: 102400 kbytes in 1.204412 sec = 85021 kbytes/sec middle: 102400 kbytes in 1.347325 sec = 76002 kbytes/sec inside: 102400 kbytes in 2.036832 sec = 50274 kbytes/sec aacd1 512 # sectorsize 218307231744 # mediasize in bytes (203G) 426381312 # mediasize in sectors 26541 # Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. Seek times: Full stroke: 250 iter in 0.856699 sec = 3.427 msec Half stroke: 250 iter in 1.475651 sec = 5.903 msec Quarter stroke: 500 iter in 2.693270 sec = 5.387 msec Short forward: 400 iter in 1.127831 sec = 2.820 msec Short backward: 400 iter in 1.216876 sec = 3.042 msec Seq outer: 2048 iter in 0.416340 sec = 0.203 msec Seq inner: 2048 iter in 0.436471 sec = 0.213 msec Transfer rates: outside: 102400 kbytes in 1.245798 sec = 82196 kbytes/sec middle: 102400 kbytes in 1.169033 sec = 87594 kbytes/sec inside: 102400 kbytes in 1.390840 sec = 73625 kbytes/sec And the LSI 320-2X card: amrd0 512 # sectorsize 35999711232 # mediasize in bytes (34G) 70311936 # mediasize in sectors 4376 # Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. Seek times: Full stroke: 250 iter in 0.737130 sec = 2.949 msec Half stroke: 250 iter in 0.694498 sec = 2.778 msec Quarter stroke: 500 iter in 2.040667 sec = 4.081 msec Short forward: 400 iter in 1.418592 sec = 3.546 msec Short backward: 400 iter in 0.896076 sec = 2.240 msec Seq outer: 2048 iter in 0.292390 sec = 0.143 msec Seq inner: 2048 iter in 0.300836 sec = 0.147 msec Transfer rates: outside: 102400 kbytes in 1.102025 sec = 92920 kbytes/sec middle: 102400 kbytes in 1.247608 sec = 82077 kbytes/sec inside: 102400 kbytes in 1.905603 sec = 53736 kbytes/sec amrd1 512 # sectorsize 215998267392 # mediasize in bytes (201G) 421871616 # mediasize in sectors 26260 # Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. Seek times: Full stroke: 250 iter in 0.741648 sec = 2.967 msec Half stroke: 250 iter in 1.021720 sec = 4.087 msec Quarter stroke: 500 iter in 2.220321 sec = 4.441 msec Short forward: 400 iter in 0.945948 sec = 2.365 msec Short backward: 400 iter in 1.036555 sec = 2.591 msec Seq outer: 2048 iter in 0.378911 sec = 0.185 msec Seq inner: 2048 iter in 0.457275 sec = 0.223 msec Transfer rates: outside: 102400 kbytes in 0.986572 sec = 103794 kbytes/sec middle: 102400 kbytes in 0.998528 sec = 102551 kbytes/sec inside: 102400 kbytes in 0.857322 sec = 119442 kbytes/sec
Vivek Khera wrote: >>> If you do put on FreeBSD 6, I'd love to see the output of "diskinfo >>> - v -t" on your RAID volume(s). >>> >> Not directly related ... >> i have a HP dl380 g3 with array 5i controlled (1+0), these are my >> results >> [...] >> is this good enough? > > > Is that on a loaded box or a mostly quiet box? Those number seem > rather low for my tastes. For comparison, here are numbers from a > Dell 1850 with a built-in PERC 4e/Si RAID in a two disk mirror. All > numbers below are on mostly or totally quiet disk systems. My numbers are on totally quiet box, i've just installed it. > > amrd0 > 512 # sectorsize > 73274490880 # mediasize in bytes (68G) > 143114240 # mediasize in sectors > 8908 # Cylinders according to firmware. > 255 # Heads according to firmware. > 63 # Sectors according to firmware. > > Seek times: > Full stroke: 250 iter in 0.756718 sec = 3.027 msec > Half stroke: 250 iter in 0.717824 sec = 2.871 msec > Quarter stroke: 500 iter in 1.972368 sec = 3.945 msec > Short forward: 400 iter in 1.193179 sec = 2.983 msec > Short backward: 400 iter in 1.322440 sec = 3.306 msec > Seq outer: 2048 iter in 0.271402 sec = 0.133 msec > Seq inner: 2048 iter in 0.271151 sec = 0.132 msec > Transfer rates: > outside: 102400 kbytes in 1.080339 sec = 94785 > kbytes/sec > middle: 102400 kbytes in 1.166021 sec = 87820 > kbytes/sec > inside: 102400 kbytes in 1.461498 sec = 70065 > kbytes/sec > > Umm, in my box i see better seektimes but worst transfer rates, does it make sense? i think i have something wrong, the question i cant answer is what tunning am i missing? --- Miguel
On Mar 20, 2006, at 6:04 PM, Miguel wrote: > Umm, in my box i see better seektimes but worst transfer rates, > does it make sense? > i think i have something wrong, the question i cant answer is what > tunning am i missing? Well, I forgot to mention I have 15k RPM disks, so the transfers should be faster. I did no tuning to the disk configurations. I think your controller is either just not supported well in FreeBSD, or is bad in general... I *really* wish LSI would make a low profile card that would fit in a Sun X4100... as it stands the only choice for dual channel cards is the adaptec 2230SLP...
Vivek Khera wrote: > > On Mar 20, 2006, at 6:04 PM, Miguel wrote: > >> Umm, in my box i see better seektimes but worst transfer rates, does >> it make sense? >> i think i have something wrong, the question i cant answer is what >> tunning am i missing? > > > Well, I forgot to mention I have 15k RPM disks, so the transfers > should be faster. > > I did no tuning to the disk configurations. I think your controller > is either just not supported well in FreeBSD, or is bad in general... :-( I guess you are right, i made a really bad choice, i better look at dell next time, thanks --- Miguel
This is a 2-Disk Linux software RAID1 with 2 7200RPM IDE Drives, 1 PATA and 1 SATA : apollo13 ~ # hdparm -t /dev/md0 /dev/md0: Timing buffered disk reads: 156 MB in 3.02 seconds = 51.58 MB/sec apollo13 ~ # hdparm -t /dev/md0 /dev/md0: Timing buffered disk reads: 168 MB in 3.06 seconds = 54.87 MB/sec This is a 5-Disk Linux software RAID5 with 4 7200RPM IDE Drives and 1 5400RPM, 3 SATA and 2 PATA: apollo13 ~ # hdparm -t /dev/md2 /dev/md2: Timing buffered disk reads: 348 MB in 3.17 seconds = 109.66 MB/sec apollo13 ~ # hdparm -t /dev/md2 /dev/md2: Timing buffered disk reads: 424 MB in 3.00 seconds = 141.21 MB/sec apollo13 ~ # hdparm -t /dev/md2 /dev/md2: Timing buffered disk reads: 426 MB in 3.00 seconds = 141.88 MB/sec apollo13 ~ # hdparm -t /dev/md2 /dev/md2: Timing buffered disk reads: 426 MB in 3.01 seconds = 141.64 MB/sec The machine is a desktop Athlon 64 3000+, buggy nforce3 chipset, 1G DDR400, Gentoo Linux 2.6.15-ck4 running in 64 bit mode. The bottleneck is the PCI bus. Expensive SCSI hardware RAID cards with expensive 10Krpm harddisks should not get humiliated by such a simple (and cheap) setup. (I'm referring to the 12-drive RAID10 mentioned before, not the other one which was a simple 2-disk mirror). Toms hardware benchmarked some hardware RAIDs and got humongous transfer rates... hm ?
Scott Marlowe wrote: > On Mon, 2006-03-20 at 08:45, Jim C. Nasby wrote: > >>On Fri, Mar 17, 2006 at 05:00:34PM -0600, Scott Marlowe wrote: >> >>>>last pid: 5788; load averages: 0.32, 0.31, 0.28 up 127+15:16:0813:59:24 >>>>169 processes: 1 running, 168 sleeping >>>>CPU states: 5.4% user, 0.0% nice, 9.9% system, 0.0% interrupt, 84.7% idle >>>>Mem: 181M Active, 2632M Inact, 329M Wired, 179M Cache, 199M Buf, 81M Free >>>>Swap: 4096M Total, 216K Used, 4096M Free >>>> >>>> PID USERNAME PRI NICE SIZE RES STATE C TIME WCPU CPU COMMAND >>>>14501 pgsql 2 0 254M 242M select 2 76:26 1.95% 1.95% postgre >>>> 5720 root 28 0 2164K 1360K CPU0 0 0:00 1.84% 0.88% top >>>> 5785 pgsql 2 0 255M 29296K sbwait 0 0:00 3.00% 0.15% postgre >>>> 5782 pgsql 2 0 255M 11900K sbwait 0 0:00 3.00% 0.15% postgre >>>> 5772 pgsql 2 0 255M 11708K sbwait 2 0:00 1.54% 0.15% postgre >>> >>>That doesn't look good. Is this machine freshly rebooted, or has it >>>been running postgres for a while? 179M cache and 199M buffer with 2.6 >>>gig inactive is horrible for a machine running a 10gig databases. >> >>No, this is perfectly fine. Inactive memory in FreeBSD isn't the same as >>Free. It's the same as 'active' memory except that it's pages that >>haven't been accessed in X amount of time (between 100 and 200 ms, I >>think). When free memory starts getting low, FBSD will start moving >>pages from the inactive queue to the free queue (possibly resulting in >>writes to disk along the way). >> >>IIRC, Cache is the directory cache, and Buf is disk buffers, which is >>somewhat akin to shared_buffers in PostgreSQL. > > > So, then, the inact is pretty much the same as kernel buffers in linux? > I think Freebsd 'Inactive' corresponds pretty closely to Linux's 'Inactive Dirty'|'Inactive Laundered'|'Inactive Free'. From what I can see, 'Buf' is a bit misleading e.g. read a 1G file randomly and you increase 'Inactive' by about 1G - 'Buf' might get to 200M. However read the file again and you'll see zero i/o in vmstat or gstat. From reading the Freebsd architecture docs, I think 'Buf' consists of those pages from 'Inactive' or 'Active' that were last kvm mapped for read/write operations. However 'Buf' is restricted to a fairly small size (various sysctls), so really only provides a lower bound on the file buffer cache activity. Sorry to not really answer your question Scott - how are Linux kernel buffers actually defined? Cheers Mark
Mark Kirkwood wrote: > > I think Freebsd 'Inactive' corresponds pretty closely to Linux's > 'Inactive Dirty'|'Inactive Laundered'|'Inactive Free'. > Hmmm - on second thoughts I think I've got that wrong :-(, since in Linux all the file buffer pages appear in 'Cached' don't they... (I also notice that 'Inactive Laundered' does not seem to be mentioned in vanilla - read non-Redhat - 2.6 kernels) So I think its more correct to say Freebsd 'Inactive' is similar to Linux 'Inactive' + some|most of Linux 'Cached'. A good discussion of how the Freebsd vm works is here: http://www.freebsd.org/doc/en_US.ISO8859-1/books/arch-handbook/vm.html In particular: "FreeBSD reserves a limited amount of KVM to hold mappings from struct bufs, but it should be made clear that this KVM is used solely to hold mappings and does not limit the ability to cache data." Cheers Mark
On Tue, Mar 21, 2006 at 03:51:35PM +1200, Mark Kirkwood wrote: > Mark Kirkwood wrote: > > > >I think Freebsd 'Inactive' corresponds pretty closely to Linux's > >'Inactive Dirty'|'Inactive Laundered'|'Inactive Free'. > > > > Hmmm - on second thoughts I think I've got that wrong :-(, since in > Linux all the file buffer pages appear in 'Cached' don't they... > > (I also notice that 'Inactive Laundered' does not seem to be mentioned > in vanilla - read non-Redhat - 2.6 kernels) > > So I think its more correct to say Freebsd 'Inactive' is similar to > Linux 'Inactive' + some|most of Linux 'Cached'. > > A good discussion of how the Freebsd vm works is here: > > http://www.freebsd.org/doc/en_US.ISO8859-1/books/arch-handbook/vm.html > > In particular: > > "FreeBSD reserves a limited amount of KVM to hold mappings from struct > bufs, but it should be made clear that this KVM is used solely to hold > mappings and does not limit the ability to cache data." It's worth noting that starting in either 2.4 or 2.6, linux pretty much adopted the FreeBSD VM system (or so I've been told). -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Mon, Mar 20, 2006 at 07:46:13PM +0000, Alex Hayward wrote: > On Mon, 20 Mar 2006, Jim C. Nasby wrote: > > > No, this is perfectly fine. Inactive memory in FreeBSD isn't the same as > > Free. It's the same as 'active' memory except that it's pages that > > haven't been accessed in X amount of time (between 100 and 200 ms, I > > think). When free memory starts getting low, FBSD will start moving > > pages from the inactive queue to the free queue (possibly resulting in > > writes to disk along the way). > > > > IIRC, Cache is the directory cache, and Buf is disk buffers, which is > > somewhat akin to shared_buffers in PostgreSQL. > > I don't believe that's true. I'm not an expert in FreeBSD's VM internals, > but this is how I believe it works: > > Active pages are pages currently mapped in to a process's address space. > > Inactive pages are pages which are marked dirty (must be written to > backing store before they can be freed) and which are not mapped in to a > process's address. They're still associated with a VM object of some kind Actually, a page that is in the inactive queue *may* be dirty. In fact, if you start with a freshly booted system (or one that's been recently starved of memory) and read in a large file, you'll see the inactive queue grow even though the pages haven't been dirtied. > - like part of a process's virtual address space or a as part of the cache > for a file on disk. If it's still part of a process's virtual address > space and is accessed a fault is generated. The page is then put back in > to the address mappings. > > Cached pages are like inactive pages but aren't dirty. Then can be either > re-mapped or freed immediately. > > Free pages are properly free. Wired pages are unswappable. Buf I'm not > sure about. It doesn't represent that amount of memory used to cache files > on disk, I'm sure of that. The sysctl -d description is 'KVA memory used > for bufs', so I suspect that it's the amount of kernel virtual address > space mapped to pages in the 'active', 'inactive' and 'cache' queues. > > -- > Alex Hayward > Seatbooker > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Mon, Mar 20, 2006 at 02:15:22PM -0500, Vivek Khera wrote: > I think FreeBSD has a hard upper limit on the total ram it will use > for disk cache. I haven't been able to get reliable, irrefutable, > answers about it, though. It does not. Any memory in the inactive queue is effectively your 'disk cache'. Pages start out in the active queue, and if they aren't used fairly frequently they will move into the inactive queue. From there they will be moved to the cache queue, but only if the cache queue falls below a certain threshold, because in order to go into the cache queue the page must be marked clean, possibly incurring a write to disk. AFAIK pages only go into the free queue if they have been completely released by all objects that were referencing them, so it's theoretically posisble for that queue to go to 0. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim C. Nasby wrote: > On Mon, Mar 20, 2006 at 02:15:22PM -0500, Vivek Khera wrote: > >>I think FreeBSD has a hard upper limit on the total ram it will use >>for disk cache. I haven't been able to get reliable, irrefutable, >>answers about it, though. > > > It does not. Any memory in the inactive queue is effectively your 'disk > cache'. Pages start out in the active queue, and if they aren't used > fairly frequently they will move into the inactive queue. From there > they will be moved to the cache queue, but only if the cache queue falls > below a certain threshold, because in order to go into the cache queue > the page must be marked clean, possibly incurring a write to disk. AFAIK > pages only go into the free queue if they have been completely released > by all objects that were referencing them, so it's theoretically > posisble for that queue to go to 0. Exactly. The so-called limit (controllable via various sysctl's) is on the amount of memory used for kvm mapped pages, not cached pages, i.e - its a subset of the cached pages that are set up for immediate access (the others require merely to be shifted from the 'Inactive' queue to this one before they can be operated on - a relatively cheap operation). So its really all about accounting, in a sense - whether pages end up in the 'Buf' or 'Inactive' queue, they are still cached! Cheers Mark
On Tue, Mar 21, 2006 at 11:03:26PM +1200, Mark Kirkwood wrote: > Jim C. Nasby wrote: > >On Mon, Mar 20, 2006 at 02:15:22PM -0500, Vivek Khera wrote: > > > >>I think FreeBSD has a hard upper limit on the total ram it will use > >>for disk cache. I haven't been able to get reliable, irrefutable, > >>answers about it, though. > > > > > >It does not. Any memory in the inactive queue is effectively your 'disk > >cache'. Pages start out in the active queue, and if they aren't used > >fairly frequently they will move into the inactive queue. From there > >they will be moved to the cache queue, but only if the cache queue falls > >below a certain threshold, because in order to go into the cache queue > >the page must be marked clean, possibly incurring a write to disk. AFAIK > >pages only go into the free queue if they have been completely released > >by all objects that were referencing them, so it's theoretically > >posisble for that queue to go to 0. > > Exactly. > > The so-called limit (controllable via various sysctl's) is on the amount > of memory used for kvm mapped pages, not cached pages, i.e - its a > subset of the cached pages that are set up for immediate access (the > others require merely to be shifted from the 'Inactive' queue to this > one before they can be operated on - a relatively cheap operation). > > So its really all about accounting, in a sense - whether pages end up in > the 'Buf' or 'Inactive' queue, they are still cached! So what's the difference between Buf and Active then? Just that active means it's a code page, or that it's been directly mapped into a processes memory (perhaps via mmap)? -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Mon, Mar 20, 2006 at 01:27:56PM -0800, Luke Lonergan wrote: > >> Transfer rates: > >> outside: 102400 kbytes in 2.075984 sec = 49326 kbytes/sec > >> middle: 102400 kbytes in 2.100510 sec = 48750 kbytes/sec > >> inside: 102400 kbytes in 2.042313 sec = 50139 kbytes/sec > >> > > I have 6 ultra a320 72G 10k discs > > Yah - ouch. With 6 drives in a RAID10, you should expect 3 drives worth of > sequential scan performance, or anywhere from 100MB/s to 180MB/s. You're > getting from half to 1/3 of the performance you'd get with a decent raid > controller. > > If you add a simple SCSI adapter like the common LSI U320 adapter to your > DL380G3 and then run software RAID, you will get more than 150MB/s with less > CPU consumption. I'd also expect you'd get down to about 2ms access times. FWIW, here's my dirt-simple workstation, with 2 segate SATA drives setup as a mirror using software (first the mirror, then one of the raw drives): decibel@noel.2[5:43]~:15>sudo diskinfo -vt /dev/mirror/gm0 Password: /dev/mirror/gm0 512 # sectorsize 300069051904 # mediasize in bytes (279G) 586072367 # mediasize in sectors Seek times: Full stroke: 250 iter in 1.416409 sec = 5.666 msec Half stroke: 250 iter in 1.404503 sec = 5.618 msec Quarter stroke: 500 iter in 2.887344 sec = 5.775 msec Short forward: 400 iter in 2.101949 sec = 5.255 msec Short backward: 400 iter in 2.373578 sec = 5.934 msec Seq outer: 2048 iter in 0.209539 sec = 0.102 msec Seq inner: 2048 iter in 0.347499 sec = 0.170 msec Transfer rates: outside: 102400 kbytes in 3.183924 sec = 32162 kbytes/sec middle: 102400 kbytes in 3.216232 sec = 31838 kbytes/sec inside: 102400 kbytes in 4.242779 sec = 24135 kbytes/sec decibel@noel.2[5:43]~:16>sudo diskinfo -vt /dev/ad4 /dev/ad4 512 # sectorsize 300069052416 # mediasize in bytes (279G) 586072368 # mediasize in sectors 581421 # Cylinders according to firmware. 16 # Heads according to firmware. 63 # Sectors according to firmware. Seek times: Full stroke: 250 iter in 5.835744 sec = 23.343 msec Half stroke: 250 iter in 4.364424 sec = 17.458 msec Quarter stroke: 500 iter in 6.981597 sec = 13.963 msec Short forward: 400 iter in 2.157210 sec = 5.393 msec Short backward: 400 iter in 2.330445 sec = 5.826 msec Seq outer: 2048 iter in 0.181176 sec = 0.088 msec Seq inner: 2048 iter in 0.198974 sec = 0.097 msec Transfer rates: outside: 102400 kbytes in 1.715810 sec = 59680 kbytes/sec middle: 102400 kbytes in 1.937027 sec = 52865 kbytes/sec inside: 102400 kbytes in 3.260515 sec = 31406 kbytes/sec No, I don't know why the transfer rates for the mirror are 1/2 that as the raw device. :( -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Tue, 21 Mar 2006, Jim C. Nasby wrote: > On Tue, Mar 21, 2006 at 11:03:26PM +1200, Mark Kirkwood wrote: > > > > So its really all about accounting, in a sense - whether pages end up in > > the 'Buf' or 'Inactive' queue, they are still cached! > > So what's the difference between Buf and Active then? Just that active > means it's a code page, or that it's been directly mapped into a > processes memory (perhaps via mmap)? I don't think that Buf and Active are mutually exclusive. Try adding up Active, Inactive, Cache, Wired, Buf and Free - it'll come to more than your physical memory. Active gives an amount of physical memory. Buf gives an amount of kernel-space virtual memory which provide the kernel with a window on to pages in the other categories. In fact, I don't think that 'Buf' really belongs in the list as it doesn't represent a 'type' of page at all. -- Alex Hayward Seatbooker
On Tue, Mar 21, 2006 at 12:22:31PM +0000, Alex Hayward wrote: > On Tue, 21 Mar 2006, Jim C. Nasby wrote: > > > On Tue, Mar 21, 2006 at 11:03:26PM +1200, Mark Kirkwood wrote: > > > > > > So its really all about accounting, in a sense - whether pages end up in > > > the 'Buf' or 'Inactive' queue, they are still cached! > > > > So what's the difference between Buf and Active then? Just that active > > means it's a code page, or that it's been directly mapped into a > > processes memory (perhaps via mmap)? > > I don't think that Buf and Active are mutually exclusive. Try adding up > Active, Inactive, Cache, Wired, Buf and Free - it'll come to more than > your physical memory. > > Active gives an amount of physical memory. Buf gives an amount of > kernel-space virtual memory which provide the kernel with a window on to > pages in the other categories. In fact, I don't think that 'Buf' really > belongs in the list as it doesn't represent a 'type' of page at all. Ahhh, I get it... a KVM (what's that stand for anyway?) is required any time the kernel wants to access a page that doesn't belong to it, right? And actually, I just checked 4 machines and adding all the queues plus buf together didn't add up to total memory except on one of them (there adding just the queues came close; 1507.6MB on a 1.5GB machine). -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Jim, On 3/21/06 3:49 AM, "Jim C. Nasby" <jnasby@pervasive.com> wrote: > No, I don't know why the transfer rates for the mirror are 1/2 that as the raw > device. :( Well - lessee. Would those drives be attached to a Silicon Image (SII) SATA controller? A Highpoint? I found in testing about 2 years ago that under Linux (looks like you're BSD), most SATA controllers other than the Intel PIIX are horribly broken from a performance standpoint, probably due to bad drivers but I'm not sure. Now I think whatever is commonly used by Nforce 4 implementations seems to work ok, but we don't count on them for RAID configurations yet. - Luke
On Mar 20, 2006, at 6:27 PM, PFC wrote: > Expensive SCSI hardware RAID cards with expensive 10Krpm harddisks > should not get humiliated by such a simple (and cheap) setup. (I'm > referring to the 12-drive RAID10 mentioned before, not the other > one which was a simple 2-disk mirror). Toms hardware benchmarked > some hardware RAIDs and got humongous transfer rates... hm ? > I'll put up my "slow" 12 disk SCSI array up against your IDE array on a large parallel load any day.
On Mar 21, 2006, at 6:03 AM, Mark Kirkwood wrote: > The so-called limit (controllable via various sysctl's) is on the > amount of memory used for kvm mapped pages, not cached pages, i.e - > its a subset of the cached pages that are set up for immediate > access (the Thanks... now that makes sense to me.
> decibel@noel.2[5:43]~:15>sudo diskinfo -vt /dev/mirror/gm0 Can anyone point me to where I can find diskinfo or an equivalent to run on my debian system, I have been googling for the last hour but can't find it! I would like to analyse my own disk setup for comparison Thanks for any help Adam -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
On Tue, Mar 21, 2006 at 07:25:07AM -0800, Luke Lonergan wrote: > Jim, > > On 3/21/06 3:49 AM, "Jim C. Nasby" <jnasby@pervasive.com> wrote: > > > No, I don't know why the transfer rates for the mirror are 1/2 that as the raw > > device. :( > > Well - lessee. Would those drives be attached to a Silicon Image (SII) SATA > controller? A Highpoint? > > I found in testing about 2 years ago that under Linux (looks like you're > BSD), most SATA controllers other than the Intel PIIX are horribly broken > from a performance standpoint, probably due to bad drivers but I'm not sure. > > Now I think whatever is commonly used by Nforce 4 implementations seems to > work ok, but we don't count on them for RAID configurations yet. atapci1: <nVidia nForce4 SATA150 controller> And note that this is using FreeBSD gmirror, not the built-in raid controller. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
>> Expensive SCSI hardware RAID cards with expensive 10Krpm harddisks >> should not get humiliated by such a simple (and cheap) setup. (I'm >> referring to the 12-drive RAID10 mentioned before, not the other one >> which was a simple 2-disk mirror). Toms hardware benchmarked some >> hardware RAIDs and got humongous transfer rates... hm ? >> > > I'll put up my "slow" 12 disk SCSI array up against your IDE array on a > large parallel load any day. Sure, and I have no doubt that yours will be immensely faster on parallel loads than mine, but still, it should also be the case on sequential scan... especially since I have desktop PCI and the original poster has a real server with PCI-X I think.
Adam Witney wrote: > > >>decibel@noel.2[5:43]~:15>sudo diskinfo -vt /dev/mirror/gm0 > > > Can anyone point me to where I can find diskinfo or an equivalent to run on > my debian system, I have been googling for the last hour but can't find it! > I would like to analyse my own disk setup for comparison > I guess you could use hdparm (-t or -T flags do a simple benchmark). Though iozone or bonnie++ are probably better. Cheers Mark
On Wed, 22 Mar 2006, Mark Kirkwood wrote: > Adam Witney wrote: >> >>> decibel@noel.2[5:43]~:15>sudo diskinfo -vt /dev/mirror/gm0 >> >> Can anyone point me to where I can find diskinfo or an equivalent to run on >> my debian system, I have been googling for the last hour but can't find it! >> I would like to analyse my own disk setup for comparison > > I guess you could use hdparm (-t or -T flags do a simple benchmark). > > Though iozone or bonnie++ are probably better. You might also have a look at lmdd for sequential read/write performance from the lmbench suite: http://sourceforge.net/projects/lmbench As numbers from lmdd are seen on this frequently. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
On Mar 21, 2006, at 2:04 PM, PFC wrote: > especially since I have desktop PCI and the original poster has a > real server with PCI-X I think. that was me :-) but yeah, I never seem to get full line speed for some reason. i don't know if it is because of inadequate measurement tools or what...
On Mar 21, 2006, at 12:59 PM, Jim C. Nasby wrote: > atapci1: <nVidia nForce4 SATA150 controller> > > And note that this is using FreeBSD gmirror, not the built-in raid > controller. I get similar counter-intuitive slowdown with gmirror SATA disks on an IBM e326m I'm evaluating. If/when I buy one I'll get the onboard SCSI RAID instead. The IBM uses ServerWorks chipset, which shows up to freebsd 6.0 as "generic ATA" and only does UDMA33 transfers.