Re: Bad iostat numbers - Mailing list pgsql-performance
From | Greg Smith |
---|---|
Subject | Re: Bad iostat numbers |
Date | |
Msg-id | Pine.GSO.4.64.0612032336250.19679@westnet.com Whole thread Raw |
In response to | Bad iostat numbers ("Carlos H. Reimer" <carlos.reimer@opendb.com.br>) |
Responses |
Re: Bad iostat numbers
|
List | pgsql-performance |
On Thu, 30 Nov 2006, Carlos H. Reimer wrote: > I would like to discover how much cache is present in > the controller, how can I find this value from Linux? As far as I know there is no cache on an Adaptec 39320. The write-back cache Linux was reporting on was the one in the drives, which is 8MB; see http://www.seagate.com/cda/products/discsales/enterprise/tech/1,1593,541,00.html Be warned that running your database with the combination of an uncached controller plus disks with write caching is dangerous to your database integrity. There is a common problem with the Linux driver for this card (aic7902) where it enters what's they're calling an "Infinite Interrupt Loop". That seems to match your readings: > Here is a typical iostat -x: > Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s > sda 0.00 7.80 0.40 6.40 41.60 113.60 20.80 56.80 > avgrq-sz avgqu-sz await svctm %util > 22.82 570697.50 10.59 147.06 100.00 An avgqu-sz of 570697.50 is extremely large. That explains why the utilization is 100%, because there's a massive number of I/O operations queued up that aren't getting flushed out. The read and write data says these drives are barely doing anything, as 20kB/s and 57KB/s are practically idle; they're not even remotely close to saturated. See http://lkml.org/lkml/2005/10/1/47 for a suggested workaround that may reduce the magnitude of this issue; lower the card's speed to U160 in the BIOS was also listed as a useful workaround. You might get better results by upgrading to a newer Linux kernel, and just rebooting to clear out the garbage might help if you haven't tried that yet. On the pessimistic side, other people reporting issues with this controller are: http://lkml.org/lkml/2005/12/17/55 http://www.ussg.iu.edu/hypermail/linux/kernel/0512.2/0390.html http://www.linuxforums.org/forum/peripherals-hardware/59306-scsi-hangs-boot.html and even under FreeBSD at http://lists.freebsd.org/pipermail/aic7xxx/2003-August/003973.html This Adaptec card just barely works under Linux, which happens regularly with their controllers, and my guess is that you've run into one of the ways it goes crazy sometimes. I just chuckled when checking http://linux.adaptec.com/ again and noticing they can't even be bothered to keep that server up at all. According to http://www.adaptec.com/en-US/downloads/linux_source/linux_source_code?productId=ASC-39320-R&dn=Adaptec+SCSI+Card+39320-R the driver for your card is "*minimally tested* for Linux Kernel v2.6 on all platforms." Adaptec doesn't care about Linux support on their products; if you want a SCSI controller that actually works under Linux, get an LSI MegaRAID. If this were really a Postgres problem, I wouldn't expect %iowait=1.10. Were the database engine waiting to read/write data, that number would be dramatically higher. Whatever is generating all these I/O requests, it's not waiting for them to complete like the database would be. Besides the driver problems that I'm very suspicious of, I'd suspect a runaway process writing garbage to the disks might also cause this behavior. > Ive taken a look in the /var/log/messages and found some temperature > messages about the disk drives: > Nov 30 11:08:07 totall smartd[1620]: Device: /dev/sda, Temperature changed 2 > Celsius to 51 Celsius since last report > Can this temperature influence in the performance? That's close to the upper tolerance for this drive (55 degrees), which means the drive is being cooked and will likely wear out quickly. But that won't slow it down, and you'd get much scarier messages out of smartd if the drives had a real problem. You should improve cooling in this case if you want to drives to have a healthy life, odds are low this is relevant to your performance issue though. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
pgsql-performance by date: