Thread: "iowait" bug?
I just discovered this on a LinkedIn user group: http://bugzilla.kernel.org/show_bug.cgi?id=12309 Is anyone here seeing evidence of this in PostgreSQL?? -- M. Edward (Ed) Borasky http://www.linkedin.com/in/edborasky I've never met a happy clam. In fact, most of them were pretty steamed.
2009/3/21 M. Edward (Ed) Borasky <zznmeb@gmail.com>: > I just discovered this on a LinkedIn user group: > > http://bugzilla.kernel.org/show_bug.cgi?id=12309 > > Is anyone here seeing evidence of this in PostgreSQL?? I've been hit by an I/O wait problem, as described here: https://bugzilla.redhat.com/show_bug.cgi?id=444759 I've told it to that other bug, but no one seems to have followed that path. Regards, Laurent
On Sun, Mar 22, 2009 at 8:49 AM, Laurent Wandrebeck <l.wandrebeck@gmail.com> wrote: > 2009/3/21 M. Edward (Ed) Borasky <zznmeb@gmail.com>: >> I just discovered this on a LinkedIn user group: >> >> http://bugzilla.kernel.org/show_bug.cgi?id=12309 >> >> Is anyone here seeing evidence of this in PostgreSQL?? > I've been hit by an I/O wait problem, as described here: > https://bugzilla.redhat.com/show_bug.cgi?id=444759 > I've told it to that other bug, but no one seems to have followed that path. We applied this mwi patch on 3 pgsql servers, and seen great performance improvement. Using 3ware, 8 SAS HDD, Octocore (2x4) Xeon and 32GB RAM, on a custom 2.6.18 kernel. -- Laurent Laborde http://www.over-blog.com/
On Fri, 20 Mar 2009, M. Edward (Ed) Borasky wrote: > I just discovered this on a LinkedIn user group: > http://bugzilla.kernel.org/show_bug.cgi?id=12309 I would bet there's at least 3 different bugs in that one. That bug report got a lot of press via Slashdot a few months ago, and it's picked all sort of people who all have I/O wait issues, but they don't all have the same cause. The 3ware-specific problem Laurent mentioned is an example. That's not the same thing most of the people there are running into, the typical reporter there has disks attached directly to their motherboard. The irony here is that #12309 was a fork of #7372 to start over with a clean discussion slat because the same thing happened to that earlier one. The original problem reported there showed up in 2.6.20, so I've been able to avoid this whole thing by sticking to the stock RHEL5 kernel (2.6.18) on most of the production systems I deal with. (Except for my system with an Areca card--that one needs 2.6.22 or later to be stable, and seems to have no unexpected I/O wait issues. I think this is because it's taking over the lowest level I/O scheduling from Linux, when it pushes from the card's cache onto the disks). Some of the people there reported significant improvement by tuning the pdflush tunables; now that I've had to do a few times on systems to get rid of unexpected write lulls. I wrote up a walkthrough on one of them at http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html that goes over how to tell if you're running into that problem, and what to do about it; something else I wrote on that already made it into the bug report in comment #150. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
2009/3/22 Greg Smith <gsmith@gregsmith.com>: > On Fri, 20 Mar 2009, M. Edward (Ed) Borasky wrote: > >> I just discovered this on a LinkedIn user group: >> http://bugzilla.kernel.org/show_bug.cgi?id=12309 > > I would bet there's at least 3 different bugs in that one. That bug report > got a lot of press via Slashdot a few months ago, and it's picked all sort > of people who all have I/O wait issues, but they don't all have the same > cause. The 3ware-specific problem Laurent mentioned is an example. That's > not the same thing most of the people there are running into, the typical > reporter there has disks attached directly to their motherboard. The irony > here is that #12309 was a fork of #7372 to start over with a clean > discussion slat because the same thing happened to that earlier one. That I/O wait problem is not 3ware specific. A friend of mine has the same problem/fix with aacraid. I'd bet a couple coins that controllers that show this problem do not set mwi. quickly grepping linux sources (2.6.28.8) for pci_try_set_mwi: (only disks controllers showed here) 230:pata_cs5530.c 3442:sata_mv.c 2016:3w-9xxx.c 147:qla_init.c 2412:lpfc_init.c 171:cs5530.c > > The original problem reported there showed up in 2.6.20, so I've been able > to avoid this whole thing by sticking to the stock RHEL5 kernel (2.6.18) on > most of the production systems I deal with. (Except for my system with an > Areca card--that one needs 2.6.22 or later to be stable, and seems to have > no unexpected I/O wait issues. I think this is because it's taking over the > lowest level I/O scheduling from Linux, when it pushes from the card's cache > onto the disks). I thought about completely fair scheduler at first, but that one came in around 2.6.21. some tests were done with different I/O scheduler, and they do not seem to be the real cause of I/O wait. A bad interaction between hard raid cards cache and system willing the card to write at the same time could be a reason. unfortunately, I've met it with a now retired box at work, that was running a single disk plugged on the mobo controller. So, there's something else under the hood...but my (very) limited kernel knowledge can't help more here. > > Some of the people there reported significant improvement by tuning the > pdflush tunables; now that I've had to do a few times on systems to get rid > of unexpected write lulls. I wrote up a walkthrough on one of them at > http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html that > goes over how to tell if you're running into that problem, and what to do > about it; something else I wrote on that already made it into the bug report > in comment #150. I think that forcing the system to write down more often, and smaller data just hides the problem, and doesn't correct it. But well, that's just feeling, not science. I hope some real hacker will be able to spot the problem(s) so they can be fixed. anyway, I keep a couple coins on mwi as a source of problem :-) Regards, Laurent
On Mon, 23 Mar 2009, Laurent Wandrebeck wrote: > I thought about completely fair scheduler at first, but that one came > in around 2.6.21. CFS showed up in 2.6.23. > I think that forcing the system to write down more often, and smaller > data just hides the problem, and doesn't correct it. That's one possibility. I've been considering things like whether the OS is getting bogged down managing things like the elevator sorting for outstanding writes. If there was something about that process that gets really inefficient proportionally to the size of the pending queue, that would both match the kinds of symptoms people are reporting, and would go better just reducing the maximum size of the issue by lowering the pdflush tunables. Anyway, the point I was trying to make is that there sure seem to be multiple problems mixed into that one bug report, and it's starting to look just as unmanagably messy as the older bug that had to be abandoned. It would have been nice if somebody kicked out all the diversions it wanted into to keep the focus a bit better. Anybody using a SSD device, USB, or ext4 should have been punted to somewhere else for example. Plenty of examples that don't require any of those things. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD