Re: Why we are going to have to go DirectIO - Mailing list pgsql-hackers
From | Jonathan Corbet |
---|---|
Subject | Re: Why we are going to have to go DirectIO |
Date | |
Msg-id | 20131204133139.5dad25c9@lwn.net Whole thread Raw |
In response to | Re: Why we are going to have to go DirectIO (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: Why we are going to have to go DirectIO
Re: Why we are going to have to go DirectIO |
List | pgsql-hackers |
On Wed, 04 Dec 2013 11:07:04 -0800 Josh Berkus <josh@agliodbs.com> wrote: > On 12/04/2013 07:33 AM, Jonathan Corbet wrote: > > Wow, Josh, I'm surprised to hear this from you. > > Well, I figured it was too angry to propose for an LWN article. ;-) So you're going to make us write it for you :) > > The active/inactive list mechanism works great for the vast majority of > > users. The second-use algorithm prevents a lot of pathological behavior, > > like wiping out your entire cache by copying a big file or running a > > backup. We *need* that kind of logic in the kernel. > > There's a large body of research on 2Q algorithms going back to the 80s, > which is what this is. As far as I can tell, the modification was > performed without any reading of this research, since that would have > easily shown that 50/50 was unlikely to be a good division, and that in > fact there is nothing which would work except a tunable setting, because > workloads are different. In general, the movement of useful information between academia and real-world programming seems to be minimal at best. Neither side seems to find much that is useful or interesting in what the other is doing. Unfortunate. For those interested in the details... (1) It's not quite 50/50, that's one bound for how the balance is allowed to go. (2) Anybody trying to add tunables to the kernel tends to run into resistance. Exposing thousands of knobs tends to lead to a situation where you *have* to be an expert on all those knobs to get decent behavior out of your system. So there is a big emphasis on having the kernel tune itself whenever possible. Here is a situation where that is not always happening, but a fix (which introduces no knob) is in the works. As an example, I've never done much with the PostgreSQL knobs on the LWN server. I just don't have the time to mess with it, and things Work Well Enough. </irrelevant_aside> > However, this particular issue concerns me less than the general > attitude that it's OK to push in experimental IO changes which can't be > disabled by users into release kernels, as exemplified by several > problematic and inadequately tested IO changes in the 3.X kernels -- > most notably the pdflush bug. It speaks of a policy that the Linux IO > stack is not production software, and it's OK to tinker with it in ways > that break things for many users. Bugs and regressions happen, and I won't say that we do a good enough job in that regard. There has been some concern recently that we're accepting too much marginal stuff. We have problems getting enough people to adequately review code — I think I've heard of another project or two with similar issues :). But nobody sees the kernel as experimental or feels that the introduction of bugs is an acceptable thing. > I also wasn't exaggerating the reception I got when I tried to talk > about IO and PostgreSQL at LinuxCon and other events. The majority of > Linux hackers I've talked to simply don't want to be bothered with > PostgreSQL's performance needs, and I've heard similar things from my > collegues at the MySQL variants. Greg KH was the only real exception. > > Heck, I went to a meeting of filesystem geeks at LinuxCon and the main > feedback I received, from Linux FS developers (Chris and Ted), was > "PostgreSQL should implement its own storage and use DirectIO, we don't > know why you're even trying to use the Linux IO stack." I think you're talking to the wrong people. Nothing you've described is a filesystem problem; you're contending with memory management problems. Chris and Ted weren't helpful because there's actually little they can do to help you. I would be happy to introduce you to some people who would be more likely to take your problems to heart. Mel Gorman, for example, is working on putting together a set of MM benchmarks in the hopes of quantifying changes and catching regressions before new code is merged. He's one of the people who has to deal with performance regressions when they show up in enterprise kernels, and I get the sense he'd rather do less of that. Perhaps even better: the next filesystem, storage, and memory management summit is March 24-25. A session on your pain points there would bring in a substantial portion of the relevant developers at all levels. LSFMM is arguably the most productive kernel event I see over the course of a year; it's where I would go first to make progress on this issue. I'm not an LSFMM organizer, but I would be happy to work to make such a session happen if somebody from the PostgreSQL community wanted to be there. > > This code has been a bit slow getting into the mainline for a few reasons, > > but one of the chief ones is this: nobody is saying from the sidelines > > that they need it! If somebody were saying "Postgres would work a lot > > better with this code in place" and had some numbers to demonstrate that, > > we'd be far more likely to see it get into an upcoming release. > > Well, Citus did that; do you need more evidence? Yes, they did that — one week ago. This patch has been in the works for almost two years. And Citus has not taken anything to the kernel community, so somebody else will have to do that for them. I might be able to help in that regard. > In addition to testing, though, I have yet to find a way to learn about > new changes to IO or memory performance in the Linux Kernel without > reading all of the traffic on LKML and all Linux commit messages and > filtering them myself. If there were a better way to look for this > information, Linux would be more likely to get feedback in a timely > fashion. And yeah, I know that Postgres has the same issue. Gee, if only there were a web site where one could read about changes to the Linux kernel :) Seriously, though, one of the best things to do would be to make a point of picking up a kernel around -rc3 (right around now, say, for 3.13) and running a few benchmarks on it. If you report a performance regression at that stage, it will get attention. Thanks, jon
pgsql-hackers by date: