Re: Why we are going to have to go DirectIO - Mailing list pgsql-hackers
From | Josh Berkus |
---|---|
Subject | Re: Why we are going to have to go DirectIO |
Date | |
Msg-id | 529F7D58.1060301@agliodbs.com Whole thread Raw |
In response to | Why we are going to have to go DirectIO (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: Why we are going to have to go DirectIO
Re: Why we are going to have to go DirectIO |
List | pgsql-hackers |
On 12/04/2013 07:33 AM, Jonathan Corbet wrote: > Wow, Josh, I'm surprised to hear this from you. Well, I figured it was too angry to propose for an LWN article. ;-) > The active/inactive list mechanism works great for the vast majority of > users. The second-use algorithm prevents a lot of pathological behavior, > like wiping out your entire cache by copying a big file or running a > backup. We *need* that kind of logic in the kernel. There's a large body of research on 2Q algorithms going back to the 80s, which is what this is. As far as I can tell, the modification was performed without any reading of this research, since that would have easily shown that 50/50 was unlikely to be a good division, and that in fact there is nothing which would work except a tunable setting, because workloads are different. Certainly the "what happens if a single file is larger than the entire recency bucket" question is addressed and debated. As an example, PostgreSQL would want to shrink the frequency list to 0%, because we already implement our own frequency list, and we already demonstrated back in version 8.1 that a 3-list system was ineffective. I can save Johannes some time: don't implement ARC. Not only is it under IBM patent, it's not effective in real-world situations. Both Postgres and Apache tried it in the early aughts. However, this particular issue concerns me less than the general attitude that it's OK to push in experimental IO changes which can't be disabled by users into release kernels, as exemplified by several problematic and inadequately tested IO changes in the 3.X kernels -- most notably the pdflush bug. It speaks of a policy that the Linux IO stack is not production software, and it's OK to tinker with it in ways that break things for many users. I also wasn't exaggerating the reception I got when I tried to talk about IO and PostgreSQL at LinuxCon and other events. The majority of Linux hackers I've talked to simply don't want to be bothered with PostgreSQL's performance needs, and I've heard similar things from my collegues at the MySQL variants. Greg KH was the only real exception. Heck, I went to a meeting of filesystem geeks at LinuxCon and the main feedback I received, from Linux FS developers (Chris and Ted), was "PostgreSQL should implement its own storage and use DirectIO, we don't know why you're even trying to use the Linux IO stack." That's why I gave up on working through community channels; I face enough uphill battles in *this* project. > This code has been a bit slow getting into the mainline for a few reasons, > but one of the chief ones is this: nobody is saying from the sidelines > that they need it! If somebody were saying "Postgres would work a lot > better with this code in place" and had some numbers to demonstrate that, > we'd be far more likely to see it get into an upcoming release. Well, Citus did that; do you need more evidence? > In the end, Linux is quite responsive to the people who participate in its > development, even as testers and bug reporters. It responds rather less > well to people who find problems in enterprise kernels years later, > granted. All infrastructure software, including Postgres, has the issue that most enterprise users are using a version which was released years ago. As a result, some performance issues simply aren't going to be found until that version has been out for a couple of years. This leads to a Catch-22: enterprise users are reluctant to upgrade because of potential performance regressions, and as a result the median "enterprise" version gets further and further behind current development, and as a result the performance regressions are never fixed. We encounter this in PostgreSQL (I have customers who are still on 8.4 or 9.1 because of specific regressions), and it's even worse in the Linux world, where RHEL is still on 2.6. We work really hard to avoid performance regressions in Postgres versions, because we know we can't test for them adequately, and often can't fix them in release versions after the fact. But you know what? 2.6, overall, still performs better than any kernel in the 3.X series, at least for Postgres. > The amount of automated testing, including performance testing, has > increased markedly in the last couple of years. I bet that it would not > be hard at all to get somebody like Fengguang Wu to add some > Postgres-oriented I/O tests to his automatic suite: > > https://lwn.net/Articles/571991/ > > Then we would all have a much better idea of how kernel releases are > affecting one of our most important applications; developers would pay > attention to that information. Oh, good! I was working with Greg on having an automated pgBench run, but doing it on Wu's testing platform would be even better. I still need to get some automated stats digestion, since I want to at least make sure that the tests would show the three major issues which we encountered in recent Linux kernels so far. Of course, I have a "free time" issue, which is being discussed on the other fork of this thread. In addition to testing, though, I have yet to find a way to learn about new changes to IO or memory performance in the Linux Kernel without reading all of the traffic on LKML and all Linux commit messages and filtering them myself. If there were a better way to look for this information, Linux would be more likely to get feedback in a timely fashion. And yeah, I know that Postgres has the same issue. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
pgsql-hackers by date: