Re: Spread checkpoint sync - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Spread checkpoint sync |
Date | |
Msg-id | 1295124798.3282.5.camel@ebony Whole thread Raw |
In response to | Re: Spread checkpoint sync (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
On Sat, 2011-01-15 at 09:15 -0500, Robert Haas wrote: > On Sat, Jan 15, 2011 at 8:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Sat, 2011-01-15 at 05:47 -0500, Greg Smith wrote: > >> Robert Haas wrote: > >> > On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith <greg@2ndquadrant.com> wrote: > >> > > >> > > One of the ideas Simon and I had been considering at one point was adding > >> > > some better de-duplication logic to the fsync absorb code, which I'm > >> > > reminded by the pattern here might be helpful independently of other > >> > > improvements. > >> > > > >> > > >> > Hopefully I'm not stepping on any toes here, but I thought this was an > >> > awfully good idea and had a chance to take a look at how hard it would > >> > be today while en route from point A to point B. The answer turned > >> > out to be "not very", so PFA a patch that seems to work. I tested it > >> > by attaching gdb to the background writer while running pgbench, and > >> > it eliminate the backend fsyncs without even breaking a sweat. > >> > > >> > >> No toe damage, this is great, I hadn't gotten to coding for this angle > >> yet at all. Suffering from an overload of ideas and (mostly wasted) > >> test data, so thanks for exploring this concept and proving it works. > > > > No toe damage either, but are we sure we want the de-duplication logic > > and in this place? > > > > I was originally of the opinion that de-duplicating the list would save > > time in the bgwriter, but that guess was wrong by about two orders of > > magnitude, IIRC. The extra time in the bgwriter wasn't even noticeable. > > Well, the point of this is not to save time in the bgwriter - I'm not > surprised to hear that wasn't noticeable. The point is that when the > fsync request queue fills up, backends start performing an fsync *for > every block they write*, and that's about as bad for performance as > it's possible to be. So it's worth going to a little bit of trouble > to try to make sure it doesn't happen. It didn't happen *terribly* > frequently before, but it does seem to be common enough to worry about > - e.g. on one occasion, I was able to reproduce it just by running > pgbench -i -s 25 or something like that on a laptop. > > With this patch applied, there's no performance impact vs. current > code in the very, very common case where space remains in the queue - > 999 times out of 1000, writing to the fsync queue will be just as fast > as ever. But in the unusual case where the queue has been filled up, > compacting the queue is much much faster than performing an fsync, and > the best part is that the compaction is generally massive. I was > seeing things like "4096 entries compressed to 14". So clearly even > if the compaction took as long as the fsync itself it would be worth > it, because the next 4000+ guys who come along again go through the > fast path. But in fact I think it's much faster than an fsync. > > In order to get pathological behavior even with this patch applied, > you'd need to have NBuffers pending fsync requests and they'd all have > to be different. I don't think that's theoretically impossible, but > Greg's research seems to indicate that even on busy systems we don't > come even a little bit close to the circumstances that would cause it > to occur in practice. Every other change we might make in this area > will further improve this case, too: for example, doing an absorb > after each fsync would presumably help, as would the more drastic step > of splitting the bgwriter into two background processes (one to do > background page cleaning, and the other to do checkpoints, for > example). But even without those sorts of changes, I think this is > enough to effectively eliminate the full fsync queue problem in > practice, which seems worth doing independently of anything else. You've persuaded me. -- Simon Riggs http://www.2ndQuadrant.com/books/PostgreSQL Development, 24x7 Support, Training and Services
pgsql-hackers by date: