Re: Group commit, revised - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Group commit, revised |
Date | |
Msg-id | CAEYLb_UqTYVZWUCRgSkVS99pRFdNaORSXaXnLXaUbpsTz-pumg@mail.gmail.com Whole thread Raw |
In response to | Re: Group commit, revised (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Responses |
Re: Group commit, revised
|
List | pgsql-hackers |
On 16 January 2012 08:11, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Impressive results. How about uploading the PDF to the community wiki? Sure. http://wiki.postgresql.org/wiki/Group_commit . > I think it might be simpler if it wasn't the background writer that's > responsible for "driving" the group commit queue, but the backends > themselves. When a flush request comes in, you join the queue, and if > someone else is already doing the flush, sleep until the driver wakes you > up. If no-one is doing the flush yet (ie. the queue is empty), start doing > it yourself. You'll need a state variable to keep track who's driving the > queue, but otherwise I think it would be simpler as there would be no > dependency on WAL writer. I think this replaces one problem with another. You've now effectively elevated a nominated backend to the status of an auxiliary process - do you intend to have the postmaster look after it, as with any other auxiliary process? I'm not sure that that is a more difficult problem to solve, but I suspect so. At least my proposal can have any one of the backends, both currently participating in group commit and yet to, wake up the WAL Writer. > I tend think of the group commit facility as a bus. Passengers can hop on > board at any time, and they take turns on who drives the bus. When the first > passengers hops in, there is no driver so he takes the driver seat. When the > next passenger hops in, he sees that someone is driving the bus already, so > he sits down, and places a big sign on his forehead stating the bus stop > where he wants to get off, and goes to sleep. When the driver has reached > his own bus stop, he wakes up all the passengers who wanted to get off at > the same stop or any of the earlier stops [1]. He also wakes up the > passenger whose bus stop is the farthest from the current stop, and gets off > the bus. The woken-up passengers who have already reached their stops can > immediately get off the bus, and the one who has not notices that no-one is > driving the bus anymore, so he takes the driver seat. > > [1] in a real bus, a passenger would not be happy if he's woken up too late > and finds himself at the next stop instead of the one where he wanted to go, > but for group commit, that is fine. > > In this arrangement, you could use the per-process semaphore for the > sleep/wakeups, instead of latches. I'm not sure if there's any difference, > but semaphores are more tried and tested, at least. Yes, and I expect that this won't be the last time someone uses a bus analogy in relation to this! The proposed patch is heavily based on sync rep, which I'd have imagined was more tried and tested than any proposed completely alternative implementation, as it is basically a generalisation of exactly the same principle, WAL Writer changes notwithstanding. I would have imagined that that aspect would be particularly approved of. > wal_writer_delay is still needed for controlling how often asynchronous > commits are flushed to disk. That had occurred to me of course, but has anyone ever actually tweaked wal_writer_delay with adjusting the behaviour of asynchronous commits in mind? I'm pretty sure that the answer is no. I have a slight preference for obsoleting it as a consequence of introducing group commit, but I don't think that it matters that much. >> Auxiliary processes cannot use group commit. The changes made prevent >> them from availing of commit_siblings/commit_delay parallelism, >> because it doesn't exist anymore. > > Auxiliary processes have PGPROC entries too. Why can't they participate? It was deemed to be a poor design decision to effectively create a dependency on the WAL Writer among other auxiliary processes, as to do so would perhaps compromise the way in which the postmaster notices and corrects isolated failures. Maybe I'll revisit that assessment, but I am not convinced that it's worth the very careful analysis of the implications of such an unprecedented dependency, without there being some obvious advantage. It it's a question of their being deprived of commit_siblings "group commit", well, we know from experience that people didn't tend to touch it a whole lot anyway. >> Group commit is sometimes throttled, which seems appropriate - if a >> backend requests that the WAL Writer flush an LSN deemed too far from >> the known flushed point, that request is rejected and the backend goes >> through another path, where XLogWrite() is called. > > Hmm, if the backend doing the big flush gets the WALWriteLock before a bunch > of group committers, the group committers will have to wait until the big > flush is finished, anyway. I presume the idea of the throttling is to avoid > the situation where a bunch of small commits need to wait for a huge flush > to finish. Exactly. Of course, you're never going to see that situation with pgbench. I don't have much data to inform exactly what the right trade-off is here, or some generic approximation of it across platforms and hardware - other people will know more about this than I do. While I have a general sense that the cost of flushing a single page of data is the same as flushing a relatively much larger amount of data, I cannot speak to much of an understanding of what that trade off might be for larger amounts of data, where the question of modelling some trade-off between throughput and latency arises, especially with all the baggage that the implementation carries such as whether or not we're using full_page_writes, hardware and so on. Something simple will probably work well. > Perhaps the big flusher should always join the queue, but use some heuristic > to first flush up to the previous commit request, to wake up others quickly, > and do another flush to flush its own request after that. Maybe, but we should decide what a big flusher looks like first. That way, if we can't figure out a way to do what you describe with it in time for 9.2, we can at least do what I'm already doing. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
pgsql-hackers by date: