Re: CLOG contention - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: CLOG contention |
Date | |
Msg-id | CA+U5nM+8wNw2H6sKNLwFfj2Zvy_yXLd6s8-YgGs1mhhw-ZDoUw@mail.gmail.com Whole thread Raw |
In response to | Re: CLOG contention (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: CLOG contention
|
List | pgsql-hackers |
On Wed, Dec 21, 2011 at 7:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I am not arguing This seems like a normal and cool technical discussion to me here. > I'm merely saying > that the specific plan of having multiple SLRUs for CLOG doesn't > appeal to me -- mostly because I think it will make life difficult for > pg_upgrade without any compensating advantage. If we're going to go > that route, I'd rather build something into the SLRU machinery > generally that allows for the cache to be less than fully-associative, > with all of the savings in terms of lock contention that this entails. > Such a system could be used by any SLRU, not just CLOG, if it proved > to be helpful; and it would avoid any on-disk changes, with, as far as > I can see, basically no downside. Partitioning will give us more buffers and more LWlocks, to spread the contention when we access the buffers. I use that word because its what we call the technique already used in the buffer manager and lock manager. If you wish to call this "less than fully-associative" I really don't mind, as long as we're discussing the same overall concept, so we can then focus on an implementation of that concept, which no doubt has many ways of doing it. More buffers per lock does reduce the lock contention somewhat, but not by much. So for me, it seems essential that we have more LWlocks to solve the problem, which is where partitioning comes in. My perspective is that there is clog contention in many places, not just in the ones you identified. Main places I see are: * Access to older pages (identified by you upthread). More buffers addresses this problem. * Committing requires us to hold exclusive lock on a page, so there is contention from nearly all sessions for the same page. The only way to solve that is by striping pages, so that one page in the current clog architecture would be striped across N pages with consecutive xids in separate partitions. Notably this addresses Tom's concern that there is a much higher request rate on very recent pages - each page would be split into N pages, so reducing contention. * We allocate a new clog page every 32k xids. At the rates you have now measured, we will do this every 1-2 seconds. When we do this, we must allocate a new page, which means writing the LRU page, which will be dirty, since we fill 8 buffers in 16 seconds (or even 32 buffers in about a minute), yet only flush buffers at checkpoint every 5 minutes. We then need to write an XLogRecord for the new page. All of that happens while we have the XidGenLock held. Also, while this is happening nothing can commit, or check clog. That causes nearly all work to halt for about a second, perhaps longer while the traffic queue clears. More obvious when writing to logged tables, since the XLogInsert for the new clog page is then very badly contended. If we partition then we will be able to continue accessing most of the clog pages. So I think we need * more buffers * clog page striping * partitioning And I would characterise what I am suggesting as "partitioning + striping" with the free benefit that we increase the number of buffers as well via partitioning. With all of that in mind, its relatively easy to rewrite the clog code so we allocate N SLRUs rather than just 1. That means we just touch the clog code. Striping adjacent xids onto separate pages in other ways would gut the SLRU code. We could just partition but then won't address Tom's concern, as you say. That is based upon code analysis and hacking something together while thinking - if it helps discussion I post that hack here, but its not working yet. I don't think reusing code from bufmgr/lockmgr would help either. Yes, you're right that I'm suggesting we change the clog data structures and that therefore we'd need to change pg_upgrade as well. But that seems like a relatively simple piece of code given the clear mapping between old and new structures. It would be able to run quickly at upgrade time. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: