Thread: slow startup due to LWLockAssign() spinlock
Hi, On larger, multi-socket, machines, startup takes a fair bit of time. As I was profiling anyway I looked into it and noticed that just about all of it is spent in LWLockAssign() called by InitBufferPool(). Starting with shared_buffers=48GB on the server Nate Boley provided, takes about 12 seconds. Nearly all of it spent taking the ShmemLock spinlock. Simply modifying LWLockAssign() to not take the spinlock when !IsUnderPostmaster speeds it up to 2 seconds. While certainly not making LWLockAssign() prettier it seems enough of a speedup to be worthwile nonetheless. Since this code is also hit when do an emergency restart, I'd say it has practical relevance... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Andres Freund <andres@2ndquadrant.com> writes: > On larger, multi-socket, machines, startup takes a fair bit of time. As > I was profiling anyway I looked into it and noticed that just about all > of it is spent in LWLockAssign() called by InitBufferPool(). Starting > with shared_buffers=48GB on the server Nate Boley provided, takes about > 12 seconds. Nearly all of it spent taking the ShmemLock spinlock. > Simply modifying LWLockAssign() to not take the spinlock when > !IsUnderPostmaster speeds it up to 2 seconds. While certainly not making > LWLockAssign() prettier it seems enough of a speedup to be worthwile > nonetheless. Hm. This patch only works if the postmaster itself never assigns any LWLocks except during startup. That's *probably* all right, but it seems a bit scary. Is there any cheap way to make the logic actually be what your comment claims, namely "Interlocking is not necessary during postmaster startup"? I guess we could invent a ShmemInitInProgress global flag ... regards, tom lane
On 2014-02-03 11:22:45 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On larger, multi-socket, machines, startup takes a fair bit of time. As > > I was profiling anyway I looked into it and noticed that just about all > > of it is spent in LWLockAssign() called by InitBufferPool(). Starting > > with shared_buffers=48GB on the server Nate Boley provided, takes about > > 12 seconds. Nearly all of it spent taking the ShmemLock spinlock. > > Simply modifying LWLockAssign() to not take the spinlock when > > !IsUnderPostmaster speeds it up to 2 seconds. While certainly not making > > LWLockAssign() prettier it seems enough of a speedup to be worthwile > > nonetheless. > > Hm. This patch only works if the postmaster itself never assigns any > LWLocks except during startup. That's *probably* all right, but it > seems a bit scary. Is there any cheap way to make the logic actually > be what your comment claims, namely "Interlocking is not necessary during > postmaster startup"? I guess we could invent a ShmemInitInProgress global > flag ... I'd be fine with inventing such a flag, I couldn't find one and decided that this alone didn't merit it, since it seems to be really unlikely that we will start to allocate such resources after startup in the postmaster. Unless we're talking about single user mode obviously, but the spinlock isn't necessary there anyway. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-02-03 11:22:45 -0500, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On larger, multi-socket, machines, startup takes a fair bit of time. As > > I was profiling anyway I looked into it and noticed that just about all > > of it is spent in LWLockAssign() called by InitBufferPool(). Starting > > with shared_buffers=48GB on the server Nate Boley provided, takes about > > 12 seconds. Nearly all of it spent taking the ShmemLock spinlock. > > Simply modifying LWLockAssign() to not take the spinlock when > > !IsUnderPostmaster speeds it up to 2 seconds. While certainly not making > > LWLockAssign() prettier it seems enough of a speedup to be worthwile > > nonetheless. > > Hm. This patch only works if the postmaster itself never assigns any > LWLocks except during startup. That's *probably* all right, but it > seems a bit scary. Is there any cheap way to make the logic actually > be what your comment claims, namely "Interlocking is not necessary during > postmaster startup"? I guess we could invent a ShmemInitInProgress global > flag ... So, here's a flag implementing things with that flag. I kept your name, as it's more in line with ipci.c's naming, but it looks kinda odd besides proc_exit_inprogress. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Tue, Feb 4, 2014 at 12:58:49AM +0100, Andres Freund wrote: > On 2014-02-03 11:22:45 -0500, Tom Lane wrote: > > Andres Freund <andres@2ndquadrant.com> writes: > > > On larger, multi-socket, machines, startup takes a fair bit of time. As > > > I was profiling anyway I looked into it and noticed that just about all > > > of it is spent in LWLockAssign() called by InitBufferPool(). Starting > > > with shared_buffers=48GB on the server Nate Boley provided, takes about > > > 12 seconds. Nearly all of it spent taking the ShmemLock spinlock. > > > Simply modifying LWLockAssign() to not take the spinlock when > > > !IsUnderPostmaster speeds it up to 2 seconds. While certainly not making > > > LWLockAssign() prettier it seems enough of a speedup to be worthwile > > > nonetheless. > > > > Hm. This patch only works if the postmaster itself never assigns any > > LWLocks except during startup. That's *probably* all right, but it > > seems a bit scary. Is there any cheap way to make the logic actually > > be what your comment claims, namely "Interlocking is not necessary during > > postmaster startup"? I guess we could invent a ShmemInitInProgress global > > flag ... > > So, here's a flag implementing things with that flag. I kept your name, > as it's more in line with ipci.c's naming, but it looks kinda odd > besides proc_exit_inprogress. Uh, where are we on this? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 2014-04-16 19:33:52 -0400, Bruce Momjian wrote: > On Tue, Feb 4, 2014 at 12:58:49AM +0100, Andres Freund wrote: > > On 2014-02-03 11:22:45 -0500, Tom Lane wrote: > > > Andres Freund <andres@2ndquadrant.com> writes: > > > > On larger, multi-socket, machines, startup takes a fair bit of time. As > > > > I was profiling anyway I looked into it and noticed that just about all > > > > of it is spent in LWLockAssign() called by InitBufferPool(). Starting > > > > with shared_buffers=48GB on the server Nate Boley provided, takes about > > > > 12 seconds. Nearly all of it spent taking the ShmemLock spinlock. > > > > Simply modifying LWLockAssign() to not take the spinlock when > > > > !IsUnderPostmaster speeds it up to 2 seconds. While certainly not making > > > > LWLockAssign() prettier it seems enough of a speedup to be worthwile > > > > nonetheless. > > > > > > Hm. This patch only works if the postmaster itself never assigns any > > > LWLocks except during startup. That's *probably* all right, but it > > > seems a bit scary. Is there any cheap way to make the logic actually > > > be what your comment claims, namely "Interlocking is not necessary during > > > postmaster startup"? I guess we could invent a ShmemInitInProgress global > > > flag ... > > > > So, here's a flag implementing things with that flag. I kept your name, > > as it's more in line with ipci.c's naming, but it looks kinda odd > > besides proc_exit_inprogress. > > Uh, where are we on this? I guess it's waiting for the next CF :(. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 04/17/2014 12:06 PM, Andres Freund wrote: > On 2014-04-16 19:33:52 -0400, Bruce Momjian wrote: >> On Tue, Feb 4, 2014 at 12:58:49AM +0100, Andres Freund wrote: >>> On 2014-02-03 11:22:45 -0500, Tom Lane wrote: >>>> Andres Freund <andres@2ndquadrant.com> writes: >>>>> On larger, multi-socket, machines, startup takes a fair bit of time. As >>>>> I was profiling anyway I looked into it and noticed that just about all >>>>> of it is spent in LWLockAssign() called by InitBufferPool(). Starting >>>>> with shared_buffers=48GB on the server Nate Boley provided, takes about >>>>> 12 seconds. Nearly all of it spent taking the ShmemLock spinlock. >>>>> Simply modifying LWLockAssign() to not take the spinlock when >>>>> !IsUnderPostmaster speeds it up to 2 seconds. While certainly not making >>>>> LWLockAssign() prettier it seems enough of a speedup to be worthwile >>>>> nonetheless. >>>> >>>> Hm. This patch only works if the postmaster itself never assigns any >>>> LWLocks except during startup. That's *probably* all right, but it >>>> seems a bit scary. Is there any cheap way to make the logic actually >>>> be what your comment claims, namely "Interlocking is not necessary during >>>> postmaster startup"? I guess we could invent a ShmemInitInProgress global >>>> flag ... >>> >>> So, here's a flag implementing things with that flag. I kept your name, >>> as it's more in line with ipci.c's naming, but it looks kinda odd >>> besides proc_exit_inprogress. >> >> Uh, where are we on this? > > I guess it's waiting for the next CF :(. Now that we have LWLock tranches in 9.4, it might be cleanest to have the buffer manager allocate a separate tranche for the buffer locks. We could also save some memory if we got rid of the LWLock pointers in BufferDesc altogether, and just used the buffer id as an index into the LWLock array (we could do that without tranches too, but would have to assume that the lock ids returned by LWLockAssign() are a contiguous range). Another idea is to add an LWLockAssignBatch(int) function that assigns a range of locks in one call. That would be very simple, and I think it would be less likely to break things than a new global flag. I would be OK with sneaking that into 9.4 still. - Heikki
On 2014-04-24 15:56:45 +0300, Heikki Linnakangas wrote: > On 04/17/2014 12:06 PM, Andres Freund wrote: > >On 2014-04-16 19:33:52 -0400, Bruce Momjian wrote: > >>On Tue, Feb 4, 2014 at 12:58:49AM +0100, Andres Freund wrote: > >>>On 2014-02-03 11:22:45 -0500, Tom Lane wrote: > >>>>Andres Freund <andres@2ndquadrant.com> writes: > >>>>>On larger, multi-socket, machines, startup takes a fair bit of time. As > >>>>>I was profiling anyway I looked into it and noticed that just about all > >>>>>of it is spent in LWLockAssign() called by InitBufferPool(). Starting > >>>>>with shared_buffers=48GB on the server Nate Boley provided, takes about > >>>>>12 seconds. Nearly all of it spent taking the ShmemLock spinlock. > >>>>>Simply modifying LWLockAssign() to not take the spinlock when > >>>>>!IsUnderPostmaster speeds it up to 2 seconds. While certainly not making > >>>>>LWLockAssign() prettier it seems enough of a speedup to be worthwile > >>>>>nonetheless. > >>>> > >>>>Hm. This patch only works if the postmaster itself never assigns any > >>>>LWLocks except during startup. That's *probably* all right, but it > >>>>seems a bit scary. Is there any cheap way to make the logic actually > >>>>be what your comment claims, namely "Interlocking is not necessary during > >>>>postmaster startup"? I guess we could invent a ShmemInitInProgress global > >>>>flag ... > >>> > >>>So, here's a flag implementing things with that flag. I kept your name, > >>>as it's more in line with ipci.c's naming, but it looks kinda odd > >>>besides proc_exit_inprogress. > >> > >>Uh, where are we on this? > > > >I guess it's waiting for the next CF :(. > > Now that we have LWLock tranches in 9.4, it might be cleanest to have the > buffer manager allocate a separate tranche for the buffer locks. We could > also save some memory if we got rid of the LWLock pointers in BufferDesc > altogether, and just used the buffer id as an index into the LWLock array > (we could do that without tranches too, but would have to assume that the > lock ids returned by LWLockAssign() are a contiguous range). I tried that, and it's nontrivial from a performance POV because it influences how a buffer descriptor fits into cacheline(s). I think this needs significant experimentation. My experimentation hinted that it'd be a good idea to put the content lwlock inline, but the io one not since it's accessed much less frequently. IIRC I could fit the remainder of the buffer descriptor into one cacheline after putting the io locks into a separate array. I wonder if we can't somehow get rid of the io locks entirely... > Another idea is to add an LWLockAssignBatch(int) function that assigns a > range of locks in one call. That would be very simple, and I think it would > be less likely to break things than a new global flag. I would be OK with > sneaking that into 9.4 still. I don't really see the advantage tbh. Assuming we always can avoid the spinlock initially seems simple enough - and I have significant doubts that anything but buffer locks will need enough locks that it matters for other users. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-04-24 15:56:45 +0300, Heikki Linnakangas wrote: >> Another idea is to add an LWLockAssignBatch(int) function that assigns a >> range of locks in one call. That would be very simple, and I think it would >> be less likely to break things than a new global flag. I would be OK with >> sneaking that into 9.4 still. > I don't really see the advantage tbh. Assuming we always can avoid the > spinlock initially seems simple enough - and I have significant doubts > that anything but buffer locks will need enough locks that it matters > for other users. FWIW, I like the LWLockAssignBatch idea a lot better than the currently proposed patch. LWLockAssign is a low-level function that has no business making risky assumptions about the context it's invoked in. The other ideas are 9.5 material at this point, since they involve research --- but I agree with Heikki that LWLockAssignBatch could be snuck into 9.4 still. regards, tom lane
On 2014-04-24 11:02:44 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2014-04-24 15:56:45 +0300, Heikki Linnakangas wrote: > >> Another idea is to add an LWLockAssignBatch(int) function that assigns a > >> range of locks in one call. That would be very simple, and I think it would > >> be less likely to break things than a new global flag. I would be OK with > >> sneaking that into 9.4 still. > > > I don't really see the advantage tbh. Assuming we always can avoid the > > spinlock initially seems simple enough - and I have significant doubts > > that anything but buffer locks will need enough locks that it matters > > for other users. > > FWIW, I like the LWLockAssignBatch idea a lot better than the currently > proposed patch. LWLockAssign is a low-level function that has no business > making risky assumptions about the context it's invoked in. I don't think LWLockAssignBatch() is that easy without introducing layering violations. It can't just return a pointer out of the main lwlock array that then can be ++ed clientside because MainLWLockArray's stride isn't sizeof(LWLock). We could just add a LWLockAssignStartup(), that'd be pretty trivial. Whoever uses it later gets to keep the pieces... I guess if it's not that, the whole thing is 9.5 material. Once those locks are in a separate tranche the whole thing is moot anyway. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 04/24/2014 07:24 PM, Andres Freund wrote: > On 2014-04-24 11:02:44 -0400, Tom Lane wrote: >> Andres Freund <andres@2ndquadrant.com> writes: >>> On 2014-04-24 15:56:45 +0300, Heikki Linnakangas wrote: >>>> Another idea is to add an LWLockAssignBatch(int) function that assigns a >>>> range of locks in one call. That would be very simple, and I think it would >>>> be less likely to break things than a new global flag. I would be OK with >>>> sneaking that into 9.4 still. >> >>> I don't really see the advantage tbh. Assuming we always can avoid the >>> spinlock initially seems simple enough - and I have significant doubts >>> that anything but buffer locks will need enough locks that it matters >>> for other users. >> >> FWIW, I like the LWLockAssignBatch idea a lot better than the currently >> proposed patch. LWLockAssign is a low-level function that has no business >> making risky assumptions about the context it's invoked in. > > I don't think LWLockAssignBatch() is that easy without introducing > layering violations. It can't just return a pointer out of the main > lwlock array that then can be ++ed clientside because MainLWLockArray's > stride isn't sizeof(LWLock). Well, it could copy the pointers to an array of pointers that the caller provides. Or palloc an array and return that. Allocating a large enough array to hold NBuffers locks might not be nice, but if you do it in batches of, say, 1k locks, that would make it fast enough. Makes the caller a bit more complicated, but still might be worth it. - Heikki
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-04-24 11:02:44 -0400, Tom Lane wrote: >> FWIW, I like the LWLockAssignBatch idea a lot better than the currently >> proposed patch. LWLockAssign is a low-level function that has no business >> making risky assumptions about the context it's invoked in. > I don't think LWLockAssignBatch() is that easy without introducing > layering violations. It can't just return a pointer out of the main > lwlock array that then can be ++ed clientside because MainLWLockArray's > stride isn't sizeof(LWLock). Meh. I knew this business of using pointers instead of indexes would have some downsides. We could return the array stride ... kinda ugly, but since there's probably only one consumer for this API, it's not *that* bad. Could wrap the stride-increment in a macro, perhaps. regards, tom lane
On 2014-04-24 12:43:13 -0400, Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On 2014-04-24 11:02:44 -0400, Tom Lane wrote: > >> FWIW, I like the LWLockAssignBatch idea a lot better than the currently > >> proposed patch. LWLockAssign is a low-level function that has no business > >> making risky assumptions about the context it's invoked in. > > > I don't think LWLockAssignBatch() is that easy without introducing > > layering violations. It can't just return a pointer out of the main > > lwlock array that then can be ++ed clientside because MainLWLockArray's > > stride isn't sizeof(LWLock). > > Meh. I knew this business of using pointers instead of indexes would > have some downsides. > > We could return the array stride ... kinda ugly, but since there's > probably only one consumer for this API, it's not *that* bad. Could > wrap the stride-increment in a macro, perhaps. I think I am just going to wait for 9.5 where I sure hope we can allocate the buffer lwlocks outside the main array... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-04-24 23:28:14 +0200, Andres Freund wrote: > On 2014-04-24 12:43:13 -0400, Tom Lane wrote: > > Andres Freund <andres@2ndquadrant.com> writes: > > > On 2014-04-24 11:02:44 -0400, Tom Lane wrote: > > >> FWIW, I like the LWLockAssignBatch idea a lot better than the currently > > >> proposed patch. LWLockAssign is a low-level function that has no business > > >> making risky assumptions about the context it's invoked in. > > > > > I don't think LWLockAssignBatch() is that easy without introducing > > > layering violations. It can't just return a pointer out of the main > > > lwlock array that then can be ++ed clientside because MainLWLockArray's > > > stride isn't sizeof(LWLock). > > > > Meh. I knew this business of using pointers instead of indexes would > > have some downsides. > > > > We could return the array stride ... kinda ugly, but since there's > > probably only one consumer for this API, it's not *that* bad. Could > > wrap the stride-increment in a macro, perhaps. > > I think I am just going to wait for 9.5 where I sure hope we can > allocate the buffer lwlocks outside the main array... For reference (and backup), here's my current patch for that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services