Re: [HACKERS] Reducing sema usage (was Postmaster dies with many child processes) - Mailing list pgsql-hackers
From | The Hermit Hacker |
---|---|
Subject | Re: [HACKERS] Reducing sema usage (was Postmaster dies with many child processes) |
Date | |
Msg-id | Pine.BSF.4.05.9901302150410.13391-100000@thelab.hub.org Whole thread Raw |
In response to | Reducing sema usage (was Postmaster dies with many child processes) (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Sat, 30 Jan 1999, Tom Lane wrote: > I said: > > Another thing we ought to look at is changing the use of semaphores so > > that Postgres uses a fixed number of semaphores, not a number that > > increases as more and more backends are started. Kernels are > > traditionally configured with very low limits for the SysV IPC > > resources, so having a big appetite for semaphores is a Bad Thing. > > I've been looking into this issue today, and it looks possible but messy. > > The source of the problem is the lock manager > (src/backend/storage/lmgr/proc.c), which wants to be able to wake up a > specific process that is blocked on a lock. I had first thought that it > would be OK to wake up any one of the processes waiting for a lock, but > after looking at the lock manager that seems a bad idea --- considerable > thought has gone into the queuing order of waiting processes, and we > don't want to give that up. So we need to preserve this ability. > > The way it's currently done is that each extant backend has its own > SysV-style semaphore, and when you want to wake up a particular backend > you just V() its semaphore. (BTW, the semaphores get allocated in > chunks of 16, so an out-of-semaphores condition will always occur when > trying to start the 16*N+1'th backend...) This is simple and reliable > but fails if you want to have more backends than the kernel has SysV > semaphores. Unfortunately kernels are usually configured with not > very many semaphores --- 64 or so is typical. Also, running the system > down to nearly zero free semaphores is likely to cause problems for > other subsystems even if Postgres itself doesn't run out. > > What seems practical to do instead is this: > * At postmaster startup, allocate a fixed number of semaphores for > use by all child backends. ("Fixed" can really mean "configurable", > of course, but the point is we won't ask for more later.) > * The semaphores aren't dedicated to use by particular backends. > Rather, when a backend needs to block, it finds a currently free > semaphore and grabs it for the duration of its wait. The number > of the semaphore a backend is using to wait with would be recorded > in its PROC struct, and we'd also need an array of per-sema data > to keep track of free and in-use semaphores. > * This works with very little extra overhead until we have more > simultaneously-blocked backends than we have semaphores. When that > happens (which we hope is really seldom), we overload semaphores --- > that is, we use the same sema to block two or more backends. Then > the V() operation by the lock's releaser might wake the wrong backend. > So, we need an extra field in the LOCK struct to identify the intended > wake-ee. When a backend is released in ProcSleep, it has to look at > the lock it is waiting on to see if it is supposed to be wakened > right now. If not, it V()s its shared semaphore a second time (to > release the intended wakee), then P()s the semaphore again to go > back to sleep itself. There probably has to be a delay in here, > to ensure that the intended wakee gets woken and we don't have its > bed-mates indefinitely trading wakeups among the wrong processes. > This is why we don't want this scenario happening often. > > I think this could be made to work, but it would be a delicate and > hard-to-test change in what is already pretty subtle code. > > A considerably more straightforward approach is just to forget about > incremental allocation of semaphores and grab all we could need at > postmaster startup. ("OK, Mac, you told me to allow up to N backends? > Fine, I'm going to grab N semaphores at startup, and if I can't get them > I won't play.") This would force the DB admin to either reconfigure the > kernel or reduce MaxBackendId to something the kernel can support right > off the bat, rather than allowing the problem to lurk undetected until > too many clients are started simultaneously. (Note there are still > potential gotchas with running out of processes, swap space, or file > table slots, so we wouldn't have really guaranteed that N backends can > be started safely.) > > If we make MaxBackendId settable from a postmaster command-line switch > then this second approach is probably not too inconvenient, though it > surely isn't pretty. > > Any thoughts about which way to jump? I'm sort of inclined to take > the simpler approach myself... I'm inclined to agree...get rid of the 'hard coded' max, make it a settable option on run time, and 'reserve the semaphores' on startup... Marc G. Fournier Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
pgsql-hackers by date: