Reducing sema usage (was Postmaster dies with many child processes) - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Reducing sema usage (was Postmaster dies with many child processes) |
Date | |
Msg-id | 5737.917741514@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [HACKERS] Postmaster dies with many child processes (spinlock/semget failed) (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Reducing sema usage (was Postmaster dies with many child processes)
Re: [HACKERS] Reducing sema usage (was Postmaster dies with many child processes) Re: [HACKERS] Reducing sema usage (was Postmaster dies with many child processes) |
List | pgsql-hackers |
I said: > Another thing we ought to look at is changing the use of semaphores so > that Postgres uses a fixed number of semaphores, not a number that > increases as more and more backends are started. Kernels are > traditionally configured with very low limits for the SysV IPC > resources, so having a big appetite for semaphores is a Bad Thing. I've been looking into this issue today, and it looks possible but messy. The source of the problem is the lock manager (src/backend/storage/lmgr/proc.c), which wants to be able to wake up a specific process that is blocked on a lock. I had first thought that it would be OK to wake up any one of the processes waiting for a lock, but after looking at the lock manager that seems a bad idea --- considerable thought has gone into the queuing order of waiting processes, and we don't want to give that up. So we need to preserve this ability. The way it's currently done is that each extant backend has its own SysV-style semaphore, and when you want to wake up a particular backend you just V() its semaphore. (BTW, the semaphores get allocated in chunks of 16, so an out-of-semaphores condition will always occur when trying to start the 16*N+1'th backend...) This is simple and reliable but fails if you want to have more backends than the kernel has SysV semaphores. Unfortunately kernels are usually configured with not very many semaphores --- 64 or so is typical. Also, running the system down to nearly zero free semaphores is likely to cause problems for other subsystems even if Postgres itself doesn't run out. What seems practical to do instead is this: * At postmaster startup, allocate a fixed number of semaphores for use by all child backends. ("Fixed" can really mean "configurable",of course, but the point is we won't ask for more later.) * The semaphores aren't dedicated to use by particular backends. Rather, when a backend needs to block, it finds a currentlyfree semaphore and grabs it for the duration of its wait. The number of the semaphore a backend is using to waitwith would be recorded in its PROC struct, and we'd also need an array of per-sema data to keep track of free and in-usesemaphores. * This works with very little extra overhead until we have more simultaneously-blocked backends than we have semaphores. When that happens (which we hope is really seldom), we overload semaphores --- that is, we use the same semato block two or more backends. Then the V() operation by the lock's releaser might wake the wrong backend. So, we needan extra field in the LOCK struct to identify the intended wake-ee. When a backend is released in ProcSleep, it hasto look at the lock it is waiting on to see if it is supposed to be wakened right now. If not, it V()s its shared semaphorea second time (to release the intended wakee), then P()s the semaphore again to go back to sleep itself. Thereprobably has to be a delay in here, to ensure that the intended wakee gets woken and we don't have its bed-mates indefinitelytrading wakeups among the wrong processes. This is why we don't want this scenario happening often. I think this could be made to work, but it would be a delicate and hard-to-test change in what is already pretty subtle code. A considerably more straightforward approach is just to forget about incremental allocation of semaphores and grab all we could need at postmaster startup. ("OK, Mac, you told me to allow up to N backends? Fine, I'm going to grab N semaphores at startup, and if I can't get them I won't play.") This would force the DB admin to either reconfigure the kernel or reduce MaxBackendId to something the kernel can support right off the bat, rather than allowing the problem to lurk undetected until too many clients are started simultaneously. (Note there are still potential gotchas with running out of processes, swap space, or file table slots, so we wouldn't have really guaranteed that N backends can be started safely.) If we make MaxBackendId settable from a postmaster command-line switch then this second approach is probably not too inconvenient, though it surely isn't pretty. Any thoughts about which way to jump? I'm sort of inclined to take the simpler approach myself... regards, tom lane
pgsql-hackers by date: