Re: [HACKERS] Reducing sema usage (was Postmaster dies with many child processes) - Mailing list pgsql-hackers

From The Hermit Hacker
Subject Re: [HACKERS] Reducing sema usage (was Postmaster dies with many child processes)
Date
Msg-id Pine.BSF.4.05.9901302150410.13391-100000@thelab.hub.org
Whole thread Raw
In response to Reducing sema usage (was Postmaster dies with many child processes)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Sat, 30 Jan 1999, Tom Lane wrote:

> I said:
> > Another thing we ought to look at is changing the use of semaphores so
> > that Postgres uses a fixed number of semaphores, not a number that
> > increases as more and more backends are started.  Kernels are
> > traditionally configured with very low limits for the SysV IPC
> > resources, so having a big appetite for semaphores is a Bad Thing.
> 
> I've been looking into this issue today, and it looks possible but messy.
> 
> The source of the problem is the lock manager
> (src/backend/storage/lmgr/proc.c), which wants to be able to wake up a
> specific process that is blocked on a lock.  I had first thought that it
> would be OK to wake up any one of the processes waiting for a lock, but
> after looking at the lock manager that seems a bad idea --- considerable
> thought has gone into the queuing order of waiting processes, and we
> don't want to give that up.  So we need to preserve this ability.
> 
> The way it's currently done is that each extant backend has its own
> SysV-style semaphore, and when you want to wake up a particular backend
> you just V() its semaphore.  (BTW, the semaphores get allocated in
> chunks of 16, so an out-of-semaphores condition will always occur when
> trying to start the 16*N+1'th backend...)  This is simple and reliable
> but fails if you want to have more backends than the kernel has SysV
> semaphores.  Unfortunately kernels are usually configured with not
> very many semaphores --- 64 or so is typical.  Also, running the system
> down to nearly zero free semaphores is likely to cause problems for
> other subsystems even if Postgres itself doesn't run out.
> 
> What seems practical to do instead is this:
> * At postmaster startup, allocate a fixed number of semaphores for
>   use by all child backends.  ("Fixed" can really mean "configurable",
>   of course, but the point is we won't ask for more later.)
> * The semaphores aren't dedicated to use by particular backends.
>   Rather, when a backend needs to block, it finds a currently free
>   semaphore and grabs it for the duration of its wait.  The number
>   of the semaphore a backend is using to wait with would be recorded
>   in its PROC struct, and we'd also need an array of per-sema data
>   to keep track of free and in-use semaphores.
> * This works with very little extra overhead until we have more
>   simultaneously-blocked backends than we have semaphores.  When that
>   happens (which we hope is really seldom), we overload semaphores ---
>   that is, we use the same sema to block two or more backends.  Then
>   the V() operation by the lock's releaser might wake the wrong backend.
>   So, we need an extra field in the LOCK struct to identify the intended
>   wake-ee.  When a backend is released in ProcSleep, it has to look at
>   the lock it is waiting on to see if it is supposed to be wakened
>   right now.  If not, it V()s its shared semaphore a second time (to
>   release the intended wakee), then P()s the semaphore again to go
>   back to sleep itself.  There probably has to be a delay in here,
>   to ensure that the intended wakee gets woken and we don't have its
>   bed-mates indefinitely trading wakeups among the wrong processes.
>   This is why we don't want this scenario happening often.
> 
> I think this could be made to work, but it would be a delicate and
> hard-to-test change in what is already pretty subtle code.
> 
> A considerably more straightforward approach is just to forget about
> incremental allocation of semaphores and grab all we could need at
> postmaster startup.  ("OK, Mac, you told me to allow up to N backends?
> Fine, I'm going to grab N semaphores at startup, and if I can't get them
> I won't play.")  This would force the DB admin to either reconfigure the
> kernel or reduce MaxBackendId to something the kernel can support right
> off the bat, rather than allowing the problem to lurk undetected until
> too many clients are started simultaneously.  (Note there are still
> potential gotchas with running out of processes, swap space, or file
> table slots, so we wouldn't have really guaranteed that N backends can
> be started safely.)
> 
> If we make MaxBackendId settable from a postmaster command-line switch
> then this second approach is probably not too inconvenient, though it
> surely isn't pretty.
> 
> Any thoughts about which way to jump?  I'm sort of inclined to take
> the simpler approach myself...

I'm inclined to agree...get rid of the 'hard coded' max, make it a
settable option on run time, and 'reserve the semaphores' on startup...

Marc G. Fournier                                
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: AW: [HACKERS] Another TEMP table trick
Next
From: Bruce Momjian
Date:
Subject: Re: [HACKERS] Re: Reducing sema usage (was Postmaster dies with many child processes)