Thread: PostgreSQL and Solaris 7?
Hello, I am running 6.4.2 on Sparc/Solaris 2.7. Occasionally (far more often than I would like - which is never :), the postmaster refuses connections. Upon investigation, there are 48 postmaster processes running. If one kills the parent and restarts, all is well for some time. Attached is a chunk of output from postmaster, unfortunately it does not timestamp, so it is hard to see when what happened. I'm most disturbed about the device space message. None of my disks are even close to full, nor have been... Help! Advice? Ways to gather further info? Is this a Solaris 7 thing? Also, is the 48 number configurable? It almost looks as though that is the maximum number of outstanding queries I can have at one time? TIA, DwD -- Daryl W. Dunbar http://www.com, Where the Web Begins! mailto:daryl@www.com
On Tue, 9 Feb 1999, Daryl W. Dunbar wrote: > Hello, > > I am running 6.4.2 on Sparc/Solaris 2.7. Occasionally (far more > often than I would like - which is never :), the postmaster refuses > connections. Upon investigation, there are 48 postmaster processes > running. If one kills the parent and restarts, all is well for some > time. Attached is a chunk of output from postmaster, unfortunately > it does not timestamp, so it is hard to see when what happened. I'm > most disturbed about the device space message. None of my disks are > even close to full, nor have been... > > Help! Advice? Ways to gather further info? Is this a Solaris 7 > thing? We just got our Solaris 7 CDs, and I'm going to be installing it on one of our quieter machines in coming weeks, but... have you tried doing a 'truss -p' on the parent process? Marc G. Fournier Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Reproduced here too. (Solaris 2.6) >On Tue, 9 Feb 1999, Daryl W. Dunbar wrote: > >> Hello, >> >> I am running 6.4.2 on Sparc/Solaris 2.7. Occasionally (far more >> often than I would like - which is never :), the postmaster refuses >> connections. Upon investigation, there are 48 postmaster processes >> running. If one kills the parent and restarts, all is well for some >> time. Attached is a chunk of output from postmaster, unfortunately >> it does not timestamp, so it is hard to see when what happened. I'm >> most disturbed about the device space message. None of my disks are >> even close to full, nor have been... >> >> Help! Advice? Ways to gather further info? Is this a Solaris 7 >> thing? > >We just got our Solaris 7 CDs, and I'm going to be installing it on one of >our quieter machines in coming weeks, but... > >have you tried doing a 'truss -p' on the parent process? Increasing # of semaphores should solve the problem, I guess. I'm going to try that as soon as I find the way to increase semaphores. --- Tatsuo Ishii
I'm also getting these types of errors in the log: NOTICE: Deadlock detected -- See the lock(l) manual page for a possible cause. ERROR: WaitOnLock: error on wakeup - Aborting this transaction NOTICE: Deadlock detected -- See the lock(l) manual page for a possible cause. ERROR: WaitOnLock: error on wakeup - Aborting this transaction also some of these: IpcSemaphoreCreate: semget failed (No space left on device) key=5432614, num=16, permission=600 IpcSemaphoreCreate: semget failed (No space left on device) key=5432714, num=16, permission=600 and many, many of these: FATAL: pq_putnchar: fputc() failed: errno=32 FATAL: pq_putnchar: fputc() failed: errno=32 My disks are virtually empty and I can't see where I'm getting a broken pipe. All communications are local using DBI/DBD::Pg w/Perl 5.005_02. Any thoughts? I'm thinking it might have to do with sockets/streams since 7 uses sockets and all prior solaris 2.x used streams. Maybe it's time to override some configure options? DwD -----Original Message----- From: owner-pgsql-hackers@postgreSQL.org [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tatsuo Ishii Sent: Tuesday, February 09, 1999 11:01 PM To: The Hermit Hacker Cc: Daryl W. Dunbar; pgsql-hackers@postgreSQL.org Subject: Re: [HACKERS] PostgreSQL and Solaris 7? Reproduced here too. (Solaris 2.6) >On Tue, 9 Feb 1999, Daryl W. Dunbar wrote: > >> Hello, >> >> I am running 6.4.2 on Sparc/Solaris 2.7. Occasionally (far more >> often than I would like - which is never :), the postmaster refuses >> connections. Upon investigation, there are 48 postmaster processes >> running. If one kills the parent and restarts, all is well for some >> time. Attached is a chunk of output from postmaster, unfortunately >> it does not timestamp, so it is hard to see when what happened. I'm >> most disturbed about the device space message. None of my disks are >> even close to full, nor have been... >> >> Help! Advice? Ways to gather further info? Is this a Solaris 7 >> thing? > >We just got our Solaris 7 CDs, and I'm going to be installing it on one of >our quieter machines in coming weeks, but... > >have you tried doing a 'truss -p' on the parent process? Increasing # of semaphores should solve the problem, I guess. I'm going to try that as soon as I find the way to increase semaphores. --- Tatsuo Ishii
On Wed, 10 Feb 1999, Tatsuo Ishii wrote: > Reproduced here too. (Solaris 2.6) > > >On Tue, 9 Feb 1999, Daryl W. Dunbar wrote: > > > >> Hello, > >> > >> I am running 6.4.2 on Sparc/Solaris 2.7. Occasionally (far more > >> often than I would like - which is never :), the postmaster refuses > >> connections. Upon investigation, there are 48 postmaster processes > >> running. If one kills the parent and restarts, all is well for some > >> time. Attached is a chunk of output from postmaster, unfortunately > >> it does not timestamp, so it is hard to see when what happened. I'm > >> most disturbed about the device space message. None of my disks are > >> even close to full, nor have been... > >> > >> Help! Advice? Ways to gather further info? Is this a Solaris 7 > >> thing? > > > >We just got our Solaris 7 CDs, and I'm going to be installing it on one of > >our quieter machines in coming weeks, but... > > > >have you tried doing a 'truss -p' on the parent process? > > Increasing # of semaphores should solve the problem, I guess. I'm > going to try that as soon as I find the way to increase semaphores. >From oen of our servers at work, using Oracle /etc/system: set shmsys:shminfo_shmmax=16777216 set shmsys:shminfo_shmmin=1 set shmsys:shminfo_shmmni=100 set shmsys:shminfo_shmseg=51 There are appropriate values for sem also...I actually have a document at work that explains it all...let me see if I can dig it up and add it to the WWW site or something... Marc G. Fournier Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Wed, 10 Feb 1999, The Hermit Hacker wrote: > /etc/system: > > set shmsys:shminfo_shmmax=16777216 > set shmsys:shminfo_shmmin=1 > set shmsys:shminfo_shmmni=100 > set shmsys:shminfo_shmseg=51 Don't forget to reboot. The file /etc/system is read by kernel only on boot. Oleg. ---- Oleg Broytmann http://members.xoom.com/phd2/ phd2@earthling.net Programmers don't die, they justGOSUB without RETURN.
>> Increasing # of semaphores should solve the problem, I guess. I'm >> going to try that as soon as I find the way to increase semaphores. > >>From oen of our servers at work, using Oracle > >/etc/system: > >set shmsys:shminfo_shmmax=16777216 >set shmsys:shminfo_shmmin=1 >set shmsys:shminfo_shmmni=100 >set shmsys:shminfo_shmseg=51 > > >There are appropriate values for sem also...I actually have a document at >work that explains it all...let me see if I can dig it up and add it to >the WWW site or something... Great. I checked my Solaris box using sysdef and got: * IPC Semaphores * 10 entries in semaphore map (SEMMAP) 10 semaphore identifiers (SEMMNI) 60 semaphores in system (SEMMNS) 30 undo structures in system (SEMMNU) 25 max semaphores per id (SEMMSL) 10 max operations per semop call (SEMOPM) 10 max undo entries per process (SEMUME)32767 semaphore maximum value (SEMVMX)16384 adjust on exit max value (SEMAEM) There are so many tunable paramters! I expect cleaner explanations for these kernel variables from Marc's document. -- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: >>> Increasing # of semaphores should solve the problem, I guess. I'm >>> going to try that as soon as I find the way to increase semaphores. Right --- the messages Daryl is seeing indicate he's running out of semaphores or semaphore IDs, not file space. (Someone was too lazy to create a separate kernel errno for out-of-semaphores.) : IpcSemaphoreCreate: semget failed (No space left on device) ^^^^^^^^^^^^^ My guess is that the "WaitOnLock" and "stuck spinlock" complaints are artifacts of not being able to recover from the out-of-semaphores condition. I hope to make this a little more robust in time for 6.5. > I checked my Solaris box using sysdef and got: > 10 entries in semaphore map (SEMMAP) > 10 semaphore identifiers (SEMMNI) > 60 semaphores in system (SEMMNS) > 25 max semaphores per id (SEMMSL) These settings are far too small if you hope to go beyond a couple dozen Postgres backends. Postgres requires one semaphore per backend, which it (presently) allocates in groups of 16. Thus you cannot get past 48 backends with these kernel settings --- starting the 49th backend requires allocating semaphores 49-64, but your system is set up to allow only 60 semas total. (If your platform doesn't have a TEST_AND_SET implementation then several more semas are needed for spinlock emulation, but I assume that's not a problem on Solaris.) SEMMNI should also be bumped up, since you could not get past 10*16 backends with it set at 10 --- and that's not allowing for anything else to be using semaphores! It'd be foolish not to leave at least a couple dozen semas and sema IDs free at Postgres' peak usage. I dunno what SEMMAP is (no such parameter in my kernel) but it probably needs to be at least as large as SEMMNI, possibly larger. To run more than 64 backends you will also need to increase Postgres' internal MaxBackendId constant. Somewhere along here you are also likely to run into other kernel configuration limits, like the total number of processes, total processes for a single user, total number of open files, etc. These are all fixable but you don't want to reboot the system to install new values very often. We need a chapter in the installation guide that covers all this stuff in more detail... offhand I don't even know how many open files to allow per backend. regards, tom lane
In order to increase the number of semaphores, I added the following entries to /etc/system and rebooted: * * Increase the total number of Semaphores per user and system set semsys:seminfo_semmap=128 set semsys:seminfo_semmni=128 set semsys:seminfo_semmns=8192 set semsys:seminfo_semmnu=8192 set semsys:seminfo_semmsl=64 set semsys:seminfo_semopm=32 set semsys:seminfo_semume=32 These values are considerably larger than the defaults but more in line with the Linux defaults. Please note I have 512MB of memory on my Sparc, so I was not concerned about allocating too much table space. Thanks all for your help, this seems to have solved my problem. DwD > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Wednesday, February 10, 1999 10:01 AM > To: t-ishii@sra.co.jp > Cc: Daryl W. Dunbar; pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] PostgreSQL and Solaris 7? > > > Tatsuo Ishii <t-ishii@sra.co.jp> writes: > >>> Increasing # of semaphores should solve the problem, > I guess. I'm > >>> going to try that as soon as I find the way to > increase semaphores. > > Right --- the messages Daryl is seeing indicate he's > running out of > semaphores or semaphore IDs, not file space. (Someone > was too lazy > to create a separate kernel errno for out-of-semaphores.) > > : IpcSemaphoreCreate: semget failed (No space left on device) > ^^^^^^^^^^^^^ > > My guess is that the "WaitOnLock" and "stuck spinlock" > complaints are > artifacts of not being able to recover from the out-of-semaphores > condition. I hope to make this a little more robust in > time for 6.5. > > > I checked my Solaris box using sysdef and got: > > > 10 entries in semaphore map (SEMMAP) > > 10 semaphore identifiers (SEMMNI) > > 60 semaphores in system (SEMMNS) > > 25 max semaphores per id (SEMMSL) > > These settings are far too small if you hope to go beyond > a couple dozen > Postgres backends. Postgres requires one semaphore per > backend, which > it (presently) allocates in groups of 16. Thus you > cannot get past 48 > backends with these kernel settings --- starting the 49th > backend requires > allocating semaphores 49-64, but your system is set up to > allow only 60 > semas total. > > (If your platform doesn't have a TEST_AND_SET implementation then > several more semas are needed for spinlock emulation, but I assume > that's not a problem on Solaris.) > > SEMMNI should also be bumped up, since you could not get > past 10*16 > backends with it set at 10 --- and that's not allowing > for anything > else to be using semaphores! It'd be foolish not to > leave at least > a couple dozen semas and sema IDs free at Postgres' peak usage. > > I dunno what SEMMAP is (no such parameter in my kernel) but it > probably needs to be at least as large as SEMMNI, possibly larger. > > To run more than 64 backends you will also need to > increase Postgres' > internal MaxBackendId constant. Somewhere along here you are also > likely to run into other kernel configuration limits, > like the total > number of processes, total processes for a single user, > total number > of open files, etc. These are all fixable but you don't want to > reboot the system to install new values very often. > > We need a chapter in the installation guide that covers > all this stuff > in more detail... offhand I don't even know how many open files to > allow per backend. > > regards, tom lane >
> > I checked my Solaris box using sysdef and got: > > > 10 entries in semaphore map (SEMMAP) > > 10 semaphore identifiers (SEMMNI) > > 60 semaphores in system (SEMMNS) > > 25 max semaphores per id (SEMMSL) > > These settings are far too small if you hope to go beyond a couple dozen > Postgres backends. Postgres requires one semaphore per backend, which > it (presently) allocates in groups of 16. Thus you cannot get past 48 > backends with these kernel settings --- starting the 49th backend requires > allocating semaphores 49-64, but your system is set up to allow only 60 > semas total. > > (If your platform doesn't have a TEST_AND_SET implementation then > several more semas are needed for spinlock emulation, but I assume > that's not a problem on Solaris.) > > SEMMNI should also be bumped up, since you could not get past 10*16 > backends with it set at 10 --- and that's not allowing for anything > else to be using semaphores! It'd be foolish not to leave at least > a couple dozen semas and sema IDs free at Postgres' peak usage. > > I dunno what SEMMAP is (no such parameter in my kernel) but it > probably needs to be at least as large as SEMMNI, possibly larger. Ok. If I consider 64 backends, at least following settings would be required from your suggestion: 64 entries in semaphore map (SEMMAP) 64 semaphore identifiers (SEMMNI) 64 semaphores in system (SEMMNS) 25 max semaphores per id (SEMMSL) Is this correct? > To run more than 64 backends you will also need to increase Postgres' > internal MaxBackendId constant. Somewhere along here you are also > likely to run into other kernel configuration limits, like the total > number of processes, total processes for a single user, total number > of open files, etc. These are all fixable but you don't want to > reboot the system to install new values very often. > > We need a chapter in the installation guide that covers all this stuff > in more detail... offhand I don't even know how many open files to > allow per backend. PostgreSQL seems to eat up file descriptor as many as possible. I observe 30-40 or more files are opened by a backend. This is definitelty a problem when thinking about large number of backends. My solution is using limit or ulimit command to lower the number of avaliable file descriptors when starting postmaster. The lowered number must be 20 or greater, since PostgreSQL's "virtual file descriptor" system reserves at least 20 open files per backend(backend/storage/file/fd.c). --- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > Ok. If I consider 64 backends, at least following settings would be > required from your suggestion: > 64 entries in semaphore map (SEMMAP) > 64 semaphore identifiers (SEMMNI) > 64 semaphores in system (SEMMNS) > 25 max semaphores per id (SEMMSL) > Is this correct? No. You do need SEMMNS >= 64 of course, but Postgres only needs a sema identifier for each block of 16 semas, so SEMMNI >= 4 will work. According to my references, the recommended value of SEMMAP is SEMMNI+2 (it's for keeping track of unused "holes" between allocated sema-ID groups, so that seems like it ought to be enough). SEMMSL could be as low as 16, though I see no reason to reduce the default value. In reality, of course, you had better leave some slop for other Unix programs to be able to grab semas of their own. I'd suggest at least doubling the minimum SEMMNS and SEMMNI. (On my HP box, ipcs shows various root-owned subsystems using 6 sema IDs with a total of 8 semas. So I'd need at least SEMMNS = 72, SEMMNI = 10 to run 64 backends --- with no margin for error.) I never understood why the default sema configuration values were so small anyway --- it's not like a semaphore uses a huge amount of kernel space... regards, tom lane
As promised, sorry for the delay... http://sunsolve.Sun.COM/private-cgi/us/doc2html?infodoc/2270+91876015215168 This gives an explnation of it all for Solaris... On Thu, 11 Feb 1999, Tom Lane wrote: > Tatsuo Ishii <t-ishii@sra.co.jp> writes: > > Ok. If I consider 64 backends, at least following settings would be > > required from your suggestion: > > > 64 entries in semaphore map (SEMMAP) > > 64 semaphore identifiers (SEMMNI) > > 64 semaphores in system (SEMMNS) > > 25 max semaphores per id (SEMMSL) > > > Is this correct? > > No. You do need SEMMNS >= 64 of course, but Postgres only needs a > sema identifier for each block of 16 semas, so SEMMNI >= 4 will work. > According to my references, the recommended value of SEMMAP is SEMMNI+2 > (it's for keeping track of unused "holes" between allocated sema-ID > groups, so that seems like it ought to be enough). SEMMSL could be as > low as 16, though I see no reason to reduce the default value. > > In reality, of course, you had better leave some slop for other Unix > programs to be able to grab semas of their own. I'd suggest at least > doubling the minimum SEMMNS and SEMMNI. (On my HP box, ipcs shows > various root-owned subsystems using 6 sema IDs with a total of 8 semas. > So I'd need at least SEMMNS = 72, SEMMNI = 10 to run 64 backends --- > with no margin for error.) > > I never understood why the default sema configuration values were so > small anyway --- it's not like a semaphore uses a huge amount of kernel > space... > > regards, tom lane > Marc G. Fournier Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org