Thread: Solaris source code
I have purchased the Solaris source code from Sun for $80. (I could have downloaded it for free after faxing them an 11 page contract, but I decided I wanted the CD's.) See the slashdot story at: http://slashdot.org/article.pl?sid=01/06/30/1224257&mode=thread My hope is that I can use the source code to help debug Solaris PostgreSQL problems. It includes source for the kernel and all user programs. The code is similar to *BSD kernels. It is basically Unix SvR4 with Sun's enhancements. It has both AT&T and Sun copyrights on the files. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
At 04:30 PM 7/5/01 -0400, Bruce Momjian wrote: >I have purchased the Solaris source code from Sun for $80. (I could >have downloaded it for free after faxing them an 11 page contract, but I >decided I wanted the CD's.) See the slashdot story at: > > http://slashdot.org/article.pl?sid=01/06/30/1224257&mode=thread > >My hope is that I can use the source code to help debug Solaris >PostgreSQL problems. It includes source for the kernel and all user >programs. The code is similar to *BSD kernels. It is basically Unix >SvR4 with Sun's enhancements. It has both AT&T and Sun copyrights on >the files. Bruce, We are about to roll out PostgreSQL on Solaris, and I am interested in any Solaris specific gotcha's. Do you have some specifics in mind, or was this just general preventive maintenance type steps? -- Naomi Walker Chief Information Officer Eldorado Computing, Inc. 602-604-3100 ext 242
> At 04:30 PM 7/5/01 -0400, Bruce Momjian wrote: > >I have purchased the Solaris source code from Sun for $80. (I could > >have downloaded it for free after faxing them an 11 page contract, but I > >decided I wanted the CD's.) See the slashdot story at: > > > > http://slashdot.org/article.pl?sid=01/06/30/1224257&mode=thread > > > >My hope is that I can use the source code to help debug Solaris > >PostgreSQL problems. It includes source for the kernel and all user > >programs. The code is similar to *BSD kernels. It is basically Unix > >SvR4 with Sun's enhancements. It has both AT&T and Sun copyrights on > >the files. > > Bruce, > > We are about to roll out PostgreSQL on Solaris, and I am interested in any > Solaris specific gotcha's. Do you have some specifics in mind, or was this > just general preventive maintenance type steps? Preventative. I have heard Solaris has higher context switching and that may effect us because we use processes instead of threads. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Thu, Jul 05, 2001 at 02:03:31PM -0700, Naomi Walker wrote: > We are about to roll out PostgreSQL on Solaris, and I am interested > in any Solaris specific gotcha's. Do you have some specifics in mind, > or was this just general preventive maintenance type steps? There have been reports of trouble with Unix sockets on Solaris. You can use TCP sockets, which might be slower; or change, in src/backend/libpq/pqcomm.c, the line listen(fd, SOMAXCONN); to listen(fd, 1024); (Cf. Stevens, "Unix Network Programming, Volume 1", pp. 96 and 918.) I don't know (and Stevens doesn't hint) of any reason not to fold this change into the mainline sources. However, we haven't heard from the people who had had trouble with Unix sockets whether this change actually fixes their problems. The effect of the change is to make it much less likely for a connection request to be rejected when connections are being opened very frequently. Nathan Myers ncm@zembu.com
On Thu, Jul 05, 2001 at 04:30:40PM -0400, Bruce Momjian allegedly wrote: > I have purchased the Solaris source code from Sun for $80. (I could > have downloaded it for free after faxing them an 11 page contract, but I > decided I wanted the CD's.) See the slashdot story at: > > http://slashdot.org/article.pl?sid=01/06/30/1224257&mode=thread > > My hope is that I can use the source code to help debug Solaris > PostgreSQL problems. It includes source for the kernel and all user > programs. The code is similar to *BSD kernels. It is basically Unix > SvR4 with Sun's enhancements. It has both AT&T and Sun copyrights on > the files. Cool. It would be nice to know why the regression tests fail on Solaris when using a UNIX socket. Cheers, Mathijs
On Thu, Jul 05, 2001 at 02:03:31PM -0700, Naomi Walker allegedly wrote: > At 04:30 PM 7/5/01 -0400, Bruce Momjian wrote: > >I have purchased the Solaris source code from Sun for $80. (I could > >have downloaded it for free after faxing them an 11 page contract, but I > >decided I wanted the CD's.) See the slashdot story at: > > > > http://slashdot.org/article.pl?sid=01/06/30/1224257&mode=thread > > > >My hope is that I can use the source code to help debug Solaris > >PostgreSQL problems. It includes source for the kernel and all user > >programs. The code is similar to *BSD kernels. It is basically Unix > >SvR4 with Sun's enhancements. It has both AT&T and Sun copyrights on > >the files. > > Bruce, > > We are about to roll out PostgreSQL on Solaris, and I am interested in any > Solaris specific gotcha's. Do you have some specifics in mind, or was this > just general preventive maintenance type steps? PostgreSQL 7.1 fails the regression tests when using a UNIX socket, which is faster than a TCP/IP socket (when both the client and the server are running on the same machine). We're running a few small PostgreSQL databases on Solaris and we're going to implement a bigger one in the near future. If you connect via TCP/IP sockets, you should be safe. We're using JDBC to connect to the database and JDBC always uses a TCP/IP socket. So far we haven't run into any real problems, although PostgreSQL did crash once, for unknown reasons (probably becase someone was messing with it). Not really helpful, I guess. Doing some testing of your own is highly recommended ;) Cheers, Mathijs
On Mon, Jul 09, 2001 at 02:03:16PM -0700, Nathan Myers allegedly wrote: > On Mon, Jul 09, 2001 at 02:24:17PM +0200, Mathijs Brands wrote: > > On Thu, Jul 05, 2001 at 02:03:31PM -0700, Naomi Walker allegedly wrote: > > > At 04:30 PM 7/5/01 -0400, Bruce Momjian wrote: > > > >I have purchased the Solaris source code from Sun for $80. (I could > > > >have downloaded it for free after faxing them an 11 page contract, but I > > > >decided I wanted the CD's.) See the slashdot story at: > > > > > > > > http://slashdot.org/article.pl?sid=01/06/30/1224257&mode=thread > > > > > > > >My hope is that I can use the source code to help debug Solaris > > > >PostgreSQL problems. It includes source for the kernel and all user > > > >programs. The code is similar to *BSD kernels. It is basically Unix > > > >SvR4 with Sun's enhancements. It has both AT&T and Sun copyrights on > > > >the files. > > > > > > Bruce, > > > > > > We are about to roll out PostgreSQL on Solaris, and I am interested in any > > > Solaris specific gotcha's. Do you have some specifics in mind, or was this > > > just general preventive maintenance type steps? > > > > PostgreSQL 7.1 fails the regression tests when using a UNIX socket, > > which is faster than a TCP/IP socket (when both the client and the > > server are running on the same machine). > > Have you tried increasing the argument to listen in libpq/pgcomm.c > from SOMAXCONN to 1024? I think many people would be very interested > in your results. OK, I tried using 1024 (and later 128) instead of SOMAXCONN (defined to be 5 on Solaris) in src/backend/libpq/pqcomm.c and ran a few regression tests on two different Sparc boxes (Solaris 7 and 8). The regression test still fails, but for a different reason. The abstime test fails; not only on Solaris but also on FreeBSD (4.3-RELEASE). *** ./expected/abstime.out Thu May 3 21:00:37 2001 --- ./results/abstime.out Tue Jul 10 10:34:18 2001 *************** *** 47,56 **** | Sun Jan 14 03:14:21 1973 PST | Mon May 01 00:30:30 1995 PDT | epoch - | current | -infinity | Sat May 10 23:59:12 1947 PST ! (6 rows) SELECT '' AS six, ABSTIME_TBL.* WHERE ABSTIME_TBL.f1 > abstime '-infinity'; --- 47,55 ---- | Sun Jan 14 03:14:21 1973 PST | Mon May 01 00:30:30 1995 PDT | epoch | -infinity | Sat May 10 23:59:12 1947 PST ! (5 rows) SELECT '' AS six, ABSTIME_TBL.* WHERE ABSTIME_TBL.f1 > abstime '-infinity'; ====================================================================== I've checked the FreeBSD and Linux headers and they've got SOMAXCONN set to 128. Here's a snippet from the linux listen(2) manpage: BUGS If the socket is of type AF_INET, and the backlog argument is greater than the constant SOMAXCONN (128 inLinux 2.0 & 2.2), it is silently truncated to SOMAXCONN. Don't rely on this value in portable applicationssince BSD (and some BSD-derived systems) limit the backlog to 5. I've checked Solaris 2.6, 7 and 8 and the kernels have a default value of 128 for the number of backlog connections. This number can be increased to 1000 (maybe even larger). On Solaris 2.4 and 2.5 it is appearently set to 32. Judging from Adrian Cockcrofts Solaris tuning guide Sun has been using a default value of 128 from Solaris 2.5.1 on. You do need some patches for 2.5.1: patches 103582 & 103630 (SPARC) or patches 103581 & 10361 (X86). Later versions of Solaris don't need any patches. You can check (and set) the number of backlog connections by using the following command: Solaris 2.3, 2.4, 2.5 and unpatched 2.5.1: /usr/sbin/ndd /dev/tcp tcp_conn_req_max (untested) Solaris 2.5.1 (patched), 2.6, 7 and 8: /usr/sbin/ndd /dev/tcp tcp_conn_req_max_q It'd probably be a good idea to use a value of 128 for the number of backlog connections and not SOMAXCONN. If the requested number of backlog connections is bigger than the number the kernel allows, it should be truncated. Of course, there's no guarantee that this won't cause problems on arcane platforms such as Ultrix (if it is still supported). The Apache survival guide has more info on TCP/IP tuning for several platforms and includes information on the listen backlog. Cheers, Mathijs Ps. Just checking IRIX 6.5 - it's got the backlog set to 1000 connctions. -- And the beast shall be made legion. Its numbers shall be increased a thousand thousand fold. The din of a million keyboards like unto a great storm shall cover the earth, and the followers of Mammon shall tremble.
Mathijs Brands <mathijs@ilse.nl> writes: > OK, I tried using 1024 (and later 128) instead of SOMAXCONN (defined to > be 5 on Solaris) in src/backend/libpq/pqcomm.c and ran a few regression > tests on two different Sparc boxes (Solaris 7 and 8). The regression > test still fails, but for a different reason. The abstime test fails; > not only on Solaris but also on FreeBSD (4.3-RELEASE). The abstime diff is to be expected (if you look closely, the test is comparing 'current' to 'June 30, 2001'. Ooops). If that's the only diff then you are in good shape. Based on this and previous discussions, I am strongly tempted to remove the use of SOMAXCONN and instead use, say, #define PG_SOMAXCONN 1000 defined in config.h.in. That would leave room for configure to twiddle it, if that proves necessary. Does anyone know of a platform where this would cause problems? AFAICT, all versions of listen(2) are claimed to be willing to reduce the passed parameter to whatever they can handle. regards, tom lane
> Mathijs Brands <mathijs@ilse.nl> writes: > > OK, I tried using 1024 (and later 128) instead of SOMAXCONN (defined to > > be 5 on Solaris) in src/backend/libpq/pqcomm.c and ran a few regression > > tests on two different Sparc boxes (Solaris 7 and 8). The regression > > test still fails, but for a different reason. The abstime test fails; > > not only on Solaris but also on FreeBSD (4.3-RELEASE). > > The abstime diff is to be expected (if you look closely, the test is > comparing 'current' to 'June 30, 2001'. Ooops). If that's the only > diff then you are in good shape. > > > Based on this and previous discussions, I am strongly tempted to remove > the use of SOMAXCONN and instead use, say, > > #define PG_SOMAXCONN 1000 > > defined in config.h.in. That would leave room for configure to twiddle > it, if that proves necessary. Does anyone know of a platform where this > would cause problems? AFAICT, all versions of listen(2) are claimed to > be willing to reduce the passed parameter to whatever they can handle. Could we test SOMAXCONN and set PG_SOMAXCONN to 1000 only if SOMAXCONN1 is less than 1000? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Tue, Jul 10, 2001 at 05:06:28PM -0400, Bruce Momjian wrote: > > Mathijs Brands <mathijs@ilse.nl> writes: > > > OK, I tried using 1024 (and later 128) instead of SOMAXCONN (defined to > > > be 5 on Solaris) in src/backend/libpq/pqcomm.c and ran a few regression > > > tests on two different Sparc boxes (Solaris 7 and 8). The regression > > > test still fails, but for a different reason. The abstime test fails; > > > not only on Solaris but also on FreeBSD (4.3-RELEASE). > > > > The abstime diff is to be expected (if you look closely, the test is > > comparing 'current' to 'June 30, 2001'. Ooops). If that's the only > > diff then you are in good shape. > > > > > > Based on this and previous discussions, I am strongly tempted to remove > > the use of SOMAXCONN and instead use, say, > > > > #define PG_SOMAXCONN 1000 > > > > defined in config.h.in. That would leave room for configure to twiddle > > it, if that proves necessary. Does anyone know of a platform where this > > would cause problems? AFAICT, all versions of listen(2) are claimed to > > be willing to reduce the passed parameter to whatever they can handle. > > Could we test SOMAXCONN and set PG_SOMAXCONN to 1000 only if SOMAXCONN > is less than 1000? All the OSes we know of fold it to 128, currently. We can jump it to 10240 now, or later when there are 20GHz CPUs. If you want to make it more complicated, it would be more useful to be able to set the value lower for runtime environments where PG is competing for OS resources with another daemon that deserves higher priority. Nathan Myers ncm@zembu.com
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Could we test SOMAXCONN and set PG_SOMAXCONN to 1000 only if SOMAXCONN1 > is less than 1000? Why bother? If you've got some plausible scenario where 1000 is too small, we could just as easily make it 10000. I don't see the need for yet another configure test for this. regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Could we test SOMAXCONN and set PG_SOMAXCONN to 1000 only if SOMAXCONN1 > > is less than 1000? > > Why bother? > > If you've got some plausible scenario where 1000 is too small, we could > just as easily make it 10000. I don't see the need for yet another > configure test for this. I was thinking: #if SOMAXCONN >= 1000#define PG_SOMAXCONN SOMAXCONN#else#define PG_SOMAXCONN 1000#endif -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
ncm@zembu.com (Nathan Myers) writes: > All the OSes we know of fold it to 128, currently. We can jump it > to 10240 now, or later when there are 20GHz CPUs. > If you want to make it more complicated, it would be more useful to > be able to set the value lower for runtime environments where PG is > competing for OS resources with another daemon that deserves higher > priority. Hmm, good point. Does anyone have a feeling for the amount of kernel resources that are actually sucked up by an accept-queue entry? If 128 is the customary limit, is it actually worth worrying about whether we are setting it to 128 vs. something smaller? regards, tom lane
On Tue, Jul 10, 2001 at 06:36:21PM -0400, Tom Lane wrote: > ncm@zembu.com (Nathan Myers) writes: > > All the OSes we know of fold it to 128, currently. We can jump it > > to 10240 now, or later when there are 20GHz CPUs. > > > If you want to make it more complicated, it would be more useful to > > be able to set the value lower for runtime environments where PG is > > competing for OS resources with another daemon that deserves higher > > priority. > > Hmm, good point. Does anyone have a feeling for the amount of kernel > resources that are actually sucked up by an accept-queue entry? If 128 > is the customary limit, is it actually worth worrying about whether > we are setting it to 128 vs. something smaller? I don't think the issue is the resources that are consumed by the accept-queue entry. Rather, it's a tuning knob to help shed load at the entry point to the system, before significant resources have been committed. An administrator would tune it according to actual system and traffic characteristics. It is easy enough for somebody to change, if they care, that it seems to me we have already devoted it more time than it deserves right now. Nathan Myers ncm@zembu.com
> ncm@zembu.com (Nathan Myers) writes: > > All the OSes we know of fold it to 128, currently. We can jump it > > to 10240 now, or later when there are 20GHz CPUs. > > > If you want to make it more complicated, it would be more useful to > > be able to set the value lower for runtime environments where PG is > > competing for OS resources with another daemon that deserves higher > > priority. > > Hmm, good point. Does anyone have a feeling for the amount of kernel > resources that are actually sucked up by an accept-queue entry? If 128 > is the customary limit, is it actually worth worrying about whether > we are setting it to 128 vs. something smaller? All I can say is keep in mind that Solaris uses SVr4 streams, which are quite a bit heavier than the BSD-based sockets. I don't know any numbers. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom Lane <tgl@sss.pgh.pa.us> writes: > ncm@zembu.com (Nathan Myers) writes: > > If you want to make it more complicated, it would be more useful to > > be able to set the value lower for runtime environments where PG is > > competing for OS resources with another daemon that deserves higher > > priority. > > Hmm, good point. Does anyone have a feeling for the amount of kernel > resources that are actually sucked up by an accept-queue entry? If 128 > is the customary limit, is it actually worth worrying about whether > we are setting it to 128 vs. something smaller? Not much in the way of kernel resources is required by an entry on the accept queue. Basically a socket structure and maybe a couple of addresses, typically about 200 bytes or so. But I wouldn't worry about it, and I wouldn't worry about Nathan's suggestion for making the limit configurable, because Postgres connections don't spend time on the queue. The postgres server will be picking them off as fast as it can. If the server can't pick processes off fast enough, then your system has other problems; reducing the size of the queue won't help those problems. A large queue will help when a large number of connections arrives simultaneously--it will permit Postgres to deal them appropriately, rather than causing the system to discard them on its terms. (Matters might be different if the Postgres server were written to not call accept when it had the maximum number of connections active, and to just leave connections on the queue in that case. But that's not how it works today.) Ian ---------------------------(end of broadcast)--------------------------- TIP 842: "When the only tool you have is a hammer, you tend to treat everything as if it were a nail." -- Abraham Maslow
Ian Lance Taylor <ian@zembu.com> writes: > But I wouldn't worry about it, and I wouldn't worry about Nathan's > suggestion for making the limit configurable, because Postgres > connections don't spend time on the queue. The postgres server will > be picking them off as fast as it can. If the server can't pick > processes off fast enough, then your system has other problems; Right. Okay, it seems like just making it a hand-configurable entry in config.h.in is good enough for now. When and if we find that that's inadequate in a real-world situation, we can improve on it... regards, tom lane
Quick rundown of our configuration: Red Hat 7.1 (no changes or extras added by us) Postgresql 7.1.2 and CVS HEAD from 07/10/2001 3.8 gb database size I included two pgsql versions because this happens on both. Here's the problem we're having: We run a vacuumdb from the server on the entire database. Some large tables are vacuumed very quickly, but the vacuum process hangs or takes more than a few hours on a specific table (we haven't let it finish before). The vacuum process works quickly on a table (loginhistory) with 2.8 million records, but is extremely slow on a table (inbox) with 1.1 million records (the table with 1.1 million records is actually larger in kb size than the other table). We've tried to vacuum the inbox table seperately ('vacuum inbox' within psql), but this still takes hours (again we have never let it complete, we need to use the database for development as well). We noticed 2 things that are significant to this situatoin: The server logs the following: DEBUG: --Relation msginbox-- DEBUG: Pages 129921: Changed 26735, reaped 85786, Empty 0, New 0; Tup 1129861: Vac 560327, Keep/VTL 0/0, Crash 0, UnUsed 51549, MinLen 100, MaxLen 2032; Re-using: Free/Avail. Space 359061488/359059332; EndEmpty/Avail. Pages 0/85785. CPU 11.18s/5.32u sec. DEBUG: Index msginbox_pkey: Pages 4749; Tuples 1129861: Deleted 76360. CPU 0.47s/6.70u sec. DEBUG: Index msginbox_fromto: Pages 5978; Tuples 1129861: Deleted 0. CPU 0.37s/6.15u sec. DEBUG: Index msginbox_search: Pages 4536; Tuples 1129861: Deleted 0. CPU 0.32s/6.30u sec. DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES the last few lines (XLogWrite .....) repeat for ever and ever and ever. With 7.1.2 this never stops unless we run out of disk space or cancel the query. With CVS HEAD this still continues, but the log files don't consume all disk space, but we still have to cancel it or it might run forever. Perhaps we need to let it run until it completes, but we thought that we might be doing something wrong or have some data (we're converting data from MS SQL Server) that isn't friendly. The major issue we're facing with this is that any read or write access to the table being vacuumed times out (obviously because the table is still locked). We plan to use PostgreSQL in our production service, but we can't until we get this resolved. We're at a loss, not being familiar enough with PostgreSQL and it's source code. Can anyone please offer some advice or suggestions? Thanks, Mark
Tom Lane writes: > Right. Okay, it seems like just making it a hand-configurable entry > in config.h.in is good enough for now. When and if we find that > that's inadequate in a real-world situation, we can improve on it... Would anything computed from the maximum number of allowed connections make sense? -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane writes: >> Right. Okay, it seems like just making it a hand-configurable entry >> in config.h.in is good enough for now. When and if we find that >> that's inadequate in a real-world situation, we can improve on it... > Would anything computed from the maximum number of allowed connections > make sense? [ looks at code ... ] Hmm, MaxBackends is indeed set before we arrive at the listen(), so it'd be possible to use MaxBackends to compute the parameter. Offhand I would think that MaxBackends or at most 2*MaxBackends would be a reasonable value. Question, though: is this better than having a hardwired constant? The only case I can think of where it might not be is if some platform out there throws an error from listen() when the parameter is too large for it, rather than silently reducing the value to what it can handle. A value set in config.h.in would be simpler to adapt for such a platform. BTW, while I'm thinking about it: why doesn't pqcomm.c test for a failure return from the listen() call? Is this just an oversight, or is there a good reason to ignore errors? regards, tom lane
> Peter Eisentraut <peter_e@gmx.net> writes: > > Tom Lane writes: > >> Right. Okay, it seems like just making it a hand-configurable entry > >> in config.h.in is good enough for now. When and if we find that > >> that's inadequate in a real-world situation, we can improve on it... > > > Would anything computed from the maximum number of allowed connections > > make sense? > > [ looks at code ... ] Hmm, MaxBackends is indeed set before we arrive > at the listen(), so it'd be possible to use MaxBackends to compute the > parameter. Offhand I would think that MaxBackends or at most > 2*MaxBackends would be a reasonable value. Don't we have maxbackends configurable at runtime. If so, any constant we put in config.h will be inaccurate. Seems we have to track maxbackends. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Don't we have maxbackends configurable at runtime. Not after postmaster start, so passing it to the initial listen() shouldn't be a problem. The other concern I had could be addressed by making the listen parameter be MIN(MaxBackends, PG_SOMAXCONN) where PG_SOMAXCONN is set in config.h --- but now we could make the default value really large, say 10000. The only reason to change it would be if you had a kernel that barfed on large listen() parameters. Have we beat this issue to death yet, or is it still twitching? regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Don't we have maxbackends configurable at runtime. > > Not after postmaster start, so passing it to the initial listen() > shouldn't be a problem. > > The other concern I had could be addressed by making the listen > parameter be MIN(MaxBackends, PG_SOMAXCONN) where PG_SOMAXCONN > is set in config.h --- but now we could make the default value > really large, say 10000. The only reason to change it would be > if you had a kernel that barfed on large listen() parameters. Sounds good to me. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom Lane writes: > The other concern I had could be addressed by making the listen > parameter be MIN(MaxBackends, PG_SOMAXCONN) where PG_SOMAXCONN > is set in config.h --- but now we could make the default value > really large, say 10000. The only reason to change it would be > if you had a kernel that barfed on large listen() parameters. We'll never find that out if we don't try it. If you're concerned about cooperating with other listen()ing processes, set it to MaxBackends * 2, if you're not, set it to INT_MAX and watch. -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
We increased shared memory in the linux kernel, which decreased the vacuumdb time from 40 minutes to 14 minutes on a 450 mhz processor. We calculate that on our dual 1ghz box with ghz ethernet san connection this will go down to under 5 minutes. This is acceptable to us. Sorry about the unnecessary post. On Wednesday 11 July 2001 09:16, Mark wrote: > Quick rundown of our configuration: > Red Hat 7.1 (no changes or extras added by us) > Postgresql 7.1.2 and CVS HEAD from 07/10/2001 > 3.8 gb database size > > I included two pgsql versions because this happens on both. > > Here's the problem we're having: > > We run a vacuumdb from the server on the entire database. Some large > tables are vacuumed very quickly, but the vacuum process hangs or takes > more than a few hours on a specific table (we haven't let it finish > before). The vacuum process works quickly on a table (loginhistory) with > 2.8 million records, but is extremely slow on a table (inbox) with 1.1 > million records (the table with 1.1 million records is actually larger in > kb size than the other table). > > We've tried to vacuum the inbox table seperately ('vacuum inbox' within > psql), but this still takes hours (again we have never let it complete, we > need to use the database for development as well). > > We noticed 2 things that are significant to this situatoin: > The server logs the following: > > > DEBUG: --Relation msginbox-- > DEBUG: Pages 129921: Changed 26735, reaped 85786, Empty 0, New 0; Tup > 1129861: Vac 560327, Keep/VTL 0/0, Crash 0, UnUsed 51549, MinLen 100, > MaxLen 2032; Re-using: Free/Avail. Space 359061488/359059332; > EndEmpty/Avail. Pages 0/85785. CPU 11.18s/5.32u sec. > DEBUG: Index msginbox_pkey: Pages 4749; Tuples 1129861: Deleted 76360. > CPU 0.47s/6.70u sec. > DEBUG: Index msginbox_fromto: Pages 5978; Tuples 1129861: Deleted 0. > CPU 0.37s/6.15u sec. > DEBUG: Index msginbox_search: Pages 4536; Tuples 1129861: Deleted 0. > CPU 0.32s/6.30u sec. > DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES > DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES > DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES > DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES > DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES > DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES > DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES > DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES > DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES > DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES > DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES > DEBUG: XLogWrite: new log file created - consider increasing WAL_FILES > > the last few lines (XLogWrite .....) repeat for ever and ever and ever. > With 7.1.2 this never stops unless we run out of disk space or cancel the > query. With CVS HEAD this still continues, but the log files don't consume > all disk space, but we still have to cancel it or it might run forever. > > Perhaps we need to let it run until it completes, but we thought that we > might be doing something wrong or have some data (we're converting data > from MS SQL Server) that isn't friendly. > > The major issue we're facing with this is that any read or write access to > the table being vacuumed times out (obviously because the table is still > locked). We plan to use PostgreSQL in our production service, but we can't > until we get this resolved. > > We're at a loss, not being familiar enough with PostgreSQL and it's source > code. Can anyone please offer some advice or suggestions? > > Thanks, > > Mark > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
On Wed, Jul 11, 2001 at 12:26:43PM -0400, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > Tom Lane writes: > >> Right. Okay, it seems like just making it a hand-configurable entry > >> in config.h.in is good enough for now. When and if we find that > >> that's inadequate in a real-world situation, we can improve on it... > > > Would anything computed from the maximum number of allowed connections > > make sense? > > [ looks at code ... ] Hmm, MaxBackends is indeed set before we arrive > at the listen(), so it'd be possible to use MaxBackends to compute the > parameter. Offhand I would think that MaxBackends or at most > 2*MaxBackends would be a reasonable value. > > Question, though: is this better than having a hardwired constant? > The only case I can think of where it might not be is if some platform > out there throws an error from listen() when the parameter is too large > for it, rather than silently reducing the value to what it can handle. > A value set in config.h.in would be simpler to adapt for such a platform. The question is really whether you ever want a client to get a "rejected" result from an open attempt, or whether you'd rather they got a report from the back end telling them they can't log in. The second is more polite but a lot more expensive. That expense might really matter if you have MaxBackends already running. I doubt most clients have tested either failure case more thoroughly than the other (or at all), but the lower-level code is more likely to have been cut-and-pasted from well-tested code. :-) Maybe PG should avoid accept()ing connections once it has MaxBackends back ends already running (as hinted at by Ian), so that the listen() parameter actually has some meaningful effect, and excess connections can be rejected more cheaply. That might also make it easier to respond more adaptively to true load than we do now. > BTW, while I'm thinking about it: why doesn't pqcomm.c test for a > failure return from the listen() call? Is this just an oversight, > or is there a good reason to ignore errors? The failure of listen() seems impossible. In the Linux, NetBSD, and Solaris man pages, none of the error returns mentioned are possible with PG's current use of the function. It seems as if the most that might be needed now would be to add a comment to the call to socket() noting that if any other address families are supported (besides AF_INET and AF_LOCAL aka AF_UNIX), the call to listen() might need to be looked at. AF_INET6 (which PG will need to support someday) doesn't seem to change matters. Probably if listen() did fail, then one or other of bind(), accept(), and read() would fail too. Nathan Myers ncm@zembu.com