Thread: Solaris source code

Solaris source code

From

Bruce Momjian

Date:

05 July 2001, 16:30:50

I have purchased the Solaris source code from Sun for $80.  (I could
have downloaded it for free after faxing them an 11 page contract, but I
decided I wanted the CD's.)  See the slashdot story at:
http://slashdot.org/article.pl?sid=01/06/30/1224257&mode=thread

My hope is that I can use the source code to help debug Solaris
PostgreSQL problems.  It includes source for the kernel and all user
programs. The code is similar to *BSD kernels.  It is basically Unix
SvR4 with Sun's enhancements.  It has both AT&T and Sun copyrights on
the files.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Solaris source code

From

Naomi Walker

Date:

05 July 2001, 17:19:13

At 04:30 PM 7/5/01 -0400, Bruce Momjian wrote:
>I have purchased the Solaris source code from Sun for $80.  (I could
>have downloaded it for free after faxing them an 11 page contract, but I
>decided I wanted the CD's.)  See the slashdot story at:
>
>         http://slashdot.org/article.pl?sid=01/06/30/1224257&mode=thread
>
>My hope is that I can use the source code to help debug Solaris
>PostgreSQL problems.  It includes source for the kernel and all user
>programs. The code is similar to *BSD kernels.  It is basically Unix
>SvR4 with Sun's enhancements.  It has both AT&T and Sun copyrights on
>the files.

Bruce,

We are about to roll out PostgreSQL on Solaris, and I am interested in any 
Solaris specific gotcha's.  Do you have some specifics in mind, or was this 
just general preventive maintenance type steps?
--
Naomi Walker
Chief Information Officer
Eldorado Computing, Inc.
602-604-3100  ext 242

Re: Solaris source code

From

Bruce Momjian

Date:

05 July 2001, 17:48:25

> At 04:30 PM 7/5/01 -0400, Bruce Momjian wrote:
> >I have purchased the Solaris source code from Sun for $80.  (I could
> >have downloaded it for free after faxing them an 11 page contract, but I
> >decided I wanted the CD's.)  See the slashdot story at:
> >
> >         http://slashdot.org/article.pl?sid=01/06/30/1224257&mode=thread
> >
> >My hope is that I can use the source code to help debug Solaris
> >PostgreSQL problems.  It includes source for the kernel and all user
> >programs. The code is similar to *BSD kernels.  It is basically Unix
> >SvR4 with Sun's enhancements.  It has both AT&T and Sun copyrights on
> >the files.
> 
> Bruce,
> 
> We are about to roll out PostgreSQL on Solaris, and I am interested in any 
> Solaris specific gotcha's.  Do you have some specifics in mind, or was this 
> just general preventive maintenance type steps?

Preventative. I have heard Solaris has higher context switching and that
may effect us because we use processes instead of threads.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Solaris source code

From

ncm@zembu.com (Nathan Myers)

Date:

05 July 2001, 18:03:01

On Thu, Jul 05, 2001 at 02:03:31PM -0700, Naomi Walker wrote:
> We are about to roll out PostgreSQL on Solaris, and I am interested
> in any Solaris specific gotcha's.  Do you have some specifics in mind,
> or was this just general preventive maintenance type steps?

There have been reports of trouble with Unix sockets on Solaris.
You can use TCP sockets, which might be slower; or change, in 
src/backend/libpq/pqcomm.c, the line 
 listen(fd, SOMAXCONN);

to
 listen(fd, 1024);

(Cf. Stevens, "Unix Network Programming, Volume 1", pp. 96 and 918.)

I don't know (and Stevens doesn't hint) of any reason not to fold 
this change into the mainline sources.  However, we haven't heard 
from the people who had had trouble with Unix sockets whether this 
change actually fixes their problems.

The effect of the change is to make it much less likely for a 
connection request to be rejected when connections are being opened 
very frequently.

Nathan Myers
ncm@zembu.com

Re: Solaris source code

From

Mathijs Brands

Date:

09 July 2001, 08:19:21

On Thu, Jul 05, 2001 at 04:30:40PM -0400, Bruce Momjian allegedly wrote:
> I have purchased the Solaris source code from Sun for $80.  (I could
> have downloaded it for free after faxing them an 11 page contract, but I
> decided I wanted the CD's.)  See the slashdot story at:
> 
>     http://slashdot.org/article.pl?sid=01/06/30/1224257&mode=thread
> 
> My hope is that I can use the source code to help debug Solaris
> PostgreSQL problems.  It includes source for the kernel and all user
> programs. The code is similar to *BSD kernels.  It is basically Unix
> SvR4 with Sun's enhancements.  It has both AT&T and Sun copyrights on
> the files.

Cool. It would be nice to know why the regression tests fail on Solaris when
using a UNIX socket.

Cheers,

Mathijs

Re: Solaris source code

From

Mathijs Brands

Date:

09 July 2001, 08:24:26

On Thu, Jul 05, 2001 at 02:03:31PM -0700, Naomi Walker allegedly wrote:
> At 04:30 PM 7/5/01 -0400, Bruce Momjian wrote:
> >I have purchased the Solaris source code from Sun for $80.  (I could
> >have downloaded it for free after faxing them an 11 page contract, but I
> >decided I wanted the CD's.)  See the slashdot story at:
> >
> >         http://slashdot.org/article.pl?sid=01/06/30/1224257&mode=thread
> >
> >My hope is that I can use the source code to help debug Solaris
> >PostgreSQL problems.  It includes source for the kernel and all user
> >programs. The code is similar to *BSD kernels.  It is basically Unix
> >SvR4 with Sun's enhancements.  It has both AT&T and Sun copyrights on
> >the files.
> 
> Bruce,
> 
> We are about to roll out PostgreSQL on Solaris, and I am interested in any 
> Solaris specific gotcha's.  Do you have some specifics in mind, or was this 
> just general preventive maintenance type steps?

PostgreSQL 7.1 fails the regression tests when using a UNIX socket,
which is faster than a TCP/IP socket (when both the client and the
server are running on the same machine). We're running a few small
PostgreSQL databases on Solaris and we're going to implement a bigger
one in the near future. If you connect via TCP/IP sockets, you should be
safe. We're using JDBC to connect to the database and JDBC always uses
a TCP/IP socket. So far we haven't run into any real problems, although
PostgreSQL did crash once, for unknown reasons (probably becase someone
was messing with it).

Not really helpful, I guess. Doing some testing of your own is highly
recommended ;)

Cheers,

Mathijs

Re: Solaris source code

From

Mathijs Brands

Date:

10 July 2001, 05:38:02

On Mon, Jul 09, 2001 at 02:03:16PM -0700, Nathan Myers allegedly wrote:
> On Mon, Jul 09, 2001 at 02:24:17PM +0200, Mathijs Brands wrote:
> > On Thu, Jul 05, 2001 at 02:03:31PM -0700, Naomi Walker allegedly wrote:
> > > At 04:30 PM 7/5/01 -0400, Bruce Momjian wrote:
> > > >I have purchased the Solaris source code from Sun for $80.  (I could
> > > >have downloaded it for free after faxing them an 11 page contract, but I
> > > >decided I wanted the CD's.)  See the slashdot story at:
> > > >
> > > >         http://slashdot.org/article.pl?sid=01/06/30/1224257&mode=thread
> > > >
> > > >My hope is that I can use the source code to help debug Solaris
> > > >PostgreSQL problems.  It includes source for the kernel and all user
> > > >programs. The code is similar to *BSD kernels.  It is basically Unix
> > > >SvR4 with Sun's enhancements.  It has both AT&T and Sun copyrights on
> > > >the files.
> > > 
> > > Bruce,
> > > 
> > > We are about to roll out PostgreSQL on Solaris, and I am interested in any 
> > > Solaris specific gotcha's.  Do you have some specifics in mind, or was this 
> > > just general preventive maintenance type steps?
> > 
> > PostgreSQL 7.1 fails the regression tests when using a UNIX socket,
> > which is faster than a TCP/IP socket (when both the client and the
> > server are running on the same machine). 
> 
> Have you tried increasing the argument to listen in libpq/pgcomm.c
> from SOMAXCONN to 1024?  I think many people would be very interested
> in your results.

OK, I tried using 1024 (and later 128) instead of SOMAXCONN (defined to
be 5 on Solaris) in src/backend/libpq/pqcomm.c and ran a few regression
tests on two different Sparc boxes (Solaris 7 and 8). The regression
test still fails, but for a different reason. The abstime test fails;
not only on Solaris but also on FreeBSD (4.3-RELEASE).

*** ./expected/abstime.out  Thu May  3 21:00:37 2001
--- ./results/abstime.out Tue Jul 10 10:34:18 2001
***************
*** 47,56 ****      | Sun Jan 14 03:14:21 1973 PST      | Mon May 01 00:30:30 1995 PDT      | epoch
-      | current      | -infinity      | Sat May 10 23:59:12 1947 PST
! (6 rows)
 SELECT '' AS six, ABSTIME_TBL.*    WHERE ABSTIME_TBL.f1 > abstime '-infinity';
--- 47,55 ----      | Sun Jan 14 03:14:21 1973 PST      | Mon May 01 00:30:30 1995 PDT      | epoch      | -infinity
 | Sat May 10 23:59:12 1947 PST

! (5 rows)
 SELECT '' AS six, ABSTIME_TBL.*    WHERE ABSTIME_TBL.f1 > abstime '-infinity';

======================================================================

I've checked the FreeBSD and Linux headers and they've got SOMAXCONN set
to 128.

Here's a snippet from the linux listen(2) manpage:

BUGS      If the socket is of type AF_INET, and the backlog argument      is greater than the constant SOMAXCONN (128
inLinux 2.0 &      2.2), it is silently truncated to SOMAXCONN.   Don't  rely      on this value in portable
applicationssince BSD (and some      BSD-derived systems) limit the backlog to 5.

I've checked Solaris 2.6, 7 and 8 and the kernels have a default value
of 128 for the number of backlog connections. This number can be
increased to 1000 (maybe even larger). On Solaris 2.4 and 2.5 it is
appearently set to 32. Judging from Adrian Cockcrofts Solaris tuning
guide Sun has been using a default value of 128 from Solaris 2.5.1
on. You do need some patches for 2.5.1: patches 103582 & 103630 (SPARC)
or patches 103581 & 10361 (X86). Later versions of Solaris don't need
any patches. You can check (and set) the number of backlog connections
by using the following command:

Solaris 2.3, 2.4, 2.5 and unpatched 2.5.1: /usr/sbin/ndd /dev/tcp tcp_conn_req_max (untested)

Solaris 2.5.1 (patched), 2.6, 7 and 8: /usr/sbin/ndd /dev/tcp tcp_conn_req_max_q

It'd probably be a good idea to use a value of 128 for the number of
backlog connections and not SOMAXCONN. If the requested number of
backlog connections is bigger than the number the kernel allows, it
should be truncated. Of course, there's no guarantee that this won't
cause problems on arcane platforms such as Ultrix (if it is still
supported).

The Apache survival guide has more info on TCP/IP tuning for several
platforms and includes information on the listen backlog.

Cheers,

Mathijs

Ps. Just checking IRIX 6.5 - it's got the backlog set to 1000
connctions.
-- 
And the beast shall be made legion. Its numbers shall be increased a
thousand thousand fold. The din of a million keyboards like unto a great
storm shall cover the earth, and the followers of Mammon shall tremble.

SOMAXCONN (was Re: Solaris source code)

From

Tom Lane

Date:

10 July 2001, 14:20:25

Mathijs Brands <mathijs@ilse.nl> writes:
> OK, I tried using 1024 (and later 128) instead of SOMAXCONN (defined to
> be 5 on Solaris) in src/backend/libpq/pqcomm.c and ran a few regression
> tests on two different Sparc boxes (Solaris 7 and 8). The regression
> test still fails, but for a different reason. The abstime test fails;
> not only on Solaris but also on FreeBSD (4.3-RELEASE).

The abstime diff is to be expected (if you look closely, the test is
comparing 'current' to 'June 30, 2001'.  Ooops).  If that's the only
diff then you are in good shape.


Based on this and previous discussions, I am strongly tempted to remove
the use of SOMAXCONN and instead use, say,
#define PG_SOMAXCONN    1000

defined in config.h.in.  That would leave room for configure to twiddle
it, if that proves necessary.  Does anyone know of a platform where this
would cause problems?  AFAICT, all versions of listen(2) are claimed to
be willing to reduce the passed parameter to whatever they can handle.
        regards, tom lane

Re: SOMAXCONN (was Re: Solaris source code)

From

Bruce Momjian

Date:

10 July 2001, 17:23:26

> Mathijs Brands <mathijs@ilse.nl> writes:
> > OK, I tried using 1024 (and later 128) instead of SOMAXCONN (defined to
> > be 5 on Solaris) in src/backend/libpq/pqcomm.c and ran a few regression
> > tests on two different Sparc boxes (Solaris 7 and 8). The regression
> > test still fails, but for a different reason. The abstime test fails;
> > not only on Solaris but also on FreeBSD (4.3-RELEASE).
> 
> The abstime diff is to be expected (if you look closely, the test is
> comparing 'current' to 'June 30, 2001'.  Ooops).  If that's the only
> diff then you are in good shape.
> 
> 
> Based on this and previous discussions, I am strongly tempted to remove
> the use of SOMAXCONN and instead use, say,
> 
>     #define PG_SOMAXCONN    1000
> 
> defined in config.h.in.  That would leave room for configure to twiddle
> it, if that proves necessary.  Does anyone know of a platform where this
> would cause problems?  AFAICT, all versions of listen(2) are claimed to
> be willing to reduce the passed parameter to whatever they can handle.

Could we test SOMAXCONN and set PG_SOMAXCONN to 1000 only if SOMAXCONN1
is less than 1000?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: SOMAXCONN (was Re: Solaris source code)

From

ncm@zembu.com (Nathan Myers)

Date:

10 July 2001, 18:15:48

On Tue, Jul 10, 2001 at 05:06:28PM -0400, Bruce Momjian wrote:
> > Mathijs Brands <mathijs@ilse.nl> writes:
> > > OK, I tried using 1024 (and later 128) instead of SOMAXCONN (defined to
> > > be 5 on Solaris) in src/backend/libpq/pqcomm.c and ran a few regression
> > > tests on two different Sparc boxes (Solaris 7 and 8). The regression
> > > test still fails, but for a different reason. The abstime test fails;
> > > not only on Solaris but also on FreeBSD (4.3-RELEASE).
> > 
> > The abstime diff is to be expected (if you look closely, the test is
> > comparing 'current' to 'June 30, 2001'.  Ooops).  If that's the only
> > diff then you are in good shape.
> > 
> > 
> > Based on this and previous discussions, I am strongly tempted to remove
> > the use of SOMAXCONN and instead use, say,
> > 
> >     #define PG_SOMAXCONN    1000
> > 
> > defined in config.h.in.  That would leave room for configure to twiddle
> > it, if that proves necessary.  Does anyone know of a platform where this
> > would cause problems?  AFAICT, all versions of listen(2) are claimed to
> > be willing to reduce the passed parameter to whatever they can handle.
> 
> Could we test SOMAXCONN and set PG_SOMAXCONN to 1000 only if SOMAXCONN
> is less than 1000?

All the OSes we know of fold it to 128, currently.  We can jump it 
to 10240 now, or later when there are 20GHz CPUs.

If you want to make it more complicated, it would be more useful to 
be able to set the value lower for runtime environments where PG is 
competing for OS resources with another daemon that deserves higher 
priority.

Nathan Myers
ncm@zembu.com

Re: SOMAXCONN (was Re: Solaris source code)

From

Tom Lane

Date:

10 July 2001, 18:47:55

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Could we test SOMAXCONN and set PG_SOMAXCONN to 1000 only if SOMAXCONN1
> is less than 1000?

Why bother?

If you've got some plausible scenario where 1000 is too small, we could
just as easily make it 10000.  I don't see the need for yet another
configure test for this.
        regards, tom lane

Re: SOMAXCONN (was Re: Solaris source code)

From

Bruce Momjian

Date:

10 July 2001, 19:26:39

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Could we test SOMAXCONN and set PG_SOMAXCONN to 1000 only if SOMAXCONN1
> > is less than 1000?
> 
> Why bother?
> 
> If you've got some plausible scenario where 1000 is too small, we could
> just as easily make it 10000.  I don't see the need for yet another
> configure test for this.

I was thinking:
#if SOMAXCONN >= 1000#define PG_SOMAXCONN SOMAXCONN#else#define PG_SOMAXCONN 1000#endif

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: SOMAXCONN (was Re: Solaris source code)

From

Tom Lane

Date:

10 July 2001, 20:19:42

ncm@zembu.com (Nathan Myers) writes:
> All the OSes we know of fold it to 128, currently.  We can jump it 
> to 10240 now, or later when there are 20GHz CPUs.

> If you want to make it more complicated, it would be more useful to 
> be able to set the value lower for runtime environments where PG is 
> competing for OS resources with another daemon that deserves higher 
> priority.

Hmm, good point.  Does anyone have a feeling for the amount of kernel
resources that are actually sucked up by an accept-queue entry?  If 128
is the customary limit, is it actually worth worrying about whether
we are setting it to 128 vs. something smaller?
        regards, tom lane

Re: SOMAXCONN (was Re: Solaris source code)

From

ncm@zembu.com (Nathan Myers)

Date:

10 July 2001, 21:03:36

On Tue, Jul 10, 2001 at 06:36:21PM -0400, Tom Lane wrote:
> ncm@zembu.com (Nathan Myers) writes:
> > All the OSes we know of fold it to 128, currently.  We can jump it 
> > to 10240 now, or later when there are 20GHz CPUs.
> 
> > If you want to make it more complicated, it would be more useful to 
> > be able to set the value lower for runtime environments where PG is 
> > competing for OS resources with another daemon that deserves higher 
> > priority.
> 
> Hmm, good point.  Does anyone have a feeling for the amount of kernel
> resources that are actually sucked up by an accept-queue entry?  If 128
> is the customary limit, is it actually worth worrying about whether
> we are setting it to 128 vs. something smaller?

I don't think the issue is the resources that are consumed by the 
accept-queue entry.  Rather, it's a tuning knob to help shed load 
at the entry point to the system, before significant resources have 
been committed.  An administrator would tune it according to actual
system and traffic characteristics.

It is easy enough for somebody to change, if they care, that it seems 
to me we have already devoted it more time than it deserves right now.

Nathan Myers
ncm@zembu.com

Re: SOMAXCONN (was Re: Solaris source code)

From

Bruce Momjian

Date:

10 July 2001, 21:15:17

> ncm@zembu.com (Nathan Myers) writes:
> > All the OSes we know of fold it to 128, currently.  We can jump it 
> > to 10240 now, or later when there are 20GHz CPUs.
> 
> > If you want to make it more complicated, it would be more useful to 
> > be able to set the value lower for runtime environments where PG is 
> > competing for OS resources with another daemon that deserves higher 
> > priority.
> 
> Hmm, good point.  Does anyone have a feeling for the amount of kernel
> resources that are actually sucked up by an accept-queue entry?  If 128
> is the customary limit, is it actually worth worrying about whether
> we are setting it to 128 vs. something smaller?

All I can say is keep in mind that Solaris uses SVr4 streams, which are
quite a bit heavier than the BSD-based sockets.  I don't know any
numbers.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: SOMAXCONN (was Re: Solaris source code)

From

Ian Lance Taylor

Date:

10 July 2001, 21:18:03

Tom Lane <tgl@sss.pgh.pa.us> writes:

> ncm@zembu.com (Nathan Myers) writes:
> > If you want to make it more complicated, it would be more useful to 
> > be able to set the value lower for runtime environments where PG is 
> > competing for OS resources with another daemon that deserves higher 
> > priority.
> 
> Hmm, good point.  Does anyone have a feeling for the amount of kernel
> resources that are actually sucked up by an accept-queue entry?  If 128
> is the customary limit, is it actually worth worrying about whether
> we are setting it to 128 vs. something smaller?

Not much in the way of kernel resources is required by an entry on the
accept queue.  Basically a socket structure and maybe a couple of
addresses, typically about 200 bytes or so.

But I wouldn't worry about it, and I wouldn't worry about Nathan's
suggestion for making the limit configurable, because Postgres
connections don't spend time on the queue.  The postgres server will
be picking them off as fast as it can.  If the server can't pick
processes off fast enough, then your system has other problems;
reducing the size of the queue won't help those problems.  A large
queue will help when a large number of connections arrives
simultaneously--it will permit Postgres to deal them appropriately,
rather than causing the system to discard them on its terms.

(Matters might be different if the Postgres server were written to not
call accept when it had the maximum number of connections active, and
to just leave connections on the queue in that case.  But that's not
how it works today.)

Ian

---------------------------(end of broadcast)---------------------------
TIP 842: "When the only tool you have is a hammer, you tend to treat
everything as if it were a nail."
-- Abraham Maslow

Re: Re: SOMAXCONN (was Re: Solaris source code)

From

Tom Lane

Date:

11 July 2001, 11:54:36

Ian Lance Taylor <ian@zembu.com> writes:
> But I wouldn't worry about it, and I wouldn't worry about Nathan's
> suggestion for making the limit configurable, because Postgres
> connections don't spend time on the queue.  The postgres server will
> be picking them off as fast as it can.  If the server can't pick
> processes off fast enough, then your system has other problems;

Right.  Okay, it seems like just making it a hand-configurable entry
in config.h.in is good enough for now.  When and if we find that
that's inadequate in a real-world situation, we can improve on it...
        regards, tom lane

vacuum problems

From

Mark

Date:

11 July 2001, 11:59:07

Quick rundown of our configuration:
Red Hat 7.1 (no changes or extras added by us)
Postgresql 7.1.2 and CVS HEAD from 07/10/2001
3.8 gb database size

I included two pgsql versions because this happens on both.

Here's the problem we're having:

We run a vacuumdb from the server on the entire database.  Some large tables 
are vacuumed very quickly, but the vacuum process hangs or takes more than a 
few hours on a specific table (we haven't let it finish before).  The vacuum 
process works quickly on a table (loginhistory) with 2.8 million records, but 
is extremely slow on a table (inbox) with 1.1 million records (the table with 
1.1 million records is actually larger in kb size than the other table).

We've tried to vacuum the inbox table seperately ('vacuum inbox' within 
psql), but this still takes hours (again we have never let it complete, we 
need to use the database for development as well).

We noticed 2 things that are significant to this situatoin:
The server logs the following:


DEBUG:  --Relation msginbox--
DEBUG:  Pages 129921: Changed 26735, reaped 85786, Empty 0, New 0; Tup 
1129861: Vac 560327, Keep/VTL 0/0, Crash 0, UnUsed 51549, MinLen 100,
MaxLen 2032; Re-using: Free/Avail. Space 359061488/359059332;
EndEmpty/Avail. Pages 0/85785. CPU 11.18s/5.32u sec.
DEBUG:  Index msginbox_pkey: Pages 4749; Tuples 1129861: Deleted 76360.
CPU 0.47s/6.70u sec.
DEBUG:  Index msginbox_fromto: Pages 5978; Tuples 1129861: Deleted 0.
CPU 0.37s/6.15u sec.
DEBUG:  Index msginbox_search: Pages 4536; Tuples 1129861: Deleted 0.
CPU 0.32s/6.30u sec.
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES

the last few lines (XLogWrite .....) repeat for ever and ever and ever.  With 
7.1.2 this never stops unless we run out of disk space or cancel the query.  
With CVS HEAD this still continues, but the log files don't consume all disk 
space, but we still have to cancel it or it might run forever.

Perhaps we need to let it run until it completes, but we thought that we 
might be doing something wrong or have some data (we're converting data from 
MS SQL Server) that isn't friendly.

The major issue we're facing with this is that any read or write access to 
the table being vacuumed times out (obviously because the table is still 
locked).  We plan to use PostgreSQL in our production service, but we can't 
until we get this resolved.

We're at a loss, not being familiar enough with PostgreSQL and it's source 
code.  Can anyone please offer some advice or suggestions?

Thanks,

Mark

Re: Re: SOMAXCONN (was Re: Solaris source code)

From

Peter Eisentraut

Date:

11 July 2001, 12:13:14

Tom Lane writes:

> Right.  Okay, it seems like just making it a hand-configurable entry
> in config.h.in is good enough for now.  When and if we find that
> that's inadequate in a real-world situation, we can improve on it...

Would anything computed from the maximum number of allowed connections
make sense?

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter

Re: Re: SOMAXCONN (was Re: Solaris source code)

From

Tom Lane

Date:

11 July 2001, 12:45:25

Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane writes:
>> Right.  Okay, it seems like just making it a hand-configurable entry
>> in config.h.in is good enough for now.  When and if we find that
>> that's inadequate in a real-world situation, we can improve on it...

> Would anything computed from the maximum number of allowed connections
> make sense?

[ looks at code ... ]  Hmm, MaxBackends is indeed set before we arrive
at the listen(), so it'd be possible to use MaxBackends to compute the
parameter.  Offhand I would think that MaxBackends or at most
2*MaxBackends would be a reasonable value.

Question, though: is this better than having a hardwired constant?
The only case I can think of where it might not be is if some platform
out there throws an error from listen() when the parameter is too large
for it, rather than silently reducing the value to what it can handle.
A value set in config.h.in would be simpler to adapt for such a platform.

BTW, while I'm thinking about it: why doesn't pqcomm.c test for a
failure return from the listen() call?  Is this just an oversight,
or is there a good reason to ignore errors?
        regards, tom lane

Re: Re: SOMAXCONN (was Re: Solaris source code)

From

Bruce Momjian

Date:

11 July 2001, 13:27:25

> Peter Eisentraut <peter_e@gmx.net> writes:
> > Tom Lane writes:
> >> Right.  Okay, it seems like just making it a hand-configurable entry
> >> in config.h.in is good enough for now.  When and if we find that
> >> that's inadequate in a real-world situation, we can improve on it...
> 
> > Would anything computed from the maximum number of allowed connections
> > make sense?
> 
> [ looks at code ... ]  Hmm, MaxBackends is indeed set before we arrive
> at the listen(), so it'd be possible to use MaxBackends to compute the
> parameter.  Offhand I would think that MaxBackends or at most
> 2*MaxBackends would be a reasonable value.

Don't we have maxbackends configurable at runtime.  If so, any constant
we put in config.h will be inaccurate.  Seems we have to track
maxbackends.


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Re: SOMAXCONN (was Re: Solaris source code)

From

Tom Lane

Date:

11 July 2001, 13:41:16

Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Don't we have maxbackends configurable at runtime.

Not after postmaster start, so passing it to the initial listen()
shouldn't be a problem.

The other concern I had could be addressed by making the listen
parameter be MIN(MaxBackends, PG_SOMAXCONN) where PG_SOMAXCONN
is set in config.h --- but now we could make the default value
really large, say 10000.  The only reason to change it would be
if you had a kernel that barfed on large listen() parameters.

Have we beat this issue to death yet, or is it still twitching?
        regards, tom lane

Re: Re: SOMAXCONN (was Re: Solaris source code)

From

Bruce Momjian

Date:

11 July 2001, 13:52:14

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Don't we have maxbackends configurable at runtime.
> 
> Not after postmaster start, so passing it to the initial listen()
> shouldn't be a problem.
> 
> The other concern I had could be addressed by making the listen
> parameter be MIN(MaxBackends, PG_SOMAXCONN) where PG_SOMAXCONN
> is set in config.h --- but now we could make the default value
> really large, say 10000.  The only reason to change it would be
> if you had a kernel that barfed on large listen() parameters.

Sounds good to me.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Re: SOMAXCONN (was Re: Solaris source code)

From

Peter Eisentraut

Date:

11 July 2001, 14:38:31

Tom Lane writes:

> The other concern I had could be addressed by making the listen
> parameter be MIN(MaxBackends, PG_SOMAXCONN) where PG_SOMAXCONN
> is set in config.h --- but now we could make the default value
> really large, say 10000.  The only reason to change it would be
> if you had a kernel that barfed on large listen() parameters.

We'll never find that out if we don't try it.  If you're concerned about
cooperating with other listen()ing processes, set it to MaxBackends * 2,
if you're not, set it to INT_MAX and watch.

-- 
Peter Eisentraut   peter_e@gmx.net   http://funkturm.homeip.net/~peter

Re: vacuum problems

From

Mark

Date:

11 July 2001, 18:34:48

We increased shared memory in the linux kernel, which decreased the vacuumdb 
time from 40 minutes to 14 minutes on a 450 mhz processor.  We calculate that 
on our dual 1ghz box with ghz ethernet san connection this will go down to 
under 5 minutes.  This is acceptable to us.  Sorry about the unnecessary post.

On Wednesday 11 July 2001 09:16, Mark wrote:
> Quick rundown of our configuration:
> Red Hat 7.1 (no changes or extras added by us)
> Postgresql 7.1.2 and CVS HEAD from 07/10/2001
> 3.8 gb database size
>
> I included two pgsql versions because this happens on both.
>
> Here's the problem we're having:
>
> We run a vacuumdb from the server on the entire database.  Some large
> tables are vacuumed very quickly, but the vacuum process hangs or takes
> more than a few hours on a specific table (we haven't let it finish
> before).  The vacuum process works quickly on a table (loginhistory) with
> 2.8 million records, but is extremely slow on a table (inbox) with 1.1
> million records (the table with 1.1 million records is actually larger in
> kb size than the other table).
>
> We've tried to vacuum the inbox table seperately ('vacuum inbox' within
> psql), but this still takes hours (again we have never let it complete, we
> need to use the database for development as well).
>
> We noticed 2 things that are significant to this situatoin:
> The server logs the following:
>
>
> DEBUG:  --Relation msginbox--
> DEBUG:  Pages 129921: Changed 26735, reaped 85786, Empty 0, New 0; Tup
> 1129861: Vac 560327, Keep/VTL 0/0, Crash 0, UnUsed 51549, MinLen 100,
> MaxLen 2032; Re-using: Free/Avail. Space 359061488/359059332;
> EndEmpty/Avail. Pages 0/85785. CPU 11.18s/5.32u sec.
> DEBUG:  Index msginbox_pkey: Pages 4749; Tuples 1129861: Deleted 76360.
> CPU 0.47s/6.70u sec.
> DEBUG:  Index msginbox_fromto: Pages 5978; Tuples 1129861: Deleted 0.
> CPU 0.37s/6.15u sec.
> DEBUG:  Index msginbox_search: Pages 4536; Tuples 1129861: Deleted 0.
> CPU 0.32s/6.30u sec.
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
> DEBUG:  XLogWrite: new log file created - consider increasing WAL_FILES
>
> the last few lines (XLogWrite .....) repeat for ever and ever and ever. 
> With 7.1.2 this never stops unless we run out of disk space or cancel the
> query. With CVS HEAD this still continues, but the log files don't consume
> all disk space, but we still have to cancel it or it might run forever.
>
> Perhaps we need to let it run until it completes, but we thought that we
> might be doing something wrong or have some data (we're converting data
> from MS SQL Server) that isn't friendly.
>
> The major issue we're facing with this is that any read or write access to
> the table being vacuumed times out (obviously because the table is still
> locked).  We plan to use PostgreSQL in our production service, but we can't
> until we get this resolved.
>
> We're at a loss, not being familiar enough with PostgreSQL and it's source
> code.  Can anyone please offer some advice or suggestions?
>
> Thanks,
>
> Mark
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: Re: SOMAXCONN (was Re: Solaris source code)

From

ncm@zembu.com (Nathan Myers)

Date:

11 July 2001, 19:24:37

On Wed, Jul 11, 2001 at 12:26:43PM -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > Tom Lane writes:
> >> Right.  Okay, it seems like just making it a hand-configurable entry
> >> in config.h.in is good enough for now.  When and if we find that
> >> that's inadequate in a real-world situation, we can improve on it...
> 
> > Would anything computed from the maximum number of allowed connections
> > make sense?
> 
> [ looks at code ... ]  Hmm, MaxBackends is indeed set before we arrive
> at the listen(), so it'd be possible to use MaxBackends to compute the
> parameter.  Offhand I would think that MaxBackends or at most
> 2*MaxBackends would be a reasonable value.
>
> Question, though: is this better than having a hardwired constant?
> The only case I can think of where it might not be is if some platform
> out there throws an error from listen() when the parameter is too large
> for it, rather than silently reducing the value to what it can handle.
> A value set in config.h.in would be simpler to adapt for such a platform.

The question is really whether you ever want a client to get a
"rejected" result from an open attempt, or whether you'd rather they 
got a report from the back end telling them they can't log in.  The 
second is more polite but a lot more expensive.  That expense might 
really matter if you have MaxBackends already running.

I doubt most clients have tested either failure case more thoroughly 
than the other (or at all), but the lower-level code is more likely 
to have been cut-and-pasted from well-tested code. :-)

Maybe PG should avoid accept()ing connections once it has MaxBackends
back ends already running (as hinted at by Ian), so that the listen()
parameter actually has some meaningful effect, and excess connections 
can be rejected more cheaply.  That might also make it easier to respond 
more adaptively to true load than we do now.

> BTW, while I'm thinking about it: why doesn't pqcomm.c test for a
> failure return from the listen() call?  Is this just an oversight,
> or is there a good reason to ignore errors?

The failure of listen() seems impossible.  In the Linux, NetBSD, and 
Solaris man pages, none of the error returns mentioned are possible 
with PG's current use of the function.  It seems as if the most that 
might be needed now would be to add a comment to the call to socket() 
noting that if any other address families are supported (besides 
AF_INET and AF_LOCAL aka AF_UNIX), the call to listen() might need to 
be looked at.  AF_INET6 (which PG will need to support someday)
doesn't seem to change matters.

Probably if listen() did fail, then one or other of bind(), accept(),
and read() would fail too.

Nathan Myers
ncm@zembu.com