Thread: backend crashing on NetBSD 1.3.2/i386
Until earlier this week, the various snapshots have been working fine on my system, NetBSD 1.3.2/i386. As of a couple of days ago the backend started to crash. I hoped this was a temporary glitch with recent patches which would disappear in a day or so as other stuff got sorted out. Nothing seems to have changed, though, over the last several days, and connections fail with the backend crashing. Everything seems to compile fine; the only warnings during backend compilation are given below. The select warning in s_lock.c requires an #include <unistd.h> to remove it, but that doesn't fix the crashing problem. I would suggest a patch for that, but I'm not sure what systems have unistd.h and what don't, so I'm not sure if the obvious thing of putting that line in s_lock.c is the right thing to do. Can anyone with more experience tracking down crashing backends give some guidance? I hate to see 6.4 shipped with one of the supported backends crashing! Unfortunately, I'm not sure where to look. I didn't notice any suspicious patches coming through in the last few days, but I don't see everything that is committed. I'm also not sure if the NetBSD/vax patches could have affected NetBSD/i386 stuff. Any help is greatly appreciated! Cheers, Brook =========================================================================== Warnings found during backend compile; directories noted, but lots of commands deleted. In the past I have seen warnings about some of the bison/lexer stuff, so I tend to ignore them; I'm not sure if these are different than the "normal" warnings. gmake[2]: Entering directory `/usr/pkgsrc-local/databases/postgresql-current/work/pgsql/src/backend/bootstrap' gcc-I../../include -I../../backend -I/usr/pkg/include -I/usr/pkg/include/tcl8.0 -I/usr/pkg/include/tk8.0 -O2 -pipe -Wall-Wmissing-prototypes -I.. -Wno-error -c bootparse.c -o bootparse.o /usr/pkg/share/bison.simple: In function `Int_yyparse': /usr/pkg/share/bison.simple:327: warning: implicit declaration of function `Int_yyerror' /usr/pkg/share/bison.simple:387:warning: implicit declaration of function `Int_yylex' gcc -I../../include -I../../backend -I/usr/pkg/include -I/usr/pkg/include/tcl8.0 -I/usr/pkg/include/tk8.0 -O2 -pipe -Wall -Wmissing-prototypes-I.. -Wno-error -c bootscanner.c -o bootscanner.o lex.Int_yy.c:683: warning: no previous prototypefor `Int_yylex' bootscanner.l:137: warning: no previous prototype for `Int_yyerror' gmake[2]: Entering directory `/usr/pkgsrc-local/databases/postgresql-current/work/pgsql/src/backend/parser' gcc -I../../include-I../../backend -I/usr/pkg/include -I/usr/pkg/include/tcl8.0 -I/usr/pkg/include/tk8.0 -O2 -pipe -Wall -Wmissing-prototypes-I.. -Wno-error -c gram.c -o gram.o /usr/pkg/share/bison.simple: In function `yyparse': /usr/pkg/share/bison.simple:327:warning: implicit declaration of function `yyerror' /usr/pkg/share/bison.simple:387: warning:implicit declaration of function `yylex' gcc -I../../include -I../../backend -I/usr/pkg/include -I/usr/pkg/include/tcl8.0-I/usr/pkg/include/tk8.0 -O2 -pipe -Wall -Wmissing-prototypes -I.. -Wno-error -c scan.c -o scan.o lex.yy.c:820: warning: no previous prototype for `yylex' scan.l:426: warning: no previous prototype for `yyerror' lex.yy.c:2174: warning: `yy_flex_realloc' defined but not used gmake[3]: Entering directory `/usr/pkgsrc-local/databases/postgresql-current/work/pgsql/src/backend/storage/buffer' gcc -I../../../include -I../../../backend -I/usr/pkg/include -I/usr/pkg/include/tcl8.0 -I/usr/pkg/include/tk8.0 -O2 -pipe -Wall -Wmissing-prototypes -I../.. -c s_lock.c -o s_lock.o s_lock.c: In function `s_lock': s_lock.c:70: warning:implicit declaration of function `select' gmake[2]: Entering directory `/usr/pkgsrc-local/databases/postgresql-current/work/pgsql/src/backend/utils' gcc -I../../../include-I../../../backend -I/usr/pkg/include -I/usr/pkg/include/tcl8.0 -I/usr/pkg/include/tk8.0 -O2 -pipe -Wall-Wmissing-prototypes -I../.. -c network.c -o network.o network.c: In function `network_network': network.c:392:warning: unused variable `ptr' gcc -I../../../include -I../../../backend -I/usr/pkg/include -I/usr/pkg/include/tcl8.0-I/usr/pkg/include/tk8.0 -O2 -pipe -Wall -Wmissing-prototypes -I../.. -c inet_net_ntop.c -o inet_net_ntop.o inet_net_ntop.c: In function `inet_net_ntop_ipv4': inet_net_ntop.c:192: warning: unused variable `m' gcc -I../../../include -I../../../backend -I/usr/pkg/include -I/usr/pkg/include/tcl8.0 -I/usr/pkg/include/tk8.0 -O2-pipe -Wall -Wmissing-prototypes -I../.. -c inet_net_pton.c -o inet_net_pton.o inet_net_pton.c: In function `inet_cidr_pton_ipv4': inet_net_pton.c:104: warning: `tmp' might be used uninitialized in this function gmake[3]: Entering directory `/usr/pkgsrc-local/databases/postgresql-current/work/pgsql/src/backend/utils/fmgr' gcc-I../../../include -I../../../backend -I/usr/pkg/include -I/usr/pkg/include/tcl8.0 -I/usr/pkg/include/tk8.0 -O2 -pipe -Wall -Wmissing-prototypes -I../.. -c dfmgr.c -o dfmgr.o dfmgr.c:283: warning: no previous prototype for `trigger_dynamic'
> Until earlier this week, the various snapshots have been working fine > on my system, NetBSD 1.3.2/i386. As of a couple of days ago the > backend started to crash. I hoped this was a temporary glitch with > recent patches which would disappear in a day or so as other stuff got > sorted out. Nothing seems to have changed, though, over the last > several days, and connections fail with the backend crashing. You have to run the backend using gdb, and get a backtrace of the crash, or check the postmaster logs for any information. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Brook Milligan <brook@trillium.NMSU.Edu> writes: > Until earlier this week, the various snapshots have been working fine > on my system, NetBSD 1.3.2/i386. As of a couple of days ago the > backend started to crash. When did this start, exactly? > I hoped this was a temporary glitch with > recent patches which would disappear in a day or so as other stuff got > sorted out. Nothing seems to have changed, though, over the last > several days, and connections fail with the backend crashing. A reproducible crash at startup ought to be pretty easy to nail down. Does it produce a corefile? If so fire up gdb and get a backtrace so we can see where the crash occurs. (The trace would give more info if you compiled with -g, but even without it would be helpful.) One thing that comes to mind quickly is that some of the changes this week required an initdb to be fully effective. If you forgot the initdb maybe a crash at startup would result; I'm not sure. regards, tom lane
I'm running an installation on NetBSD/i386 1.3.2 that I upgraded (using 'cvs update') just a couple of hours ago, and it's behaving just fine. I've loaded a few megabytes of data into it, and done a bit of updating and querying, with no problems. Tom Lane <tgl@sss.pgh.pa.us> writes: > One thing that comes to mind quickly is that some of the changes this > week required an initdb to be fully effective. If you forgot the initdb > maybe a crash at startup would result; I'm not sure. That may be it. I always use pg_dump and initdb when I update the systems I run a current snapshot of PostgreSQL on, just to be sure. -tih -- Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"
You have to run the backend using gdb, and get a backtrace of the crash, or check the postmaster logs for any information. No core dump that I can find. But, the logs are reporting errors from semget in ipc.c. Specifically, it is complaining of not enough space, but I have no full filesystems. This stuff is part of the locking code isn't it? Did that get tweaked with the NetBSD/vax patches recently? Cheers, Brook
> Until earlier this week, the various snapshots have been working fine > on my system, NetBSD 1.3.2/i386. As of a coupleof days ago the > backend started to crash. When did this start, exactly? Sometime between Monday and Thursday, but Marc just trashed the old BETA* files so I can't go back and check. I did manage to catch the BETA3 just before it disappeared and it exhibits the problem. One thing that comes to mind quickly is that some of the changes this week required an initdb to be fully effective. If you forgot the initdb maybe a crash at startup would result; I'm not sure. Nope. For testing the new version I've been doing a clean install every time, including initdb and everything. Cheers, Brook
I'm running an installation on NetBSD/i386 1.3.2 that I upgraded (using 'cvs update') just a couple of hours ago, andit's behaving just fine. I've loaded a few megabytes of data into it, and done a bit of updating and querying, withno problems. Do you see the warnings about select() when compiling s_lock.c? Cheers, Brook
On Sun, 1 Nov 1998, Brook Milligan wrote: > You have to run the backend using gdb, and get a backtrace of the crash, > or check the postmaster logs for any information. > > No core dump that I can find. But, the logs are reporting errors from > semget in ipc.c. Specifically, it is complaining of not enough space, > but I have no full filesystems. This stuff is part of the locking > code isn't it? Did that get tweaked with the NetBSD/vax patches > recently? semget deals with shared memory, not file systems...under FreeBSD, you do: %ipcs Message Queues: T ID KEY MODE OWNER GROUP Shared Memory: T ID KEY MODE OWNER GROUP m 131072 2063597841 --rw-rw-rw- scrappy staff Semaphores: T ID KEY MODE OWNER GROUP To see what is being used. ipcrm to remove 'stale' handles...sounds like your system isn't releasing when you kill of postgres daemon... Marc G. Fournier Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
semget deals with shared memory, not file systems...under FreeBSD, There error message said "No space on device" because semget returned ENOSPC, hence my initial confusion about what was going on. To see what is being used. ipcrm to remove 'stale' handles...sounds like your system isn't releasing when you kill of postgres daemon... That was it! Thanks, Marc. I have no idea where the extra semaphores came from (they weren't owned by pgsql), but everything works again after they were deleted (that is BETA5 passes regression on NetBSD 1.3.2/i386). Does a kill signal to the postmaster prevent the cleanup of these? Should postmasters be killed with HUP? Sorry for the diversion. Now back to your regularly scheduled release. :) Thanks for the help. Cheers, Brook
Brook Milligan <brook@trillium.NMSU.Edu> writes: > Do you see the warnings about select() when compiling s_lock.c? Yup. -tih -- Popularity is the hallmark of mediocrity. --Niles Crane, "Frasier"