Thread: Re: sick DB - ??
As a followup - the line from top: 1641 postgres 105 0 2684K 1384K CPU1 0 8:26 99.02% 99.02% postgres As you can see, it's barely taking up any RAM - the process is going nuts right off the bat.. On Wed, 18 Jul 2001, Pete Leonard wrote: > > Postgres 7.1.2, FreeBSD 3.4 > > Box got sick, had to bounce it. Postgres wasn't brought down in a > graceful fashion.. > > restart didn't bring the DB back properly, so as the postgres user, did > the following: > > /usr/local/pgsql/bin/postmaster -d5 start > > it dumps the initial environment variables, and then returns nothing. CPU > is pegged at 100%. No reporting, no information as to what's happening. > > Solutions? It the DB corrupted badly? Where do I go from here? > > thanks, > > --pete > > >
Followup ^2 - The reason this happened was that for whatever reason (we're still investigating), /tmp was writeable only by root. I only noticed this when using initdb to create a new data directory. postmaster offered no suggestion that there was a problem here, even when running at -d5. chmod 777 /tmp fixed everything. my best guess (I don't know how postmaster is operating, I didn't run any of the system-level diagnostic tools to check) is that if postmaster fails on opening a pipe/tmpfile, rather than check the error properly, it changes the filename and tries again ad infinitum? Perhaps printing some error code (especially at debug level 5) would help? thanks, --pete On Wed, 18 Jul 2001, Pete Leonard wrote: > > As a followup - the line from top: > > 1641 postgres 105 0 2684K 1384K CPU1 0 8:26 99.02% 99.02% > postgres > > As you can see, it's barely taking up any RAM - the process is going nuts > right off the bat.. > > On Wed, 18 Jul 2001, Pete Leonard wrote: > > > > > Postgres 7.1.2, FreeBSD 3.4 > > > > Box got sick, had to bounce it. Postgres wasn't brought down in a > > graceful fashion.. > > > > restart didn't bring the DB back properly, so as the postgres user, did > > the following: > > > > /usr/local/pgsql/bin/postmaster -d5 start > > > > it dumps the initial environment variables, and then returns nothing. CPU > > is pegged at 100%. No reporting, no information as to what's happening. > > > > Solutions? It the DB corrupted badly? Where do I go from here? > > > > thanks, > > > > --pete > > > > > > > >
Pete Leonard <pete@hero.com> writes: >> restart didn't bring the DB back properly, so as the postgres user, did >> the following: >> /usr/local/pgsql/bin/postmaster -d5 start >> it dumps the initial environment variables, and then returns nothing. CPU >> is pegged at 100%. No reporting, no information as to what's happening. This is kind of a random guess, but we recently noticed that 7.1 has a bug whereby the postmaster can go into an infinite loop at startup if the $PGDATA directory is not writable. Check permissions. It might also be a good idea to remove the old postmaster.pid file by hand. regards, tom lane
On Wed, Jul 18, 2001 at 09:36:38AM -0700, Pete Leonard wrote: > chmod 777 /tmp fixed everything. That should be 1777. mrc -- Mike Castle dalgoda@ix.netcom.com www.netcom.com/~dalgoda/ We are all of us living in the shadow of Manhattan. -- Watchmen fatal ("You are in a maze of twisty compiler features, all different"); -- gcc
Pete Leonard <pete@hero.com> writes: > The reason this happened was that for whatever reason (we're still > investigating), /tmp was writeable only by root. Ah. Hadn't thought about it before, but the infinite-loop-on- nonwritable-$PGDATA bug would also trigger for nonwritable /tmp. (The bug was actually in CreateLockFile, which is used both to create a lockfile in $PGDATA and one in /tmp. Sigh.) This is fixed in current sources. If we were going to do a 7.1.3 then I'd backpatch the fix into the REL7_1 branch, but at this point I suspect there won't be a 7.1.3 --- we'll probably go into 7.2 beta in another five or six weeks, so there's not much point. regards, tom lane