Thread: 7.4 Crashed... Why?

7.4 Crashed... Why?

From
Hunter Hillegas
Date:
Looks like my copy of 7.4, which seems to have been running fine until now,
crashed last night at about 1am.

All my alarms went off and I got it started again, but I'd like to know what
happened so I can sleep soundly. :-)

Here's the serverlog entry:

LOG:  recycled transaction log file "0000000000000028"
FATAL:  lock file "/usr/local/pgsql/data/postmaster.pid" already exists
HINT:  Is another postmaster (PID 1010) running in data directory
"/usr/local/pgsql/data"?

Not much. Looks like it tried to restart itself, found the old pid file and
crapped out... Or something. Why would it restart itself?

Any ideas?

We're running on RedHat 7.2.

Thanks,
Hunter


Re: 7.4 Crashed... Why?

From
Tom Lane
Date:
Hunter Hillegas <lists@lastonepicked.com> writes:
> Here's the serverlog entry:
> LOG:  recycled transaction log file "0000000000000028"
> FATAL:  lock file "/usr/local/pgsql/data/postmaster.pid" already exists
> HINT:  Is another postmaster (PID 1010) running in data directory
> "/usr/local/pgsql/data"?

> Not much. Looks like it tried to restart itself, found the old pid file and
> crapped out... Or something. Why would it restart itself?

The postmaster *never* restarts itself.  What the above looks like to me
is some random script decided to try to start a new postmaster, and the
new postmaster quite properly refused to do anything because there
already was a running postmaster.  You should look into your cron jobs
and see what sort of interesting stuff might lurk there.

            regards, tom lane

Re: 7.4 Crashed... Why?

From
Hunter Hillegas
Date:
Thanks Tom.

Good to know that postmaster doesn't restart itself.

I did find a cron job that was running in the suspect time... But all it
does is the following:

DATE=`date +%Y%m%d`
DB1=/root/database_backup/db1_db.$DATE
su - postgres -c "/usr/local/pgsql/bin/pg_dump db1" >> $DB1
gzip $DB1

Is it possible this could cause some strange behavior? This backup script
has been running for a year (every night) w/o any trouble.

Very strange.

> From: Tom Lane <tgl@sss.pgh.pa.us>
> Date: Fri, 05 Dec 2003 22:18:33 -0500
> To: Hunter Hillegas <lists@lastonepicked.com>
> Cc: PostgreSQL <pgsql-general@postgresql.org>
> Subject: Re: [GENERAL] 7.4 Crashed... Why?
>
> Hunter Hillegas <lists@lastonepicked.com> writes:
>> Here's the serverlog entry:
>> LOG:  recycled transaction log file "0000000000000028"
>> FATAL:  lock file "/usr/local/pgsql/data/postmaster.pid" already exists
>> HINT:  Is another postmaster (PID 1010) running in data directory
>> "/usr/local/pgsql/data"?
>
>> Not much. Looks like it tried to restart itself, found the old pid file and
>> crapped out... Or something. Why would it restart itself?
>
> The postmaster *never* restarts itself.  What the above looks like to me
> is some random script decided to try to start a new postmaster, and the
> new postmaster quite properly refused to do anything because there
> already was a running postmaster.  You should look into your cron jobs
> and see what sort of interesting stuff might lurk there.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>              http://www.postgresql.org/docs/faqs/FAQ.html


Re: 7.4 Crashed... Why?

From
Tom Lane
Date:
Hunter Hillegas <lists@lastonepicked.com> writes:
> I did find a cron job that was running in the suspect time... But all it
> does is the following:
> DATE=`date +%Y%m%d`
> DB1=/root/database_backup/db1_db.$DATE
> su - postgres -c "/usr/local/pgsql/bin/pg_dump db1" >> $DB1
> gzip $DB1
> Is it possible this could cause some strange behavior? This backup script
> has been running for a year (every night) w/o any trouble.

The cron script itself certainly looks unexceptional.  But if "su - postgres"
executes postgres' ~/.profile or other shell-startup scripts (I think it
does so on some platforms but not others), maybe you had some weird
behavior recently added to those scripts?

            regards, tom lane

Re: 7.4 Crashed... Why?

From
Hunter Hillegas
Date:
Verified that no shell startup scripts are running... And the backup
proceeded normally last night with no crash.

I'm starting to think that I'm not going to be able to track this down and
have to hope it doesn't happen again.

> From: Tom Lane <tgl@sss.pgh.pa.us>
> Date: Sat, 06 Dec 2003 20:36:39 -0500
> To: Hunter Hillegas <lists@lastonepicked.com>
> Cc: PostgreSQL <pgsql-general@postgresql.org>
> Subject: Re: [GENERAL] 7.4 Crashed... Why?
>
> Hunter Hillegas <lists@lastonepicked.com> writes:
>> I did find a cron job that was running in the suspect time... But all it
>> does is the following:
>> DATE=`date +%Y%m%d`
>> DB1=/root/database_backup/db1_db.$DATE
>> su - postgres -c "/usr/local/pgsql/bin/pg_dump db1" >> $DB1
>> gzip $DB1
>> Is it possible this could cause some strange behavior? This backup script
>> has been running for a year (every night) w/o any trouble.
>
> The cron script itself certainly looks unexceptional.  But if "su - postgres"
> executes postgres' ~/.profile or other shell-startup scripts (I think it
> does so on some platforms but not others), maybe you had some weird
> behavior recently added to those scripts?
>
> regards, tom lane