Home > mailing lists

Thread: Database server restarting

Database server restarting

From

"shoaib"

Date:

04 May 2003, 22:57:00

Hello Everybody,

We are using postgressql 7.2.2 . our system running is 24 hours day it a preventive reboot once a day.some time I am getting this error and after it the sytem hang .Can any body help in this.

DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: pq_recvbuf: unexpected EOF on client connection
DEBUG: database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG: checkpoint record is at 3/85EA18B0
DEBUG: redo record is at 3/85EA18B0; undo record is at 0/0; shutdown FALSE
DEBUG: next transaction id: 4111285; next oid: 7557242
DEBUG: database system was not properly shut down; automatic recovery in progress
DEBUG: ReadRecord: record with zero length at 3/85EA18F0
DEBUG: redo is not required
DEBUG: recycled transaction log file 0000000300000083
DEBUG: recycled transaction log file 0000000300000084
DEBUG: database system is ready
DEBUG: pq_recvbuf: unexpected EOF on client connection

Regards

Shoaib

Re: Database server restarting

From

"Nigel J. Andrews"

Date:

05 May 2003, 07:08:19

On Mon, 5 May 2003, shoaib wrote:

> Hello Everybody,
>
> We are using postgressql 7.2.2 . our system running is 24 hours day it a
> preventive reboot once a day.

Odd concept. What is this reboot preventing?

> some time I am getting this error and after
> it the sytem hang .Can any body help in this.
>
> DEBUG:  pq_recvbuf: unexpected EOF on client connection
> DEBUG:  pq_recvbuf: unexpected EOF on client connection
> DEBUG:  pq_recvbuf: unexpected EOF on client connection
> DEBUG:  pq_recvbuf: unexpected EOF on client connection
> DEBUG:  database system was interrupted at 2003-05-03 04:17:19 SGT
> DEBUG:  checkpoint record is at 3/85EA18B0
> DEBUG:  redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
> FALSE
> DEBUG:  next transaction id: 4111285; next oid: 7557242
> DEBUG:  database system was not properly shut down; automatic recovery
> in progress

It looks like your preventative daily reboot is not preventing the problems it
is causing. It is possible that the postmaster is not being shutdown properly
because, for example, there is a client still connected and the shutdown script
isn't forcing a fast shutdown. See pg_ctl manpage for infomation on the
switches.

As for worrying about the messages, there's no real error message in there,
aside from the 'EOF on client connection', just the normal messages on start up
from a bad shutdown. If you're worried, I would look at solving whatever the
answer to the daily reboot question shows is the problem.

> DEBUG:  ReadRecord: record with zero length at 3/85EA18F0
> DEBUG:  redo is not required
> DEBUG:  recycled transaction log file 0000000300000083
> DEBUG:  recycled transaction log file 0000000300000084
> DEBUG:  database system is ready
> DEBUG:  pq_recvbuf: unexpected EOF on client connection
>
>
> Regards
>
> Shoaib
>
>

--
Nigel J. Andrews

Re: Database server restarting

From

"shoaib"

Date:

05 May 2003, 07:19:24

Thanks a lot for your prompt reply.
We are rebooting the server for cleaning up the buffers of the
system.Before rebooting I will shutdown database server.Can you provide
any futher clue why suddenly at 4.17 aM it restarted.Our preventive
maintenance run at 1 AM.
And another process of Reading data from some flat files and updating it
to database ended at 4.13 AM on the same day.
Your help is really appreciated.

Regards
Shoaib

-----Original Message-----
From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
Sent: Monday, May 05, 2003 7:08 PM
To: shoaib
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting

On Mon, 5 May 2003, shoaib wrote:

> Hello Everybody,
>
> We are using postgressql 7.2.2 . our system running is 24 hours day it
a
> preventive reboot once a day.

Odd concept. What is this reboot preventing?

> some time I am getting this error and after
> it the sytem hang .Can any body help in this.
>
> DEBUG:  pq_recvbuf: unexpected EOF on client connection
> DEBUG:  pq_recvbuf: unexpected EOF on client connection
> DEBUG:  pq_recvbuf: unexpected EOF on client connection
> DEBUG:  pq_recvbuf: unexpected EOF on client connection
> DEBUG:  database system was interrupted at 2003-05-03 04:17:19 SGT
> DEBUG:  checkpoint record is at 3/85EA18B0
> DEBUG:  redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
> FALSE
> DEBUG:  next transaction id: 4111285; next oid: 7557242
> DEBUG:  database system was not properly shut down; automatic recovery
> in progress

It looks like your preventative daily reboot is not preventing the
problems it
is causing. It is possible that the postmaster is not being shutdown
properly
because, for example, there is a client still connected and the shutdown
script
isn't forcing a fast shutdown. See pg_ctl manpage for infomation on the
switches.

As for worrying about the messages, there's no real error message in
there,
aside from the 'EOF on client connection', just the normal messages on
start up
from a bad shutdown. If you're worried, I would look at solving whatever
the
answer to the daily reboot question shows is the problem.

> DEBUG:  ReadRecord: record with zero length at 3/85EA18F0
> DEBUG:  redo is not required
> DEBUG:  recycled transaction log file 0000000300000083
> DEBUG:  recycled transaction log file 0000000300000084
> DEBUG:  database system is ready
> DEBUG:  pq_recvbuf: unexpected EOF on client connection
>
>
> Regards
>
> Shoaib
>
>

--
Nigel J. Andrews

Re: Database server restarting

From

"Nigel J. Andrews"

Date:

05 May 2003, 07:41:45

On Mon, 5 May 2003, shoaib wrote:

> Thanks a lot for your prompt reply.
> We are rebooting the server for cleaning up the buffers of the
> system.Before rebooting I will shutdown database server.Can you provide
> any futher clue why suddenly at 4.17 aM it restarted.Our preventive
> maintenance run at 1 AM.
> And another process of Reading data from some flat files and updating it
> to database ended at 4.13 AM on the same day.

Hmmm...I assumed the 4:17 was from the scheduled reboot. It's a more difficult
issue if that was from the postmaster exiting by itself. Did the data loading
process end normally? It's a good few minutes but in the scheme of things 4
minutes for the postmaster to be restarted automatically may be isn't such a
long time.

I'm still drawn to this daily reboot process though. You do it to clean up the
system buffers. Why? Is there perhaps some instability in the system if the
system uses lots of memory? What is the hardware/os? Have you run hardware
diagnostics? If it's Intel/PC like there is a program called memtest86 which is
good at checking the memory. Be warned though, if you need that 24 hour up time
to run memtest86 properly you're going to lose a good few hours.


> -----Original Message-----
> From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
> Sent: Monday, May 05, 2003 7:08 PM
> To: shoaib
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Database server restarting
>
> On Mon, 5 May 2003, shoaib wrote:
>
> > Hello Everybody,
> >
> > We are using postgressql 7.2.2 . our system running is 24 hours day it
> a
> > preventive reboot once a day.
>
> Odd concept. What is this reboot preventing?
>
>
> > some time I am getting this error and after
> > it the sytem hang .Can any body help in this.
> >
> > DEBUG:  pq_recvbuf: unexpected EOF on client connection
> > DEBUG:  pq_recvbuf: unexpected EOF on client connection
> > DEBUG:  pq_recvbuf: unexpected EOF on client connection
> > DEBUG:  pq_recvbuf: unexpected EOF on client connection
> > DEBUG:  database system was interrupted at 2003-05-03 04:17:19 SGT
> > DEBUG:  checkpoint record is at 3/85EA18B0
> > DEBUG:  redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
> > FALSE
> > DEBUG:  next transaction id: 4111285; next oid: 7557242
> > DEBUG:  database system was not properly shut down; automatic recovery
> > in progress
>
> It looks like your preventative daily reboot is not preventing the
> problems it
> is causing. It is possible that the postmaster is not being shutdown
> properly
> because, for example, there is a client still connected and the shutdown
> script
> isn't forcing a fast shutdown. See pg_ctl manpage for infomation on the
> switches.
>
> As for worrying about the messages, there's no real error message in
> there,
> aside from the 'EOF on client connection', just the normal messages on
> start up
> from a bad shutdown. If you're worried, I would look at solving whatever
> the
> answer to the daily reboot question shows is the problem.
>
>
> > DEBUG:  ReadRecord: record with zero length at 3/85EA18F0
> > DEBUG:  redo is not required
> > DEBUG:  recycled transaction log file 0000000300000083
> > DEBUG:  recycled transaction log file 0000000300000084
> > DEBUG:  database system is ready
> > DEBUG:  pq_recvbuf: unexpected EOF on client connection
> >
> >
> > Regards
> >
> > Shoaib
> >
> >
>
>

--
Nigel J. Andrews

Re: Database server restarting

From

Dennis Gearon

Date:

05 May 2003, 11:12:33

Modern OS's shouldn' need rebooting, unless something else is wrong. What's the quality of your hardware? Any
applicationscompiled on bad hardware? 

sigh, is it a windows environment?

shoaib wrote:
> Thanks a lot for your prompt reply.
> We are rebooting the server for cleaning up the buffers of the
> system.Before rebooting I will shutdown database server.Can you provide
> any futher clue why suddenly at 4.17 aM it restarted.Our preventive
> maintenance run at 1 AM.
> And another process of Reading data from some flat files and updating it
> to database ended at 4.13 AM on the same day.
> Your help is really appreciated.
>
> Regards
> Shoaib
>
> -----Original Message-----
> From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
> Sent: Monday, May 05, 2003 7:08 PM
> To: shoaib
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Database server restarting
>
> On Mon, 5 May 2003, shoaib wrote:
>
>
>>Hello Everybody,
>>
>>We are using postgressql 7.2.2 . our system running is 24 hours day it
>
> a
>
>>preventive reboot once a day.
>
>
> Odd concept. What is this reboot preventing?
>
>
>
>>some time I am getting this error and after
>>it the sytem hang .Can any body help in this.
>>
>>DEBUG:  pq_recvbuf: unexpected EOF on client connection
>>DEBUG:  pq_recvbuf: unexpected EOF on client connection
>>DEBUG:  pq_recvbuf: unexpected EOF on client connection
>>DEBUG:  pq_recvbuf: unexpected EOF on client connection
>>DEBUG:  database system was interrupted at 2003-05-03 04:17:19 SGT
>>DEBUG:  checkpoint record is at 3/85EA18B0
>>DEBUG:  redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
>>FALSE
>>DEBUG:  next transaction id: 4111285; next oid: 7557242
>>DEBUG:  database system was not properly shut down; automatic recovery
>>in progress
>
>
> It looks like your preventative daily reboot is not preventing the
> problems it
> is causing. It is possible that the postmaster is not being shutdown
> properly
> because, for example, there is a client still connected and the shutdown
> script
> isn't forcing a fast shutdown. See pg_ctl manpage for infomation on the
> switches.
>
> As for worrying about the messages, there's no real error message in
> there,
> aside from the 'EOF on client connection', just the normal messages on
> start up
> from a bad shutdown. If you're worried, I would look at solving whatever
> the
> answer to the daily reboot question shows is the problem.
>
>
>
>>DEBUG:  ReadRecord: record with zero length at 3/85EA18F0
>>DEBUG:  redo is not required
>>DEBUG:  recycled transaction log file 0000000300000083
>>DEBUG:  recycled transaction log file 0000000300000084
>>DEBUG:  database system is ready
>>DEBUG:  pq_recvbuf: unexpected EOF on client connection
>>
>>
>>Regards
>>
>>Shoaib
>>
>>
>
>

Re: Database server restarting

From

"shoaib"

Date:

06 May 2003, 01:42:53

Our server reboots at 1 aM in the morning and the job I mentioned starts
at 4 aM in the morning and the job ended at 4.13 AM. This process is
database extensive around 10000 records are updated / inserted.Can it be
the cause of this problem.  After this thing happened my server just
hangs.

Last night I faced the same problem again on another server and it was
after yet another DB extensive process.
The  system has 1 GB RAM, 1 GHZ processor and RAID 1 installed on it and
Red Hat linux 7.3.
We are about to install 70 such servers.

Please help.

regards
Shoaib

-----Original Message-----
From: Dennis Gearon [mailto:gearond@cvc.net]
Sent: Monday, May 05, 2003 11:13 PM
To: shoaib
Cc: 'Nigel J. Andrews'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting

Modern OS's shouldn' need rebooting, unless something else is wrong.
What's the quality of your hardware? Any applications compiled on bad
hardware?

sigh, is it a windows environment?

shoaib wrote:
> Thanks a lot for your prompt reply.
> We are rebooting the server for cleaning up the buffers of the
> system.Before rebooting I will shutdown database server.Can you
provide
> any futher clue why suddenly at 4.17 aM it restarted.Our preventive
> maintenance run at 1 AM.
> And another process of Reading data from some flat files and updating
it
> to database ended at 4.13 AM on the same day.
> Your help is really appreciated.
>
> Regards
> Shoaib
>
> -----Original Message-----
> From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
> Sent: Monday, May 05, 2003 7:08 PM
> To: shoaib
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Database server restarting
>
> On Mon, 5 May 2003, shoaib wrote:
>
>
>>Hello Everybody,
>>
>>We are using postgressql 7.2.2 . our system running is 24 hours day it
>
> a
>
>>preventive reboot once a day.
>
>
> Odd concept. What is this reboot preventing?
>
>
>
>>some time I am getting this error and after
>>it the sytem hang .Can any body help in this.
>>
>>DEBUG:  pq_recvbuf: unexpected EOF on client connection
>>DEBUG:  pq_recvbuf: unexpected EOF on client connection
>>DEBUG:  pq_recvbuf: unexpected EOF on client connection
>>DEBUG:  pq_recvbuf: unexpected EOF on client connection
>>DEBUG:  database system was interrupted at 2003-05-03 04:17:19 SGT
>>DEBUG:  checkpoint record is at 3/85EA18B0
>>DEBUG:  redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
>>FALSE
>>DEBUG:  next transaction id: 4111285; next oid: 7557242
>>DEBUG:  database system was not properly shut down; automatic recovery
>>in progress
>
>
> It looks like your preventative daily reboot is not preventing the
> problems it
> is causing. It is possible that the postmaster is not being shutdown
> properly
> because, for example, there is a client still connected and the
shutdown
> script
> isn't forcing a fast shutdown. See pg_ctl manpage for infomation on
the
> switches.
>
> As for worrying about the messages, there's no real error message in
> there,
> aside from the 'EOF on client connection', just the normal messages on
> start up
> from a bad shutdown. If you're worried, I would look at solving
whatever
> the
> answer to the daily reboot question shows is the problem.
>
>
>
>>DEBUG:  ReadRecord: record with zero length at 3/85EA18F0
>>DEBUG:  redo is not required
>>DEBUG:  recycled transaction log file 0000000300000083
>>DEBUG:  recycled transaction log file 0000000300000084
>>DEBUG:  database system is ready
>>DEBUG:  pq_recvbuf: unexpected EOF on client connection
>>
>>
>>Regards
>>
>>Shoaib
>>
>>
>
>

Re: Database server restarting

From

Martijn van Oosterhout

Date:

06 May 2003, 02:15:52

On Tue, May 06, 2003 at 01:41:37PM +0800, shoaib wrote:
> Our server reboots at 1 aM in the morning and the job I mentioned starts
> at 4 aM in the morning and the job ended at 4.13 AM. This process is
> database extensive around 10000 records are updated / inserted.Can it be
> the cause of this problem.  After this thing happened my server just
> hangs.

When you say hang, do you mean the entire server stops responding ie you
can't login any more, no web requests, etc..?

If so, it's got nothing to do with postgres as a user program simply can't
hang the machine like that (unless you run out of memory in which case it's
just really slow rather hung).

> Last night I faced the same problem again on another server and it was
> after yet another DB extensive process.
> The  system has 1 GB RAM, 1 GHZ processor and RAID 1 installed on it and
> Red Hat linux 7.3.
> We are about to install 70 such servers.

When oyu say reboot, are you doing to proper shutdown sequence (shutdown -r
now) or are you just pulling the plug.

Please explain what "hangs". Also, rebooting everyday seems to be a massive
waste of time. UNIX machines don't need that kind of maintainence.

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "the West won the world not by the superiority of its ideas or values or
> religion but rather by its superiority in applying organized violence.
> Westerners often forget this fact, non-Westerners never do."
>   - Samuel P. Huntington

Attachment

msg-32091-59337.dat

Re: Database server restarting

From

"shoaib"

Date:

06 May 2003, 02:30:13

When I say hangs it means ..I am not even able to login at the server
console also.
No ssh, no login form remote machines.

Regards

Shoaib

-----Original Message-----
From: Martijn van Oosterhout [mailto:kleptog@svana.org]
Sent: Tuesday, May 06, 2003 2:15 PM
To: shoaib
Cc: gearond@cvc.net; 'Nigel J. Andrews'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting

On Tue, May 06, 2003 at 01:41:37PM +0800, shoaib wrote:
> Our server reboots at 1 aM in the morning and the job I mentioned
starts
> at 4 aM in the morning and the job ended at 4.13 AM. This process is
> database extensive around 10000 records are updated / inserted.Can it
be
> the cause of this problem.  After this thing happened my server just
> hangs.

When you say hang, do you mean the entire server stops responding ie you
can't login any more, no web requests, etc..?

If so, it's got nothing to do with postgres as a user program simply
can't
hang the machine like that (unless you run out of memory in which case
it's
just really slow rather hung).

> Last night I faced the same problem again on another server and it was
> after yet another DB extensive process.
> The  system has 1 GB RAM, 1 GHZ processor and RAID 1 installed on it
and
> Red Hat linux 7.3.
> We are about to install 70 such servers.

When oyu say reboot, are you doing to proper shutdown sequence (shutdown
-r
now) or are you just pulling the plug.

Please explain what "hangs". Also, rebooting everyday seems to be a
massive
waste of time. UNIX machines don't need that kind of maintainence.

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "the West won the world not by the superiority of its ideas or values
or
> religion but rather by its superiority in applying organized violence.
> Westerners often forget this fact, non-Westerners never do."
>   - Samuel P. Huntington

Re: Database server restarting

From

Shridhar Daithankar

Date:

06 May 2003, 02:30:38

On Tuesday 06 May 2003 11:11, shoaib wrote:
> Our server reboots at 1 aM in the morning and the job I mentioned starts
> at 4 aM in the morning and the job ended at 4.13 AM. This process is
> database extensive around 10000 records are updated / inserted.Can it be
> the cause of this problem.  After this thing happened my server just
> hangs.
>
> Last night I faced the same problem again on another server and it was
> after yet another DB extensive process.
> The  system has 1 GB RAM, 1 GHZ processor and RAID 1 installed on it and
> Red Hat linux 7.3.
> We are about to install 70 such servers.

I am sure there is something not very correct here. You should not need a
server restart. I would like to see your database configuration options,
patterns in data access and min/max/avg load on each server.

10K records isn't much. Certainly not for that kind of hardware..

I am still bothered by the fact that you reboot your server daily. Can't find
a good reason from above description..

 Shridhar

Re: Database server restarting

From

Martijn van Oosterhout

Date:

06 May 2003, 02:40:14

On Tue, May 06, 2003 at 02:28:57PM +0800, shoaib wrote:
> When I say hangs it means ..I am not even able to login at the server
> console also.
> No ssh, no login form remote machines.

Well, that's not postgresql's fault. It can't hang a machine like that. You
should look elsewhere for the exact cause. I'm assuming here that consoles
that are still logged in don't respond either? Maybe leave a top running to
capture the list of processes just before it dies? Any cronjobs about the
time it dies?

What other processes run at about that time?
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "the West won the world not by the superiority of its ideas or values or
> religion but rather by its superiority in applying organized violence.
> Westerners often forget this fact, non-Westerners never do."
>   - Samuel P. Huntington

Attachment

msg-32091-59342.dat

Re: Database server restarting

From

"shoaib"

Date:

06 May 2003, 02:48:00

There are some cron jobs running at the same time...
One server does SSH into our application server and on cron job is
reading the DB and writing some data into flat files. But by the time
this problem is happening these jobs are not writing any data. Last
night when the server went down the other server wa trying to do SsH and
probably it was running some cron job and a heavy DB process was
running.I can not do a top bcoz I can not login into server even from
console.

Regards
shaoib

-----Original Message-----
From: Martijn van Oosterhout [mailto:kleptog@svana.org]
Sent: Tuesday, May 06, 2003 2:40 PM
To: shoaib
Cc: gearond@cvc.net; 'Nigel J. Andrews'; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting

On Tue, May 06, 2003 at 02:28:57PM +0800, shoaib wrote:
> When I say hangs it means ..I am not even able to login at the server
> console also.
> No ssh, no login form remote machines.

Well, that's not postgresql's fault. It can't hang a machine like that.
You
should look elsewhere for the exact cause. I'm assuming here that
consoles
that are still logged in don't respond either? Maybe leave a top running
to
capture the list of processes just before it dies? Any cronjobs about
the
time it dies?

What other processes run at about that time?
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> "the West won the world not by the superiority of its ideas or values
or
> religion but rather by its superiority in applying organized violence.
> Westerners often forget this fact, non-Westerners never do."
>   - Samuel P. Huntington

Re: Database server restarting

From

Shridhar Daithankar

Date:

06 May 2003, 03:13:01

On Tuesday 06 May 2003 12:16, shoaib wrote:
> There are some cron jobs running at the same time...
> One server does SSH into our application server and on cron job is
> reading the DB and writing some data into flat files. But by the time
> this problem is happening these jobs are not writing any data. Last
> night when the server went down the other server wa trying to do SsH and
> probably it was running some cron job and a heavy DB process was
> running.I can not do a top bcoz I can not login into server even from
> console.

How much time did you wait? If server has doing heavy disk processing, it
would take upto 10 minutes under worst conditions.. Just don't give up in a
minute or so..

 Shridhar

Re: Database server restarting

From

"Nigel J. Andrews"

Date:

06 May 2003, 03:44:09

On Tue, 6 May 2003, shoaib wrote:

> There are some cron jobs running at the same time...
> One server does SSH into our application server and on cron job is
> reading the DB and writing some data into flat files. But by the time
> this problem is happening these jobs are not writing any data. Last
> night when the server went down the other server wa trying to do SsH and
> probably it was running some cron job and a heavy DB process was
> running.I can not do a top bcoz I can not login into server even from
> console.

Do you mean you have no log in priviledges on to the machine or you are only
trying to login once you see a problem? If the former then I can't see how
there's any way you can make progress with this. If the later, forget that,
that's not helping since you are unable to get the processes running. What you
should do is log in _now_, run 'top' and leave it running. It may be that when
the problem occurs the session running the top will stop and so show the
information from that time. However, it may also be that it doesn't stop and
when you come into the office n hours later you find it merrily ticking away
showing you the current information. Therefore, investigate ways to log
the information if you aren't sat there when the problem is occuring.

Also take a look at procinfo, it may be helpful as well.

One thing that might be a problem is the number of open file descriptors, you
could be running into the system limit of those. That sort of thing can
sometimes make a system unstable.

I'd still be interested to know whether the hardware has been tested
properly. Is there any known problems for RH 7.3's kernel and your particular
hardware, such as the RAID device?

One interesting thing you say though; the same thing happens on a second
server. That to me suggests either something like a kernel/hardware problem
such as the RAID or you have a bug in your own software. Perhaps an endless
loop? Perhaps an endless trying to obtain a file descriptor? A heavy cpu usage
process shouldn't bring the machine down but it can make it look very
unresponsive.

>
> Regards
> shaoib
>
>
> -----Original Message-----
> From: Martijn van Oosterhout [mailto:kleptog@svana.org]
> Sent: Tuesday, May 06, 2003 2:40 PM
> To: shoaib
> Cc: gearond@cvc.net; 'Nigel J. Andrews'; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Database server restarting
>
> On Tue, May 06, 2003 at 02:28:57PM +0800, shoaib wrote:
> > When I say hangs it means ..I am not even able to login at the server
> > console also.
> > No ssh, no login form remote machines.
>
> Well, that's not postgresql's fault. It can't hang a machine like that.
> You
> should look elsewhere for the exact cause. I'm assuming here that
> consoles
> that are still logged in don't respond either? Maybe leave a top running
> to
> capture the list of processes just before it dies? Any cronjobs about
> the
> time it dies?
>
> What other processes run at about that time?
>

--
Nigel J. Andrews

Re: Database server restarting

From

"shoaib"

Date:

06 May 2003, 04:40:20

When I login a console, I can see the prompt but after typing in login
name system just don't respond it does not come to password prompt.

Regards
Shoaib

-----Original Message-----
From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
Sent: Tuesday, May 06, 2003 3:44 PM
To: shoaib
Cc: 'Martijn van Oosterhout'; gearond@cvc.net;
pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting

On Tue, 6 May 2003, shoaib wrote:

> There are some cron jobs running at the same time...
> One server does SSH into our application server and on cron job is
> reading the DB and writing some data into flat files. But by the time
> this problem is happening these jobs are not writing any data. Last
> night when the server went down the other server wa trying to do SsH
and
> probably it was running some cron job and a heavy DB process was
> running.I can not do a top bcoz I can not login into server even from
> console.

Do you mean you have no log in priviledges on to the machine or you are
only
trying to login once you see a problem? If the former then I can't see
how
there's any way you can make progress with this. If the later, forget
that,
that's not helping since you are unable to get the processes running.
What you
should do is log in _now_, run 'top' and leave it running. It may be
that when
the problem occurs the session running the top will stop and so show the
information from that time. However, it may also be that it doesn't stop
and
when you come into the office n hours later you find it merrily ticking
away
showing you the current information. Therefore, investigate ways to log
the information if you aren't sat there when the problem is occuring.

Also take a look at procinfo, it may be helpful as well.

One thing that might be a problem is the number of open file
descriptors, you
could be running into the system limit of those. That sort of thing can
sometimes make a system unstable.

I'd still be interested to know whether the hardware has been tested
properly. Is there any known problems for RH 7.3's kernel and your
particular
hardware, such as the RAID device?

One interesting thing you say though; the same thing happens on a second
server. That to me suggests either something like a kernel/hardware
problem
such as the RAID or you have a bug in your own software. Perhaps an
endless
loop? Perhaps an endless trying to obtain a file descriptor? A heavy cpu
usage
process shouldn't bring the machine down but it can make it look very
unresponsive.

>
> Regards
> shaoib
>
>
> -----Original Message-----
> From: Martijn van Oosterhout [mailto:kleptog@svana.org]
> Sent: Tuesday, May 06, 2003 2:40 PM
> To: shoaib
> Cc: gearond@cvc.net; 'Nigel J. Andrews'; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Database server restarting
>
> On Tue, May 06, 2003 at 02:28:57PM +0800, shoaib wrote:
> > When I say hangs it means ..I am not even able to login at the
server
> > console also.
> > No ssh, no login form remote machines.
>
> Well, that's not postgresql's fault. It can't hang a machine like
that.
> You
> should look elsewhere for the exact cause. I'm assuming here that
> consoles
> that are still logged in don't respond either? Maybe leave a top
running
> to
> capture the list of processes just before it dies? Any cronjobs about
> the
> time it dies?
>
> What other processes run at about that time?
>

--
Nigel J. Andrews

Re: Database server restarting

From

"Nigel J. Andrews"

Date:

06 May 2003, 04:55:07

On Tue, 6 May 2003, shoaib wrote:

> When I login a console, I can see the prompt but after typing in login
> name system just don't respond it does not come to password prompt.

You may have to wait a long time which isn't very good because a) by the time
the system has enough resources to proceed with your log in it's not in the
same state it was in at the problem time (obviously) and b) the login process
may well timeout the login attempt before it even gets to the stage of asking
for the password.

You really do need to be logged in before the problem occurs. Indeed, have more
than one session running, run system monitoring utilities like top and procinfo
and also one you can type into without stopping those utilities.

If you can get the system to again you may also find it useful to run your
cronjobs by hand to verify them individually and to then try and replicate the
early morning conditions at whatever time you can test things. If you're having
to wait overnight everytime just to take a look at a new piece of the puzzle
you're locked into that timetable for generating and testing a solution.

--
Nigel Andrews

Re: Database server restarting

From

"shoaib"

Date:

06 May 2003, 05:01:24

Hello,

Thanks for you kind help.

But is there any particular reason for database to do such kind of
behavior.
DEBUG:  pq_recvbuf: unexpected EOF on client connection
DEBUG:  pq_recvbuf: unexpected EOF on client connection
DEBUG:  pq_recvbuf: unexpected EOF on client connection
DEBUG:  pq_recvbuf: unexpected EOF on client connection
DEBUG:  database system was interrupted at 2003-05-03 04:17:19 SGT
DEBUG:  checkpoint record is at 3/85EA18B0
DEBUG:  redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
FALSE
DEBUG:  next transaction id: 4111285; next oid: 7557242
DEBUG:  database system was not properly shut down; automatic recovery
in progress
DEBUG:  ReadRecord: record with zero length at 3/85EA18F0
DEBUG:  redo is not required
DEBUG:  recycled transaction log file 0000000300000083
DEBUG:  recycled transaction log file 0000000300000084
DEBUG:  database system is ready
DEBUG:  pq_recvbuf: unexpected EOF on client connection

Is there any particular reason for this thing.

Regards
Shoaib

-----Original Message-----
From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
Sent: Tuesday, May 06, 2003 4:55 PM
To: shoaib
Cc: 'Martijn van Oosterhout'; gearond@cvc.net;
pgsql-general@postgresql.org
Subject: RE: [GENERAL] Database server restarting

On Tue, 6 May 2003, shoaib wrote:

> When I login a console, I can see the prompt but after typing in login
> name system just don't respond it does not come to password prompt.

You may have to wait a long time which isn't very good because a) by the
time
the system has enough resources to proceed with your log in it's not in
the
same state it was in at the problem time (obviously) and b) the login
process
may well timeout the login attempt before it even gets to the stage of
asking
for the password.

You really do need to be logged in before the problem occurs. Indeed,
have more
than one session running, run system monitoring utilities like top and
procinfo
and also one you can type into without stopping those utilities.

If you can get the system to again you may also find it useful to run
your
cronjobs by hand to verify them individually and to then try and
replicate the
early morning conditions at whatever time you can test things. If you're
having
to wait overnight everytime just to take a look at a new piece of the
puzzle
you're locked into that timetable for generating and testing a solution.

--
Nigel Andrews

Re: Database server restarting

From

"Nigel J. Andrews"

Date:

06 May 2003, 05:10:34

On Tue, 6 May 2003, shoaib wrote:

> Hello,
>
> Thanks for you kind help.
>
> But is there any particular reason for database to do such kind of
> behavior.
> DEBUG:  pq_recvbuf: unexpected EOF on client connection
> DEBUG:  pq_recvbuf: unexpected EOF on client connection
> DEBUG:  pq_recvbuf: unexpected EOF on client connection
> DEBUG:  pq_recvbuf: unexpected EOF on client connection
> DEBUG:  database system was interrupted at 2003-05-03 04:17:19 SGT
> DEBUG:  checkpoint record is at 3/85EA18B0
> DEBUG:  redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
> FALSE
> DEBUG:  next transaction id: 4111285; next oid: 7557242
> DEBUG:  database system was not properly shut down; automatic recovery
> in progress
> DEBUG:  ReadRecord: record with zero length at 3/85EA18F0
> DEBUG:  redo is not required
> DEBUG:  recycled transaction log file 0000000300000083
> DEBUG:  recycled transaction log file 0000000300000084
> DEBUG:  database system is ready
> DEBUG:  pq_recvbuf: unexpected EOF on client connection
>
> Is there any particular reason for this thing.

Well, there are probably lots of potential causes but consider something like
this:

process A starts up
process A uses N MB of memory
process A loops
process A uses N+1 MB of memory
...
process B starts up and connects to DB
memory available is 1MB
process A loops
process A uses N+1 MB of memory
proi
process B wants 10KB more memory
process B dies for want of memory allocation checks
DB notes the unexpected EOF on the connection from B
process A loops
process A wants N+1 MB of memory
process A retries N+1 MB of memory
process A retries N+1 MB of memory
process A retries N+1 MB of memory
process A retries N+1 MB of memory
...
system can't start any other process for lack of memory resources

You've got high system load, inability for processes to claim more memory and
errors about programs exiting at unexpected times.

--
Nigel Andrews

Re: Database server restarting

From

Shridhar Daithankar

Date:

06 May 2003, 05:33:47

On Tuesday 06 May 2003 14:25, Nigel J. Andrews wrote:
> On Tue, 6 May 2003, shoaib wrote:
> > When I login a console, I can see the prompt but after typing in login
> > name system just don't respond it does not come to password prompt.
>
> You may have to wait a long time which isn't very good because a) by the
> time the system has enough resources to proceed with your log in it's not
> in the same state it was in at the problem time (obviously) and b) the
> login process may well timeout the login attempt before it even gets to the
> stage of asking for the password.

I have two suggestions for OP, if he is interested in experimenting with
alternatives, assuming problems is with heavy DB process.

1) Try freeBSD4.8 and postgresql from ports. I have a gut feeling that BSD
would be more responsive under heavy disk load than linux. No concrete
evidence.. just a gut feeling..

2) Try a latest kernel.. I suggest you get 2.4.20 from kernel.org and apply
patches from http://members.optusnet.com.au/ckolivas/kernel/. Just get the
base patch that includes O(1), pre-empt and low-latency.. That should be good
enough..

Basically with either of these, the irresponsiveness that you are facing
should be gone and you should be able to debug the problem..

 HTH

 Shridhar

Re: Database server restarting

From

Tom Lane

Date:

06 May 2003, 10:22:33

"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
>> But is there any particular reason for database to do such kind of
>> behavior.
>> DEBUG:  pq_recvbuf: unexpected EOF on client connection
>> DEBUG:  pq_recvbuf: unexpected EOF on client connection
>> DEBUG:  pq_recvbuf: unexpected EOF on client connection
>> DEBUG:  pq_recvbuf: unexpected EOF on client connection
>> DEBUG:  database system was interrupted at 2003-05-03 04:17:19 SGT
>> DEBUG:  checkpoint record is at 3/85EA18B0

> You've got high system load, inability for processes to claim more memory and
> errors about programs exiting at unexpected times.

What strikes me about the above trace is that we see "database system
was interrupted" without any prior failure.  That says to me that
something killed the postmaster itself --- if a database child process
died, the postmaster would have logged the fact.

That leaves me with two questions: what killed the postmaster, and what
restarted it?

If Nigel's guess is right that the system is under heavy memory
pressure, and this is a Linux box, then the kernel itself might have
kill -9'd the postmaster to try to get out of a memory shortage.
I can't think of very many other theories (though I do recall at
least one self-inflicted problem, from someone whose "maintenance
script" kill -9'd the postmaster for random reasons...)

I'd also like to know whether the system is configured to auto-restart
the postmaster, and if so how, and does it do any mucking about (like
removing lockfiles) while it's doing so?

            regards, tom lane

Re: Database server restarting

From

Dennis Gearon

Date:

06 May 2003, 11:51:05

Look in the archives about disk and memory testing.

memtest86 and some other program.

shoaib wrote:
> Our server reboots at 1 aM in the morning and the job I mentioned starts
> at 4 aM in the morning and the job ended at 4.13 AM. This process is
> database extensive around 10000 records are updated / inserted.Can it be
> the cause of this problem.  After this thing happened my server just
> hangs.
>
> Last night I faced the same problem again on another server and it was
> after yet another DB extensive process.
> The  system has 1 GB RAM, 1 GHZ processor and RAID 1 installed on it and
> Red Hat linux 7.3.
> We are about to install 70 such servers.
>
> Please help.
>
> regards
> Shoaib
>
> -----Original Message-----
> From: Dennis Gearon [mailto:gearond@cvc.net]
> Sent: Monday, May 05, 2003 11:13 PM
> To: shoaib
> Cc: 'Nigel J. Andrews'; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Database server restarting
>
> Modern OS's shouldn' need rebooting, unless something else is wrong.
> What's the quality of your hardware? Any applications compiled on bad
> hardware?
>
> sigh, is it a windows environment?
>
> shoaib wrote:
>
>>Thanks a lot for your prompt reply.
>>We are rebooting the server for cleaning up the buffers of the
>>system.Before rebooting I will shutdown database server.Can you
>
> provide
>
>>any futher clue why suddenly at 4.17 aM it restarted.Our preventive
>>maintenance run at 1 AM.
>>And another process of Reading data from some flat files and updating
>
> it
>
>>to database ended at 4.13 AM on the same day.
>>Your help is really appreciated.
>>
>>Regards
>>Shoaib
>>
>>-----Original Message-----
>>From: Nigel J. Andrews [mailto:nandrews@investsystems.co.uk]
>>Sent: Monday, May 05, 2003 7:08 PM
>>To: shoaib
>>Cc: pgsql-general@postgresql.org
>>Subject: Re: [GENERAL] Database server restarting
>>
>>On Mon, 5 May 2003, shoaib wrote:
>>
>>
>>
>>>Hello Everybody,
>>>
>>>We are using postgressql 7.2.2 . our system running is 24 hours day it
>>
>>a
>>
>>
>>>preventive reboot once a day.
>>
>>
>>Odd concept. What is this reboot preventing?
>>
>>
>>
>>
>>>some time I am getting this error and after
>>>it the sytem hang .Can any body help in this.
>>>
>>>DEBUG:  pq_recvbuf: unexpected EOF on client connection
>>>DEBUG:  pq_recvbuf: unexpected EOF on client connection
>>>DEBUG:  pq_recvbuf: unexpected EOF on client connection
>>>DEBUG:  pq_recvbuf: unexpected EOF on client connection
>>>DEBUG:  database system was interrupted at 2003-05-03 04:17:19 SGT
>>>DEBUG:  checkpoint record is at 3/85EA18B0
>>>DEBUG:  redo record is at 3/85EA18B0; undo record is at 0/0; shutdown
>>>FALSE
>>>DEBUG:  next transaction id: 4111285; next oid: 7557242
>>>DEBUG:  database system was not properly shut down; automatic recovery
>>>in progress
>>
>>
>>It looks like your preventative daily reboot is not preventing the
>>problems it
>>is causing. It is possible that the postmaster is not being shutdown
>>properly
>>because, for example, there is a client still connected and the
>
> shutdown
>
>>script
>>isn't forcing a fast shutdown. See pg_ctl manpage for infomation on
>
> the
>
>>switches.
>>
>>As for worrying about the messages, there's no real error message in
>>there,
>>aside from the 'EOF on client connection', just the normal messages on
>>start up
>>from a bad shutdown. If you're worried, I would look at solving
>
> whatever
>
>>the
>>answer to the daily reboot question shows is the problem.
>>
>>
>>
>>
>>>DEBUG:  ReadRecord: record with zero length at 3/85EA18F0
>>>DEBUG:  redo is not required
>>>DEBUG:  recycled transaction log file 0000000300000083
>>>DEBUG:  recycled transaction log file 0000000300000084
>>>DEBUG:  database system is ready
>>>DEBUG:  pq_recvbuf: unexpected EOF on client connection
>>>
>>>
>>>Regards
>>>
>>>Shoaib
>>>
>>>
>>
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>
>

Re: Database server restarting

From

"scott.marlowe"

Date:

06 May 2003, 12:49:15

Here's a little script that will run top every so often and log the output
to a file you can read later when the machine's recovered.

#!/bin/bash
for ((a=0;a=1;a=0)) do {
        top -bn 1 >>log.txt
        sleep 60
}

Just run it in your home directory.  Make sure your /home partition has
enough space.  Under heavy load each 60 seconds you'll be adding about 2k
to 5k to that file.  Change the sleep 60 to something smaller if you want
it to run more often.  No warranties implied, use at your own risk. :-)

On Tue, 6 May 2003, shoaib wrote:

> There are some cron jobs running at the same time...
> One server does SSH into our application server and on cron job is
> reading the DB and writing some data into flat files. But by the time
> this problem is happening these jobs are not writing any data. Last
> night when the server went down the other server wa trying to do SsH and
> probably it was running some cron job and a heavy DB process was
> running.I can not do a top bcoz I can not login into server even from
> console.
>
> Regards
> shaoib
>
>
> -----Original Message-----
> From: Martijn van Oosterhout [mailto:kleptog@svana.org]
> Sent: Tuesday, May 06, 2003 2:40 PM
> To: shoaib
> Cc: gearond@cvc.net; 'Nigel J. Andrews'; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Database server restarting
>
> On Tue, May 06, 2003 at 02:28:57PM +0800, shoaib wrote:
> > When I say hangs it means ..I am not even able to login at the server
> > console also.
> > No ssh, no login form remote machines.
>
> Well, that's not postgresql's fault. It can't hang a machine like that.
> You
> should look elsewhere for the exact cause. I'm assuming here that
> consoles
> that are still logged in don't respond either? Maybe leave a top running
> to
> capture the list of processes just before it dies? Any cronjobs about
> the
> time it dies?
>
> What other processes run at about that time?
>

Re: Database server restarting

From

"scott.marlowe"

Date:

06 May 2003, 13:53:25

On Tue, 6 May 2003, shoaib wrote:

> When I login a console, I can see the prompt but after typing in login
> name system just don't respond it does not come to password prompt.

FYI, for future reference, this is generally referred to as being
non-responsive, not hanging.  Hanging means the server has truly crashed,
and is no longer answer pings, etc...  Usually hanging servers mean bad
hardware.  Non-responsive servers often mean that you've increased the
load too high for the server to handle, and it's busily swapping out
resources left and right to try and stay up and running.

And there is NO reason to reboot a RedHat Linux 7.x box every night.  Mine
routinely get 100 days of uptime between reboots, sometimes 200 days.
Usually by then we're either upgrading to a new version or installing a
new kernel and have to reboot.

Leaving the OS up is actually a good thing, as it keeps the buffers from
getting cleared out.  Note that if all you want is for OS cache buffers to
flush, just write a short c program that mallocs huge chunks of memory
until you start swapping a bit.  But that's counter productive.
Postgresql flushes buffers when it's writing, so you don't have to worry
about dataloss, and the data in those buffers takes a while to load.

 11:42am  up 36 days,  1:20,  4 users,  load average: 0.27, 0.28, 0.32
195 processes: 194 sleeping, 1 running, 0 zombie, 0 stopped
CPU0 states: 21.0% user,  0.0% system,  0.0% nice, 78.0% idle
CPU1 states:  1.0% user,  8.0% system,  0.0% nice, 89.0% idle
Mem:  1543980K av, 1535472K used,    8508K free,  265928K shrd,   48872K buff
Swap: 2048208K av,  164524K used, 1883684K free                  871720K cached

Note the 870 Meg of cached data.  It takes my server at least a day of
running before it can use the extra memory as cache, and rebooting it
would make it start over.

Unlike Windows machines, Unix machines tend to run faster the longer
they're left up.

Re: Database server restarting

From

"Nigel J. Andrews"

Date:

06 May 2003, 14:00:40

On Tue, 6 May 2003, scott.marlowe wrote:

> Here's a little script that will run top every so often and log the output
> to a file you can read later when the machine's recovered.
>
> #!/bin/bash
> for ((a=0;a=1;a=0)) do {
>         top -bn 1 >>log.txt
>         sleep 60
> }
>
> Just run it in your home directory.  Make sure your /home partition has
> enough space.  Under heavy load each 60 seconds you'll be adding about 2k
> to 5k to that file.  Change the sleep 60 to something smaller if you want
> it to run more often.  No warranties implied, use at your own risk. :-)

The problem with that is that it is starting up new processes each
iteration. At the least you need to redirect stderr to the log file as
well. Should top fail to launch then that would provide some help with the
problem but not as much as actually having the output of top. It would be much
better to just do a:

top -d 60 -b -n 600 > log.txt 2>&1

which would take snapshots for 10 hours, or just set a very large number
instead of 600 and interrupt it when wanted. The 60 second interval can easily
be changed then as well.

Then, of course, if the issue is disk activity, swap or otherwise, there's also
vmstat. What about file descriptor usage? It's possible to determine an
estimate of that by looking through /proc, in which case I'd say a simple shell
script would suffice and never mind the possible failures to start programs
like ls. Then what about if it's interrupt activity that's a problem? Not very
likely on modern hardware but even 10Mbps ethernet could bring a system almost
to it's knees with interrupt activity on older stuff.

I think the important point in this is that there is something making the
system unstable and the extra load produced by the postgresql cron jobs is
sufficient to make that something significant where normally a daily reboot
prevents it avoids it getting to that stage. So again, it's the question of
'why reboot daily?'

--
Nigel Andrews

Re: Database server restarting

From

"shoaib"

Date:

06 May 2003, 22:01:47

I am not using any restart script ( may be I understood u wrongly)
But I am starting postgres at the time of system boot up and this is the
script for that

#! /bin/sh
#
# Startup script to run Postgresql
#
#

start()
 {
    if [ `id -u` = 0 ] && ! echo $PATH | /bin/grep -q "/sbin" ; then
        PATH=/sbin:$PATH
    fi

    if [ `id -u` = 0 ] && ! echo $PATH | /bin/grep -q "/usr/sbin" ;
then
        PATH=/usr/sbin:$PATH
    fi

    if [ `id -u` = 0 ] && ! echo $PATH | /bin/grep -q
"/usr/local/sbin" ; then
        PATH=/usr/local/sbin:$PATH
    fi

    if ! echo $PATH | /bin/grep -q "/usr/X11R6/bin" ; then
        PATH="$PATH:/usr/X11R6/bin"
    fi

    PATH=$PATH:.:/usr/local/jdk/bin:/usr/local/pgsql/bin

    #FOR NON-RAID
    #PGDATA=/usr/local/pgsql/data
    #FOR RAID
        PGDATA=/data/pgsql/data

    export PATH PGDATA

    su -l postgres -s /bin/sh -c "/usr/local/pgsql/bin/pg_ctl start
-D $PGDATA -o '-i' -s -l $PGDATA/simspgsql.log &"

    sleep 1
    if [ -f $PGDATA/postmaster.pid ]
    then
       echo "PostgreSQL started"
    else
       echo "PostgreSQL not started"
    fi
 }

stop()
 {
    su -l postgres -s /bin/sh -c "/usr/local/pgsql/bin/pg_ctl stop
-D $PGDATA -s -m fast"
    sleep 1
    if [ -f $PGDATA/postmaster.pid ]
    then
       echo "PostgreSQL not stopped"
    else
       echo "PostgreSQL is currently stopped"
    fi
 }

restart()
{
    stop
    start
}

status()
{
    su -l postgres -s /bin/sh -c "/usr/local/pgsql/bin/pg_ctl status
-D $PGDATA"
}

case "$1" in
 start)
   start
   ;;
 stop)
   stop
   ;;
 restart)
   restart
   ;;
 status)
   status
   ;;
 *)
   echo "Usage: $0 {start|stop|restart|status}"
esac


Let m know if there is any problem in it.

Regards,
Shoaib

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, May 06, 2003 10:22 PM
To: Nigel J. Andrews
Cc: shoaib; 'Martijn van Oosterhout'; gearond@cvc.net;
pgsql-general@postgresql.org
Subject: Re: [GENERAL] Database server restarting

"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
>> But is there any particular reason for database to do such kind of
>> behavior.
>> DEBUG:  pq_recvbuf: unexpected EOF on client connection
>> DEBUG:  pq_recvbuf: unexpected EOF on client connection
>> DEBUG:  pq_recvbuf: unexpected EOF on client connection
>> DEBUG:  pq_recvbuf: unexpected EOF on client connection
>> DEBUG:  database system was interrupted at 2003-05-03 04:17:19 SGT
>> DEBUG:  checkpoint record is at 3/85EA18B0

> You've got high system load, inability for processes to claim more
memory and
> errors about programs exiting at unexpected times.

What strikes me about the above trace is that we see "database system
was interrupted" without any prior failure.  That says to me that
something killed the postmaster itself --- if a database child process
died, the postmaster would have logged the fact.

That leaves me with two questions: what killed the postmaster, and what
restarted it?

If Nigel's guess is right that the system is under heavy memory
pressure, and this is a Linux box, then the kernel itself might have
kill -9'd the postmaster to try to get out of a memory shortage.
I can't think of very many other theories (though I do recall at
least one self-inflicted problem, from someone whose "maintenance
script" kill -9'd the postmaster for random reasons...)

I'd also like to know whether the system is configured to auto-restart
the postmaster, and if so how, and does it do any mucking about (like
removing lockfiles) while it's doing so?

            regards, tom lane