Thread: "show all" command crashes server

"show all" command crashes server

From
Grant Maxwell
Date:
Hi folks

First time poster here so please extend grace if I don't initially provide what is needed to help.

I am running postgresql 8.3.7 on debian lenny (postgresql-8.3_8.3.7-0lenny1_i386.deb).

I have three of these servers and generally they run well.

On this one server if I use the command "show all" in psql, phpPgAdmin or pgAdmin3 the postgresql server spits the dummy as follows:

postgres@theconsole:~$ psql
Welcome to psql 8.3.7, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

postgres=# show all;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


In the syslog is:

Sep 10 23:55:14 theconsole postgres[31118]: [3-2]     0: LOCATION:  reaper, postmaster.c:2156
Sep 10 23:55:15 theconsole postgres[31124]: [4-1]  [local] [unknown] [unknown] 0: LOG:  08P01: incomplete startup packet
Sep 10 23:55:15 theconsole postgres[31124]: [4-2]  [local] [unknown] [unknown] 0: LOCATION:  ProcessStartupPacket, postmaster.c:1396
Sep 10 23:55:36 theconsole postgres[31118]: [4-1]     0: LOG:  00000: server process (PID 31145) was terminated by signal 11: Segmentation fault
Sep 10 23:55:36 theconsole postgres[31118]: [4-2]     0: LOCATION:  LogChildExit, postmaster.c:2529
Sep 10 23:55:36 theconsole postgres[31118]: [5-1]     0: LOG:  00000: terminating any other active server processes
Sep 10 23:55:36 theconsole postgres[31118]: [5-2]     0: LOCATION:  HandleChildCrash, postmaster.c:2374
Sep 10 23:55:36 theconsole postgres[31118]: [6-1]     0: LOG:  00000: all server processes terminated; reinitializing
Sep 10 23:55:36 theconsole postgres[31118]: [6-2]     0: LOCATION:  PostmasterStateMachine, postmaster.c:2690
Sep 10 23:55:36 theconsole postgres[31146]: [7-1]     0: LOG:  00000: database system was interrupted; last known up at 2009-09-10 23:55:14 EST
Sep 10 23:55:36 theconsole postgres[31146]: [7-2]     0: LOCATION:  StartupXLOG, xlog.c:4836
Sep 10 23:55:36 theconsole postgres[31147]: [7-1]  [local] postgres postgres 0: FATAL:  57P03: the database system is in recovery mode
Sep 10 23:55:36 theconsole postgres[31147]: [7-2]  [local] postgres postgres 0: LOCATION:  ProcessStartupPacket, postmaster.c:1648
Sep 10 23:55:36 theconsole postgres[31146]: [8-1]     0: LOG:  00000: database system was not properly shut down; automatic recovery in progress
Sep 10 23:55:36 theconsole postgres[31146]: [8-2]     0: LOCATION:  StartupXLOG, xlog.c:5003
Sep 10 23:55:36 theconsole postgres[31146]: [9-1]     0: LOG:  00000: record with zero length at 2A/E734761C
Sep 10 23:55:36 theconsole postgres[31146]: [9-2]     0: LOCATION:  ReadRecord, xlog.c:3126
Sep 10 23:55:36 theconsole postgres[31146]: [10-1]     0: LOG:  00000: redo is not required
Sep 10 23:55:36 theconsole postgres[31146]: [10-2]     0: LOCATION:  StartupXLOG, xlog.c:5146
Sep 10 23:55:36 theconsole postgres[31150]: [7-1]     0: LOG:  00000: autovacuum launcher started
Sep 10 23:55:36 theconsole postgres[31150]: [7-2]     0: LOCATION:  AutoVacLauncherMain, autovacuum.c:520
Sep 10 23:55:36 theconsole postgres[31118]: [7-1]     0: LOG:  00000: database system is ready to accept connections


this is 100% repeatable.

The database seems to work fine unless this command is run then it is instant death.

any help would be appreciated

regards
Grant






Re: "show all" command crashes server

From
Richard Huxton
Date:
Grant Maxwell wrote:
> Hi folks
>
> First time poster here so please extend grace if I don't initially
> provide what is needed to help.
>
> I am running postgresql 8.3.7 on debian lenny
> (postgresql-8.3_8.3.7-0lenny1_i386.deb)

Well that's useful.

> I have three of these servers and generally they run well.

As is that.

> On this one server if I use the command "show all" in psql, phpPgAdmin
> or pgAdmin3 the postgresql server spits the dummy as follows:

> postgres=# show all;
> server closed the connection unexpectedly

Hmm - some modules can provide their own config variables. Do you have
the same modules installed in all three servers?

Can you "show" individual variables?

--
  Richard Huxton
  Archonet Ltd

Re: "show all" command crashes server

From
Scott Marlowe
Date:
On Thu, Sep 10, 2009 at 8:37 AM, Grant
Maxwell<grant.maxwell@maxan.com.au> wrote:
> Hi folks
> First time poster here so please extend grace if I don't initially provide
> what is needed to help.
> I am running postgresql 8.3.7 on debian lenny
> (postgresql-8.3_8.3.7-0lenny1_i386.deb).
> I have three of these servers and generally they run well.
SNIP
> Sep 10 23:55:36 theconsole postgres[31118]: [4-1]     0: LOG:  00000: server
> process (PID 31145) was terminated by signal 11: Segmentation fault

Sig 11 is a process crash which can be caused by bad hardware or
corrupted / buggy binaries.  I'd try reinstalling pgsql binaries and
see if that helps.

Re: "show all" command crashes server

From
Grant Maxwell
Date:

On 11/09/2009, at 1:09 AM, Richard Huxton wrote:

>
>
>> On this one server if I use the command "show all" in psql,
>> phpPgAdmin
>> or pgAdmin3 the postgresql server spits the dummy as follows:
>
>> postgres=# show all;
>> server closed the connection unexpectedly
>
> Hmm - some modules can provide their own config variables. Do you have
> the same modules installed in all three servers?

        How can I determine what modules are installed ?
        I do know that pgmemcache is installed on this server - but it was
there before the problems started and it works ok.

>
> Can you "show" individual variables?
        I did a show all on one of the other servers, created a script to
use each of the resulting outputs in a single show statement and ran
on the problem server.
        It ran without a fault.

        I then took the postgresql.conf file from the problem server,
grabbed all the config lines and submitted them one at a time (again
with a script) and it also
        worked fine.

regards
Grant Maxwell


Re: "show all" command crashes server

From
Tom Lane
Date:
Grant Maxwell <grant.maxwell@maxan.com.au> writes:
> On 11/09/2009, at 1:09 AM, Richard Huxton wrote:
>> Hmm - some modules can provide their own config variables. Do you have
>> the same modules installed in all three servers?

>         How can I determine what modules are installed ?

The contents of the local_preload_libraries and shared_preload_libraries
parameters would probably be enough ...

            regards, tom lane

Re: "show all" command crashes server

From
Grant Maxwell
Date:
On 11/09/2009, at 8:17 AM, Tom Lane wrote:

> Grant Maxwell <grant.maxwell@maxan.com.au> writes:
>> On 11/09/2009, at 1:09 AM, Richard Huxton wrote:
>>> Hmm - some modules can provide their own config variables. Do you
>>> have
>>> the same modules installed in all three servers?
>
>>         How can I determine what modules are installed ?
>
> The contents of the local_preload_libraries and
> shared_preload_libraries
> parameters would probably be enough ...
>
>             regards, tom lane
>
On the problem server:
    shared_preload_libraries = 'pgmemcache'
    #local_preload_libraries = ''

on the others both are emply.

For good measure I removed pgmemcache but the problem persists.
I have now put it back.

regards
Grant


Re: "show all" command crashes server

From
Tom Lane
Date:
Grant Maxwell <grant.maxwell@maxan.com.au> writes:
> On the problem server:
>     shared_preload_libraries = 'pgmemcache'
>     #local_preload_libraries = ''

> on the others both are emply.

Sounds like a smoking gun to me.

> For good measure I removed pgmemcache but the problem persists.

Did you restart the postmaster afterwards?  shared_preload_libraries
is only considered at postmaster start.

            regards, tom lane

Re: "show all" command crashes server

From
Grant Maxwell
Date:
On 11/09/2009, at 8:36 AM, Tom Lane wrote:

> Grant Maxwell <grant.maxwell@maxan.com.au> writes:
>> On the problem server:
>>     shared_preload_libraries = 'pgmemcache'
>>     #local_preload_libraries = ''
>
>> on the others both are emply.
>
> Sounds like a smoking gun to me.
>
>> For good measure I removed pgmemcache but the problem persists.
>
> Did you restart the postmaster afterwards?  shared_preload_libraries
> is only considered at postmaster start.

    yep - full restart.
>
>             regards, tom lane


Re: "show all" command crashes server

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160

>> shared_preload_libraries = 'pgmemcache'
...
> Sounds like a smoking gun to me.

Yep, known problem with pgmemcache. Bruce and I poked around
with this about a year ago. Bruce, I think you were going
to throw the problem at some EDB people - did that ever happen?

I seem to recall we fixed that particular problem as well
during the codeathon at OpenSQL Camp.

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200909102039
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkqpnIEACgkQvJuQZxSWSsgIPQCgnvLBNKLqeAVcx8r2ufEcNPyF
bZ4An2Ed60lQ1kyokrAoGJFPQm1fwpOQ
=3i3z
-----END PGP SIGNATURE-----



Re: "show all" command crashes server

From
Tom Lane
Date:
Grant Maxwell <grant.maxwell@maxan.com.au> writes:
> On 11/09/2009, at 8:36 AM, Tom Lane wrote:
>> Did you restart the postmaster afterwards?

>     yep - full restart.

okay, next step is to collect a stack trace ...

            regards, tom lane

Re: "show all" command crashes server *** FIXED ***

From
Grant Maxwell
Date:
First of all thanks to those who provided input.

This problem is now fixed and I thought I would post this solution so that others might benefit in the future.

For the sake of completeness:

The error was that if "show all" was run on this postgresql (version 8.3) server, postgres would crash and then recover.
Otherwise the server "seemed" healthy

The postgres log showed:
Sep 10 23:55:36 theconsole postgres[31118]: [4-1]     0: LOG:  00000: server process (PID 31145) was terminated by signal 11: Segmentation fault
Sep 10 23:55:36 theconsole postgres[31118]: [4-2]     0: LOCATION:  LogChildExit, postmaster.c:2529
Sep 10 23:55:36 theconsole postgres[31118]: [5-1]     0: LOG:  00000: terminating any other active server processes
Sep 10 23:55:36 theconsole postgres[31118]: [5-2]     0: LOCATION:  HandleChildCrash, postmaster.c:2374
Sep 10 23:55:36 theconsole postgres[31118]: [6-1]     0: LOG:  00000: all server processes terminated; reinitializing
Sep 10 23:55:36 theconsole postgres[31118]: [6-2]     0: LOCATION:  PostmasterStateMachine, postmaster.c:2690
Sep 10 23:55:36 theconsole postgres[31146]: [7-1]     0: LOG:  00000: database system was interrupted; last known up at 2009-09-10 23:55:14 EST
Sep 10 23:55:36 theconsole postgres[31146]: [7-2]     0: LOCATION:  StartupXLOG, xlog.c:4836
Sep 10 23:55:36 theconsole postgres[31147]: [7-1]  [local] postgres postgres 0: FATAL:  57P03: the database system is in recovery mode
Sep 10 23:55:36 theconsole postgres[31147]: [7-2]  [local] postgres postgres 0: LOCATION:  ProcessStartupPacket, postmaster.c:1648
Sep 10 23:55:36 theconsole postgres[31146]: [8-1]     0: LOG:  00000: database system was not properly shut down; automatic recovery in progress
Sep 10 23:55:36 theconsole postgres[31146]: [8-2]     0: LOCATION:  StartupXLOG, xlog.c:5003
Sep 10 23:55:36 theconsole postgres[31146]: [9-1]     0: LOG:  00000: record with zero length at 2A/E734761C
Sep 10 23:55:36 theconsole postgres[31146]: [9-2]     0: LOCATION:  ReadRecord, xlog.c:3126
Sep 10 23:55:36 theconsole postgres[31146]: [10-1]     0: LOG:  00000: redo is not required
Sep 10 23:55:36 theconsole postgres[31146]: [10-2]     0: LOCATION:  StartupXLOG, xlog.c:5146
Sep 10 23:55:36 theconsole postgres[31150]: [7-1]     0: LOG:  00000: autovacuum launcher started
Sep 10 23:55:36 theconsole postgres[31150]: [7-2]     0: LOCATION:  AutoVacLauncherMain, autovacuum.c:520
Sep 10 23:55:36 theconsole postgres[31118]: [7-1]     0: LOG:  00000: database system is ready to accept connections

SOLUTION:
Increase the memory on the server.

WHY
We had recently ( a month before) had installed splunk on the server. It was running ok
The combination of splunk and other tasks running had pushed the memory too close.
What we did not notice was that swap had been almost completely consumed - nasty

RESULT
We shut it all down, increased the memory (double) and voila - problem gone.

It goes to show that when hunting problems we should not ignore the basic environmental elements.
It also goes to show that our monitoring system was not looking at this relatively new server.
(this confession is not an invitation for a spanking)

again thanks for the help
Grant


On 11/09/2009, at 9:09 AM, Grant Maxwell wrote:


On 11/09/2009, at 8:36 AM, Tom Lane wrote:

Grant Maxwell <grant.maxwell@maxan.com.au> writes:
On the problem server:
shared_preload_libraries = 'pgmemcache'
#local_preload_libraries = ''

on the others both are emply.

Sounds like a smoking gun to me.

For good measure I removed pgmemcache but the problem persists.

Did you restart the postmaster afterwards?  shared_preload_libraries
is only considered at postmaster start.

yep - full restart.

regards, tom lane


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: "show all" command crashes server *** FIXED ***

From
Tom Lane
Date:
Grant Maxwell <grant.maxwell@maxan.com.au> writes:
>     The error was that if "show all" was run on this postgresql (version
> 8.3) server, postgres would crash and then recover.

>     The postgres log showed:
> Sep 10 23:55:36 theconsole postgres[31118]: [4-1]     0: LOG:  00000:
> server process (PID 31145) was terminated by signal 11: Segmentation
> fault

>     We had recently ( a month before) had installed splunk on the server.
> It was running ok
>     The combination of splunk and other tasks running had pushed the
> memory too close.
>     What we did not notice was that swap had been almost completely
> consumed - nasty
>     We shut it all down, increased the memory (double) and voila -
> problem gone.

Hmm.  A segfault in that case seems to indicate that something somewhere
is failing to check for a null result from malloc().  Which is a bug we
ought to fix.  Is there any chance of getting a core dump stack trace
from one of those crashes?

            regards, tom lane