Thread: VACUUM ANALYZE FAILS on 7.0.3

VACUUM ANALYZE FAILS on 7.0.3

From
"Dave Cramer"
Date:
When I run VACUUM ANALYZE it fails and all the backend connections are
closed. Has anyone else run into this problem?


--DC--


Re: VACUUM ANALYZE FAILS on 7.0.3

From
Tom Lane
Date:
"Dave Cramer" <Dave@micro-automation.net> writes:
> When I run VACUUM ANALYZE it fails and all the backend connections are
> closed. Has anyone else run into this problem?

There should be a core dump file from the crashed backend in your
database subdirectory --- can you provide a backtrace from it?

            regards, tom lane

Re: VACUUM ANALYZE FAILS on 7.0.3

From
"Dave Cramer"
Date:
Tom,

No core dump but here is the message from vacuum verbose analyze;

NOTICE:  --Relation pg_rewrite--
pqReadData() -- backend closed the channel unexpectedly.
        This probably means the backend terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


--DC--
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Dave Cramer" <Dave@micro-automation.net>
Cc: <pgsql-general@postgresql.org>
Sent: Tuesday, January 23, 2001 5:06 PM
Subject: Re: [GENERAL] VACUUM ANALYZE FAILS on 7.0.3


> "Dave Cramer" <Dave@micro-automation.net> writes:
> > When I run VACUUM ANALYZE it fails and all the backend connections are
> > closed. Has anyone else run into this problem?
>
> There should be a core dump file from the crashed backend in your
> database subdirectory --- can you provide a backtrace from it?
>
> regards, tom lane
>
>


Re: VACUUM ANALYZE FAILS on 7.0.3

From
Tom Lane
Date:
"Dave Cramer" <Dave@micro-automation.net> writes:
> No core dump but here is the message from vacuum verbose analyze;

> NOTICE:  --Relation pg_rewrite--
> pqReadData() -- backend closed the channel unexpectedly.

Can't tell much from that, except that the backend crashed, which
*should* leave a core dump.

On some platforms (eg, most Linux distros), processes started from
system boot scripts are by default started under "ulimit -c 0", which
prevents core dumps.  To get more information about what's happening,
I recommend restarting the postmaster with "ulimit -c unlimited" so that
crashed backends will leave core files.  While you're at it, make sure
you are starting the postmaster without -S, and redirect its stdout and
stderr into some convenient logfile.  The postmaster log might also
contain useful info about what's going wrong...

            regards, tom lane

Re: VACUUM ANALYZE FAILS on 7.0.3

From
"Dave Cramer"
Date:
Ok, I tried setting ulimit to unlimited inside the script. This didn't help,
still no core dump
I did get logging working though:

/usr/bin/postmaster: CleanupProc: pid 20146 exited with status 139
Server process (pid 20146) exited with status 139 at Tue Jan 23 22:32:26
2001
Terminating any active server processes...
Server processes were terminated at Tue Jan 23 22:32:26 2001
Reinitializing shared memory and semaphores
010123.22:32:26.086 [19777] shmem_exit(0)


--DC--
----- Original Message -----
From: "Tom Lane" <tgl@sss.pgh.pa.us>
To: "Dave Cramer" <Dave@micro-automation.net>
Cc: <pgsql-general@postgresql.org>
Sent: Tuesday, January 23, 2001 5:52 PM
Subject: Re: [GENERAL] VACUUM ANALYZE FAILS on 7.0.3


> "Dave Cramer" <Dave@micro-automation.net> writes:
> > No core dump but here is the message from vacuum verbose analyze;
>
> > NOTICE:  --Relation pg_rewrite--
> > pqReadData() -- backend closed the channel unexpectedly.
>
> Can't tell much from that, except that the backend crashed, which
> *should* leave a core dump.
>
> On some platforms (eg, most Linux distros), processes started from
> system boot scripts are by default started under "ulimit -c 0", which
> prevents core dumps.  To get more information about what's happening,
> I recommend restarting the postmaster with "ulimit -c unlimited" so that
> crashed backends will leave core files.  While you're at it, make sure
> you are starting the postmaster without -S, and redirect its stdout and
> stderr into some convenient logfile.  The postmaster log might also
> contain useful info about what's going wrong...
>
> regards, tom lane
>
>


Re: VACUUM ANALYZE FAILS on 7.0.3

From
Tom Lane
Date:
"Dave Cramer" <Dave@micro-automation.net> writes:
> Ok, I tried setting ulimit to unlimited inside the script. This didn't help,
> still no core dump
> I did get logging working though:

> /usr/bin/postmaster: CleanupProc: pid 20146 exited with status 139
> Server process (pid 20146) exited with status 139 at Tue Jan 23 22:32:26
> 2001

Well, that is for *sure* a backend crashing --- with signal 11, which
is SIGSEGV on most Unixen.  There should be a coredump.  Poke around in
your OS documentation and see if you can figure out what's preventing
the core file from being written.

(BTW, you are looking in the right place no? $PGDATA/base/yourdbname/core)

            regards, tom lane

Re: VACUUM ANALYZE FAILS on 7.0.3

From
"Rob Arnold"
Date:
I had a machine with a bad CPU that did that, but I don't know that this is
your case.

Perhaps a bad index?

Can you determine if only one table is the cause (i.e. Vacuum Analyze
<table>).

Anything in the logs?

--rob

----- Original Message -----
From: "Dave Cramer" <Dave@micro-automation.net>
To: <pgsql-general@postgresql.org>
Sent: Tuesday, January 23, 2001 1:35 PM
Subject: VACUUM ANALYZE FAILS on 7.0.3


> When I run VACUUM ANALYZE it fails and all the backend connections are
> closed. Has anyone else run into this problem?
>
>
> --DC--
>
>


Re: VACUUM ANALYZE FAILS on 7.0.3

From
Tom Lane
Date:
"Dave Cramer" <Dave@micro-automation.net> writes:
> #0  0x40171b80 in strcoll () at strcoll.c:228
> 228     strcoll.c: No such file or directory.

Hm.  This may be a variant of the problem that Jukka Honkela
<fatal@ees2.oulu.fi> has reported.  For him, strcoll() is crashing
even though it's being passed perfectly valid strings to compare.
We suppose that the locale data structures that strcoll() uses are
being clobbered sometime before the crash, but haven't figured out
how or where.

What's interesting is that he saw it under 7.1beta.  We'd assumed that
it was a new bug in 7.1 code, but if you're seeing the same thing on
7.0.3 then it's not new.  (But then why aren't there more reports?
Maybe the problem only occurs on a particular platform, or a particular
libc version, and/or with particular locale environment settings?  What
is your platform & environment, anyway?)

Anyway, if you have time to dig in and see exactly why strcoll is
crashing, another pair of eyes on the problem would surely help.

            regards, tom lane