Thread: postgres dies while doing vacuum analyze

postgres dies while doing vacuum analyze

From
Manuel Sugawara
Date:
Guys,

Just installed a new data base in my server and while running vacuum
analyze postgres dies with the following message:

[...]
NOTICE:  Index pg_rewrite_oid_index: Pages 2; Tuples 16. CPU 0.00s/0.00u sec.
NOTICE:  Index pg_rewrite_rulename_index: Pages 2; Tuples 16. CPU 0.00s/0.00u sec.
NOTICE:  --Relation pg_toast_17058--
NOTICE:  Pages 4: Changed 0, reaped 0, Empty 0, New 0; Tup 17: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 219,
MaxLen2034; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
 
NOTICE:  Index pg_toast_17058_idx: Pages 2; Tuples 17. CPU 0.00s/0.00u sec.
NOTICE:  Analyzing...
pqReadData() -- backend closed the channel unexpectedly.This probably means the backend terminated abnormallybefore or
whileprocessing the request.
 
The connection to the server was lost. Attempting reset: Failed.
!# 

The postgres version is 7.1.2 and the data base was initialized with

$ LANG=es_MX /usr/bin/initdb -D /var/lib/pgsql/data -E latin1 

It is running on Redhat Linux 7.1 i686 with 2.4.2-2 kernel.
Here is the back trace from gdb

(gdb) bt
#0  strcoll () at strcoll.c:229
#1  0x081348e7 in varstr_cmp () at eval.c:41
#2  0x0813493f in varstr_cmp () at eval.c:41
#3  0x08134b7c in text_gt () at eval.c:41
#4  0x08148ca2 in FunctionCall2 () at eval.c:41
#5  0x080b3b09 in analyze_rel () at eval.c:41
#6  0x080b3795 in analyze_rel () at eval.c:41
#7  0x080afa76 in vacuum () at eval.c:41
#8  0x080af9c7 in vacuum () at eval.c:41
#9  0x0810a3ca in ProcessUtility () at eval.c:41
#10 0x0810808b in pg_exec_query_string () at eval.c:41
#11 0x081091ce in PostgresMain () at eval.c:41
#12 0x080f208b in PostmasterMain () at eval.c:41
#13 0x080f1c45 in PostmasterMain () at eval.c:41
#14 0x080f0d0c in PostmasterMain () at eval.c:41
#15 0x080f0684 in PostmasterMain () at eval.c:41
#16 0x080cf3c8 in main () at eval.c:41
#17 0x401e2177 in __libc_start_main (main=0x80cf260 <main>, argc=3, ubp_av=0xbffffa7c, init=0x8065c20 <_init>,
fini=0x8154bb0<_fini>, rtld_fini=0x4000e184 <_dl_fini>, stack_end=0xbffffa6c) at ../sysdeps/generic/libc-start.c:129
 
(gdb) 

Seems like a problem with my locale settings. The 
strange thing is that postgres dies while analyzing a system
table; however I'm able to vacuum my tables individually:

$ for t in `psql dep dep -c '\dt' -t -A | cut -d\| -f1`; do psql dep -c "vacuum analyze $t"; done

Any ideas?

best regards,
Manuel.


Re: postgres dies while doing vacuum analyze

From
Tom Lane
Date:
Manuel Sugawara <masm@fciencias.unam.mx> writes:
> [ vacuum analyze dies ]
> It is running on Redhat Linux 7.1 i686 with 2.4.2-2 kernel.
> Here is the back trace from gdb

> (gdb) bt
> #0  strcoll () at strcoll.c:229

We've heard reports before of strcoll() crashing on apparently valid
input.  It seems to be a Red Hat-specific problem; the three reports
I have in my notes are from people running RH 7.0 (check the archives
from 1/1/01, 1/24/01, 3/1/01 if you want to see the prior reports).

It's possible that Postgres is doing something that confuses RH's
locale library, but I dunno what.  Since no other platform is reporting
it, it could also be a plain old bug in that locale library.

We need some RH-er to burrow in with a debugger and figure out what's
going wrong.  The previous reporters don't seem to have done anything;
are you the man to fix it?
        regards, tom lane


Re: postgres dies while doing vacuum analyze

From
teg@redhat.com (Trond Eivind Glomsrød)
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Manuel Sugawara <masm@fciencias.unam.mx> writes:
> > [ vacuum analyze dies ]
> > It is running on Redhat Linux 7.1 i686 with 2.4.2-2 kernel.
> > Here is the back trace from gdb
> 
> > (gdb) bt
> > #0  strcoll () at strcoll.c:229
> 
> We've heard reports before of strcoll() crashing on apparently valid
> input. 

We haven't AFAIK, but would be very interested if it can be reproduced.


-- 
Trond Eivind Glomsrød
Red Hat, Inc.


Re: postgres dies while doing vacuum analyze

From
Manuel Sugawara
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Manuel Sugawara <masm@fciencias.unam.mx> writes:
> > [ vacuum analyze dies ]
> > It is running on Redhat Linux 7.1 i686 with 2.4.2-2 kernel.
> > Here is the back trace from gdb
>
> > (gdb) bt
> > #0  strcoll () at strcoll.c:229
>
> We've heard reports before of strcoll() crashing on apparently valid
> input.  It seems to be a Red Hat-specific problem; the three reports
> I have in my notes are from people running RH 7.0 (check the archives
> from 1/1/01, 1/24/01, 3/1/01 if you want to see the prior reports).
>
> It's possible that Postgres is doing something that confuses RH's
> locale library, but I dunno what.  Since no other platform is reporting
> it, it could also be a plain old bug in that locale library.

After a look into strcoll I found the bug. Attached is a tarball
including a patch for strcoll, glibc.spec and an small program that
shows the bug. Hopefully Trond can address this to the glibc and rpm
experts.

best regards,
Manuel.

>
> We need some RH-er to burrow in with a debugger and figure out what's
> going wrong.  The previous reporters don't seem to have done anything;
> are you the man to fix it?
>
>             regards, tom lane


Attachment

Re: postgres dies while doing vacuum analyze

From
teg@redhat.com (Trond Eivind Glomsrød)
Date:
Manuel Sugawara <masm@fciencias.unam.mx> writes:

> Tom Lane <tgl@sss.pgh.pa.us> writes:
> 
> > Manuel Sugawara <masm@fciencias.unam.mx> writes:
> > > [ vacuum analyze dies ]
> > > It is running on Redhat Linux 7.1 i686 with 2.4.2-2 kernel.
> > > Here is the back trace from gdb
> > 
> > > (gdb) bt
> > > #0  strcoll () at strcoll.c:229
> > 
> > We've heard reports before of strcoll() crashing on apparently valid
> > input.  It seems to be a Red Hat-specific problem; the three reports
> > I have in my notes are from people running RH 7.0 (check the archives
> > from 1/1/01, 1/24/01, 3/1/01 if you want to see the prior reports).
> > 
> > It's possible that Postgres is doing something that confuses RH's
> > locale library, but I dunno what.  Since no other platform is reporting
> > it, it could also be a plain old bug in that locale library.
> 
> After a look into strcoll I found the bug. Attached is a tarball
> including a patch for strcoll, glibc.spec and an small program that
> shows the bug.

Will do... what is the expected result of the testcase? It seems to
work alright for me, but I'm running a slightly newer version than we
have released yet... (glibc-2.2.3-11, look in rawhide).


-- 
Trond Eivind Glomsrød
Red Hat, Inc.


Re: postgres dies while doing vacuum analyze

From
Tom Lane
Date:
teg@redhat.com (Trond Eivind Glomsrød) writes:
> Will do... what is the expected result of the testcase?

Given a sufficiently large discrepancy between the string lengths,
a core dump is the likely result.  Try increasing the "16k" numbers
if it doesn't crash for you.

Good work, Manuel!  I'm surprised this hasn't been found before, because
you'd think it'd be biting lots of people ...
        regards, tom lane


Re: postgres dies while doing vacuum analyze

From
Manuel Sugawara
Date:
teg@redhat.com (Trond Eivind Glomsrød) writes:

> Will do... what is the expected result of the testcase? It seems to
> work alright for me, but I'm running a slightly newer version than we
> have released yet... (glibc-2.2.3-11, look in rawhide).

a core dump, at least on glibc-2.2.2-10. Try with some locale
different than C or POSIX.

masm@dep1$ LC_COLLATE=es_MX ./strcoll-bug
es_MX
zsh: 25041 segmentation fault (core dumped)  LC_COLLATE=es_MX ./strcoll-bug
masm@dep1$ LC_COLLATE=C ./strcoll-bug
C
strcoll returned -1
masm@dep1$ 

regards,
Manuel.                           

> 
> 
> -- 
> Trond Eivind Glomsrød
> Red Hat, Inc.


Re: postgres dies while doing vacuum analyze

From
teg@redhat.com (Trond Eivind Glomsrød)
Date:
Manuel Sugawara <masm@fciencias.unam.mx> writes:

> teg@redhat.com (Trond Eivind Glomsrød) writes:
> 
> > Will do... what is the expected result of the testcase? It seems to
> > work alright for me, but I'm running a slightly newer version than we
> > have released yet... (glibc-2.2.3-11, look in rawhide).
> 
> a core dump, at least on glibc-2.2.2-10. Try with some locale
> different than C or POSIX.
> 
> masm@dep1$ LC_COLLATE=es_MX ./strcoll-bug
> es_MX
> zsh: 25041 segmentation fault (core dumped)  LC_COLLATE=es_MX ./strcoll-bug
> masm@dep1$ LC_COLLATE=C ./strcoll-bug
> C
> strcoll returned -1
> masm@dep1$ 

OK, this works with my system - no coredump, correct results. I'll
take a look at the glibc sources to verify that, but it looks like
this was fixed by drepper@redhat.com and included in glibc 2.2.3:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=36539

-- 
Trond Eivind Glomsrød
Red Hat, Inc.


Re: postgres dies while doing vacuum analyze

From
Manuel Sugawara
Date:
teg@redhat.com (Trond Eivind Glomsrød) writes:

[...]
> OK, this works with my system - no coredump, correct results. I'll
> take a look at the glibc sources to verify that, but it looks like
> this was fixed by drepper@redhat.com and included in glibc 2.2.3:
> https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=36539

yes, is already fixed on glibc-2.2.3. It's safe to install this
version on my 7.1 systems or should I use my rpms?

regards,
Manuel.

> 
> -- 
> Trond Eivind Glomsrød
> Red Hat, Inc.


Re: postgres dies while doing vacuum analyze

From
Trond Eivind Glomsrød
Date:
On 16 Jun 2001, Manuel Sugawara wrote:

> teg@redhat.com (Trond Eivind Glomsrød) writes:
>
> [...]
> > OK, this works with my system - no coredump, correct results. I'll
> > take a look at the glibc sources to verify that, but it looks like
> > this was fixed by drepper@redhat.com and included in glibc 2.2.3:
> > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=36539
>
> yes, is already fixed on glibc-2.2.3. It's safe to install this
> version on my 7.1 systems

The 2.2.3-11 should be safe, we would be very interested to hear
othwerwise.


-- 
Trond Eivind Glomsrød
Red Hat, Inc.



Re: postgres dies while doing vacuum analyze

From
mordicus
Date:
Manuel Sugawara wrote:

> Guys,
> 
> Just installed a new data base in my server and while running vacuum
> analyze postgres dies with the following message:
> 
> [...]
> NOTICE:  Index pg_rewrite_oid_index: Pages 2; Tuples 16. CPU 0.00s/0.00u
> sec.
> NOTICE:  Index pg_rewrite_rulename_index: Pages 2; Tuples 16. CPU
> 0.00s/0.00u sec.
> NOTICE:  --Relation pg_toast_17058--
> NOTICE:  Pages 4: Changed 0, reaped 0, Empty 0, New 0; Tup 17: Vac 0,
> Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 219, MaxLen 2034; Re-using:
> Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. CPU 0.00s/0.00u sec.
> NOTICE:  Index pg_toast_17058_idx: Pages 2; Tuples 17. CPU 0.00s/0.00u
> sec.
> NOTICE:  Analyzing...
> pqReadData() -- backend closed the channel unexpectedly.
> This probably means the backend terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !#
> 
> The postgres version is 7.1.2 and the data base was initialized with
> 
> $ LANG=es_MX /usr/bin/initdb -D /var/lib/pgsql/data -E latin1
> 
> It is running on Redhat Linux 7.1 i686 with 2.4.2-2 kernel.
> Here is the back trace from gdb

Try 2.4.5 Kernel, I have the same problem with Suse 7.1 2.4.2 Kernel, since 
update, no more problems