Thread: Re: pg_upgrade segfault (was: pg_migrator segfault)

Re: pg_upgrade segfault (was: pg_migrator segfault)

From
hernan gonzalez
Date:
2010/11/2 hernan gonzalez <hgonzalez@gmail.com>
2010/11/2 Grzegorz Jaśkiewicz <gryzman@gmail.com>

try gdb --args ./pg_upgrade -d /var/pgsql-8_4_3/data/ -D
/var/pgsql-9_0_1/data/ -b /var/pgsql-8_4_3/bin/ -B
/var/pgsql-9_0_1/bin/ --check -P 5433 -v -g -G debug
and when it fails, type in 'bt' and paste it here please.

--
GJ

I read somewhere that it can happen that a programs segfaults because some allocation problem, which doesnt happen inside gbd (because there some more memory is allocated, or whatever).

Running gbd with the core generated by the segfault, it outputs this:

Program terminated with signal 11, Segmentation fault.
#0  0xb7df84ed in _int_realloc () from /lib/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.11.1-4.i686
(gdb) bt
#0  0xb7df84ed in _int_realloc () from /lib/libc.so.6
#1  0xb7df88a0 in realloc () from /lib/libc.so.6
#2  0xb7db2a5e in __add_to_environ () from /lib/libc.so.6
#3  0xb7db27b7 in putenv () from /lib/libc.so.6
#4  0x0804aa11 in putenv2 ()
#5  0x0804af93 in get_control_data ()
#6  0x08049801 in check_cluster_compatibility ()
#7  0x0804eb88 in main ()


Hernán J. González

Re: pg_upgrade segfault (was: pg_migrator segfault)

From
Tom Lane
Date:
hernan gonzalez <hgonzalez@gmail.com> writes:
> Running gbd with the core generated by the segfault, it outputs this:

> Program terminated with signal 11, Segmentation fault.
> #0  0xb7df84ed in _int_realloc () from /lib/libc.so.6
> Missing separate debuginfos, use: debuginfo-install glibc-2.11.1-4.i686
> (gdb) bt
> #0  0xb7df84ed in _int_realloc () from /lib/libc.so.6
> #1  0xb7df88a0 in realloc () from /lib/libc.so.6
> #2  0xb7db2a5e in __add_to_environ () from /lib/libc.so.6
> #3  0xb7db27b7 in putenv () from /lib/libc.so.6
> #4  0x0804aa11 in putenv2 ()
> #5  0x0804af93 in get_control_data ()
> #6  0x08049801 in check_cluster_compatibility ()
> #7  0x0804eb88 in main ()

Hmm, this suggests that pg_upgrade has managed to clobber malloc's
internal data structures, probably by writing past the end of an
allocated chunk.  You should be able to identify where if you can
run pg_upgrade under valgrind or ElectricFence.

            regards, tom lane

Re: pg_upgrade segfault (was: pg_migrator segfault)

From
hernan gonzalez
Date:
In pg_upgrade/controldata.c  , putenv2 function :

        char       *envstr = (char *) pg_malloc(ctx, strlen(var) +  strlen(val) + 1);
        sprintf(envstr, "%s=%s", var, val);

Shouldn't it be  "+ 2 " instead of "+ 1" ? (one for the '=', plus one for the null terminating char) ?

I think that fixes it.


Hernán J. González
http://hjg.com.ar/

Re: pg_upgrade segfault (was: pg_migrator segfault)

From
hernan gonzalez
Date:
Replacing that 1 for 2 it's enough for making it work, for me, it seems.

But it's not enough to get valgrind happy (It still reports 4 "definitely lost" blocks, all from that putenv2 function). Perhaps that's related to the comment:

         /*
          * Do not free envstr because it becomes part of the environment
          *  on some operating systems.  See port/unsetenv.c::unsetenv.
          */

Hernán J. González
http://hjg.com.ar/

Re: pg_upgrade segfault (was: pg_migrator segfault)

From
Tom Lane
Date:
hernan gonzalez <hgonzalez@gmail.com> writes:
> In pg_upgrade/controldata.c  , putenv2 function :
>         char       *envstr = (char *) pg_malloc(ctx, strlen(var)
> +  strlen(val) + 1);
>         sprintf(envstr, "%s=%s", var, val);

> Shouldn't it be  "+ 2 " instead of "+ 1" ?

Yup, it sure should.  So probably the reason you're the first one to see
it is that the problem would depend on the exact lengths of the strings
being used here :-(

> But it's not enough to get valgrind happy (It still reports 4 "definitely
> lost" blocks, all from that putenv2 function).

That's expected; those blocks aren't supposed to get freed.

            regards, tom lane