Thread: beta6 pg_restore core dumps

beta6 pg_restore core dumps

From
Tatsuo Ishii
Date:
pg_restore crushes if dump data includes large objects...
--
Tatsuo Ishii

[t-ishii@srapc1474 7.1]$ createdb test
CREATE DATABASE
[t-ishii@srapc1474 7.1]$ psql -c "select lo_import('/boot/vmlinuz')" testlo_import 
-----------    20736
(1 row)

[t-ishii@srapc1474 7.1]$ pg_dump -F c -b test > test.db
[t-ishii@srapc1474 7.1]$ createdb test2
CREATE DATABASE
[t-ishii@srapc1474 7.1]$ pg_restore -d test2 test.db
Segmentation fault (core dumped)
[t-ishii@srapc1474 7.1]$ gdb pg_restore core
GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
[snip]
#0  0x804abd4 in _enableTriggersIfNecessary (AH=0x8057d30, te=0x0,    ropt=0x8057c90) at pg_backup_archiver.c:474
474        ahprintf(AH, "UPDATE pg_class SET reltriggers = "
(gdb) where
#0  0x804abd4 in _enableTriggersIfNecessary (AH=0x8057d30, te=0x0,    ropt=0x8057c90) at pg_backup_archiver.c:474
#1  0x804a8c0 in RestoreArchive (AHX=0x8057d30, ropt=0x8057c90)   at pg_backup_archiver.c:336
#2  0x804a03e in main (argc=4, argv=0x7ffff864) at pg_restore.c:312
#3  0x2ab9796b in __libc_start_main (main=0x8049a40 <main>, argc=4,    argv=0x7ffff864, init=0x8049394 <_init>,
fini=0x8052d2c<_fini>,    rtld_fini=0x2aab5d00 <_dl_fini>, stack_end=0x7ffff85c)   at
../sysdeps/generic/libc-start.c:92
(gdb) 


Re: beta6 pg_restore core dumps

From
Tom Lane
Date:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> pg_restore crushes if dump data includes large objects...

This is probably the same problem that Martin Renters reported
yesterday.  I have a patch that seems to fix it on my machine,
but I haven't heard back from Martin whether it solves his case
completely.  In particular, he said something about memory leaks...
        regards, tom lane


*** pg_backup_custom.c.orig    Fri Feb  9 17:32:26 2001
--- pg_backup_custom.c    Fri Mar 16 17:24:59 2001
***************
*** 521,531 ****         if (blkLen > (ctx->inSize - 1)) {             free(ctx->zlibIn);             ctx->zlibIn =
NULL;
!             ctx->zlibIn = (char*)malloc(blkLen);             if (!ctx->zlibIn)                 die_horribly(AH, "%s:
failedto allocate decompression buffer\n", progname); 
 
!             ctx->inSize = blkLen;             in = ctx->zlibIn;         } 
--- 521,531 ----         if (blkLen > (ctx->inSize - 1)) {             free(ctx->zlibIn);             ctx->zlibIn =
NULL;
!             ctx->zlibIn = (char*)malloc(blkLen+1);             if (!ctx->zlibIn)                 die_horribly(AH,
"%s:failed to allocate decompression buffer\n", progname); 
 
!             ctx->inSize = blkLen+1;             in = ctx->zlibIn;         } 


Re: beta6 pg_restore core dumps

From
Tom Lane
Date:
After looking more closely I see that pg_restore has two different
buffer overrun conditions in this one routine.  Attached is take two
of my patch.

This would be a lot simpler and cleaner if _PrintData() simply didn't
append a zero byte to the buffer contents.  Philip, is it actually
necessary for it to do that?
        regards, tom lane


*** pg_backup_custom.c.orig    Fri Feb  9 17:32:26 2001
--- pg_backup_custom.c    Sat Mar 17 12:25:17 2001
***************
*** 150,156 ****     if (ctx->zp == NULL)     die_horribly(AH, "%s: unable to allocate zlib stream archive
context",progname);
 
!     ctx->zlibOut = (char*)malloc(zlibOutSize);     ctx->zlibIn = (char*)malloc(zlibInSize);     ctx->inSize =
zlibInSize;    ctx->filePos = 0;
 
--- 150,163 ----     if (ctx->zp == NULL)     die_horribly(AH, "%s: unable to allocate zlib stream archive
context",progname);
 
!     /*
!      * zlibOutSize is the buffer size we tell zlib it can output to.  We
!      * actually allocate one extra byte because some routines want to append
!      * a trailing zero byte to the zlib output.  The input buffer is expansible
!      * and is always of size ctx->inSize; zlibInSize is just the initial
!      * default size for it.
!      */
!     ctx->zlibOut = (char*)malloc(zlibOutSize+1);     ctx->zlibIn = (char*)malloc(zlibInSize);     ctx->inSize =
zlibInSize;    ctx->filePos = 0;
 
***************
*** 518,531 ****      blkLen = ReadInt(AH);     while (blkLen != 0) {
!         if (blkLen > (ctx->inSize - 1)) {             free(ctx->zlibIn);             ctx->zlibIn = NULL;
!             ctx->zlibIn = (char*)malloc(blkLen);             if (!ctx->zlibIn)                 die_horribly(AH, "%s:
failedto allocate decompression buffer\n", progname); 
 
!             ctx->inSize = blkLen;             in = ctx->zlibIn;         } 
--- 525,538 ----      blkLen = ReadInt(AH);     while (blkLen != 0) {
!         if (blkLen+1 > ctx->inSize) {             free(ctx->zlibIn);             ctx->zlibIn = NULL;
!             ctx->zlibIn = (char*)malloc(blkLen+1);             if (!ctx->zlibIn)                 die_horribly(AH,
"%s:failed to allocate decompression buffer\n", progname); 
 
!             ctx->inSize = blkLen+1;             in = ctx->zlibIn;         } 


Re: beta6 pg_restore core dumps

From
Tatsuo Ishii
Date:
Thanks, at least the problem I have reported seems gone after I
applied your patch.
--
Tatsuo Ishii

> After looking more closely I see that pg_restore has two different
> buffer overrun conditions in this one routine.  Attached is take two
> of my patch.
> 
> This would be a lot simpler and cleaner if _PrintData() simply didn't
> append a zero byte to the buffer contents.  Philip, is it actually
> necessary for it to do that?
> 
>             regards, tom lane
> 
> 
> *** pg_backup_custom.c.orig    Fri Feb  9 17:32:26 2001
> --- pg_backup_custom.c    Sat Mar 17 12:25:17 2001
> ***************
> *** 150,156 ****
>       if (ctx->zp == NULL)
>       die_horribly(AH, "%s: unable to allocate zlib stream archive context",progname);
>   
> !     ctx->zlibOut = (char*)malloc(zlibOutSize);
>       ctx->zlibIn = (char*)malloc(zlibInSize);
>       ctx->inSize = zlibInSize;
>       ctx->filePos = 0;
> --- 150,163 ----
>       if (ctx->zp == NULL)
>       die_horribly(AH, "%s: unable to allocate zlib stream archive context",progname);
>   
> !     /*
> !      * zlibOutSize is the buffer size we tell zlib it can output to.  We
> !      * actually allocate one extra byte because some routines want to append
> !      * a trailing zero byte to the zlib output.  The input buffer is expansible
> !      * and is always of size ctx->inSize; zlibInSize is just the initial
> !      * default size for it.
> !      */
> !     ctx->zlibOut = (char*)malloc(zlibOutSize+1);
>       ctx->zlibIn = (char*)malloc(zlibInSize);
>       ctx->inSize = zlibInSize;
>       ctx->filePos = 0;
> ***************
> *** 518,531 ****
>   
>       blkLen = ReadInt(AH);
>       while (blkLen != 0) {
> !         if (blkLen > (ctx->inSize - 1)) {
>               free(ctx->zlibIn);
>               ctx->zlibIn = NULL;
> !             ctx->zlibIn = (char*)malloc(blkLen);
>               if (!ctx->zlibIn)
>                   die_horribly(AH, "%s: failed to allocate decompression buffer\n", progname);
>   
> !             ctx->inSize = blkLen;
>               in = ctx->zlibIn;
>           }
>   
> --- 525,538 ----
>   
>       blkLen = ReadInt(AH);
>       while (blkLen != 0) {
> !         if (blkLen+1 > ctx->inSize) {
>               free(ctx->zlibIn);
>               ctx->zlibIn = NULL;
> !             ctx->zlibIn = (char*)malloc(blkLen+1);
>               if (!ctx->zlibIn)
>                   die_horribly(AH, "%s: failed to allocate decompression buffer\n", progname);
>   
> !             ctx->inSize = blkLen+1;
>               in = ctx->zlibIn;
>           }
>   


Re: beta6 pg_restore core dumps

From
Philip Warner
Date:
At 12:31 17/03/01 -0500, Tom Lane wrote:
>
>This would be a lot simpler and cleaner if _PrintData() simply didn't
>append a zero byte to the buffer contents.  Philip, is it actually
>necessary for it to do that?
>

Strictly, I think the answer is that it is not necessary. The output of the
uncompress may be a string, which could be passed to one of the str*
functions by a downstream call. AFAICT, this is not the case, and the code
should work without it, but it's probably safer in the long run to leave it
there. If you have strong feelings about removing it, I'll have a closer
look at the code, but my guess is that it was just me being paranoid (and
stuffing up).


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: beta6 pg_restore core dumps

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> At 12:31 17/03/01 -0500, Tom Lane wrote:
>> This would be a lot simpler and cleaner if _PrintData() simply didn't
>> append a zero byte to the buffer contents.  Philip, is it actually
>> necessary for it to do that?

> Strictly, I think the answer is that it is not necessary. The output of the
> uncompress may be a string, which could be passed to one of the str*
> functions by a downstream call. AFAICT, this is not the case, and the code
> should work without it, but it's probably safer in the long run to leave it
> there.

Considering that the data we are working with is binary, and may contain
nulls, any code that insisted on null-termination would probably be ipso
facto broken.
        regards, tom lane


Re: beta6 pg_restore core dumps

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
>> Considering that the data we are working with is binary, and may contain
>> nulls, any code that insisted on null-termination would probably be ipso
>> facto broken.

> But we're not; this is the same code that sends the COPY output back to PG.

Oh, isn't this the code that pushes large-object bodies around?  I
should think the problem would've been noticed much sooner if not...
        regards, tom lane


Re: beta6 pg_restore core dumps

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
>> Oh, isn't this the code that pushes large-object bodies around?  I
>> should think the problem would've been noticed much sooner if not...

> It does both, which is why I was also surprised.

Hmm ... digging through the code, it does look like one of the possible
destinations is ExecuteSqlCommandBuf, which is a bit schizo about
whether it's dealing with a null-terminated string or not, but is likely
to get ill if handed one that isn't.

Okay, I'll commit what I have then.
        regards, tom lane


Re: beta6 pg_restore core dumps

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> Looking at Tatsuos original message, it looks like the lowest level call was:
> #0  0x804abd4 in _enableTriggersIfNecessary (AH=0x8057d30, te=0x0, 
>     ropt=0x8057c90) at pg_backup_archiver.c:474

> which probably has nothing to do with BLOBs.

Oh ... I had assumed it was just dying there because of collateral
damage from the buffer overrun stomp, but if you see an actual bug there
then by all means fix it ;-)
        regards, tom lane


Re: beta6 pg_restore core dumps

From
Philip Warner
Date:
At 21:08 17/03/01 -0500, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>>> Considering that the data we are working with is binary, and may contain
>>> nulls, any code that insisted on null-termination would probably be ipso
>>> facto broken.
>
>> But we're not; this is the same code that sends the COPY output back to PG.
>
>Oh, isn't this the code that pushes large-object bodies around?  I
>should think the problem would've been noticed much sooner if not...

It does both, which is why I was also surprised.



----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: beta6 pg_restore core dumps

From
Philip Warner
Date:
Looking at Tatsuos original message, it looks like the lowest level call was:

#0  0x804abd4 in _enableTriggersIfNecessary (AH=0x8057d30, te=0x0,    ropt=0x8057c90) at pg_backup_archiver.c:474

which probably has nothing to do with BLOBs. I think it's a different
problem entirely, caused by a mistake in my recent trigger enable/disable
code that only become apparent if BLOBs are being restored. I'll fix it
soon...




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: beta6 pg_restore core dumps

From
Philip Warner
Date:
At 20:57 17/03/01 -0500, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>> At 12:31 17/03/01 -0500, Tom Lane wrote:
>>> This would be a lot simpler and cleaner if _PrintData() simply didn't
>>> append a zero byte to the buffer contents.  Philip, is it actually
>>> necessary for it to do that?
>
>> Strictly, I think the answer is that it is not necessary. The output of the
>> uncompress may be a string, which could be passed to one of the str*
>> functions by a downstream call. AFAICT, this is not the case, and the code
>> should work without it, but it's probably safer in the long run to leave it
>> there.
>
>Considering that the data we are working with is binary, and may contain
>nulls, any code that insisted on null-termination would probably be ipso
>facto broken.

But we're not; this is the same code that sends the COPY output back to PG.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: beta6 pg_restore core dumps

From
Philip Warner
Date:
At 21:24 17/03/01 -0500, Tom Lane wrote:
>Philip Warner <pjw@rhyme.com.au> writes:
>> Looking at Tatsuos original message, it looks like the lowest level call
was:
>> #0  0x804abd4 in _enableTriggersIfNecessary (AH=0x8057d30, te=0x0, 
>>     ropt=0x8057c90) at pg_backup_archiver.c:474
>
>> which probably has nothing to do with BLOBs.
>
>Oh ... I had assumed it was just dying there because of collateral
>damage from the buffer overrun stomp, but if you see an actual bug there
>then by all means fix it ;-)

Fixed. It happened for Tatsuo because of the test case he used. Any real,
full, database dump would have worked. It's just data-only ones that
failed, and the test case he cited was an implied data-only restore (there
were no tables or other metadata).






----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: beta6 pg_restore core dumps

From
Martin Renters
Date:
On Sat, Mar 17, 2001 at 12:31:20PM -0500, Tom Lane wrote:
> After looking more closely I see that pg_restore has two different
> buffer overrun conditions in this one routine.  Attached is take two
> of my patch.
> 
> This would be a lot simpler and cleaner if _PrintData() simply didn't
> append a zero byte to the buffer contents.  Philip, is it actually
> necessary for it to do that?

This patch seems to fix the problem I was seeing.

Martin