Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE andwork_mem values - Mailing list pgsql-bugs

From Peter Geoghegan
Subject Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE andwork_mem values
Date
Msg-id CAH2-WzktoNj4uBhJq+5y9puLRq7bHuK=7S+MQKcbgnG4M6A9cg@mail.gmail.com
Whole thread Raw
In response to Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values  ("Daniel Verite" <daniel@manitou-mail.org>)
Responses Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values  ("Daniel Verite" <daniel@manitou-mail.org>)
List pgsql-bugs
On Tue, Aug 1, 2017 at 12:45 AM, Daniel Verite <daniel@manitou-mail.org> wrote:
> The test that iterates over collations produces two kinds of core files,
> some of them are 289MB large, some others are 17GB large.
> shared_buffers is only 128MB and work_mem 128MB,
> so 289MB is not surprising but 17GB seems excessive.
> The box has 16GB of physical mem and 8GB of swap.
>
> I haven't checked all core files because they exhaust the disk
> space before completion of the test, but a typical backtrace for
> the biggest ones looks like the following, with the segfaults
> happening in memcpy:
>
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  __memcpy_sse2_unaligned ()
>     at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:35
> (gdb) #0  __memcpy_sse2_unaligned ()
>     at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:35
> #1  0x00007fc1db02be6b in memcpy (__len=8589934592, __src=0x7fbc6f4a2010,
>     __dest=<optimized out>) at
> /usr/include/x86_64-linux-gnu/bits/string3.h:51
> #2  ucol_CEBuf_Expand (ci=<optimized out>, status=0x7ffd90777128,
>     b=0x7ffd907751e0) at ucol.cpp:7009
> #3  UCOL_CEBUF_PUT (status=0x7ffd90777128, ci=0x7ffd90776460, ce=1493173509,
>     b=0x7ffd907751e0) at ucol.cpp:7022
> #4  ucol_strcollRegular (sColl=sColl@entry=0x7ffd90776460,
>     tColl=tColl@entry=0x7ffd90776610, status=status@entry=0x7ffd90777128)
>     at ucol.cpp:7163
> #5  0x00007fc1db031177 in ucol_strcollRegularUTF8 (coll=0x1371af0,
>     source=source@entry=0x273d379 "콗喩zx㎍",
>     sourceLength=sourceLength@entry=11, target=<optimized out>,
>     targetLength=targetLength@entry=8, status=status@entry=0x7ffd90777128)
>     at ucol.cpp:8023
> #6  0x00007fc1db032d36 in ucol_strcollUseLatin1UTF8 (status=<optimized out>,
>     tLen=<optimized out>, target=<optimized out>, sLen=<optimized out>,
>     source=<optimized out>, coll=<optimized out>) at ucol.cpp:8108
> #7  ucol_strcollUTF8_52 (coll=<optimized out>,
>     source=source@entry=0x273d379 "콗喩zx㎍", sourceLength=<optimized out>,
>     sourceLength@entry=11, target=<optimized out>,
>     target@entry=0x273d409 "쳭喩zz", targetLength=targetLength@entry=8,
>     status=status@entry=0x7ffd90777128) at ucol.cpp:8770

Interesting. The "__len" argument to memcpy() is 8589934592 -- that's
2 ^ 33. (I'm not sure why it's the first memcpy() argument in the
stack trace, since it's supposed to be the last -- anyone seen that
before?)

Can you figure out what the optimized-out lengths are, by either
looking at registers within GDB, or building at a lower optimization
level?

Maybe this is a bug in ICU-52. For reference, here is ICU-52's
ucol_CEBuf_Expand() function:

static
void ucol_CEBuf_Expand(ucol_CEBuf *b, collIterate *ci, UErrorCode *status) {   uint32_t  oldSize;   uint32_t  newSize;
uint32_t  *newBuf; 
   ci->flags |= UCOL_ITER_ALLOCATED;   oldSize = (uint32_t)(b->pos - b->buf);   newSize = oldSize * 2;   newBuf =
(uint32_t*)uprv_malloc(newSize * sizeof(uint32_t));   if(newBuf == NULL) {       *status = U_MEMORY_ALLOCATION_ERROR;
}  else {       uprv_memcpy(newBuf, b->buf, oldSize * sizeof(uint32_t));       if (b->buf != b->localArray) {
uprv_free(b->buf);      }       b->buf = newBuf;       b->endp = b->buf + newSize;       b->pos  = b->buf + oldSize;
}
}

If "oldSize * sizeof(uint32_t)" becomes what we see as "__len", as I
believe it does, then that must mean that oldSize is 2 ^ 31. *Not* 2 ^
31 - 1 (INT_MAX). I think that this could be an off-by-one bug, since
ucol_strcollUTF8()/ucol_strcollUTF8_52() accepts an int32 argument for
sourceLength and targetLength. I'm not very confident of this, but it
does make a certain amount of sense. It could be that everyone else is
passing -1 as sourceLength and targetLength arguments, anyway, to
indicate that the buffer is NUL-terminated, as required by regular
strcoll().

Note also that the docs say this of ucol_strcollUTF8(): "When input
string contains malformed a UTF-8 byte sequence, this function treats
these bytes as REPLACEMENT CHARACTER (U+FFFD)". I'm not sure that
that's a very sensible way for it to fail.

I'd be interested to see if anything changed when -1 was passed as
both sourceLength and targetLength to ucol_strcollUTF8(). You'd have
to build Postgres yourself to test this, but it would just work, since
we don't actually avoid NUL termination, even though in principled we
could with ICU.

--
Peter Geoghegan


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

pgsql-bugs by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: [HACKERS] Re: [BUGS] BUG #14758: Segfault with logicalreplication on a function index
Next
From: Andres Freund
Date:
Subject: Re: [BUGS] BUG #14758: Segfault with logical replication on afunction index