Re: BUG #8970: ts_parse incorrectly split numbers in digit token - Mailing list pgsql-bugs

From Marco Atzeri
Subject Re: BUG #8970: ts_parse incorrectly split numbers in digit token
Date
Msg-id 52ED5627.4070005@gmail.com
Whole thread Raw
In response to Re: BUG #8970: ts_parse incorrectly split numbers in digit token  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #8970: ts_parse incorrectly split numbers in digit token
List pgsql-bugs
On 26/01/2014 18:27, Tom Lane wrote:
> Marco Atzeri <marco.atzeri@gmail.com> writes:
>> On 26/01/2014 03:25, Alvaro Herrera wrote:
>>> To trace this, I would look at src/backend/tsearch/wparser_def.c;
>>> probably try compiling that file with WPARSER_TRACE defined, and compare
>>> the output of ts_parse() in something simple such as '345' in a working
>>> port with the failing one.  That might give you clues as to what is
>>> causing the failure.
>
>> [ trace ]
>
> As was suspected upthread, this shows that p_isdigit() is failing to
> recognize "3" as a digit.  So you've got broken locale support somewhere.
>
> There are two different implementations of p_isdigit in wparser_def.c,
> depending on whether USE_WIDE_UPPER_LOWER is defined.  It should be, in
> a Windows build, but maybe this is tracing back to a configure problem?
>
>             regards, tom lane
>

debugging a bit I think that is not a broken locale

the first two times the character contains also a portion of the
next digit so the result is always false.

Eventually it was assumed that size of a wide char is always 32 bit ?

"Unlike Windows UTF-16 2-byte wide chars, wchar_t on Linux and OS X is 4
bytes UTF-32 (gcc/g++ and XCode). On cygwin it is 2 (cygwin uses Windows
APIs)."

testing with "SELECT * FROM ts_parse('default', '345');"

--------------------------------------------------------------
Breakpoint 1, p_isdigit (prs=0x80100930)
     at
/pub/devel/postgresql/postgresql-9.3.2-2/src/postgresql-9.3.2/src/backend/tsearch/wparser_def.c:560
560     p_iswhat(digit)
(gdb) step
0x007036d8 in iswdigit ()
(gdb) step
Single stepping until exit from function iswdigit,
which has no line number information.
iswdigit (c=3407923)
     at /usr/src/debug/cygwin-1.7.27-2/newlib/libc/ctype/iswdigit.c:35
35        return (c >= (wint_t)'0' && c <= (wint_t)'9');
(gdb) p/x c
$77 = 0x340033
(gdb) finish
Run till exit from #0  iswdigit (c=3407923)
     at /usr/src/debug/cygwin-1.7.27-2/newlib/libc/ctype/iswdigit.c:35
0x0060c510 in TParserGet (prs=0x80100930)
     at
/pub/devel/postgresql/postgresql-9.3.2-2/src/postgresql-9.3.2/src/backend/tsearch/wparser_def.c:1834
1834                            if (item->isclass(prs) != 0)
Value returned is $78 = 0

Breakpoint 1, p_isdigit (prs=0x80100930)
     at
/pub/devel/postgresql/postgresql-9.3.2-2/src/postgresql-9.3.2/src/backend/tsearch/wparser_def.c:560
560     p_iswhat(digit)
(gdb) step
0x007036d8 in iswdigit ()
(gdb) step
Single stepping until exit from function iswdigit,
which has no line number information.
iswdigit (c=3473460)
     at /usr/src/debug/cygwin-1.7.27-2/newlib/libc/ctype/iswdigit.c:35
35        return (c >= (wint_t)'0' && c <= (wint_t)'9');
(gdb) p/x c
$79 = 0x350034
(gdb) finish
Run till exit from #0  iswdigit (c=3473460)
     at /usr/src/debug/cygwin-1.7.27-2/newlib/libc/ctype/iswdigit.c:35
0x0060c510 in TParserGet (prs=0x80100930)
     at
/pub/devel/postgresql/postgresql-9.3.2-2/src/postgresql-9.3.2/src/backend/tsearch/wparser_def.c:1834
1834                            if (item->isclass(prs) != 0)
Value returned is $80 = 0

Breakpoint 1, p_isdigit (prs=0x80100930)
     at
/pub/devel/postgresql/postgresql-9.3.2-2/src/postgresql-9.3.2/src/backend/tsearch/wparser_def.c:560
560     p_iswhat(digit)
(gdb) step
0x007036d8 in iswdigit ()
(gdb) step
Single stepping until exit from function iswdigit,
which has no line number information.
iswdigit (c=53)
     at /usr/src/debug/cygwin-1.7.27-2/newlib/libc/ctype/iswdigit.c:35
35        return (c >= (wint_t)'0' && c <= (wint_t)'9');
(gdb) p/x c
$81 = 0x35
(gdb) finish
Run till exit from #0  iswdigit (c=53)
     at /usr/src/debug/cygwin-1.7.27-2/newlib/libc/ctype/iswdigit.c:35
0x0060c510 in TParserGet (prs=0x80100930)
     at
/pub/devel/postgresql/postgresql-9.3.2-2/src/postgresql-9.3.2/src/backend/tsearch/wparser_def.c:1834
1834                            if (item->isclass(prs) != 0)
Value returned is $82 = 1
-------------------------------------------------------------------------

pgsql-bugs by date:

Previous
From: "Paul Watson"
Date:
Subject:
Next
From: Tom Lane
Date:
Subject: Re: BUG #8970: ts_parse incorrectly split numbers in digit token