string_to_array() is confused by ambiguous field separator - Mailing list pgsql-bugs

From Tom Lane
Subject string_to_array() is confused by ambiguous field separator
Date
Msg-id 6008.1160169130@sss.pgh.pa.us
Whole thread Raw
List pgsql-bugs
Good:

regression=# select string_to_array('123xx456xx789', 'xx');
 string_to_array
-----------------
 {123,456,789}
(1 row)

Not so good:

regression=# select string_to_array('123xx456xxx789', 'xx');
ERROR:  negative substring length not allowed

The proximate problem is that in the inner loop in text_position(),
if it finds a match but hasn't yet found matchnum of them, it advances
only one character instead of advancing over the whole match.  This
means it can report overlapping successive matches, which leads to an
invalid subscript calculation in text_to_array().  I think the correct
approach is to ignore overlapping matches, so that the result in the
second case would be
    {123,456,x789}

There's another problem here, which is that the API of text_position()
is poorly chosen anyway: as defined, parsing a string of N fields
requires O(N^2) work.  It'd be better to pass it a starting character
number for the search instead of a field number to find, and to break
out the setup step so that we don't have to repeat the conversion to
pg_wchar format for each field.

Any objections?

            regards, tom lane

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #2674: libedit not detected
Next
From: Martin Pitt
Date:
Subject: Fwd: Bug#390730: postgresql-8.1: segfaults on huge litteral IN clauses