Re: chr() function leads to OOM / killed connection with 8.1, 8.2 - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: chr() function leads to OOM / killed connection with 8.1, 8.2
Date
Msg-id 469FC5B8.3050609@enterprisedb.com
Whole thread Raw
In response to Re: chr() function leads to OOM / killed connection with 8.1, 8.2  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: chr() function leads to OOM / killed connection with 8.1, 8.2  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> Tom Lane wrote:
>>> I can reproduce an out-of-memory condition (basically, replace() is
>>> going into an infinite loop because of the invalid input) but I'm
>>> not seeing any crash.
>
>> replace_text reads past the end of source string, byte by byte (or
>> character by character, not sure), and eventually tries to read from an
>> invalid address which causes a segfault. It happens here when start_posn
>> == 367368.
>
> Hm, must be memory-layout-dependent.  On mine, the output string buffer
> is growing fast enough to ensure there's still RAM to read, up till the
> kernel says no more.
>
> Anyway the problem is that pg_utf2wchar_with_len silently drops the
> trailing incomplete character in its input, causing text_position_next
> to think the pattern is empty, causing an infinite loop because
> curr_posn never advances.  replace_text already tried to guard against
> empty pattern, but it doesn't know about this case.
>
> What I intend to do to fix this is to modify the users of
> text_position_next to believe the string lengths saved by
> text_position_setup, rather than using TEXTLEN() to compute
> the lengths.  This will effectively make replace_text and
> friends consistently act as though the partial character isn't there.
>
> In the long run it might be better to make pg_utf2wchar_with_len throw
> an error for bad input, but I'm quite unsure of the consequences of
> that, in view of the existing comment "not ours to throw error".
> Anyway such a potentially-significant behavioral change doesn't seem
> like a good idea to back-patch.  (We seem to have this bug in one form
> or another clear back to 7.3...)

I agree we should do the above changes for the sake of robustness, but
isn't the real problem here that chr function can return invalid byte
sequences? That was actually discussed a while back (starting at
http://archives.postgresql.org/pgsql-hackers/2007-04/msg00010.php), but
that was inconclusive.

IMHO chr should at the very least not return invalid byte sequences.
Limiting it to ascii range is not a bad idea either, though that might
break some applications.

Is there any other known loopholes to get invalid data in the database?

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: chr() function leads to OOM / killed connection with 8.1, 8.2
Next
From: Tom Lane
Date:
Subject: Re: chr() function leads to OOM / killed connection with 8.1, 8.2