BUG #2895: Private Use Unicode character crashes server when using ILIKE - Mailing list pgsql-bugs

From James Russell
Subject BUG #2895: Private Use Unicode character crashes server when using ILIKE
Date
Msg-id 200701160616.l0G6GMQ7057017@wwwmaster.postgresql.org
Whole thread Raw
Responses Re: BUG #2895: Private Use Unicode character crashes server when using ILIKE  (Michael Fuhr <mike@fuhr.org>)
Re: BUG #2895: Private Use Unicode character crashes server when using ILIKE  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
The following bug has been logged online:

Bug reference:      2895
Logged by:          James Russell
Email address:      internationalhobo@gmail.com
PostgreSQL version: 8.1.4
Operating system:   Linux (Fedora Core 5)
Description:        Private Use Unicode character crashes server when using
ILIKE
Details:

A UTF-8 text field contains (among others) the following Unicode character:
 0xf4808286 in UTF-8 (== 0x00100086 in UTF-32)

This corresponds to a character in the Unicode "Private Use" area, where the
codepoints are undefined and designated for proprietary uses
(0x100000-0x10FFFD).

If a text field contains a such a character, and if the search term contains
certain Unicode characters, then queries using ILIKE will crash the server.

We discovered this bug when a user searched for rows containing the "TM"
character (UTF-8 0xE284A2 / UTF-32 0x2122), which translated to the
following query:

SELECT * FROM foo WHERE bar ILIKE('%™%');

If the rows searched contain the row with the Private Use character, then
the server crashes during the query and psql returns the following:

"psql:recreatebug.sql:8: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
psql:recreatebug.sql:8: connection to server was lost"

This bug is 100% reproducible on multiple servers with different
configurations. Our server configuration does not have any changes from
default.

The following PHP script will generate a psql script that can reproduce the
problem. Run this script to generate the SQL script, then run the SQL script
in psql. The server will crash when the SELECT statement is run.

<?php
$sql = "
\c template1
CREATE DATABASE temp_db WITH ENCODING 'UNICODE';
\c temp_db
BEGIN;
CREATE TABLE foo (bar text);
INSERT INTO foo (bar) VALUES
('".chr(0xf4).chr(0x80).chr(0x82).chr(0x86)."');
SELECT * FROM foo WHERE bar ILIKE('%".chr(0xe2).chr(0x84).chr(0xa2)."%');

\c template1
DROP DATABASE temp_db;
";
file_put_contents('recreatebug.sql', $sql);


Additional notes:
- The bug does not appear to occur when using LIKE, just ILIKE.
- The bug does not appear to occur when searching for plain ASCII strings
like '%abc%'.
- I have not yet tried to reproduce the bug on the latest Postgres 8.2.x

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #2898: dynamic load support
Next
From: Laurence Rowe
Date:
Subject: Re: BUG #2848: information_schema.key_column_usage does not work