The following bug has been logged online:
Bug reference: 2895
Logged by: James Russell
Email address: internationalhobo@gmail.com
PostgreSQL version: 8.1.4
Operating system: Linux (Fedora Core 5)
Description: Private Use Unicode character crashes server when using
ILIKE
Details:
A UTF-8 text field contains (among others) the following Unicode character:
0xf4808286 in UTF-8 (== 0x00100086 in UTF-32)
This corresponds to a character in the Unicode "Private Use" area, where the
codepoints are undefined and designated for proprietary uses
(0x100000-0x10FFFD).
If a text field contains a such a character, and if the search term contains
certain Unicode characters, then queries using ILIKE will crash the server.
We discovered this bug when a user searched for rows containing the "TM"
character (UTF-8 0xE284A2 / UTF-32 0x2122), which translated to the
following query:
SELECT * FROM foo WHERE bar ILIKE('%â¢%');
If the rows searched contain the row with the Private Use character, then
the server crashes during the query and psql returns the following:
"psql:recreatebug.sql:8: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
psql:recreatebug.sql:8: connection to server was lost"
This bug is 100% reproducible on multiple servers with different
configurations. Our server configuration does not have any changes from
default.
The following PHP script will generate a psql script that can reproduce the
problem. Run this script to generate the SQL script, then run the SQL script
in psql. The server will crash when the SELECT statement is run.
<?php
$sql = "
\c template1
CREATE DATABASE temp_db WITH ENCODING 'UNICODE';
\c temp_db
BEGIN;
CREATE TABLE foo (bar text);
INSERT INTO foo (bar) VALUES
('".chr(0xf4).chr(0x80).chr(0x82).chr(0x86)."');
SELECT * FROM foo WHERE bar ILIKE('%".chr(0xe2).chr(0x84).chr(0xa2)."%');
\c template1
DROP DATABASE temp_db;
";
file_put_contents('recreatebug.sql', $sql);
Additional notes:
- The bug does not appear to occur when using LIKE, just ILIKE.
- The bug does not appear to occur when searching for plain ASCII strings
like '%abc%'.
- I have not yet tried to reproduce the bug on the latest Postgres 8.2.x