BUG #8821: pg_trgm segfault with Turkish locale database - Mailing list pgsql-bugs

From ufuk@paralaus.com
Subject BUG #8821: pg_trgm segfault with Turkish locale database
Date
Msg-id 20140113125641.30125.40578@wrigleys.postgresql.org
Whole thread Raw
Responses Re: BUG #8821: pg_trgm segfault with Turkish locale database  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      8821
Logged by:          Ufuk Kayserilioglu
Email address:      ufuk@paralaus.com
PostgreSQL version: 9.3.2
Operating system:   Linux - Ubuntu x86_64
Description:

PLATFORM:
--------

PostgreSQL 9.3.2 installed from precise-pgdg apt repository on Ubuntu
12.04.3 LTS and 13.10 x86_64

SETUP:
-----

Given a database with encoding UTF-8, locale tr_TR.UTF-8, 'pg_trgm'
extension enabled, and the following setup:

CREATE TABLE car_makers (maker TEXT);
INSERT INTO car_makers VALUES ('AUDI'), ('MINI');

Run any one of the following commands:

- SELECT maker <-> 'MAZDA' FROM car_makers;
- SELECT similarity(maker, 'MAZDA') FROM car_makers;
- SELECT show_trgm('III');

ACTUAL OUTCOME:
--------------

The database will crash and the log will output something along the lines
of:

LOG:  server process (PID 24941) was terminated by signal 11: Segmentation
fault
DETAIL:  Failed process was running: SELECT maker <-> 'MAZDA' FROM
car_makers;
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
FATAL:  the database system is in recovery mode
LOG:  all server processes terminated; reinitializing

EXPECTED OUTCOME:
----------------

The commands return a result and the database process stays up.

ANALYSIS:
--------

I initially ran into this issue on a table with more records and have
narrowed the case down to rows that have 4 letter words that contain the
uppercase I character (that's why only 'AUDI' and 'MINI' in the example). I
am not sure why comparison with 'MAZDA' triggers the crash (for example, a
comparison with 'MAZDA ' does not), but it seems other 5 char strings also
trigger it (eg. 'ABCDE' or '12345'). Also, there are other, longer
comparison strings like 'RENAULT-OYAK ' that also trigger it.

As I was trying to understand what exactly causes the problem, I tried to
understand the trigrams generated by the values in the table and various
test strings. I happened on the other crash during that experimentation.

As a result, it seems there is a problem with trigrams generation and/or
comparison when the Turkish locale is being used.

The issue ultimately seems to be linked to the fact that the lowercase
equivalent of "I" is "ı" (i without the dot), and the trigram codes does the
proper conversion to lowercase which turns the string into a multibyte
string.

The same problem does NOT exist on Mac OSX 10.9.1 with PostgreSQL installed
via Postgres.app. However, OSX has broken tr_TR locale support to begin with
(for example, sorting does not work properly.)

WORKAROUND:
----------

I was not able to find a reliable workaround since the code that does the
lowercase conversion seems to only take into consideration the locale of the
working database (which in my case needs to be tr_TR). Specifying an 'en_US'
collation on the 'maker' column or an explicit collation specifier in the
query do not work.

pgsql-bugs by date:

Previous
From: mmogadime@argility.com
Date:
Subject: BUG #8822: What's the work around for the following exception?
Next
From: Pavel Stehule
Date:
Subject: Re: BUG #8822: What's the work around for the following exception?