unicode searches failing that use % and LIKE operators - Mailing list pgsql-general

From Benjamin Weaver
Subject unicode searches failing that use % and LIKE operators
Date
Msg-id 20071022180038.CE858EB04D@webmail221.herald.ox.ac.uk
Whole thread Raw
Responses Re: unicode searches failing that use % and LIKE operators  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Dear all,

I have the following problem:  a compound search, involving 2 wildcarded
character search terms, in which one search term consists of Latin characters
and the other, of UTF-8 unicode Greek characters, fails.  This is strange,
because similar searches in which both terms are either unicode Greek or Latin
characters succeed.

Both terms query a column of type text.  Searches of this kind fail both via
JDBC and the PSQL interface via an xterm window (which handles unicode
properly).  The JDBC search will have put both terms in UTF-8.  The column is
called metadatafulltext.  The search uses the LIKE operator in conjunction with
the wildcard character:

For example, the following search fails:

SELECT ..

FROM metadatafulltext...

WHERE metadatafulltext LIKE '%Jones%' AND metadatafulltext LIKE '%ALPHABETA%'
(where ALPHABETA is actually a unicode Greek string (\u03b1\u03b2).


whereas searches using all Greek characters succeed:

WHERE metadatafulltext LIKE '%BETAEPSILONDELTA%' AND metadatafulltext LIKE
'%ALPHABETA%'


and equally, all-Latin searches also succeed:
WHERE metadatafulltext LIKE '%Jones%' AND metadatafulltext LIKE '%Smith%'



What must I do to ensure that mixed-term searches of the first kind succeed?


Thanks in advance,

Ben Weaver

--
Benjamin Weaver
Faculty Research Associate, Imaging Papyri Projects, Herculaneum Society, Oxford
email:  benjamin.weaver@classics.ox.ac.uk
phone:  (0)1865 610236


pgsql-general by date:

Previous
From: Martin Marques
Date:
Subject: Bitmap Heap scan 8.1/8.2
Next
From: "Pavel Stehule"
Date:
Subject: Re: Bitmap Heap scan 8.1/8.2