Thread: pg_trgm: unicode string not working
I am using pg_trgm for spelling correction as prescribed in the documentation. But I see that it does not work for unicode sring. The database was initialized with utf8 encoding and the C locale. Here is the table:\d words Table "public.words"Column | Type | Modifiers --------+---------+-----------word | text | ndoc | integer | nentry | integer | Indexes: "words_idx" gin (word gin_trgm_ops) Query: select word from words where word % 'कतद'; I get an error: ERROR: GIN indexes do not support whole-index scans Any idea what is wrong? -Sushant.
Hi Next time, please post questions regarding the usage of postgres to the -general list, not to -hackers. The purpose of -hackers is to discuss the development of postgres proper, not the development of applications using postgres. On Jun12, 2011, at 13:33 , Sushant Sinha wrote: > I am using pg_trgm for spelling correction as prescribed in the > documentation. But I see that it does not work for unicode sring. The > database was initialized with utf8 encoding and the C locale. I think you need to use a locale (more precisely, a CTYPE) in which 'क', 'त', 'द' are considered to be alphanumeric. You can specify the CTYPE when creating the database with CREATE DATABASE ... LC_CTYPE = ... > Here is the table: > \d words > Table "public.words" > Column | Type | Modifiers > --------+---------+----------- > word | text | > ndoc | integer | > nentry | integer | > Indexes: > "words_idx" gin (word gin_trgm_ops) > > Query: select word from words where word % 'कतद'; > > I get an error: > > ERROR: GIN indexes do not support whole-index scans pg_trgm probably ignores non-alphanumeric characters during comparison, so you end up with an empty search string, which translates to a whole-index scan. Postgres up to 9.0 does not support such scans for GIN indices. Note that this restriction was removed in postgres 9.1 which is currently in beta. However, GIT indices must be re-created with REINDEX after upgrading from 9.0 to leverage that improvement. best regards. Florian Pflug
On Sun, Jun 12, 2011 at 8:40 AM, Florian Pflug <fgp@phlo.org> wrote: > Note that this restriction was removed in postgres 9.1 which > is currently in beta. However, GIT indices must be re-created > with REINDEX after upgrading from 9.0 to leverage that > improvement. Does pg_upgrade know about this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Sun, Jun 12, 2011 at 8:40 AM, Florian Pflug <fgp@phlo.org> wrote: > > Note that this restriction was removed in postgres 9.1 which > > is currently in beta. However, GIT indices must be re-created > > with REINDEX after upgrading from 9.0 to leverage that > > improvement. > > Does pg_upgrade know about this? No, it does not. Under what circumstances should I issue a suggestion to reindex, and what should the text be? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Mon, Jun 13, 2011 at 7:47 PM, Bruce Momjian <bruce@momjian.us> wrote: > Robert Haas wrote: >> On Sun, Jun 12, 2011 at 8:40 AM, Florian Pflug <fgp@phlo.org> wrote: >> > Note that this restriction was removed in postgres 9.1 which >> > is currently in beta. However, GIT indices must be re-created >> > with REINDEX after upgrading from 9.0 to leverage that >> > improvement. >> >> Does pg_upgrade know about this? > > No, it does not. Under what circumstances should I issue a suggestion > to reindex, and what should the text be? It sounds like GIN indexes need to be reindexed after upgrading from < 9.1 to >= 9.1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Mon, Jun 13, 2011 at 7:47 PM, Bruce Momjian <bruce@momjian.us> wrote: > > Robert Haas wrote: > >> On Sun, Jun 12, 2011 at 8:40 AM, Florian Pflug <fgp@phlo.org> wrote: > >> > Note that this restriction was removed in postgres 9.1 which > >> > is currently in beta. However, GIT indices must be re-created > >> > with REINDEX after upgrading from 9.0 to leverage that > >> > improvement. > >> > >> Does pg_upgrade know about this? > > > > No, it does not. ?Under what circumstances should I issue a suggestion > > to reindex, and what should the text be? > > It sounds like GIN indexes need to be reindexed after upgrading from < > 9.1 to >= 9.1. I already have some GIN tests I used for 8.3 to 8.4 so that is easy, but is the reindex required or just suggested for features? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jun 13, 2011 at 7:47 PM, Bruce Momjian <bruce@momjian.us> wrote: >> No, it does not. �Under what circumstances should I issue a suggestion >> to reindex, and what should the text be? > It sounds like GIN indexes need to be reindexed after upgrading from < > 9.1 to >= 9.1. Only if you care whether they work for corner cases such as empty arrays ... corner cases which didn't work before 9.1, so very likely you don't care. I'm not sure that pg_upgrade is a good vehicle for dispensing such advice, anyway. At least in the Red Hat packaging, end users will never read what it prints, unless maybe it fails outright and they're trying to debug why. regards, tom lane
On Jun14, 2011, at 07:15 , Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Jun 13, 2011 at 7:47 PM, Bruce Momjian <bruce@momjian.us> wrote: >>> No, it does not. Under what circumstances should I issue a suggestion >>> to reindex, and what should the text be? > >> It sounds like GIN indexes need to be reindexed after upgrading from < >> 9.1 to >= 9.1. > > Only if you care whether they work for corner cases such as empty > arrays ... corner cases which didn't work before 9.1, so very likely > you don't care. We also already say "To fix this, do REINDEX INDEX ... " in the errhint of "old GIN indexes do not support whole-index scans nor searches for nulls". best regards, Florian Pflug
On Tue, Jun 14, 2011 at 1:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm not sure that pg_upgrade is a good vehicle for dispensing such > advice, anyway. At least in the Red Hat packaging, end users will never > read what it prints, unless maybe it fails outright and they're trying > to debug why. In my experience to date, that happens 100% of the time. :-( -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company