Question on Trigram GIST indexes - Mailing list pgsql-general

From ERR ORR
Subject Question on Trigram GIST indexes
Date
Msg-id CALtFtELEvreyXn3fgk9GPWDHvrRXNeVQsSgpuicZZL_oQ4oHmw@mail.gmail.com
Whole thread Raw
Responses Re: Question on Trigram GIST indexes
Re: Question on Trigram GIST indexes
List pgsql-general

@Moderators: I am reposting this because the original from 22 December apparently didn't arrive on the list.

I was trying to make Postgresql use a trigram gist index on a varchar field, but to no avail.

Specifically, I was trying to replicate what is done in this blog post:
http://www.postgresonline.com/journal/archives/212-PostgreSQL-9.1-Trigrams-teaching-LIKE-and-ILIKE-new-tricks.html


I use Postgresql 9.1.7 on Linux FC17 64bit, my locale is UTF8.

My full table definition is

CREATE TABLE "TEST"
(
  "RECID" bigint NOT NULL DEFAULT next_id(),
  "TST_PAYLOAD" character varying(255),
  CONSTRAINT "PK_TEST" PRIMARY KEY ("RECID")
  USING INDEX TABLESPACE local
)
WITH (
  OIDS=FALSE
);

CREATE INDEX "TEST_PAYLOAD_PATTERN_1_IDX"
  ON "TEST"
  USING btree
  ("TST_PAYLOAD" COLLATE pg_catalog."default" varchar_pattern_ops)
TABLESPACE local;

CREATE INDEX "TEST_PAYLOAD_TRIGRAM_GIST_1_IDX"
  ON "TEST"
  USING gist
  ("TST_PAYLOAD" COLLATE pg_catalog."default" gist_trgm_ops)
TABLESPACE local;

CREATE INDEX "TEST_PAYLOAD_TRIGRAM_GIN_1_IDX"
  ON "TEST"
  USING gin
  ("TST_PAYLOAD" COLLATE pg_catalog."default" gin_trgm_ops)
TABLESPACE local;


The COLLATE pg_catalog."default" clause is inserted by the DB (default is "Unicode"). I also tried to define the Trigram index with COLLATE pg_catalog."C"  but the behavior did not change. I did vacuum and analyze after creating each index.

The field "TST_PAYLOAD" contains 26389 names of cities, all in uppercase.

I have pg_tgrm installed - actually all extensions are present.

Queries which use "WHERE "TST_PAYLOAD" LIKE 'SEAT%'" go to the btree index as it should.
Queries which use "WHERE "TST_PAYLOAD" LIKE '%EAT%'" *should* use the GIST index but do a full table scan instead.
(I am looking for names like 'SEATTLE' in this example)

I also tried dropping the btree index but that has no influence on the behavior.

I have texts/strings in different languages/charsets, so UTF8 looked like the best decision to me, instead of, say, ISO-8859-15, which is limited to just some European charsets. Specifically I am storing strings in European languages (corresponding to the ISO-8859 series) including diacrites line äöüñáéíóú ..., Russian, Arabic, Chinese etc. in one column instead of making different columns/tables and using them via a view because that's my use case and UTF8 should accommodate that IMHO (or is that an abuse of the DB?)

Would it help to `ALTER DATABASE set lc_collate = 'C'`,supposing that is possible? (Oracle doesn't allow that iirc).

Thanks for any insights, pointers ...

I'd be grateful if anybody could explain to me what I am doing wrong.

Thanks in advance.

pgsql-general by date:

Previous
From: John R Pierce
Date:
Subject: Re: recasting to timestamp from varchar
Next
From: Chris Travers
Date:
Subject: Counterintuitive locking behavior