Thread: Question on Trigram GIST indexes

Question on Trigram GIST indexes

From
ERR ORR
Date:
I was trying to make Postgresql use a trigram gist index on a varchar field, but to no avail.

Specifically, I was trying to replicate what is done in this blog post:
http://www.postgresonline.com/journal/archives/212-PostgreSQL-9.1-Trigrams-teaching-LIKE-and-ILIKE-new-tricks.html


I use Postgresql 9.1.7 on Linux FC17 64bit, my locale is UTF8.

My full table definition is

CREATE TABLE "TEST"
(
  "RECID" bigint NOT NULL DEFAULT next_id(),
  "TST_PAYLOAD" character varying(255),
  CONSTRAINT "PK_TEST" PRIMARY KEY ("RECID")
  USING INDEX TABLESPACE local
)
WITH (
  OIDS=FALSE
);

CREATE INDEX "TEST_PAYLOAD_PATTERN_1_IDX"
  ON "TEST"
  USING btree
  ("TST_PAYLOAD" COLLATE pg_catalog."default" varchar_pattern_ops)
TABLESPACE local;

CREATE INDEX "TEST_PAYLOAD_TRIGRAM_GIST_1_IDX"
  ON "TEST"
  USING gist
  ("TST_PAYLOAD" COLLATE pg_catalog."default" gist_trgm_ops)
TABLESPACE local;

CREATE INDEX "TEST_PAYLOAD_TRIGRAM_GIN_1_IDX"
  ON "TEST"
  USING gin
  ("TST_PAYLOAD" COLLATE pg_catalog."default" gin_trgm_ops)
TABLESPACE local;


The field "TST_PAYLOAD" contains 26389 names of cities, all in uppercase.

I have pg_tgrm installed - actually all extensions are present.

Queries which use "WHERE "TST_PAYLOAD" LIKE 'SEAT%'" go to the btree index as it should.
Queries which use "WHERE "TST_PAYLOAD" LIKE '%EAT%'" *should* use the GIST index but do a full table scan instead.
(I am looking for names like 'SEATTLE')

I also tried dropping the btree index but that has no influence on the behavior.

I'd be grateful if anybody could explain to me what I am doing wrong.

Thanks in advance.

Re: Question on Trigram GIST indexes

From
"Kevin Grittner"
Date:
ERR ORR wrote:

> Specifically, I was trying to replicate what is done in this blog post:
> http://www.postgresonline.com/journal/archives/212-PostgreSQL-9.1-Trigrams-teaching-LIKE-and-ILIKE-new-tricks.html

> Queries which use "WHERE "TST_PAYLOAD" LIKE 'SEAT%'" go to the btree index
> as it should.
> Queries which use "WHERE "TST_PAYLOAD" LIKE '%EAT%'" *should* use the GIST
> index but do a full table scan instead.
> (I am looking for names like 'SEATTLE')

Have you run VACUUM ANALYZE with the index and data in place (as
shown in the blog post?

Another conspicuous difference is your explicit use of a COLLATE
clause in the index declaration.

-Kevin


Fwd: Question on Trigram GIST indexes

From
ERR ORR
Date:
(forwarded to pgsql-general after it went to Kevin Grittner alone)

On 22 December 2012 22:46, Kevin Grittner <kgrittn@mail.com> wrote:
ERR ORR wrote:

> Specifically, I was trying to replicate what is done in this blog post:
> http://www.postgresonline.com/journal/archives/212-PostgreSQL-9.1-Trigrams-teaching-LIKE-and-ILIKE-new-tricks.html

> Queries which use "WHERE "TST_PAYLOAD" LIKE 'SEAT%'" go to the btree index
> as it should.
> Queries which use "WHERE "TST_PAYLOAD" LIKE '%EAT%'" *should* use the GIST
> index but do a full table scan instead.
> (I am looking for names like 'SEATTLE')

Have you run VACUUM ANALYZE with the index and data in place (as
shown in the blog post?

Another conspicuous difference is your explicit use of a COLLATE
clause in the index declaration.

-Kevin

a) Yes, I ran VACUUM ANALYZE after creating the indexes.
b) The COLLATE pg_catalog."default" clause is inserted by the DB, I run the CREATE INDEX command without that.
"Default" collation for all my DBs in Postgres is en_US.UTF-8 and both the system (Linux FC17) and the DB
use encoding UTF8. 

I have texts/strings in different languages/charsets, so UTF8 looked like the best decision to me, instead of, say, ISO-8859-15, which is limited to just some European charsets. Specifically I am storing strings in European languages (corresponding to the ISO-8859 series) including diacrites line äöüñáéíóú ..., Russian, Arabic, Chinese etc. in one column instead of making different columns/tables and using them via a view because that's my use case and UTF8 should accommodate that IMHO (or is that an abuse of the DB?)

Would it help to `ALTER DATABASE set lc_collate = 'C'`,supposing that is possible? (Oracle doesn't allow that iirc)

Thanks for any insights, pointers ...

R.

Re: Fwd: Question on Trigram GIST indexes

From
Tom Lane
Date:
ERR ORR <rd0002@gmail.com> writes:
>> Queries which use "WHERE "TST_PAYLOAD" LIKE 'SEAT%'" go to the btree
>> index as it should.
>> Queries which use "WHERE "TST_PAYLOAD" LIKE '%EAT%'" *should* use the
>> GIST index but do a full table scan instead.

Are you sure it "should" use the index for that?  That query doesn't
look very selective to me --- it might well be deciding that a seqscan
is cheaper.  You could try forcing the issue with enable_seqscan = off
to see if the query is really unable to match the index, or it just
doesn't like the cost estimate.

> Would it help to `ALTER DATABASE set lc_collate = 'C'`,supposing that is
> possible? (Oracle doesn't allow that iirc)

FWIW, I think you do want the index to have the database's default
collation, otherwise it could only match LIKE clauses that explicitly
specify the same non-default collation.

            regards, tom lane


Re: Fwd: Question on Trigram GIST indexes

From
ERR ORR
Date:
* I think it "should" use that index based on trying to follow that exercise.
* The part about changing the collation was an idea in the course of trying out different things.
* enable_seqscan is off, and the sharedmem and temp_buffers are set so high that most things happen in RAM.

I wonder what it that the other gentleman, Merlin, found out in the documentation and if he would share that.

I've also tried this on another table I have, with and without other indexes, but no success :-(

Wondering ...


On 23 January 2013 04:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
ERR ORR <rd0002@gmail.com> writes:
>> Queries which use "WHERE "TST_PAYLOAD" LIKE 'SEAT%'" go to the btree
>> index as it should.
>> Queries which use "WHERE "TST_PAYLOAD" LIKE '%EAT%'" *should* use the
>> GIST index but do a full table scan instead.

Are you sure it "should" use the index for that?  That query doesn't
look very selective to me --- it might well be deciding that a seqscan
is cheaper.  You could try forcing the issue with enable_seqscan = off
to see if the query is really unable to match the index, or it just
doesn't like the cost estimate.

> Would it help to `ALTER DATABASE set lc_collate = 'C'`,supposing that is
> possible? (Oracle doesn't allow that iirc)

FWIW, I think you do want the index to have the database's default
collation, otherwise it could only match LIKE clauses that explicitly
specify the same non-default collation.

                        regards, tom lane