Thread: text type has no default operator class for GIN?

text type has no default operator class for GIN?

From
Bob Gobeille
Date:
CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN
("ufile_name");

ERROR:  data type text has no default operator class for access method
"gin"
HINT:  You must specify an operator class for the index or define a
default operator class for the data type.

This is on a new 8.3 install.
Why is this happening?  Isn't creating a GIN index on a text type a
common thing to do?

I'm not seeing any mention of this in 8.3.7 docs cha 12 (Full Text
Search).

Do I really have to CREATE OPERATOR CLASS to make this work?
I see http://www.postgresql.org/docs/8.3/interactive/sql-createopclass.html

many thanks,
bob

Re: text type has no default operator class for GIN?

From
Bob Gobeille
Date:
On Aug 18, 2009, at 3:46 PM, Gobeille, Robert wrote:

> CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN
> ("ufile_name");
>
> ERROR:  data type text has no default operator class for access method
> "gin"
> HINT:  You must specify an operator class for the index or define a
> default operator class for the data type.
>
> This is on a new 8.3 install.
> Why is this happening?  Isn't creating a GIN index on a text type a
> common thing to do?
>
> I'm not seeing any mention of this in 8.3.7 docs cha 12 (Full Text
> Search).
>
> Do I really have to CREATE OPERATOR CLASS to make this work?
> I see http://www.postgresql.org/docs/8.3/interactive/sql-createopclass.html


I still don't understand the above, but the following works:

create index ufile_name_ginidx on uploadtree using
gin(to_tsvector('english', ufile_name));

Bob

Re: text type has no default operator class for GIN?

From
Sam Mason
Date:
On Tue, Aug 18, 2009 at 03:50:47PM -0600, Bob Gobeille wrote:
> On Aug 18, 2009, at 3:46 PM, Gobeille, Robert wrote:
> >CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN
> >("ufile_name");
> >
> >ERROR:  data type text has no default operator class for access method
> >"gin"
> >HINT:  You must specify an operator class for the index or define a
> >default operator class for the data type.
[..]
> I still don't understand the above, but the following works:
>
> create index ufile_name_ginidx on uploadtree using
> gin(to_tsvector('english', ufile_name));

Not sure if understand very well myself, but GIN indexes can only speed
up specific access patterns and these are exposed through various
different operators.

When PG refuses to create a GIN index on a plain TEXT column it's saying
that it doesn't how to use those operators with a values of TEXT type.
As soon as you pull this value apart (with the to_tsvector) you end up
with something that PG can get some traction on and all is good.

Maybe a useful question to ask is, what are you expecting PG do to when
you create a GIN index on this TEXT column?

--
  Sam  http://samason.me.uk/

Re: text type has no default operator class for GIN?

From
Tom Lane
Date:
Sam Mason <sam@samason.me.uk> writes:
> On Tue, Aug 18, 2009 at 03:50:47PM -0600, Bob Gobeille wrote:
>>> CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN
>>> ("ufile_name");
>>> ERROR:  data type text has no default operator class for access method
>>> "gin"

> Not sure if understand very well myself, but GIN indexes can only speed
> up specific access patterns and these are exposed through various
> different operators.

What GIN indexes are good for is indexing equality queries on the
components of something the database otherwise thinks of as a single
object.  For instance you can GIN-index searches for arrays containing
a particular value as a member.

Now type text doesn't have any built-in notion of a component, other
than individual characters, which aren't normally that interesting
to search for.  What I suppose the OP has in mind is full-text
searching, which is looking for component *words*.  But "word" is a
very language- and context-dependent concept.  And defining which words
are to be considered equal for searching purposes is even more so.
If we'd hard-wired one notion of "word" into datatype text, it wouldn't
be very flexible.  The point of the tsvector layer is to have a
configurable way to extract searchable words from a chunk of text.
There are also some implementation advantages like not having to repeat
that processing constantly during a search --- but the main point is
having a place to define what a word is and what search equality means.

            regards, tom lane

Re: text type has no default operator class for GIN?

From
Sam Mason
Date:
On Tue, Aug 18, 2009 at 08:21:49PM -0400, Tom Lane wrote:
> Sam Mason <sam@samason.me.uk> writes:
> > On Tue, Aug 18, 2009 at 03:50:47PM -0600, Bob Gobeille wrote:
> >>> CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN
> >>> ("ufile_name");
> >>> ERROR:  data type text has no default operator class for access method
> >>> "gin"
>
> > Not sure if understand very well myself, but GIN indexes can only speed
> > up specific access patterns and these are exposed through various
> > different operators.
>
> What GIN indexes are good for is indexing equality queries on the
> components of something the database otherwise thinks of as a single
> object.  For instance you can GIN-index searches for arrays containing
> a particular value as a member.

Yup, that's a much better description than I could muster!

> What I suppose the OP has in mind is full-text
> searching, which is looking for component *words*.  But "word" is a
> very language- and context-dependent concept.

That's what I was trying to get the OP to think about when I said "what
are you expecting PG do to when you create a GIN index on this TEXT
column" but reading it back now I was being my normal oblique self.

Writing nice emails is depressingly difficult!

--
  Sam  http://samason.me.uk/

Re: text type has no default operator class for GIN?

From
Bob Gobeille
Date:
On Aug 18, 2009, at 6:21 PM, Tom Lane wrote:

> What GIN indexes are good for is indexing equality queries on the
> components of something the database otherwise thinks of as a single
> object.  For instance you can GIN-index searches for arrays containing
> a particular value as a member.
>
> Now type text doesn't have any built-in notion of a component, other
> than individual characters, which aren't normally that interesting
> to search for.  What I suppose the OP has in mind is full-text
> searching, which is looking for component *words*.  But "word" is a
> very language- and context-dependent concept.  And defining which
> words
> are to be considered equal for searching purposes is even more so.
> If we'd hard-wired one notion of "word" into datatype text, it
> wouldn't
> be very flexible.  The point of the tsvector layer is to have a
> configurable way to extract searchable words from a chunk of text.
> There are also some implementation advantages like not having to
> repeat
> that processing constantly during a search --- but the main point is
> having a place to define what a word is and what search equality
> means.

Yes, I was looking for full text searching in english.  Since my
postgresql.conf contained:

default_text_search_config = 'pg_catalog.english'

doesn't this specify the parser, dictionary, and template to use for
full text searching in english?  I should have mentioned the above in
my post but since it was in the original conf file (debian install) I
didn't think of it.

Bob Gobeille
bobg@fossology.org




Re: text type has no default operator class for GIN?

From
Tom Lane
Date:
Bob Gobeille <bob.gobeille@hp.com> writes:
> Yes, I was looking for full text searching in english.  Since my
> postgresql.conf contained:

> default_text_search_config = 'pg_catalog.english'

> doesn't this specify the parser, dictionary, and template to use for
> full text searching in english?

It does, but that's a default for tsvector, not bare text.

            regards, tom lane