Thread: text type has no default operator class for GIN?
CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN ("ufile_name"); ERROR: data type text has no default operator class for access method "gin" HINT: You must specify an operator class for the index or define a default operator class for the data type. This is on a new 8.3 install. Why is this happening? Isn't creating a GIN index on a text type a common thing to do? I'm not seeing any mention of this in 8.3.7 docs cha 12 (Full Text Search). Do I really have to CREATE OPERATOR CLASS to make this work? I see http://www.postgresql.org/docs/8.3/interactive/sql-createopclass.html many thanks, bob
On Aug 18, 2009, at 3:46 PM, Gobeille, Robert wrote: > CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN > ("ufile_name"); > > ERROR: data type text has no default operator class for access method > "gin" > HINT: You must specify an operator class for the index or define a > default operator class for the data type. > > This is on a new 8.3 install. > Why is this happening? Isn't creating a GIN index on a text type a > common thing to do? > > I'm not seeing any mention of this in 8.3.7 docs cha 12 (Full Text > Search). > > Do I really have to CREATE OPERATOR CLASS to make this work? > I see http://www.postgresql.org/docs/8.3/interactive/sql-createopclass.html I still don't understand the above, but the following works: create index ufile_name_ginidx on uploadtree using gin(to_tsvector('english', ufile_name)); Bob
On Tue, Aug 18, 2009 at 03:50:47PM -0600, Bob Gobeille wrote: > On Aug 18, 2009, at 3:46 PM, Gobeille, Robert wrote: > >CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN > >("ufile_name"); > > > >ERROR: data type text has no default operator class for access method > >"gin" > >HINT: You must specify an operator class for the index or define a > >default operator class for the data type. [..] > I still don't understand the above, but the following works: > > create index ufile_name_ginidx on uploadtree using > gin(to_tsvector('english', ufile_name)); Not sure if understand very well myself, but GIN indexes can only speed up specific access patterns and these are exposed through various different operators. When PG refuses to create a GIN index on a plain TEXT column it's saying that it doesn't how to use those operators with a values of TEXT type. As soon as you pull this value apart (with the to_tsvector) you end up with something that PG can get some traction on and all is good. Maybe a useful question to ask is, what are you expecting PG do to when you create a GIN index on this TEXT column? -- Sam http://samason.me.uk/
Sam Mason <sam@samason.me.uk> writes: > On Tue, Aug 18, 2009 at 03:50:47PM -0600, Bob Gobeille wrote: >>> CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN >>> ("ufile_name"); >>> ERROR: data type text has no default operator class for access method >>> "gin" > Not sure if understand very well myself, but GIN indexes can only speed > up specific access patterns and these are exposed through various > different operators. What GIN indexes are good for is indexing equality queries on the components of something the database otherwise thinks of as a single object. For instance you can GIN-index searches for arrays containing a particular value as a member. Now type text doesn't have any built-in notion of a component, other than individual characters, which aren't normally that interesting to search for. What I suppose the OP has in mind is full-text searching, which is looking for component *words*. But "word" is a very language- and context-dependent concept. And defining which words are to be considered equal for searching purposes is even more so. If we'd hard-wired one notion of "word" into datatype text, it wouldn't be very flexible. The point of the tsvector layer is to have a configurable way to extract searchable words from a chunk of text. There are also some implementation advantages like not having to repeat that processing constantly during a search --- but the main point is having a place to define what a word is and what search equality means. regards, tom lane
On Tue, Aug 18, 2009 at 08:21:49PM -0400, Tom Lane wrote: > Sam Mason <sam@samason.me.uk> writes: > > On Tue, Aug 18, 2009 at 03:50:47PM -0600, Bob Gobeille wrote: > >>> CREATE INDEX "ufile_name_search" ON "public"."uploadtree" USING GIN > >>> ("ufile_name"); > >>> ERROR: data type text has no default operator class for access method > >>> "gin" > > > Not sure if understand very well myself, but GIN indexes can only speed > > up specific access patterns and these are exposed through various > > different operators. > > What GIN indexes are good for is indexing equality queries on the > components of something the database otherwise thinks of as a single > object. For instance you can GIN-index searches for arrays containing > a particular value as a member. Yup, that's a much better description than I could muster! > What I suppose the OP has in mind is full-text > searching, which is looking for component *words*. But "word" is a > very language- and context-dependent concept. That's what I was trying to get the OP to think about when I said "what are you expecting PG do to when you create a GIN index on this TEXT column" but reading it back now I was being my normal oblique self. Writing nice emails is depressingly difficult! -- Sam http://samason.me.uk/
On Aug 18, 2009, at 6:21 PM, Tom Lane wrote: > What GIN indexes are good for is indexing equality queries on the > components of something the database otherwise thinks of as a single > object. For instance you can GIN-index searches for arrays containing > a particular value as a member. > > Now type text doesn't have any built-in notion of a component, other > than individual characters, which aren't normally that interesting > to search for. What I suppose the OP has in mind is full-text > searching, which is looking for component *words*. But "word" is a > very language- and context-dependent concept. And defining which > words > are to be considered equal for searching purposes is even more so. > If we'd hard-wired one notion of "word" into datatype text, it > wouldn't > be very flexible. The point of the tsvector layer is to have a > configurable way to extract searchable words from a chunk of text. > There are also some implementation advantages like not having to > repeat > that processing constantly during a search --- but the main point is > having a place to define what a word is and what search equality > means. Yes, I was looking for full text searching in english. Since my postgresql.conf contained: default_text_search_config = 'pg_catalog.english' doesn't this specify the parser, dictionary, and template to use for full text searching in english? I should have mentioned the above in my post but since it was in the original conf file (debian install) I didn't think of it. Bob Gobeille bobg@fossology.org
Bob Gobeille <bob.gobeille@hp.com> writes: > Yes, I was looking for full text searching in english. Since my > postgresql.conf contained: > default_text_search_config = 'pg_catalog.english' > doesn't this specify the parser, dictionary, and template to use for > full text searching in english? It does, but that's a default for tsvector, not bare text. regards, tom lane