Thread: prefix search in tsearch
[docs from cvs HEAD] I found the text-search documentation a little unclear about 'prefix search'; specifically, the examples do not show that the so-called 'prefix' is first stemmed, before it is used as prefix. For instance, the following can be a little surprising: SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' ); ?column? ---------- t (1 row) Because prefix search is such an important functionality I think this should be better explained, which I hope the attached doc-patch does. (In textsearch.sgml is another mention + example of prefix search, perhaps it should be extended a little there too - which I'm happy to do as well, but I first wanted to see if you agree that it is a little too obscure as it stands) Erik Rijkers
Attachment
Erik, I think it'd be more clear if you say not 'stemmed', but processed in according to configuration. Here is an example: $SHAREDIR/tsearch_data/my_synonyms.syn contains one line: one 1 CREATE TEXT SEARCH DICTIONARY my_synonym ( TEMPLATE = synonym, SYNONYMS = my_synonyms ); ALTER TEXT SEARCH CONFIGURATION english ALTER MAPPING FOR asciiword WITH my_synonym, english_stem; test=# select 'one'::tsvector @@ to_tsquery('english','one:*'); ?column? ---------- f (1 row) because 'one' was processed by my_synonym dictionary. test=# select ts_debug('english','one'); ts_debug ------------------------------------------------------------------------------ (asciiword,"Word, all ASCII",one,"{my_synonym,english_stem}",my_synonym,{1}) (1 row) On Tue, 31 Aug 2010, Erik Rijkers wrote: > [docs from cvs HEAD] > > I found the text-search documentation a little unclear about 'prefix search'; specifically, the > examples do not show that the so-called 'prefix' is first stemmed, before it is used as prefix. > > For instance, the following can be a little surprising: > > SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' ); > ?column? > ---------- > t > (1 row) > > Because prefix search is such an important functionality I think this should be better explained, > which I hope the attached doc-patch does. > > (In textsearch.sgml is another mention + example of prefix search, perhaps it should be extended a > little there too - which I'm happy to do as well, but I first wanted to see if you agree that it > is a little too obscure as it stands) > > > Erik Rijkers > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
I applied a modified documentation patch (attached) that includes Oleg's suggestions. --------------------------------------------------------------------------- Oleg Bartunov wrote: > Erik, > > I think it'd be more clear if you say not 'stemmed', but processed in > according to configuration. Here is an example: > > $SHAREDIR/tsearch_data/my_synonyms.syn contains one line: > one 1 > > > CREATE TEXT SEARCH DICTIONARY my_synonym ( > TEMPLATE = synonym, > SYNONYMS = my_synonyms > ); > > ALTER TEXT SEARCH CONFIGURATION english > ALTER MAPPING FOR asciiword > WITH my_synonym, english_stem; > > > test=# select 'one'::tsvector @@ to_tsquery('english','one:*'); > ?column? > ---------- > f > (1 row) > > because 'one' was processed by my_synonym dictionary. > > test=# select ts_debug('english','one'); > ts_debug > ------------------------------------------------------------------------------ > (asciiword,"Word, all ASCII",one,"{my_synonym,english_stem}",my_synonym,{1}) > (1 row) > > > > On Tue, 31 Aug 2010, Erik Rijkers wrote: > > > [docs from cvs HEAD] > > > > I found the text-search documentation a little unclear about 'prefix search'; specifically, the > > examples do not show that the so-called 'prefix' is first stemmed, before it is used as prefix. > > > > For instance, the following can be a little surprising: > > > > SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' ); > > ?column? > > ---------- > > t > > (1 row) > > > > Because prefix search is such an important functionality I think this should be better explained, > > which I hope the attached doc-patch does. > > > > (In textsearch.sgml is another mention + example of prefix search, perhaps it should be extended a > > little there too - which I'm happy to do as well, but I first wanted to see if you agree that it > > is a little too obscure as it stands) > > > > > > Erik Rijkers > > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83 > > -- > Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-docs -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. + diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 2bf411d..10f0e59 100644 *** a/doc/src/sgml/datatype.sgml --- b/doc/src/sgml/datatype.sgml *************** SELECT 'super:*'::tsquery; *** 3847,3853 **** 'super':* </programlisting> This query will match any word in a <type>tsvector</> that begins ! with <quote>super</>. </para> <para> --- 3847,3874 ---- 'super':* </programlisting> This query will match any word in a <type>tsvector</> that begins ! with <quote>super</>. ! </para> ! ! <para> ! Note that text search configuration processing happens before ! comparisons, which means this comparison returns <literal>true</>: ! <programlisting> ! SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' ); ! ?column? ! ---------- ! t ! (1 row) ! </programlisting> ! because <literal>postgres</> gets stemmed to <literal>postgr</>: ! <programlisting> ! SELECT to_tsquery('postgres:*'); ! to_tsquery ! ------------ ! 'postgr':* ! (1 row) ! </programlisting> ! which then matches <literal>postgraduate</>. </para> <para>
I came up with some better wording, which I have applied: This query will match any word in a <type>tsvector</> that begins with <quote>super</>. Note that prefixes are first processed by text search configurations, which means this comparison returns true: --------------------------------------------------------------------------- bruce wrote: > > I applied a modified documentation patch (attached) that includes Oleg's > suggestions. > > --------------------------------------------------------------------------- > > Oleg Bartunov wrote: > > Erik, > > > > I think it'd be more clear if you say not 'stemmed', but processed in > > according to configuration. Here is an example: > > > > $SHAREDIR/tsearch_data/my_synonyms.syn contains one line: > > one 1 > > > > > > CREATE TEXT SEARCH DICTIONARY my_synonym ( > > TEMPLATE = synonym, > > SYNONYMS = my_synonyms > > ); > > > > ALTER TEXT SEARCH CONFIGURATION english > > ALTER MAPPING FOR asciiword > > WITH my_synonym, english_stem; > > > > > > test=# select 'one'::tsvector @@ to_tsquery('english','one:*'); > > ?column? > > ---------- > > f > > (1 row) > > > > because 'one' was processed by my_synonym dictionary. > > > > test=# select ts_debug('english','one'); > > ts_debug > > ------------------------------------------------------------------------------ > > (asciiword,"Word, all ASCII",one,"{my_synonym,english_stem}",my_synonym,{1}) > > (1 row) > > > > > > > > On Tue, 31 Aug 2010, Erik Rijkers wrote: > > > > > [docs from cvs HEAD] > > > > > > I found the text-search documentation a little unclear about 'prefix search'; specifically, the > > > examples do not show that the so-called 'prefix' is first stemmed, before it is used as prefix. > > > > > > For instance, the following can be a little surprising: > > > > > > SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' ); > > > ?column? > > > ---------- > > > t > > > (1 row) > > > > > > Because prefix search is such an important functionality I think this should be better explained, > > > which I hope the attached doc-patch does. > > > > > > (In textsearch.sgml is another mention + example of prefix search, perhaps it should be extended a > > > little there too - which I'm happy to do as well, but I first wanted to see if you agree that it > > > is a little too obscure as it stands) > > > > > > > > > Erik Rijkers > > > > > > > Regards, > > Oleg > > _____________________________________________________________ > > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > > Sternberg Astronomical Institute, Moscow University, Russia > > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > > phone: +007(495)939-16-83, +007(495)939-23-83 > > > > -- > > Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-docs > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://enterprisedb.com > > + It's impossible for everything to be true. + -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +