Thread: prefix search in tsearch

prefix search in tsearch

From
"Erik Rijkers"
Date:
[docs from cvs HEAD]

I found the text-search documentation a little unclear about 'prefix search'; specifically, the
examples do not show that the so-called 'prefix' is first stemmed, before it is used as prefix.

For instance, the following can be a little surprising:

SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
 ?column?
----------
 t
(1 row)

Because prefix search is such an important functionality I think this should be better explained,
which I hope the attached doc-patch does.

(In textsearch.sgml is another mention + example of prefix search, perhaps it should be extended a
little there too - which I'm happy to do as well, but I first wanted to see if you agree that it
is a little too obscure as it stands)


Erik Rijkers

Attachment

Re: prefix search in tsearch

From
Oleg Bartunov
Date:
Erik,

I think it'd be more clear if you say not 'stemmed', but processed in
according to configuration. Here is an example:

$SHAREDIR/tsearch_data/my_synonyms.syn  contains one line:
one 1


CREATE TEXT SEARCH DICTIONARY my_synonym (
     TEMPLATE = synonym,
     SYNONYMS = my_synonyms
);

ALTER TEXT SEARCH CONFIGURATION english
     ALTER MAPPING FOR asciiword
     WITH my_synonym, english_stem;


test=# select 'one'::tsvector @@ to_tsquery('english','one:*');
  ?column?
----------
  f
(1 row)

because 'one' was processed by my_synonym dictionary.

test=# select ts_debug('english','one');
                                    ts_debug
------------------------------------------------------------------------------
  (asciiword,"Word, all ASCII",one,"{my_synonym,english_stem}",my_synonym,{1})
(1 row)



On Tue, 31 Aug 2010, Erik Rijkers wrote:

> [docs from cvs HEAD]
>
> I found the text-search documentation a little unclear about 'prefix search'; specifically, the
> examples do not show that the so-called 'prefix' is first stemmed, before it is used as prefix.
>
> For instance, the following can be a little surprising:
>
> SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
> ?column?
> ----------
> t
> (1 row)
>
> Because prefix search is such an important functionality I think this should be better explained,
> which I hope the attached doc-patch does.
>
> (In textsearch.sgml is another mention + example of prefix search, perhaps it should be extended a
> little there too - which I'm happy to do as well, but I first wanted to see if you agree that it
> is a little too obscure as it stands)
>
>
> Erik Rijkers
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: prefix search in tsearch

From
Bruce Momjian
Date:
I applied a modified documentation patch (attached) that includes Oleg's
suggestions.

---------------------------------------------------------------------------

Oleg Bartunov wrote:
> Erik,
>
> I think it'd be more clear if you say not 'stemmed', but processed in
> according to configuration. Here is an example:
>
> $SHAREDIR/tsearch_data/my_synonyms.syn  contains one line:
> one 1
>
>
> CREATE TEXT SEARCH DICTIONARY my_synonym (
>      TEMPLATE = synonym,
>      SYNONYMS = my_synonyms
> );
>
> ALTER TEXT SEARCH CONFIGURATION english
>      ALTER MAPPING FOR asciiword
>      WITH my_synonym, english_stem;
>
>
> test=# select 'one'::tsvector @@ to_tsquery('english','one:*');
>   ?column?
> ----------
>   f
> (1 row)
>
> because 'one' was processed by my_synonym dictionary.
>
> test=# select ts_debug('english','one');
>                                     ts_debug
> ------------------------------------------------------------------------------
>   (asciiword,"Word, all ASCII",one,"{my_synonym,english_stem}",my_synonym,{1})
> (1 row)
>
>
>
> On Tue, 31 Aug 2010, Erik Rijkers wrote:
>
> > [docs from cvs HEAD]
> >
> > I found the text-search documentation a little unclear about 'prefix search'; specifically, the
> > examples do not show that the so-called 'prefix' is first stemmed, before it is used as prefix.
> >
> > For instance, the following can be a little surprising:
> >
> > SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
> > ?column?
> > ----------
> > t
> > (1 row)
> >
> > Because prefix search is such an important functionality I think this should be better explained,
> > which I hope the attached doc-patch does.
> >
> > (In textsearch.sgml is another mention + example of prefix search, perhaps it should be extended a
> > little there too - which I'm happy to do as well, but I first wanted to see if you agree that it
> > is a little too obscure as it stands)
> >
> >
> > Erik Rijkers
> >
>
>      Regards,
>          Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> --
> Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-docs

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 2bf411d..10f0e59 100644
*** a/doc/src/sgml/datatype.sgml
--- b/doc/src/sgml/datatype.sgml
*************** SELECT 'super:*'::tsquery;
*** 3847,3853 ****
   'super':*
  </programlisting>
       This query will match any word in a <type>tsvector</> that begins
!      with <quote>super</>.
      </para>

      <para>
--- 3847,3874 ----
   'super':*
  </programlisting>
       This query will match any word in a <type>tsvector</> that begins
!      with <quote>super</>.
!     </para>
!
!     <para>
!      Note that text search configuration processing happens before
!      comparisons, which means this comparison returns <literal>true</>:
! <programlisting>
! SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
!  ?column?
! ----------
!  t
! (1 row)
! </programlisting>
!      because <literal>postgres</> gets stemmed to <literal>postgr</>:
! <programlisting>
! SELECT to_tsquery('postgres:*');
!  to_tsquery
! ------------
!  'postgr':*
! (1 row)
! </programlisting>
!      which then matches <literal>postgraduate</>.
      </para>

      <para>

Re: prefix search in tsearch

From
Bruce Momjian
Date:
I came up with some better wording, which I have applied:

     This query will match any word in a <type>tsvector</> that begins
     with <quote>super</>.  Note that prefixes are first processed by
     text search configurations, which means this comparison returns
     true:

---------------------------------------------------------------------------

bruce wrote:
>
> I applied a modified documentation patch (attached) that includes Oleg's
> suggestions.
>
> ---------------------------------------------------------------------------
>
> Oleg Bartunov wrote:
> > Erik,
> >
> > I think it'd be more clear if you say not 'stemmed', but processed in
> > according to configuration. Here is an example:
> >
> > $SHAREDIR/tsearch_data/my_synonyms.syn  contains one line:
> > one 1
> >
> >
> > CREATE TEXT SEARCH DICTIONARY my_synonym (
> >      TEMPLATE = synonym,
> >      SYNONYMS = my_synonyms
> > );
> >
> > ALTER TEXT SEARCH CONFIGURATION english
> >      ALTER MAPPING FOR asciiword
> >      WITH my_synonym, english_stem;
> >
> >
> > test=# select 'one'::tsvector @@ to_tsquery('english','one:*');
> >   ?column?
> > ----------
> >   f
> > (1 row)
> >
> > because 'one' was processed by my_synonym dictionary.
> >
> > test=# select ts_debug('english','one');
> >                                     ts_debug
> > ------------------------------------------------------------------------------
> >   (asciiword,"Word, all ASCII",one,"{my_synonym,english_stem}",my_synonym,{1})
> > (1 row)
> >
> >
> >
> > On Tue, 31 Aug 2010, Erik Rijkers wrote:
> >
> > > [docs from cvs HEAD]
> > >
> > > I found the text-search documentation a little unclear about 'prefix search'; specifically, the
> > > examples do not show that the so-called 'prefix' is first stemmed, before it is used as prefix.
> > >
> > > For instance, the following can be a little surprising:
> > >
> > > SELECT to_tsvector( 'postgraduate' ) @@ to_tsquery( 'postgres:*' );
> > > ?column?
> > > ----------
> > > t
> > > (1 row)
> > >
> > > Because prefix search is such an important functionality I think this should be better explained,
> > > which I hope the attached doc-patch does.
> > >
> > > (In textsearch.sgml is another mention + example of prefix search, perhaps it should be extended a
> > > little there too - which I'm happy to do as well, but I first wanted to see if you agree that it
> > > is a little too obscure as it stands)
> > >
> > >
> > > Erik Rijkers
> > >
> >
> >      Regards,
> >          Oleg
> > _____________________________________________________________
> > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> > Sternberg Astronomical Institute, Moscow University, Russia
> > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> > phone: +007(495)939-16-83, +007(495)939-23-83
> >
> > --
> > Sent via pgsql-docs mailing list (pgsql-docs@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-docs
>
> --
>   Bruce Momjian  <bruce@momjian.us>        http://momjian.us
>   EnterpriseDB                             http://enterprisedb.com
>
>   + It's impossible for everything to be true. +


--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +