Thread: Documentation bug in 8.3?

Documentation bug in 8.3?

From
Bruce Momjian
Date:
Reading through the text search data type docs:

    http://www.postgresql.org/docs/8.3/static/datatype-textsearch.html#DATATYPE-TSVECTOR

it says:

    Optionally, integer position(s) can be attached to any or all of the
    lexemes:

    SELECT 'a:1 fat:2 cat:3 sat:4 on:5 a:6 mat:7 and:8 ate:9 a:10 fat:11
    rat:12'::tsvector;
                                      tsvector
    -------------------------------------------------------------------------------

     'a':1,6,10 'on':5 'and':8 'ate':9 'cat':3 'fat':2,11 'mat':7 'rat':12
    'sat':4

    A position normally indicates the source word's location in the
    document. Positional information can be used for proximity ranking.
    Position values can range from 1 to 16383; larger numbers are silently
    clamped to 16383. Duplicate position entries are discarded.
                      ----------------------------------------

However in my testing of 8.3 duplicate position entries are not
discarded:

    test=> SELECT 'a:1 b:1'::tsvector;
      tsvector
    -------------
     'a':1 'b':1
    (1 row)

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: Documentation bug in 8.3?

From
Tom Lane
Date:
Bruce Momjian <bruce@momjian.us> writes:
>     clamped to 16383. Duplicate position entries are discarded.
>                       ----------------------------------------

> However in my testing of 8.3 duplicate position entries are not
> discarded:

>     test=> SELECT 'a:1 b:1'::tsvector;
>       tsvector
>     -------------
>      'a':1 'b':1
>     (1 row)

Those aren't duplicates, because they're not attached to the same
lexeme.  The comment is talking about this behavior:

regression=# SELECT 'a:1 a:1'::tsvector;
 tsvector
----------
 'a':1
(1 row)

regression=# SELECT 'a:1,2,1'::tsvector;
 tsvector
----------
 'a':1,2
(1 row)

            regards, tom lane

Re: Documentation bug in 8.3?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> >     clamped to 16383. Duplicate position entries are discarded.
> >                       ----------------------------------------
>
> > However in my testing of 8.3 duplicate position entries are not
> > discarded:
>
> >     test=> SELECT 'a:1 b:1'::tsvector;
> >       tsvector
> >     -------------
> >      'a':1 'b':1
> >     (1 row)
>
> Those aren't duplicates, because they're not attached to the same
> lexeme.  The comment is talking about this behavior:
>
> regression=# SELECT 'a:1 a:1'::tsvector;
>  tsvector
> ----------
>  'a':1
> (1 row)
>
> regression=# SELECT 'a:1,2,1'::tsvector;
>  tsvector
> ----------
>  'a':1,2
> (1 row)

OK, thanks.  I will clarify the documentation.  Patch attached and
applied.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/datatype.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v
retrieving revision 1.222
diff -c -c -r1.222 datatype.sgml
*** doc/src/sgml/datatype.sgml    2 Jan 2008 19:53:13 -0000    1.222
--- doc/src/sgml/datatype.sgml    12 Jan 2008 21:50:51 -0000
***************
*** 3330,3336 ****
       document.  Positional information can be used for
       <firstterm>proximity ranking</firstterm>.  Position values can
       range from 1 to 16383; larger numbers are silently clamped to 16383.
!      Duplicate position entries are discarded.
      </para>

      <para>
--- 3330,3336 ----
       document.  Positional information can be used for
       <firstterm>proximity ranking</firstterm>.  Position values can
       range from 1 to 16383; larger numbers are silently clamped to 16383.
!      Duplicate positions for the same lexeme are discarded.
      </para>

      <para>