Thread: tsearch2 questions

tsearch2 questions

From
Joshua N Pritikin
Date:
1. What is the advantage of the tsearch2() trigger? Why can't I write my
own trigger which does approximately:

  UPDATE manuscript set manuscript_vector =
    setweight(to_tsvector(manuscript_genre), 'A') ||
    setweight(to_tsvector(manuscript_title), 'B') ||
    to_tsvector(manuscript_abstract);

2. Is there a way to know in advance the maximum return value of the
rank function? I have lots of other information to include in the
goodness-of-match score besides the fulltext match rank so I would
prefer a tsearch2 rank score between 0 and 1. Do I need to write my own
rank function?

--
Make April 15 just another day, visit http://fairtax.org

Re: tsearch2 questions

From
Oleg Bartunov
Date:
On Wed, 4 Jul 2007, Joshua N Pritikin wrote:

> 1. What is the advantage of the tsearch2() trigger? Why can't I write my
> own trigger which does approximately:

no advantage, it's just an example.


>
>  UPDATE manuscript set manuscript_vector =
>    setweight(to_tsvector(manuscript_genre), 'A') ||
>    setweight(to_tsvector(manuscript_title), 'B') ||
>    to_tsvector(manuscript_abstract);
>
> 2. Is there a way to know in advance the maximum return value of the
> rank function? I have lots of other information to include in the
> goodness-of-match score besides the fulltext match rank so I would
> prefer a tsearch2 rank score between 0 and 1. Do I need to write my own
> rank function?

what's about simple normalization formulae, like rank/(rank+1) ?


     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: tsearch2 questions

From
Joshua N Pritikin
Date:
On Wed, Jul 04, 2007 at 10:59:46AM +0400, Oleg Bartunov wrote:
> On Wed, 4 Jul 2007, Joshua N Pritikin wrote:
> >1. What is the advantage of the tsearch2() trigger? Why can't I write my
> >own trigger which does approximately:
>
> no advantage, it's just an example.

Please mention that in the documentation:

tsearch2() trigger used to automatically update vector_column_name,
my_filter_name is the function name to preprocess text_column_name.
There are can be many functions and text columns specified in tsearch2()
trigger. The following rule used: function applied to all subsequent
text columns until next function occurs. Example, function dropatsymbol
replaces all entries of @ sign by space.

tsearch2() is an example. You are welcome to write your own trigger.

> >2. Is there a way to know in advance the maximum return value of the
> >rank function? I have lots of other information to include in the
> >goodness-of-match score besides the fulltext match rank so I would
> >prefer a tsearch2 rank score between 0 and 1. Do I need to write my own
> >rank function?
>
> what's about simple normalization formulae, like rank/(rank+1) ?

I think you are suggesting that I use the best rank as the denominator
for the rank column. Yes, I suppose that will work.

Thanks.

--
Make April 15 just another day, visit http://fairtax.org

Re: tsearch2 questions

From
"hubert depesz lubaczewski"
Date:
On 7/4/07, Joshua N Pritikin <jpritikin@pobox.com> wrote:
Please mention that in the documentation:

dont you think this is perfeclty clear?

"If you want to do something specific with columns, you may write your very own trigger function using plpgsql or other procedural languages (but not SQL, unfortunately) and use it instead of tsearch2 trigger."


> what's about simple normalization formulae, like rank/(rank+1) ?
I think you are suggesting that I use the best rank as the denominator
for the rank column. Yes, I suppose that will work.

actually oleg supposed not to use best rank, but just use the formula as given - rank/(rank+1) to get rank in range of 0 to 1.

depesz

Re: tsearch2 questions

From
Joshua N Pritikin
Date:
On Wed, Jul 04, 2007 at 10:40:11AM +0200, hubert depesz lubaczewski wrote:
> On 7/4/07, Joshua N Pritikin <jpritikin@pobox.com> wrote:
> >Please mention that in the documentation:
>
> dont you think this is perfeclty clear?
>
> "If you want to do something specific with columns, you may write your very
> own trigger function using plpgsql or other procedural languages (but not
> SQL, unfortunately) and use it instead of tsearch2 trigger."

From where are you quoting? I was quoting from:

http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch2-ref.html

> >what's about simple normalization formulae, like rank/(rank+1) ?
> >I think you are suggesting that I use the best rank as the denominator
> >for the rank column. Yes, I suppose that will work.
>
> actually oleg supposed not to use best rank, but just use the formula as
> given - rank/(rank+1) to get rank in range of 0 to 1.

OK, then what does the +1 mean in your formulae? Consider these results
from [1]. rank/(rank+1): 0.19/.1 = 1.9, .1/.1 = 1, etc. That doesn't
make sense. The reciprocal also doesn't make sense. So what does Oleg
mean? I was guessing that Oleg meant to divide the rank column by the
first rank, that is, by 0.19 so you would get 1, .52, .52, etc.

 id |                       headline                        | rank
----+-------------------------------------------------------+------
  3 | <b>crawling</b> over cobbles in a low <b>passage</b>. | 0.19
  1 | <b>crawl</b> over cobbles leads inward to the west.   |  0.1
  4 | <b>passages</b> lead east, north, and south.          |  0.1
  5 | <b>crawl</b> slants up.                               |  0.1
  7 | <b>passage</b> here is blocked by a recent  cave-in.  |  0.1

Am I being stupid?

[1] http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch2-guide.html

--
Make April 15 just another day, visit http://fairtax.org

Re: tsearch2 questions

From
"hubert depesz lubaczewski"
Date:
On 7/4/07, Joshua N Pritikin <jpritikin@pobox.com> wrote:
From where are you quoting? I was quoting from:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch2-ref.html

 i was quoting file http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html
or actually - it's copy provided with sources of postgresql in contrib/tsearch2/docs directory.

> actually oleg supposed not to use best rank, but just use the formula as
> given - rank/(rank+1) to get rank in range of 0 to 1.
OK, then what does the +1 mean in your formulae? Consider these results
from [1]. rank/(rank+1): 0.19/.1 = 1.9, .1/.1 = 1, etc. That doesn't
make sense. The reciprocal also doesn't make sense. So what does Oleg
mean? I was guessing that Oleg meant to divide the rank column by the
first rank, that is, by 0.19 so you would get 1, .52, .52, etc.

+1 means: add one to.
for example: for rank = 0.1 you get: 0.1/(0.1+1) = 0.1/1.1 = 0.0909
for rank = 0.5 you get: 0.5/(0.5+1) = 0.5/1.5 = 0.3333

i think that notation: rank+1 is pretty readable.

additionally - sorry but i dont understand your calculations. what is 0.19/.1 ? how did you get the .1?

depesz

Re: tsearch2 questions

From
Joshua N Pritikin
Date:
On Wed, Jul 04, 2007 at 11:08:21AM +0200, hubert depesz lubaczewski wrote:
> On 7/4/07, Joshua N Pritikin <jpritikin@pobox.com> wrote:
> >From where are you quoting? I was quoting from:
> >
> >http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch2-ref.html
>
> i was quoting file
> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html

So that one is fine. Only the reference could use some clarification.

> >actually oleg supposed not to use best rank, but just use the formula as
> >> given - rank/(rank+1) to get rank in range of 0 to 1.
> >OK, then what does the +1 mean in your formulae? Consider these results
> >from [1]. rank/(rank+1): 0.19/.1 = 1.9, .1/.1 = 1, etc. That doesn't
> >make sense. The reciprocal also doesn't make sense. So what does Oleg
> >mean? I was guessing that Oleg meant to divide the rank column by the
> >first rank, that is, by 0.19 so you would get 1, .52, .52, etc.
>
> +1 means: add one to.
> for example: for rank = 0.1 you get: 0.1/(0.1+1) = 0.1/1.1 = 0.0909
> for rank = 0.5 you get: 0.5/(0.5+1) = 0.5/1.5 = 0.3333

D'oh! I see.

> i think that notation: rank+1 is pretty readable.
>
> additionally - sorry but i dont understand your calculations. what is
> 0.19/.1
> ? how did you get the .1?

I was imagining that "rank+1" was the second row of the rank column.

Sorry for the confusion.

--
Make April 15 just another day, visit http://fairtax.org