Thread: TSearch2 Questions

TSearch2 Questions

From
Hannes Dorbath
Date:
A few stupid questions:

Where to get the latest version?

Is http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ a dead site
and the latest versions are always "silently" distributed with PG inside
the contrib dir?

How can I find out what version of TSearch2 I'm running?

Is there active development?

Are the patches provided on the site above for backup still needed, or
are they already included in the versions that ship with 8.0.x? If not,
why not? =)

Or the better question, are any of those patches listed under
"Development" included in the version that ships with recent PG versions?

I'm playing a bit with it ATM. Indexing one Gigabyte of plain text
worked well, with 10 GB I yet have some performance problems. I read the
TSearch Tuning Guide and will start optimizing some things, but is it a
realistic goal to index ~90GB plain text and get sub-second response
times on hardware that ~4000 EUR can buy?

Thanks in advance

--
Regards,
Hannes Dorbath

Re: TSearch2 Questions

From
Oleg Bartunov
Date:
On Mon, 21 Nov 2005, Hannes Dorbath wrote:

> A few stupid questions:
>
> Where to get the latest version?
>
> Is http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ a dead site and
> the latest versions are always "silently" distributed with PG inside the
> contrib dir?

You should always use tsearch2 distributed with postgresql.
We keep our version for testing purposes. Sometimes we publish backpatches
(from CVS HEAD) for stable releases.

>
> How can I find out what version of TSearch2 I'm running?
>
> Is there active development?

It's actively developed, see CVS HEAD commits. Main problem attacked is
fully UTF-8 support. Also, we plan some other improvements.
See http://www.sai.msu.su/~megera/oddmuse/index.cgi/todo

>
> Are the patches provided on the site above for backup still needed, or are
> they already included in the versions that ship with 8.0.x? If not, why not?
> =)

All patches already applied .

>
> Or the better question, are any of those patches listed under "Development"
> included in the version that ships with recent PG versions?
>

right now, there is no patches you should be aware of. We plan to release
UTF-8 support patch for 8.1 release.

> I'm playing a bit with it ATM. Indexing one Gigabyte of plain text worked
> well, with 10 GB I yet have some performance problems. I read the TSearch
> Tuning Guide and will start optimizing some things, but is it a realistic
> goal to index ~90GB plain text and get sub-second response times on hardware
> that ~4000 EUR can buy?

What's ATM ?  As for the sub-second response times it'd very depend on
your data and queries. It'd be certainly possible with our tsearch daemon
which we postponed, because we inclined to implement inverted indices first
and then build fts index on top of inverted index. But this is long-term
plan.

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: TSearch2 Questions

From
Bruno Wolff III
Date:
On Mon, Nov 21, 2005 at 16:50:00 +0300,
  Oleg Bartunov <oleg@sai.msu.su> wrote:
> On Mon, 21 Nov 2005, Hannes Dorbath wrote:
>
> >I'm playing a bit with it ATM. Indexing one Gigabyte of plain text worked
> >well, with 10 GB I yet have some performance problems. I read the TSearch
> >Tuning Guide and will start optimizing some things, but is it a realistic
> >goal to index ~90GB plain text and get sub-second response times on
> >hardware that ~4000 EUR can buy?
>
> What's ATM ?  As for the sub-second response times it'd very depend on
> your data and queries. It'd be certainly possible with our tsearch daemon
> which we postponed, because we inclined to implement inverted indices first
> and then build fts index on top of inverted index. But this is long-term
> plan.

I believe in this context, 'ATM' is an ancronym for 'at the moment' which
has little impact on the meaning of the paragraph.

Re: TSearch2 Questions

From
Hannes Dorbath
Date:
On 21.11.2005 18:24, Bruno Wolff III wrote:
> On Mon, Nov 21, 2005 at 16:50:00 +0300, Oleg Bartunov <oleg@sai.msu.su> wrote:
>> On Mon, 21 Nov 2005, Hannes Dorbath wrote:
>>
>>> I'm playing a bit with it ATM. Indexing one Gigabyte of plain text worked
>>> well, with 10 GB I yet have some performance problems. I read the TSearch
>>> Tuning Guide and will start optimizing some things, but is it a realistic
>>> goal to index ~90GB plain text and get sub-second response times on
>>> hardware that ~4000 EUR can buy?
>> What's ATM ?  As for the sub-second response times it'd very depend on
>> your data and queries. It'd be certainly possible with our tsearch daemon
>> which we postponed, because we inclined to implement inverted indices first
>> and then build fts index on top of inverted index. But this is long-term
>> plan.
>
> I believe in this context, 'ATM' is an ancronym for 'at the moment' which
> has little impact on the meaning of the paragraph.

For whatever reason I cannot find Oleg's reply on this server, so I
reply to this post instead. Thanks for your time Oleg, your answers
really helped me. I still have two questions about compound words and
UTF-8, but I'll create a new specific post.

Thanks again.

--
Regards,
Hannes Dorbath