Thread: TSearch vs. Homebrew

TSearch vs. Homebrew

From
Hannes Dorbath
Date:
http://www.symfony-project.com/askeet/21

How does this dead simple approach compare to TSearch performance /
scaling wise?

--
Regards,
Hannes Dorbath

Re: TSearch vs. Homebrew

From
Oleg Bartunov
Date:
On Tue, 27 Jun 2006, Hannes Dorbath wrote:

> http://www.symfony-project.com/askeet/21
>
> How does this dead simple approach compare to TSearch performance / scaling
> wise?

You miss the main point in tsearch2 - full integration with database, i.e.,
full access to metadata, ACID.....
Lucene has no of these features, so it could use some well known optimization
and, and so,  scales better. If you don't need ACID, metadata access, why
do you need database at all ?

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: TSearch vs. Homebrew

From
Hannes Dorbath
Date:
On 27.06.2006 13:31, Oleg Bartunov wrote:
> On Tue, 27 Jun 2006, Hannes Dorbath wrote:
>
>> http://www.symfony-project.com/askeet/21
>>
>> How does this dead simple approach compare to TSearch performance /
>> scaling wise?
>
> You miss the main point in tsearch2 - full integration with database, i.e.,
> full access to metadata, ACID..... Lucene has no of these features, so
> it could use some well known optimization
> and, and so,  scales better. If you don't need ACID, metadata access, why
> do you need database at all ?

Yes, I know the benefits of using TSearch :) (I'm using it on many
projects) I just found that article and wondered how well this simple
approach might scale. Sorry for wasting your time ;)


--
Regards,
Hannes Dorbath

Re: TSearch vs. Homebrew

From
Oleg Bartunov
Date:
On Tue, 27 Jun 2006, Hannes Dorbath wrote:

> On 27.06.2006 13:31, Oleg Bartunov wrote:
>> On Tue, 27 Jun 2006, Hannes Dorbath wrote:
>>
>>> http://www.symfony-project.com/askeet/21
>>>
>>> How does this dead simple approach compare to TSearch performance /
>>> scaling wise?
>>
>> You miss the main point in tsearch2 - full integration with database, i.e.,
>> full access to metadata, ACID..... Lucene has no of these features, so it
>> could use some well known optimization
>> and, and so,  scales better. If you don't need ACID, metadata access, why
>> do you need database at all ?
>
> Yes, I know the benefits of using TSearch :) (I'm using it on many projects)
> I just found that article and wondered how well this simple approach might
> scale. Sorry for wasting your time ;)

Sorry, I was a bit off-topic. Lucene scales as any inverted index based
engine. In 8.2 tsearch2 also has inverted index support, but we obey
relational approach and couldn't provide a whole set of optimization,
which file based engines could provide.



     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: TSearch vs. Homebrew

From
Tim Allen
Date:
Oleg Bartunov wrote:
>>> On Tue, 27 Jun 2006, Hannes Dorbath wrote:
>>>
>>>> http://www.symfony-project.com/askeet/21
>>>>
>>>> How does this dead simple approach compare to TSearch performance /
>>>> scaling wise?
>
> Sorry, I was a bit off-topic. Lucene scales as any inverted index based
> engine. In 8.2 tsearch2 also has inverted index support, but we obey
> relational approach and couldn't provide a whole set of optimization,
> which file based engines could provide.

If you read further down the article, you see that what the fellow is
actually doing seems to be not using Lucene, but instead setting up his
own text indexing, ie identifying words, stemming, making a table which
records which words appear in which record etc. Basically he seems to
have re-implemented tsearch2 in a mixture of PHP and MySQL. I can't
imagine how well (or badly...) that must perform for a large amount of
data. The comments at the end are amusing, one fellow quite touching in
his naivety, wondering how much effort it would be to turn the framework
as described into an open source competitor for Google.

My best guess as an answer to the original question is that this
approach would not scale very well at all, and certainly not as well as
tsearch2 (even though tsearch2 doesn't scale quite as well as one might
hope either). And for that matter, it's not all that simple - it seems
to be of a similar order of complexity to tsearch2. However, my
performance estimate is completely unfounded in any actual experience,
so I could be wrong.

Tim

--
-----------------------------------------------
Tim Allen          tim@proximity.com.au
Proximity Pty Ltd  http://www.proximity.com.au/