Home > mailing lists

Re: multi terabyte fulltext searching - Mailing list pgsql-general

From	Benjamin Arai
Subject	Re: multi terabyte fulltext searching
Date	March 21, 2007 15:58:03
Msg-id	E73355CC-EBC3-4B42-B634-3BB8511C559C@araisoft.com Whole thread Raw
In response to	Re: multi terabyte fulltext searching (Oleg Bartunov <oleg@sai.msu.su>)
Responses	Re: multi terabyte fulltext searching (Oleg Bartunov <oleg@sai.msu.su>) Re: multi terabyte fulltext searching (Teodor Sigaev <teodor@sigaev.ru>)
List	pgsql-general

Tree view

Hi Oleg,

I am currently using GIST indexes because I receive about 10GB of new
data a week (then again I am not deleting any information).  The do
not expect to be able to stop receiving text for about 5 years, so
the data is not going to become static any time soon.  The reason I
am concerned with performance is that I am providing a search system
for several newspapers since essentially the beginning of time.  Many
bibliographer etc would like to use this utility but if each search
takes too long I am not going to be able to support many concurrent
users.

Benjamin

On Mar 21, 2007, at 8:42 AM, Oleg Bartunov wrote:

> Benjamin,
>
> as one of the author of tsearch2 I'd like to know more about your
> setup.
> tsearch2 in 8.2 has GIN index support, which scales much better
> than old
> GiST index.
>
> Oleg
>
> On Wed, 21 Mar 2007, Benjamin Arai wrote:
>
>> Hi,
>>
>> I have been struggling with getting fulltext searching for very
>> large databases.  I can fulltext index 10s if gigs without any
>> problem but when I start geting to hundreds of gigs it becomes
>> slow.  My current system is a quad core with 8GB of memory.  I
>> have the resource to throw more hardware at it but realistically
>> it is not cost effective to buy a system with 128GB of memory.  Is
>> there any solutions that people have come up with for indexing
>> very large text databases?
>>
>> Essentially I have several terabytes of text that I need to
>> index.  Each record is about 5 paragraphs of text.  I am currently
>> using TSearch2 (stemming and etc) and getting sub-optimal
>> results.  Queries take more than a second to execute.  Has anybody
>> implemented such a database using multiple systems or some special
>> add-on to TSearch2 to make things faster?  I want to do something
>> like partitioning the data into multiple systems and merging the
>> ranked results at some master node.  Is something like this
>> possible for PostgreSQL or must it be a software solution?
>>
>> Benjamin
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>     choose an index scan if your joining column's datatypes do not
>>     match
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>

pgsql-general by date:

From: "codeWarrior"
Date: 21 March 2007, 15:56:28
Subject: Re: Anyone still using the sql_inheritance parameter?

From: Teodor Sigaev
Date: 21 March 2007, 15:59:56
Subject: Re: multi terabyte fulltext searching

Re: multi terabyte fulltext searching - Mailing list pgsql-general

Previous

Next