Re: multi terabyte fulltext searching - Mailing list pgsql-general

From Benjamin Arai
Subject Re: multi terabyte fulltext searching
Date
Msg-id 29859D8F-A9DA-4371-A342-2D5B6715ED1C@araisoft.com
Whole thread Raw
In response to Re: multi terabyte fulltext searching  ("Joshua D. Drake" <jd@commandprompt.com>)
Responses Re: multi terabyte fulltext searching  ("Joshua D. Drake" <jd@commandprompt.com>)
List pgsql-general
True, but what happens when my database reaches 100 terabytes? Is 5
seconds ok? How about 10?  My problem is that I do not believe the
performance loss I am experiencing as the data becomes large is (log
the # of records).  This worries me because I could be doing
something wrong.  Or I might be able to do something better.

Benjamin

On Mar 21, 2007, at 8:49 AM, Joshua D. Drake wrote:

> Benjamin Arai wrote:
>> Hi,
>>
>> I have been struggling with getting fulltext searching for very large
>> databases.  I can fulltext index 10s if gigs without any problem but
>> when I start geting to hundreds of gigs it becomes slow.  My current
>> system is a quad core with 8GB of memory.  I have the resource to
>> throw
>> more hardware at it but realistically it is not cost effective to
>> buy a
>> system with 128GB of memory.  Is there any solutions that people have
>> come up with for indexing very large text databases?
>
> GIST indexes are very large.
>
>> Essentially I have several terabytes of text that I need to
>> index.  Each
>> record is about 5 paragraphs of text.  I am currently using TSearch2
>> (stemming and etc) and getting sub-optimal results.  Queries take
>> more
>> than a second to execute.
>
> you are complaining about more than a second with a terabyte of text?
>
>
>>  Has anybody implemented such a database using
>> multiple systems or some special add-on to TSearch2 to make things
>> faster?  I want to do something like partitioning the data into
>> multiple
>> systems and merging the ranked results at some master node.  Is
>> something like this possible for PostgreSQL or must it be a software
>> solution?
>>
>> Benjamin
>>
>> ---------------------------(end of
>> broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>       choose an index scan if your joining column's datatypes do not
>>       match
>>
>
>
> --
>
>       === The PostgreSQL Company: Command Prompt, Inc. ===
> Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
> Providing the most comprehensive  PostgreSQL solutions since 1997
>              http://www.commandprompt.com/
>
> Donate to the PostgreSQL Project: http://www.postgresql.org/about/
> donate
> PostgreSQL Replication: http://www.commandprompt.com/products/
>


pgsql-general by date:

Previous
From: Teodor Sigaev
Date:
Subject: Re: multi terabyte fulltext searching
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Remove add_missing_from_clause?