Re: multi terabyte fulltext searching - Mailing list pgsql-general

From Joshua D. Drake
Subject Re: multi terabyte fulltext searching
Date
Msg-id 460160FC.1070609@commandprompt.com
Whole thread Raw
In response to Re: multi terabyte fulltext searching  (Benjamin Arai <benjamin@araisoft.com>)
List pgsql-general
Benjamin Arai wrote:
> 24.

I can think of a couple of things.

1. Increase your spindle count.
2. Push your gist indexes off to another array entirely (with separate
controllers)
3. Split your actual tables between other arrays

Or... by a SAN (but then again, I just replaced a million dollar SAN
with a split array/controller enviroment that cost 10% of the cost and
the multiple controller solution is kicking its butt.)


Joshua D. Drake


>
> Benjamin
>
> On Mar 21, 2007, at 9:09 AM, Joshua D. Drake wrote:
>
>> Benjamin Arai wrote:
>>> True, but what happens when my database reaches 100 terabytes? Is 5
>>> seconds ok? How about 10?  My problem is that I do not believe the
>>> performance loss I am experiencing as the data becomes large is (log the
>>> # of records).  This worries me because I could be doing something
>>> wrong.  Or I might be able to do something better.
>>
>> Well a couple of things you could do, especially if you have the ability
>> to throw hardware at it.
>>
>> How many spindles do you have?
>>
>> J
>>
>>
>>>
>>> Benjamin
>>>
>>> On Mar 21, 2007, at 8:49 AM, Joshua D. Drake wrote:
>>>
>>>> Benjamin Arai wrote:
>>>>> Hi,
>>>>>
>>>>> I have been struggling with getting fulltext searching for very large
>>>>> databases.  I can fulltext index 10s if gigs without any problem but
>>>>> when I start geting to hundreds of gigs it becomes slow.  My current
>>>>> system is a quad core with 8GB of memory.  I have the resource to
>>>>> throw
>>>>> more hardware at it but realistically it is not cost effective to
>>>>> buy a
>>>>> system with 128GB of memory.  Is there any solutions that people have
>>>>> come up with for indexing very large text databases?
>>>>
>>>> GIST indexes are very large.
>>>>
>>>>> Essentially I have several terabytes of text that I need to index.
>>>>> Each
>>>>> record is about 5 paragraphs of text.  I am currently using TSearch2
>>>>> (stemming and etc) and getting sub-optimal results.  Queries take more
>>>>> than a second to execute.
>>>>
>>>> you are complaining about more than a second with a terabyte of text?
>>>>
>>>>
>>>>>  Has anybody implemented such a database using
>>>>> multiple systems or some special add-on to TSearch2 to make things
>>>>> faster?  I want to do something like partitioning the data into
>>>>> multiple
>>>>> systems and merging the ranked results at some master node.  Is
>>>>> something like this possible for PostgreSQL or must it be a software
>>>>> solution?
>>>>>
>>>>> Benjamin
>>>>>
>>>>> ---------------------------(end of
>>>>> broadcast)---------------------------
>>>>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>>>>       choose an index scan if your joining column's datatypes do not
>>>>>       match
>>>>>
>>>>
>>>>
>>>> --      === The PostgreSQL Company: Command Prompt, Inc. ===
>>>> Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
>>>> Providing the most comprehensive  PostgreSQL solutions since 1997
>>>>              http://www.commandprompt.com/
>>>>
>>>> Donate to the PostgreSQL Project:
>>>> http://www.postgresql.org/about/donate
>>>> PostgreSQL Replication: http://www.commandprompt.com/products/
>>>>
>>>
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 5: don't forget to increase your free space map settings
>>>
>>
>>
>> --
>>       === The PostgreSQL Company: Command Prompt, Inc. ===
>> Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
>> Providing the most comprehensive  PostgreSQL solutions since 1997
>>              http://www.commandprompt.com/
>>
>> Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
>> PostgreSQL Replication: http://www.commandprompt.com/products/
>>
>


--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


pgsql-general by date:

Previous
From: Oleg Bartunov
Date:
Subject: Re: multi terabyte fulltext searching
Next
From: Ron Johnson
Date:
Subject: Re: Anyone still using the sql_inheritance parameter?