Re: multi terabyte fulltext searching - Mailing list pgsql-general
From | Benjamin Arai |
---|---|
Subject | Re: multi terabyte fulltext searching |
Date | |
Msg-id | E73355CC-EBC3-4B42-B634-3BB8511C559C@araisoft.com Whole thread Raw |
In response to | Re: multi terabyte fulltext searching (Oleg Bartunov <oleg@sai.msu.su>) |
Responses |
Re: multi terabyte fulltext searching
(Oleg Bartunov <oleg@sai.msu.su>)
Re: multi terabyte fulltext searching (Teodor Sigaev <teodor@sigaev.ru>) |
List | pgsql-general |
Hi Oleg, I am currently using GIST indexes because I receive about 10GB of new data a week (then again I am not deleting any information). The do not expect to be able to stop receiving text for about 5 years, so the data is not going to become static any time soon. The reason I am concerned with performance is that I am providing a search system for several newspapers since essentially the beginning of time. Many bibliographer etc would like to use this utility but if each search takes too long I am not going to be able to support many concurrent users. Benjamin On Mar 21, 2007, at 8:42 AM, Oleg Bartunov wrote: > Benjamin, > > as one of the author of tsearch2 I'd like to know more about your > setup. > tsearch2 in 8.2 has GIN index support, which scales much better > than old > GiST index. > > Oleg > > On Wed, 21 Mar 2007, Benjamin Arai wrote: > >> Hi, >> >> I have been struggling with getting fulltext searching for very >> large databases. I can fulltext index 10s if gigs without any >> problem but when I start geting to hundreds of gigs it becomes >> slow. My current system is a quad core with 8GB of memory. I >> have the resource to throw more hardware at it but realistically >> it is not cost effective to buy a system with 128GB of memory. Is >> there any solutions that people have come up with for indexing >> very large text databases? >> >> Essentially I have several terabytes of text that I need to >> index. Each record is about 5 paragraphs of text. I am currently >> using TSearch2 (stemming and etc) and getting sub-optimal >> results. Queries take more than a second to execute. Has anybody >> implemented such a database using multiple systems or some special >> add-on to TSearch2 to make things faster? I want to do something >> like partitioning the data into multiple systems and merging the >> ranked results at some master node. Is something like this >> possible for PostgreSQL or must it be a software solution? >> >> Benjamin >> >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 9: In versions below 8.0, the planner will ignore your desire to >> choose an index scan if your joining column's datatypes do not >> match > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83 >
pgsql-general by date: