Re: - Mailing list pgsql-performance
From | Ron Johnson |
---|---|
Subject | Re: |
Date | |
Msg-id | 1038887764.4660.49.camel@haggis Whole thread Raw |
In response to | (<typea@l-i-e.com>) |
Responses |
Is there any limitations
|
List | pgsql-performance |
On Mon, 2002-12-02 at 12:30, typea@l-i-e.com wrote: > [I hope job postings are kosher...] > > I need help optimizing a PostgreSQL application: > > Full-text search > ~17,000 records > Articles (text) are about 10K long on average, ranging from 0 to 278K. > > I don't know if we need to throw more RAM, more hard drive, more > comparison RAM in postmaster.conf or build a concordance or if this is > just not something that can be done within our budget. > > I can't even seem to get the PostgreSQL profiling output using "-s" in the > startup of postmaster and client to determine what the db engine is doing. > > I don't understand why PostgreSQL sometimes chooses not to use the > existing INDEXes to do an index scan instead of sequential scan -- Does it > really think sequential will be faster, or does it eliminate an index scan > because there won't be enough hard drive or swap space to do it? > > Currently, full text search queries take on the order of 2 minutes to > execute. > We need them to be happening in 5 seconds, if at all possible. > > Unfortunately, this needs to happen EARLY THIS WEEK, if at all possible. > > Contact me off-list with some idea of price/availability/references if you > are interested in taking on this task. After reading the thread to see that your box has what looks like 1GB RAM, and firing up bc(1) to see that 17K articles each of which is ~10KB == 166MB, it seems to this simple mind that given enough buffers, you could suck all of the articles into the buffers. Thus, no more disk IO, but boy would it burn up the CPU! Also, I think that I might write some sort of "book index pre-processor" to run against each article, to create, for each article, a list of words plus byte offsets. (Some tweaking would have to occur in order to handle capitalization vagaries. Probably capitalize all "index words".) (Yes, this method has the limitation of [sub-]word searches instead of arbitrary string searches, Then, insert all that data into a 3rd table (T_LOOKUP) whose structure is: val TEXT (primary key) article_name TEXT byte_offset INTEGER Then, 'EINSTEIN%' queries would go against T_LOOKUP instead of the articles table. -- +------------------------------------------------------------+ | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | "they love our milk and honey, but preach about another | | way of living" | | Merle Haggard, "The Fighting Side Of Me" | +------------------------------------------------------------+
pgsql-performance by date: