Re: - Mailing list pgsql-performance

From Ron Johnson
Subject Re:
Date
Msg-id 1038887764.4660.49.camel@haggis
Whole thread Raw
In response to  (<typea@l-i-e.com>)
Responses Is there any limitations
List pgsql-performance
On Mon, 2002-12-02 at 12:30, typea@l-i-e.com wrote:
> [I hope job postings are kosher...]
>
> I need help optimizing a PostgreSQL application:
>
> Full-text search
> ~17,000 records
> Articles (text) are about 10K long on average, ranging from 0 to 278K.
>
> I don't know if we need to throw more RAM, more hard drive, more
> comparison RAM in postmaster.conf or build a concordance or if this is
> just not something that can be done within our budget.
>
> I can't even seem to get the PostgreSQL profiling output using "-s" in the
> startup of postmaster and client to determine what the db engine is doing.
>
> I don't understand why PostgreSQL sometimes chooses not to use the
> existing INDEXes to do an index scan instead of sequential scan -- Does it
> really think sequential will be faster, or does it eliminate an index scan
> because there won't be enough hard drive or swap space to do it?
>
> Currently, full text search queries take on the order of 2 minutes to
> execute.
> We need them to be happening in 5 seconds, if at all possible.
>
> Unfortunately, this needs to happen EARLY THIS WEEK, if at all possible.
>
> Contact me off-list with some idea of price/availability/references if you
> are interested in taking on this task.

After reading the thread to see that your box has what looks like
1GB RAM, and firing up bc(1) to see that 17K articles each of
which is ~10KB == 166MB, it seems to this simple mind that given
enough buffers, you could suck all of the articles into the
buffers.  Thus, no more disk IO, but boy would it burn up the CPU!

Also, I think that I might write some sort of "book index pre-processor"
to run against each article, to create, for each article, a list of
words plus byte offsets.  (Some tweaking would have to occur in order
to handle capitalization vagaries.  Probably capitalize all "index
words".)  (Yes, this method has the limitation of [sub-]word searches
instead of arbitrary string searches,

Then, insert all that data into a 3rd table (T_LOOKUP) whose structure
is:
  val        TEXT    (primary key)
  article_name    TEXT
  byte_offset    INTEGER

Then, 'EINSTEIN%' queries would go against T_LOOKUP instead of the
articles table.

--
+------------------------------------------------------------+
| Ron Johnson, Jr.     mailto:ron.l.johnson@cox.net          |
| Jefferson, LA  USA   http://members.cox.net/ron.l.johnson  |
|                                                            |
| "they love our milk and honey, but preach about another    |
|  way of living"                                            |
|    Merle Haggard, "The Fighting Side Of Me"                |
+------------------------------------------------------------+


pgsql-performance by date:

Previous
From: "ir. F.T.M. van Vugt bc."
Date:
Subject: Re: v7.2.3 versus v7.3 -> huge performance penalty for JOIN with UNION
Next
From: li li
Date:
Subject: Is there any limitations