Tsearch2 performance on big database - Mailing list pgsql-performance

From Rick Jansen
Subject Tsearch2 performance on big database
Date
Msg-id 42400F57.3040604@rockingstone.nl
Whole thread Raw
Responses Re: Tsearch2 performance on big database
List pgsql-performance
Hi,

I'm looking for a *fast* solution to search thru ~ 4 million records of
book descriptions. I've installed PostgreSQL 8.0.1 on a dual opteron
server with 8G of memory, running Linux 2.6. I haven't done a lot of
tuning on PostgreSQL itself, but here's the settings I have changed so far:

shared_buffers = 2000 (anything much bigger says the kernel doesnt allow
  it, still have to look into that)
effective_cache_size = 32768

Here's my table:

ilab=# \d books
                                        Table "public.books"
     Column     |          Type          |                        Modifiers
---------------+------------------------+----------------------------------------------------------
  recordnumber  | integer                | not null default
nextval('books_recordnumber_seq'::text)
  membernumber  | integer                | not null default 0
  booknumber    | character varying(20)  | not null default
''::character varying
  author        | character varying(60)  | not null default
''::character varying
  titel         | text                   | not null
  description   | character varying(100) | not null default
''::character varying
  descriprest   | text                   | not null
  price         | bigint                 | not null default 0::bigint
  keywords      | character varying(100) | not null default
''::character varying
  dollarprice   | bigint                 | not null default 0::bigint
  countrynumber | smallint               | not null default 0::smallint
  entrydate     | date                   | not null
  status        | smallint               | not null default 0::smallint
  recordtype    | smallint               | not null default 0::smallint
  bookflags     | smallint               | not null default 0::smallint
  year          | smallint               | not null default 0::smallint
  firstedition  | smallint               | not null default 0::smallint
  dustwrapper   | smallint               | not null default 0::smallint
  signed        | smallint               | not null default 0::smallint
  cover         | smallint               | not null default 0::smallint
  specialfield  | smallint               | not null default 0::smallint
  idxfti        | tsvector               |
Indexes:
     "recordnumber_idx" unique, btree (recordnumber)
     "idxfti_idx" gist (idxfti)

idxfti is a tsvector of concatenated description and descriprest.

ilab=# select
avg(character_length(description)),avg(character_length(descriprest))
from books;
          avg         |         avg
---------------------+----------------------
  89.1596992873947218 | 133.0468689304200538

Queries take forever to run. Right now we run a MySQL server, on which
we maintain our own indices (we split the description fields by word and
have different tables for words and the bookdescriptions they appear in).

For example, a query for the word 'terminology' on our MySQL search
takes 5.8 seconds and returns 375 results. The same query on postgresql
using the tsearch2 index takes 30802.105 ms and returns 298 results.

How do I speed this up? Should I change settings, add or change indexes
or.. what?

Rick Jansen
--
Systems Administrator for Rockingstone IT
http://www.rockingstone.com
http://www.megabooksearch.com - Search many book listing sites at once

pgsql-performance by date:

Previous
From: Richard Huxton
Date:
Subject: Re: What about utility to calculate planner cost constants?
Next
From: Oleg Bartunov
Date:
Subject: Re: Tsearch2 performance on big database