TSearch2 and optimisation ... - Mailing list pgsql-performance

From Hervé Piedvache
Subject TSearch2 and optimisation ...
Date
Msg-id 200408260048.32274.footcow@noos.fr
Whole thread Raw
Responses Re: TSearch2 and optimisation ...
List pgsql-performance
Hi,

I'm a little beginner with Tsearch2 ....

I have simples tables like this :

# \d article
                                         Table "public.article"
   Column   |            Type             |
Modifiers
------------+-----------------------------+-----------------------------------------------------------------
 id     | integer                     | not null default
nextval('public.article_rss_id_rss_seq'::text)
 id_site    | integer                     | not null
 title  | text                        |
 url    | text                        |
 desc   | text                        |
 r_date   | timestamp without time zone | default now()
 r_update | timestamp without time zone | default now()
 idxfti     | tsvector                    |
Indexes:
    "article_id_key" unique, btree (id)
    "idxfti_idx" gist (idxfti)
    "ix_article_update" btree (r_update)
    "ix_article_url" btree (url)
    "ix_id_site" btree (id_site)
Foreign-key constraints:
    "$1" FOREIGN KEY (id_site) REFERENCES site (id_site)
Triggers:
    tsvectorupdate BEFORE INSERT OR UPDATE ON article FOR EACH ROW EXECUTE
PROCEDURE tsearch2('idxfti', 'title', 'desc')

# \d site_rss
                                Table "public.site"
    Column    |  Type   |                           Modifiers
--------------+---------+---------------------------------------------------------------
 id_site      | integer | not null default
nextval('public.site_id_site_seq'::text)
 site_name    | text    |
 site_url     | text    |
url             | text    |
 language     | text    |
 datecrea     | date    | default now()
 id_category  | integer |
 time_refresh | integer |
 active       | integer |
 error        | integer |
Indexes:
    "site_id_site_key" unique, btree (id_site)
    "ix_site_id_category" btree (id_category)
    "ix_site_url" btree (url)

# \d user_choice
  Table "public.user_choice"
 Column  |  Type   | Modifiers
---------+---------+-----------
 id_user | integer |
 id_site | integer |
Indexes:
    "ix_user_choice_all" unique, btree (id_user, id_site)

I have done a simple request, looking for title or description having Postgres
inside order by rank and date,  like this :
SELECT a.title, a.id, a.url, to_char(a.r_date, 'DD/MM/YYYY HH24:MI:SS') as dt,
s.site_name, s.id_site, case when exists (select id_user from user_choice u
where u.id_site=s.id_site and u.id_user = 1) then 1 else 0 end as bookmarked
    FROM article a, site s
  WHERE s.id_site = a.id_site
       AND idxfti @@ to_tsquery('postgresql')
   ORDER BY rank(idxfti, to_tsquery('postgresql')) DESC, a.r_date DESC;

The request takes about 4 seconds ... I have about 1 400 000 records in
article and 36 000 records in site table ... it's a Bi-Pentium III 933 MHz
server with 1 Gb memory ... I'm using Postgresql 7.4.5
For me this result is very very slow I really need a quicker result with less
than 1 second ...
The next time I call the same request I have got the result in 439 ms ... but
If I replace "Postgresql" in my find with "Linux" for example I will get the
next result in 5 seconds ... :o(

Is it a bad use of Tsearch2 ... or a bad table structure ... or from my
request ... ? I have no idea how to optimise this ...

Explain gives me this result :
                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Sort  (cost=10720.91..10724.29 rows=1351 width=191)
   Sort Key: rank(a.idxfti, '\'postgresql\''::tsquery), a.r_date
   ->  Merge Join  (cost=4123.09..10650.66 rows=1351 width=191)
         Merge Cond: ("outer".id_site = "inner".id_site)
         ->  Index Scan using site_id_site_key on site s  (cost=0.00..2834.96
rows=35705 width=28)
         ->  Sort  (cost=4123.09..4126.47 rows=1351 width=167)
               Sort Key: a.id_site
               ->  Index Scan using idxfti_idx on article a
(cost=0.00..4052.84 rows=1351 width=167)
                     Index Cond: (idxfti @@ '\'postgresql\''::tsquery)
                     Filter: (idxfti @@ '\'postgresql\''::tsquery)
         SubPlan
           ->  Seq Scan on user_choice u  (cost=0.00..2.69 rows=1 width=4)
                 Filter: ((id_site = $0) AND (id_user = 1))
(13 rows)

Any idea are well done ;o)

Regards,
--
Bill Footcow


pgsql-performance by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Anyone familiar with Apple Xserve RAID
Next
From: Doug McNaught
Date:
Subject: Re: Anyone familiar with Apple Xserve RAID