Re: bitmap heap scan recheck on full text search with gin index - Mailing list pgsql-performance

From Oleg Bartunov
Subject Re: bitmap heap scan recheck on full text search with gin index
Date
Msg-id Pine.LNX.4.64.0902060720070.4158@sn.sai.msu.ru
Whole thread Raw
In response to bitmap heap scan recheck on full text search with gin index  (Hal Roberts <hroberts@cyber.law.harvard.edu>)
List pgsql-performance
Hal,

just create separate column with tsvector and create index on it.

Oleg
On Thu, 5 Feb 2009, Hal Roberts wrote:

> Hi All,
>
> I'm getting poor performance on full text searches that return lots of
> entries from a table with about 7 million rows.  I think the cause is
> rechecking the text match on all of the returned rows, even though I'm using
> a @@ query on a gin index, which the docs say should not require a recheck.
>
> Here's the table:
>
> ****
> mediacloud=> \d download_texts;
>                                   Table "public.download_texts"
>   Column       |  Type   |                                 Modifiers
> -------------------+---------+----------------------------------------------------------------------------
> download_texts_id | integer | not null default
> nextval('download_texts_download_texts_id_seq'::regclass)
> downloads_id      | integer | not null
> download_text     | text    | not null
> Indexes:
> "download_texts_pkey" PRIMARY KEY, btree (download_texts_id)
> "download_texts_downloads_id_index" UNIQUE, btree (downloads_id)
> "download_texts_textsearch_idx" gin (to_tsvector('english'::regconfig,
> download_text)), tablespace "large_table_space"
> Foreign-key constraints:
> "download_texts_downloads_id_fkey" FOREIGN KEY (downloads_id) REFERENCES
> downloads(downloads_id)
> Tablespace: "large_table_space"
> ****
>
> And here's the query:
>
> ****
> mediacloud=> explain analyze select count(dt.download_texts_id) from
> download_texts dt where to_tsvector('english', download_text) @@
> to_tsquery('english', 'stimulus');
> QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate  (cost=26161.16..26161.17 rows=1 width=4) (actual
> time=153640.083..153640.083 rows=1 loops=1)
> ->  Bitmap Heap Scan on download_texts dt  (cost=3937.41..26146.11 rows=6018
> width=4) (actual time=1957.074..153529.351 rows=72225 loops=1)
>      Recheck Cond: (to_tsvector('english'::regconfig, download_text) @@
> '''stimulus'''::tsquery)
>      ->  Bitmap Index Scan on download_texts_textsearch_idx
> (cost=0.00..3935.90 rows=6018 width=0) (actual time=1048.556..1048.556
> rows=72225 loops=1)
>            Index Cond: (to_tsvector('english'::regconfig, download_text) @@
> '''stimulus'''::tsquery)
> Total runtime: 153642.249 ms
> ****
>
> Nearly all of the time is being spent in the bitmap heap scan, I suspect
> because of the work of rereading and rechecking the text of all the matched
> entries.  Is this indeed what's going on here?  Is there any way to make
> postgres not do that recheck?
>
>
> Thanks!
>
> -hal
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

pgsql-performance by date:

Previous
From: Hal Roberts
Date:
Subject: bitmap heap scan recheck on full text search with gin index
Next
From: Mark Wong
Date:
Subject: Re: dbt-2 tuning results with postgresql-8.3.5