Re: Tsearch2 performance on big database - Mailing list pgsql-performance

From Rick Jansen
Subject Re: Tsearch2 performance on big database
Date
Msg-id 42404263.40604@rockingstone.nl
Whole thread Raw
In response to Re: Tsearch2 performance on big database  (Oleg Bartunov <oleg@sai.msu.su>)
Responses Re: Tsearch2 performance on big database
List pgsql-performance
Oleg Bartunov wrote:
> Mike,
>
> no comments before Rick post tsearch configs and increased buffers !
> Union shouldn't be faster than (term1|term2).
> tsearch2 internals description might help you understanding tsearch2
> limitations.
> See  http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals
> Also, don't miss my notes:
> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes
>
> Oleg

Thanks Oleg, i've seen those pages before :) I've set shared_buffers to
45000 now (yes thats probably very much, isn't it?) and it already seems
a lot quicker.

How do I find out what my tsearch config is? I followed the intro
(http://www.sai.msu.su/~megera/oddmuse/index.cgi/tsearch-v2-intro) and
applied it to our books table, thats all, didnt change anything else
about configs.


> On Tue, 22 Mar 2005, Mike Rylander wrote:
>> Ahh... I should have qualified my claim.  I am creating a google-esqe
>> search interface and almost every query uses '&' as the term joiner.
>> 'AND' queries and one-term queries are orders of magnitude faster than
>> 'OR' queries, and fortunately are the expected default for most users.
>> (Think, "I typed in these words, therefore I want to match these
>> words"...)  An interesting test may be to time multiple queries
>> independently, one for each search term, and see if the combined cost
>> is less than a single 'OR' search.  If so, you could use UNION to join
>> the results.

Well I just asked my colleges and OR queries arent used by us anyway, so
I'll test for AND queries instead.

>> However, the example you originally gave ('terminology') should be
>> very fast.  On a comparable query ("select count(value) from
>> metabib.full_rec where index_vector @@ to_tsquery('default','jane');")
>> I get 12ms.

ilab=# select count(*) from books where idxfti @@ to_tsquery('default',
'jane');
  count
-------
   4093
(1 row)
Time: 217395.820 ms

:(

ilab=# explain analyze select count(*) from books where idxfti @@
to_tsquery('default', 'jane');
                                                              QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=16591.95..16591.95 rows=1 width=0) (actual
time=4634.931..4634.932 rows=1 loops=1)
    ->  Index Scan using idxfti_idx on books  (cost=0.00..16581.69
rows=4102 width=0) (actual time=0.395..4631.454 rows=4093 loops=1)
          Index Cond: (idxfti @@ '\'jane\''::tsquery)
  Total runtime: 4635.023 ms
(4 rows)

Time: 4636.028 ms
ilab=#

>> 16G of RAM on a dedicated machine.
>>
>>
>> shared_buffers = 15000          # min 16, at least max_connections*2,
>> 8KB each
>> work_mem = 10240                # min 64, size in KB
>> maintenance_work_mem = 1000000  # min 1024, size in KB
>> # big m_w_m for loading data...
>>
>> random_page_cost = 2.5          # units are one sequential page fetch
>> cost
>> # fast drives, and tons of RAM
>>

Right.. well I'll try copying these settings, see how that works out,
thanks :)

Rick
--
Systems Administrator for Rockingstone IT
http://www.rockingstone.com
http://www.megabooksearch.com - Search many book listing sites at once

pgsql-performance by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: What needs to be done for real Partitioning?
Next
From: Greg Stark
Date:
Subject: Re: What about utility to calculate planner cost constants?