Thread: tsearch2 poor performance

tsearch2 poor performance

From
Kris Kiger
Date:
Hi all.  I am doing some work with tsearch2 and am not sure what to
expect out of it, performance wise.  Here is my setup:

                          Table "public.product"
   Column    |   Type   |                    Modifiers
-------------+----------+-------------------------------------------------
 description | text     |
 product_id  | integer  | default nextval('product_product_id_seq'::text)
 vector      | tsvector |
Indexes:
    "vector_idx" gist (vector)
Triggers:
    tsvectorupdate BEFORE INSERT OR UPDATE ON product FOR EACH ROW EXECUTE PROCEDURE tsearch2('vector', 'description')

This table has 3,000,000 rows in it.  Each description field has roughly 50 characters.  There are fewer than ten
thousanddistinct words in my 3,000,000 rows.  The vector was filled using the description fields values.  I ran a
vacuumfull analyze before executing any of my queries.   

Here are a couple of tests I performed using the tsearch index and like;

search_test=# select count(*) from product where vector @@ to_tsquery('oil');
 count
--------
 226357
(1 row)

Time: 191056.230 ms

search_test=# select count(*) from product where vector @@ to_tsquery('hydrogen');
 count
--------
 226868
(1 row)

Time: 306411.957 ms

search_test=# select count(*) from product where description like '% oil %';
 count
--------
 226357
(1 row)

Time: 38426.851 ms

search_test=# select count(*) from product where description like '% hydrogen %';
 count
--------
 226868
(1 row)

Time: 38265.421 ms


Both of the likes are using a sequential scan and both of the tsearch queries use the gist index.  Did I miss a
configurationparameter, are these queries incorrectly using tsearch2,or is this tsearch2's average performance?  Thanks
inadvance for the input!   

Kris




Re: tsearch2 poor performance

From
Oleg Bartunov
Date:
Kris,

could you post 'explain analyze' output ?
Also, could you disable index usage (set enable_indexscan=off)
and rerun search using tsearch2 ?

also, could you run 'stat' function to see frequency distribution
of words. See http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes
for details.

Oleg
On Fri, 24 Sep 2004, Kris Kiger wrote:

> Hi all.  I am doing some work with tsearch2 and am not sure what to
> expect out of it, performance wise.  Here is my setup:
>
>                           Table "public.product"
>    Column    |   Type   |                    Modifiers
> -------------+----------+-------------------------------------------------
>  description | text     |
>  product_id  | integer  | default nextval('product_product_id_seq'::text)
>  vector      | tsvector |
> Indexes:
>     "vector_idx" gist (vector)
> Triggers:
>     tsvectorupdate BEFORE INSERT OR UPDATE ON product FOR EACH ROW EXECUTE PROCEDURE tsearch2('vector',
'description')
>
> This table has 3,000,000 rows in it.  Each description field has roughly 50 characters.  There are fewer than ten
thousanddistinct words in my 3,000,000 rows.  The vector was filled using the description fields values.  I ran a
vacuumfull analyze before executing any of my queries. 
>
> Here are a couple of tests I performed using the tsearch index and like;
>
> search_test=# select count(*) from product where vector @@ to_tsquery('oil');
>  count
> --------
>  226357
> (1 row)
>
> Time: 191056.230 ms
>
> search_test=# select count(*) from product where vector @@ to_tsquery('hydrogen');
>  count
> --------
>  226868
> (1 row)
>
> Time: 306411.957 ms
>
> search_test=# select count(*) from product where description like '% oil %';
>  count
> --------
>  226357
> (1 row)
>
> Time: 38426.851 ms
>
> search_test=# select count(*) from product where description like '% hydrogen %';
>  count
> --------
>  226868
> (1 row)
>
> Time: 38265.421 ms
>
>
> Both of the likes are using a sequential scan and both of the tsearch queries use the gist index.  Did I miss a
configurationparameter, are these queries incorrectly using tsearch2,or is this tsearch2's average performance?  Thanks
inadvance for the input! 
>
> Kris
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: tsearch2 poor performance

From
Kris Kiger
Date:
Here is the explain analyze output, funny thing, after I ran josh's
query, mine ran a lot faster....maybe it forced a caching?;

search_test=# explain analyze select count(*) from product where vector
@@ to_tsquery('oil');
 Aggregate  (cost=6113.09..6113.09 rows=1 width=0) (actual
time=19643.372..19643.376 rows=1 loops=1)
   ->  Index Scan using vector_idx on product  (cost=0.00..6105.58
rows=3001 width=0) (actual time=0.381..18145.917 rows=226357 loops=1)
         Index Cond: (vector @@ '\'oil\''::tsquery)
         Filter: (vector @@ '\'oil\''::tsquery)
 Total runtime: 19643.597 ms

search_test=# explain analyze select count(*) from product where vector
@@ to_tsquery('hydrogen');
 Aggregate  (cost=6113.09..6113.09 rows=1 width=0) (actual
time=19629.766..19629.769 rows=1 loops=1)
   ->  Index Scan using vector_idx on product  (cost=0.00..6105.58
rows=3001 width=0) (actual time=0.378..18127.573 rows=226868 loops=1)
         Index Cond: (vector @@ '\'hydrogen\''::tsquery)
         Filter: (vector @@ '\'hydrogen\''::tsquery)
 Total runtime: 19629.992 ms

Here is Josh's;

search_test=# explain analyze SELECT count(q) FROM product,
to_tsquery('oil') AS q  WHERE vector @@ q;
 Aggregate  (cost=6150597.03..6150597.03 rows=1 width=32) (actual
time=21769.526..21769.530 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..6143097.02 rows=3000001 width=32)
(actual time=0.424..20450.208 rows=226357 loops=1)
         ->  Function Scan on q  (cost=0.00..12.50 rows=1000 width=32)
(actual time=0.023..0.031 rows=1 loops=1)
         ->  Index Scan using vector_idx on product  (cost=0.00..6105.58
rows=3000 width=32) (actual time=0.376..18165.415 rows=226357 loops=1)
               Index Cond: (product.vector @@ "outer".q)
               Filter: (product.vector @@ "outer".q)
 Total runtime: 21769.786 ms

Disabling Index usage slowed it down:

search_test=# explain analyze select count(*) from product where vector
@@ to_tsquery('hydrogen');
 Aggregate  (cost=347259.51..347259.51 rows=1 width=0) (actual
time=24675.933..24675.936 rows=1 loops=1)
   ->  Seq Scan on product  (cost=0.00..347252.00 rows=3001 width=0)
(actual time=0.320..23164.492 rows=226868 loops=1)
         Filter: (vector @@ '\'hydrogen\''::tsquery)
 Total runtime: 24676.091 ms

Time: 24678.842 ms

search_test=# explain analyze SELECT count(q) FROM product,
to_tsquery('oil') AS q  WHERE vector @@ q;
 Aggregate  (cost=67847264.50..67847264.50 rows=1 width=32) (actual
time=83631.201..83631.204 rows=1 loops=1)
   ->  Nested Loop  (cost=12.50..67839764.50 rows=3000001 width=32)
(actual time=0.214..82294.710 rows=226357 loops=1)
         Join Filter: ("outer".vector @@ "inner".q)
         ->  Seq Scan on product  (cost=0.00..339752.00 rows=3000000
width=32) (actual time=0.107..27563.952 rows=3000000 loops=1)
         ->  Materialize  (cost=12.50..22.50 rows=1000 width=32) (actual
time=0.003..0.006 rows=1 loops=3000000)
               ->  Function Scan on q  (cost=0.00..12.50 rows=1000
width=32) (actual time=0.019..0.023 rows=1 loops=1)
 Total runtime: 83631.385 ms

Here are the results of stat:

search_test=# select * from stat('select vector from product') order by
ndoc desc, nentry;
     word     |  ndoc   | nentry
--------------+---------+---------
 anoth        | 1187386 | 1477442
 bear         |  696668 |  780963
 take         |  675319 |  736410
 relat        |  491469 |  528259
 toward       |  490653 |  528369
 defin        |  490572 |  527099
 live         |  490538 |  527401
 beyond       |  490124 |  527957
 behind       |  490087 |  527735
 insid        |  489530 |  527074
 near         |  489504 |  527721
 around       |  489244 |  526870
 mean         |  478201 |  512699
 complex      |  440339 |  468669
 light        |  438685 |  468140
 ball         |  438567 |  468168
 pit          |  438293 |  467807
 dress        |  438128 |  467260
 player       |  437633 |  466753
 secret       |  433279 |  457246
 love         |  423777 |  442694
 give         |  423691 |  441305
 need         |  423336 |  434409
 peopl        |  423336 |  434409
 believ       |  423336 |  434409
 rememb       |  423336 |  434409
 howev        |  421762 |  434194
 real         |  419906 |  435074
 furthermor   |  416672 |  434413
 indic        |  416508 |  434919
 exampl       |  416508 |  434919
 alway        |  415543 |  432861
 sometim      |  415543 |  432861
 see          |  410706 |  434586
 inde         |  408379 |  434283
 fruit        |  363203 |  381862
 cook         |  362674 |  381112
 graduat      |  362444 |  381284
 chees        |  362358 |  381040
 hesit        |  307431 |  317550
 self         |  301001 |  312312
 hard         |  300138 |  310167
 spirit       |  299310 |  312092
 know         |  298246 |  309010
 laugh        |  294136 |  302392
 make         |  287633 |  295003
 find         |  287550 |  294770
 goe          |  279336 |  287025
 team         |  228000 |  234703
 footbal      |  228000 |  234703
 void         |  227914 |  234681
 formless     |  227914 |  234681
 board        |  227907 |  234797
 chess        |  227907 |  234797
 submarin     |  227869 |  234727
 inferior     |  227858 |  234357
 viper        |  227855 |  234865
 cylind       |  227847 |  234505
 suit         |  227822 |  234376
 class        |  227822 |  234376
 action       |  227822 |  234376
 diskett      |  227802 |  234786
 roller       |  227792 |  234524
 coaster      |  227792 |  234524
 mate         |  227785 |  234431
 ritual       |  227785 |  234431
 engin        |  227784 |  234575
 steam        |  227784 |  234575
 industri     |  227780 |  234312
 fire         |  227775 |  234532
 hydrant      |  227775 |  234532
 briar        |  227769 |  234524
 patch        |  227769 |  234524
 mastadon     |  227677 |  234665
 defend       |  227617 |  234410
 blade        |  227603 |  234356
 razor        |  227603 |  234356
 cab          |  227578 |  234554
 driver       |  227578 |  234554
 cough        |  227570 |  234324
 syrup        |  227570 |  234324
 cowboy       |  227566 |  234663
 chop         |  227564 |  234437
 pork         |  227564 |  234437
 ceo          |  227557 |  234760
 rattlesnak   |  227554 |  234323
 hell         |  227540 |  234313
 flavor       |  227540 |  234313
 maelstrom    |  227537 |  234404
 mulch        |  227531 |  234311
 cyprus       |  227531 |  234311
 tack         |  227525 |  234462
 carpet       |  227525 |  234462
 movi         |  227505 |  234207
 theater      |  227505 |  234207
 spider       |  227466 |  234524
 cone         |  227463 |  234198
 pine         |  227463 |  234198
 garbag       |  227459 |  234207
 beer         |  227443 |  234077
 bottl        |  227443 |  234077
 polygon      |  227438 |  234267
 judg         |  227425 |  234565
 blith        |  227409 |  233979
 traffic      |  227403 |  234051
 paper        |  227397 |  234028
 napkin       |  227397 |  234028
 apart        |  227393 |  233911
 build        |  227393 |  233911
 cocker       |  227368 |  233926
 spaniel      |  227368 |  233926
 bay          |  227358 |  234261
 cargo        |  227358 |  234261
 order        |  227357 |  233885
 short        |  227357 |  233885
 polar        |  227326 |  234118
 demon        |  227324 |  234442
 minivan      |  227317 |  234292
 bulb         |  227314 |  234089
 fundrais     |  227308 |  234235
 eggplant     |  227306 |  234202
 cake         |  227299 |  234075
 bowl         |  227299 |  234110
 paycheck     |  227295 |  234224
 sheriff      |  227292 |  234313
 turkey       |  227271 |  234267
 turn         |  227265 |  234210
 signal       |  227265 |  234210
 chestnut     |  227250 |  234104
 hole         |  227239 |  233975
 puncher      |  227239 |  233975
 tabloid      |  227238 |  234341
 microscop    |  227236 |  234067
 reclin       |  227234 |  233946
 dolphin      |  227231 |  234080
 pen          |  227222 |  234269
 pig          |  227222 |  234269
 wed          |  227221 |  233860
 bullfrog     |  227211 |  234144
 truck        |  227208 |  233980
 pickup       |  227208 |  233980
 agent        |  227201 |  233840
 insur        |  227201 |  233840
 girl         |  227201 |  233934
 scout        |  227201 |  233934
 drill        |  227200 |  233986
 power        |  227200 |  233986
 ocean        |  227187 |  234211
 case         |  227173 |  233983
 crank        |  227173 |  233983
 squid        |  227169 |  234056
 senat        |  227167 |  234147
 fraction     |  227161 |  234065
 custom       |  227152 |  234128
 burglar      |  227148 |  234014
 grizzli      |  227133 |  233955
 wheel        |  227122 |  233813
 asteroid     |  227108 |  233928
 anomali      |  227106 |  234156
 acceler      |  227103 |  233428
 particl      |  227103 |  233428
 saw          |  227082 |  233934
 chain        |  227082 |  233934
 reactor      |  227035 |  234061
 wedg         |  227033 |  234143
 photon       |  227029 |  234017
 deficit      |  227029 |  234102
 vacuum       |  227021 |  233760
 cleaner      |  227021 |  233760
 cashier      |  227010 |  233858
 scyth        |  227001 |  233928
 cloud        |  226981 |  233569
 format       |  226981 |  233569
 tornado      |  226968 |  234058
 grand        |  226936 |  233730
 piano        |  226936 |  233730
 tripod       |  226930 |  233755
 tomato       |  226928 |  233915
 sandwich     |  226923 |  233786
 earring      |  226912 |  233665
 train        |  226912 |  233712
 freight      |  226912 |  233712
 skyscrap     |  226901 |  233755
 abstract     |  226890 |  233658
 mortician    |  226883 |  233781
 warranti     |  226876 |  233935
 atom         |  226868 |  233467
 hydrogen     |  226868 |  233467
 satellit     |  226866 |  233680
 corpor       |  226858 |  233818
 globul       |  226853 |  233980
 cow          |  226832 |  233808
 jersey       |  226832 |  233808
 salad        |  226830 |  233400
 buzzard      |  226804 |  233825
 lot          |  226794 |  233643
 park         |  226794 |  233643
 prime        |  226793 |  233325
 minist       |  226793 |  233325
 clot         |  226780 |  233380
 blood        |  226780 |  233380
 tuba         |  226765 |  233575
 tape         |  226749 |  233388
 record       |  226749 |  233388
 line         |  226747 |  233574
 dancer       |  226747 |  233574
 nation       |  226736 |  233796
 bartend      |  226653 |  233422
 hockey       |  226645 |  233178
 canyon       |  226617 |  233699
 ski          |  226610 |  233451
 lodg         |  226610 |  233451
 stovepip     |  226608 |  233489
 crane        |  226590 |  233578
 sand         |  226572 |  233270
 grain        |  226572 |  233270
 dust         |  226570 |  233391
 bunni        |  226570 |  233391
 lover        |  226564 |  233628
 fairi        |  226554 |  233743
 plaintiff    |  226537 |  233563
 wheelbarrow  |  226520 |  233206
 food         |  226445 |  233228
 stamp        |  226445 |  233228
 umbrella     |  226380 |  233273
 avocado      |  226375 |  232942
 oil          |  226357 |  233266
 filter       |  226357 |  233266
 financi      |  220105 |  225116
 complet      |  162829 |  164065
 ridicul      |  162346 |  163592
 handl        |  162200 |  163390
 singl        |  162200 |  163390
 single-handl |  162200 |  163390
 greedili     |  162123 |  163379
 careless     |  162009 |  163193
 somewhat     |  161979 |  163205
 accur        |  161975 |  163228
 overwhelm    |  161946 |  163107
 usual        |  161930 |  163158
 ostens       |  161826 |  163020
 lazili       |  161809 |  163133
 slyli        |  161803 |  163149
 underhand    |  161751 |  162955
 non          |  161585 |  162823
 chalant      |  161585 |  162823
 non-chal     |  161585 |  162823
 seldom       |  161525 |  162739
 accident     |  161511 |  162676
 almost       |  161508 |  162782
 often        |  161488 |  162733
 bare         |  161401 |  162659
 eager        |  161278 |  162513
 wise         |  161073 |  162341
 inexor       |  161042 |  162265
 feverish     |  160805 |  162020
 thorough     |  160611 |  161823
 home         |  154672 |  155766
 return       |  154672 |  155766
 lost         |  154655 |  155567
 glori        |  154655 |  155567
 start        |  154655 |  155567
 reminisc     |  154655 |  155567
 rumin        |  154577 |  155776
 read         |  154529 |  155642
 magazin      |  154529 |  155642
 pray         |  154478 |  155748
 floor        |  154396 |  155477
 sweep        |  154396 |  155477
 nag          |  154271 |  155259
 feel         |  154271 |  155259
 remors       |  154271 |  155259
 procrastin   |  154256 |  155371
 wake         |  154220 |  155397
 sleep        |  154217 |  155353
 panic        |  154189 |  155346
 get          |  154168 |  155253
 drunk        |  154168 |  155253
 stink        |  154168 |  155253
 hibern       |  154158 |  155358
 die          |  153973 |  155223
 fli          |  153943 |  155056
 rage         |  153943 |  155056
 flagel       |  153916 |  155067
 self-flagel  |  153916 |  155067
 daydream     |  153864 |  155043
 medit        |  153816 |  154935
 ceas         |  153735 |  154815
 exist        |  153735 |  154815
 joy          |  153672 |  154754
 beam         |  153672 |  154754
 trembl       |  153656 |  154799
 loud         |  153635 |  154665
 hide         |  153592 |  154797
 break        |  153559 |  154599
 coffe        |  153559 |  154599
 earn         |  153538 |  154540
 mile         |  153538 |  154540
 flier        |  153538 |  154540
 frequent     |  153538 |  154540
 leav         |  153535 |  154730
 rejoic       |  153226 |  154412
 sell         |  147231 |  148103
 plan         |  147046 |  147809
 escap        |  147046 |  147809
 throw        |  146973 |  147764
 negoti       |  146905 |  147704
 prenupti     |  146905 |  147704
 agreement    |  146905 |  147704
 card         |  146892 |  147731
 trade        |  146892 |  147731
 basebal      |  146892 |  147731
 oper         |  146888 |  147787
 small        |  146888 |  147787
 stand        |  146888 |  147787
 drink        |  146881 |  147727
 night        |  146881 |  147727
 steal        |  146835 |  147847
 pencil       |  146835 |  147847
 seek         |  146816 |  148029
 figur        |  146801 |  147908
 write        |  146736 |  147720
 letter       |  146736 |  147720
 recogn       |  146723 |  147823
 truce        |  146684 |  147630
 eat          |  146670 |  147874
 compet       |  146647 |  147760
 buy          |  146642 |  147522
 gift         |  146642 |  147522
 expens       |  146642 |  147522
 big          |  146626 |  147717
 fan          |  146626 |  147717
 fall         |  146597 |  147601
 assist       |  146587 |  147589
 requir       |  146587 |  147589
 chang        |  146542 |  147479
 heart        |  146542 |  147479
 conquer      |  146542 |  147695
 money        |  146481 |  147450
 borrow       |  146481 |  147450
 ignor        |  146475 |  147643
 share        |  146415 |  147404
 shower       |  146415 |  147404
 fault        |  146413 |  147361
 subtl        |  146413 |  147361
 kind         |  146402 |  147492
 great        |  146397 |  147367
 upon         |  146396 |  147366
 honor        |  146396 |  147366
 bestow       |  146396 |  147366
 pee          |  146394 |  147477
 avoid        |  146392 |  147388
 contact      |  146392 |  147388
 pink         |  146372 |  147347
 slip         |  146372 |  147347
 aid          |  146367 |  147225
 teach        |  146366 |  147516
 sanit        |  146361 |  147477
 lice         |  146360 |  147409
 buri         |  146360 |  147483
 cold         |  146357 |  147220
 pour         |  146357 |  147220
 freez        |  146357 |  147220
 water        |  146357 |  147220
 sea          |  146347 |  147217
 deep         |  146347 |  147217
 fish         |  146347 |  147217
 organ        |  146321 |  147476
 grit         |  146289 |  147227
 satiat       |  146251 |  147349
 assimil      |  146251 |  147377
 tri          |  146188 |  147200
 seduc        |  146188 |  147200
 reach        |  146132 |  147008
 understand   |  146132 |  147008
 brainwash    |  146068 |  147158
 admir        |  146050 |  147021
 caricatur    |  145989 |  147107
 deriv        |  145941 |  146790
 pervers      |  145941 |  146790
 satisfact    |  145941 |  146790
 moral        |  145854 |  146733
 lectur       |  145854 |  146733
 befriend     |  145799 |  146963
 learn        |  145758 |  146666
 lesson       |  145758 |  146666
 play         |  145738 |  146706
 pinochl      |  145738 |  146706
 peek         |  145698 |  146737
 danc         |  145555 |  146637
 fashion      |   78762 |   79203
 muddi        |   78750 |   79236
 hypnot       |   78747 |   79204
 childlik     |   78579 |   79002
 loyal        |   78575 |   79056
 mysteri      |   78554 |   79047
 annoy        |   78532 |   79032
 slow         |   78517 |   78996
 twist        |   78515 |   79016
 unstabl      |   78510 |   78945
 feder        |   78501 |   78967
 rever        |   78501 |   79008
 wrinkl       |   78495 |   78965
 rude         |   78495 |   78975
 boil         |   78493 |   78972
 high         |   78481 |   78940
 paid         |   78481 |   78940
 geosynchron  |   78478 |   78931
 greasi       |   78476 |   78961
 cosmopolitan |   78459 |   78903
 fat          |   78438 |   78935
 inciner      |   78429 |   78896
 dot          |   78426 |   78864
 polka        |   78426 |   78864
 polka-dot    |   78426 |   78864
 outer        |   78415 |   78910
 phoni        |   78411 |   78895
 pathet       |   78405 |   78869
 purpl        |   78405 |   78895
 frozen       |   78403 |   78886
 nearest      |   78396 |   78879
 statesmanlik |   78386 |   78830
 dirt         |   78376 |   78828
 encrust      |   78376 |   78828
 dirt-encrust |   78376 |   78828
 sur          |   78371 |   78895
 obsequi      |   78369 |   78805
 salti        |   78360 |   78834
 imagin       |   78356 |   78808
 south        |   78325 |   78787
 american     |   78325 |   78787
 load         |   78318 |   78832
 righteous    |   78282 |   78760
 fractur      |   78281 |   78737
 educ         |   78278 |   78682
 colleg       |   78278 |   78682
 college-educ |   78278 |   78682
 mitochondri  |   78269 |   78745
 treacher     |   78265 |   78697
 spartan      |   78252 |   78707
 felin        |   78244 |   78713
 ravish       |   78242 |   78765
 patern       |   78241 |   78701
 psychot      |   78238 |   78693
 shabbi       |   78228 |   78685
 dreamlik     |   78224 |   78642
 loath        |   78221 |   78653
 self-loath   |   78221 |   78653
 world        |   78203 |   78658
 call         |   78183 |   78610
 so-cal       |   78183 |   78610
 radioact     |   78182 |   78623
 alleg        |   78178 |   78664
 cantanker    |   78159 |   78620
 makeshift    |   78159 |   78648
 gentl        |   78156 |   78609
 fri          |   78143 |   78648
 linguist     |   78141 |   78586
 overrip      |   78134 |   78572
 varig        |   78132 |   78609
 vapor        |   78105 |   78548
 impromptu    |   78104 |   78569
 actual       |   78104 |   78592
 self-actu    |   78104 |   78592
 frighten     |   78100 |   78544
 molten       |   78100 |   78567
 gratifi      |   78098 |   78528
 bur          |   78094 |   78563
 hairi        |   78092 |   78563
 foreign      |   78083 |   78569
 tatter       |   78050 |   78518
 frustrat     |   78044 |   78474
 stoic        |   78036 |   78503
 eurasian     |   78033 |   78513
 proverbi     |   78031 |   78519
 green        |   78024 |   78450
 skinni       |   78023 |   78524
 familiar     |   78016 |   78477
 optim        |   78006 |   78483
 bohemian     |   78002 |   78500
 overpr       |   77983 |   78411
 pompous      |   77955 |   78460
 difficult    |   77938 |   78375
 raspi        |   77924 |   78461
 soggi        |   77912 |   78381
 resplend     |   77910 |   78351
 blotch       |   77910 |   78380
 fals         |   77908 |   78409
 infect       |   77907 |   78399
 magnific     |   77898 |   78350
 snooti       |   77897 |   78422
 moron        |   77886 |   78362
 moldi        |   77865 |   78370
 precis       |   77860 |   78331
 crispi       |   77856 |   78324
 smelli       |   77813 |   78279
 tempor       |   77810 |   78244
 alaskan      |   77808 |   78258
 elus         |   77775 |   78245
 miser        |   77772 |   78232
 flatul       |   77761 |   78201
 orbit        |   77723 |   78157
 mean-spirit  |   77660 |   78113
 flabbi       |   77649 |   78110
 nuclear      |   77609 |   78069
 go           |   15532 |   15545
 made         |       1 |       1
 america      |       1 |       1


If you need anything else, let me know!

Kris


Oleg Bartunov wrote:

>Kris,
>
>could you post 'explain analyze' output ?
>Also, could you disable index usage (set enable_indexscan=off)
>and rerun search using tsearch2 ?
>
>also, could you run 'stat' function to see frequency distribution
>of words. See http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes
>for details.
>
>Oleg
>
>
>>Hi all.  I am doing some work with tsearch2 and am not sure what to
>>expect out of it, performance wise.  Here is my setup:
>>
>>                          Table "public.product"
>>   Column    |   Type   |                    Modifiers
>>-------------+----------+-------------------------------------------------
>> description | text     |
>> product_id  | integer  | default nextval('product_product_id_seq'::text)
>> vector      | tsvector |
>>Indexes:
>>    "vector_idx" gist (vector)
>>Triggers:
>>    tsvectorupdate BEFORE INSERT OR UPDATE ON product FOR EACH ROW EXECUTE PROCEDURE tsearch2('vector',
'description')
>>
>>This table has 3,000,000 rows in it.  Each description field has roughly 50 characters.  There are fewer than ten
thousanddistinct words in my 3,000,000 rows.  The vector was filled using the description fields values.  I ran a
vacuumfull analyze before executing any of my queries. 
>>
>>Here are a couple of tests I performed using the tsearch index and like;
>>
>>search_test=# select count(*) from product where vector @@ to_tsquery('oil');
>> count
>>--------
>> 226357
>>(1 row)
>>
>>Time: 191056.230 ms
>>
>>search_test=# select count(*) from product where vector @@ to_tsquery('hydrogen');
>> count
>>--------
>> 226868
>>(1 row)
>>
>>Time: 306411.957 ms
>>
>>search_test=# select count(*) from product where description like '% oil %';
>> count
>>--------
>> 226357
>>(1 row)
>>
>>Time: 38426.851 ms
>>
>>search_test=# select count(*) from product where description like '% hydrogen %';
>> count
>>--------
>> 226868
>>(1 row)
>>
>>Time: 38265.421 ms
>>
>>
>>Both of the likes are using a sequential scan and both of the tsearch queries use the gist index.  Did I miss a
configurationparameter, are these queries incorrectly using tsearch2,or is this tsearch2's average performance?  Thanks
inadvance for the input! 
>>
>>Kris
>>
>>




Re: tsearch2 poor performance

From
"Gregory S. Williamson"
Date:
Can't speak to tsearch2 in specific but I have learned to be very cautious -- caching does indeed make a noticible
differenceon this sort of thing, especially if you have enough RAM to hold a significant amount of the data. Either
keepchanging the query target or do something violent to wipe the cache(s). 

Greg Williamson
DBA
GlobeXplorer LLC

-----Original Message-----
From: Kris Kiger [mailto:kris@musicrebellion.com]
Sent: Friday, September 24, 2004 2:59 PM
To: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] tsearch2 poor performance


Here is the explain analyze output, funny thing, after I ran josh's
query, mine ran a lot faster....maybe it forced a caching?;

search_test=# explain analyze select count(*) from product where vector
@@ to_tsquery('oil');
 Aggregate  (cost=6113.09..6113.09 rows=1 width=0) (actual
time=19643.372..19643.376 rows=1 loops=1)
   ->  Index Scan using vector_idx on product  (cost=0.00..6105.58
rows=3001 width=0) (actual time=0.381..18145.917 rows=226357 loops=1)
         Index Cond: (vector @@ '\'oil\''::tsquery)
         Filter: (vector @@ '\'oil\''::tsquery)
 Total runtime: 19643.597 ms

search_test=# explain analyze select count(*) from product where vector
@@ to_tsquery('hydrogen');
 Aggregate  (cost=6113.09..6113.09 rows=1 width=0) (actual
time=19629.766..19629.769 rows=1 loops=1)
   ->  Index Scan using vector_idx on product  (cost=0.00..6105.58
rows=3001 width=0) (actual time=0.378..18127.573 rows=226868 loops=1)
         Index Cond: (vector @@ '\'hydrogen\''::tsquery)
         Filter: (vector @@ '\'hydrogen\''::tsquery)
 Total runtime: 19629.992 ms

Here is Josh's;

search_test=# explain analyze SELECT count(q) FROM product,
to_tsquery('oil') AS q  WHERE vector @@ q;
 Aggregate  (cost=6150597.03..6150597.03 rows=1 width=32) (actual
time=21769.526..21769.530 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..6143097.02 rows=3000001 width=32)
(actual time=0.424..20450.208 rows=226357 loops=1)
         ->  Function Scan on q  (cost=0.00..12.50 rows=1000 width=32)
(actual time=0.023..0.031 rows=1 loops=1)
         ->  Index Scan using vector_idx on product  (cost=0.00..6105.58
rows=3000 width=32) (actual time=0.376..18165.415 rows=226357 loops=1)
               Index Cond: (product.vector @@ "outer".q)
               Filter: (product.vector @@ "outer".q)
 Total runtime: 21769.786 ms

Disabling Index usage slowed it down:

search_test=# explain analyze select count(*) from product where vector
@@ to_tsquery('hydrogen');
 Aggregate  (cost=347259.51..347259.51 rows=1 width=0) (actual
time=24675.933..24675.936 rows=1 loops=1)
   ->  Seq Scan on product  (cost=0.00..347252.00 rows=3001 width=0)
(actual time=0.320..23164.492 rows=226868 loops=1)
         Filter: (vector @@ '\'hydrogen\''::tsquery)
 Total runtime: 24676.091 ms

Time: 24678.842 ms

search_test=# explain analyze SELECT count(q) FROM product,
to_tsquery('oil') AS q  WHERE vector @@ q;
 Aggregate  (cost=67847264.50..67847264.50 rows=1 width=32) (actual
time=83631.201..83631.204 rows=1 loops=1)
   ->  Nested Loop  (cost=12.50..67839764.50 rows=3000001 width=32)
(actual time=0.214..82294.710 rows=226357 loops=1)
         Join Filter: ("outer".vector @@ "inner".q)
         ->  Seq Scan on product  (cost=0.00..339752.00 rows=3000000
width=32) (actual time=0.107..27563.952 rows=3000000 loops=1)
         ->  Materialize  (cost=12.50..22.50 rows=1000 width=32) (actual
time=0.003..0.006 rows=1 loops=3000000)
               ->  Function Scan on q  (cost=0.00..12.50 rows=1000
width=32) (actual time=0.019..0.023 rows=1 loops=1)
 Total runtime: 83631.385 ms

Here are the results of stat:

search_test=# select * from stat('select vector from product') order by
ndoc desc, nentry;
     word     |  ndoc   | nentry
--------------+---------+---------
 anoth        | 1187386 | 1477442
 bear         |  696668 |  780963
 take         |  675319 |  736410
 relat        |  491469 |  528259
 toward       |  490653 |  528369
 defin        |  490572 |  527099
 live         |  490538 |  527401
 beyond       |  490124 |  527957
 behind       |  490087 |  527735
 insid        |  489530 |  527074
 near         |  489504 |  527721
 around       |  489244 |  526870
 mean         |  478201 |  512699
 complex      |  440339 |  468669
 light        |  438685 |  468140
 ball         |  438567 |  468168
 pit          |  438293 |  467807
 dress        |  438128 |  467260
 player       |  437633 |  466753
 secret       |  433279 |  457246
 love         |  423777 |  442694
 give         |  423691 |  441305
 need         |  423336 |  434409
 peopl        |  423336 |  434409
 believ       |  423336 |  434409
 rememb       |  423336 |  434409
 howev        |  421762 |  434194
 real         |  419906 |  435074
 furthermor   |  416672 |  434413
 indic        |  416508 |  434919
 exampl       |  416508 |  434919
 alway        |  415543 |  432861
 sometim      |  415543 |  432861
 see          |  410706 |  434586
 inde         |  408379 |  434283
 fruit        |  363203 |  381862
 cook         |  362674 |  381112
 graduat      |  362444 |  381284
 chees        |  362358 |  381040
 hesit        |  307431 |  317550
 self         |  301001 |  312312
 hard         |  300138 |  310167
 spirit       |  299310 |  312092
 know         |  298246 |  309010
 laugh        |  294136 |  302392
 make         |  287633 |  295003
 find         |  287550 |  294770
 goe          |  279336 |  287025
 team         |  228000 |  234703
 footbal      |  228000 |  234703
 void         |  227914 |  234681
 formless     |  227914 |  234681
 board        |  227907 |  234797
 chess        |  227907 |  234797
 submarin     |  227869 |  234727
 inferior     |  227858 |  234357
 viper        |  227855 |  234865
 cylind       |  227847 |  234505
 suit         |  227822 |  234376
 class        |  227822 |  234376
 action       |  227822 |  234376
 diskett      |  227802 |  234786
 roller       |  227792 |  234524
 coaster      |  227792 |  234524
 mate         |  227785 |  234431
 ritual       |  227785 |  234431
 engin        |  227784 |  234575
 steam        |  227784 |  234575
 industri     |  227780 |  234312
 fire         |  227775 |  234532
 hydrant      |  227775 |  234532
 briar        |  227769 |  234524
 patch        |  227769 |  234524
 mastadon     |  227677 |  234665
 defend       |  227617 |  234410
 blade        |  227603 |  234356
 razor        |  227603 |  234356
 cab          |  227578 |  234554
 driver       |  227578 |  234554
 cough        |  227570 |  234324
 syrup        |  227570 |  234324
 cowboy       |  227566 |  234663
 chop         |  227564 |  234437
 pork         |  227564 |  234437
 ceo          |  227557 |  234760
 rattlesnak   |  227554 |  234323
 hell         |  227540 |  234313
 flavor       |  227540 |  234313
 maelstrom    |  227537 |  234404
 mulch        |  227531 |  234311
 cyprus       |  227531 |  234311
 tack         |  227525 |  234462
 carpet       |  227525 |  234462
 movi         |  227505 |  234207
 theater      |  227505 |  234207
 spider       |  227466 |  234524
 cone         |  227463 |  234198
 pine         |  227463 |  234198
 garbag       |  227459 |  234207
 beer         |  227443 |  234077
 bottl        |  227443 |  234077
 polygon      |  227438 |  234267
 judg         |  227425 |  234565
 blith        |  227409 |  233979
 traffic      |  227403 |  234051
 paper        |  227397 |  234028
 napkin       |  227397 |  234028
 apart        |  227393 |  233911
 build        |  227393 |  233911
 cocker       |  227368 |  233926
 spaniel      |  227368 |  233926
 bay          |  227358 |  234261
 cargo        |  227358 |  234261
 order        |  227357 |  233885
 short        |  227357 |  233885
 polar        |  227326 |  234118
 demon        |  227324 |  234442
 minivan      |  227317 |  234292
 bulb         |  227314 |  234089
 fundrais     |  227308 |  234235
 eggplant     |  227306 |  234202
 cake         |  227299 |  234075
 bowl         |  227299 |  234110
 paycheck     |  227295 |  234224
 sheriff      |  227292 |  234313
 turkey       |  227271 |  234267
 turn         |  227265 |  234210
 signal       |  227265 |  234210
 chestnut     |  227250 |  234104
 hole         |  227239 |  233975
 puncher      |  227239 |  233975
 tabloid      |  227238 |  234341
 microscop    |  227236 |  234067
 reclin       |  227234 |  233946
 dolphin      |  227231 |  234080
 pen          |  227222 |  234269
 pig          |  227222 |  234269
 wed          |  227221 |  233860
 bullfrog     |  227211 |  234144
 truck        |  227208 |  233980
 pickup       |  227208 |  233980
 agent        |  227201 |  233840
 insur        |  227201 |  233840
 girl         |  227201 |  233934
 scout        |  227201 |  233934
 drill        |  227200 |  233986
 power        |  227200 |  233986
 ocean        |  227187 |  234211
 case         |  227173 |  233983
 crank        |  227173 |  233983
 squid        |  227169 |  234056
 senat        |  227167 |  234147
 fraction     |  227161 |  234065
 custom       |  227152 |  234128
 burglar      |  227148 |  234014
 grizzli      |  227133 |  233955
 wheel        |  227122 |  233813
 asteroid     |  227108 |  233928
 anomali      |  227106 |  234156
 acceler      |  227103 |  233428
 particl      |  227103 |  233428
 saw          |  227082 |  233934
 chain        |  227082 |  233934
 reactor      |  227035 |  234061
 wedg         |  227033 |  234143
 photon       |  227029 |  234017
 deficit      |  227029 |  234102
 vacuum       |  227021 |  233760
 cleaner      |  227021 |  233760
 cashier      |  227010 |  233858
 scyth        |  227001 |  233928
 cloud        |  226981 |  233569
 format       |  226981 |  233569
 tornado      |  226968 |  234058
 grand        |  226936 |  233730
 piano        |  226936 |  233730
 tripod       |  226930 |  233755
 tomato       |  226928 |  233915
 sandwich     |  226923 |  233786
 earring      |  226912 |  233665
 train        |  226912 |  233712
 freight      |  226912 |  233712
 skyscrap     |  226901 |  233755
 abstract     |  226890 |  233658
 mortician    |  226883 |  233781
 warranti     |  226876 |  233935
 atom         |  226868 |  233467
 hydrogen     |  226868 |  233467
 satellit     |  226866 |  233680
 corpor       |  226858 |  233818
 globul       |  226853 |  233980
 cow          |  226832 |  233808
 jersey       |  226832 |  233808
 salad        |  226830 |  233400
 buzzard      |  226804 |  233825
 lot          |  226794 |  233643
 park         |  226794 |  233643
 prime        |  226793 |  233325
 minist       |  226793 |  233325
 clot         |  226780 |  233380
 blood        |  226780 |  233380
 tuba         |  226765 |  233575
 tape         |  226749 |  233388
 record       |  226749 |  233388
 line         |  226747 |  233574
 dancer       |  226747 |  233574
 nation       |  226736 |  233796
 bartend      |  226653 |  233422
 hockey       |  226645 |  233178
 canyon       |  226617 |  233699
 ski          |  226610 |  233451
 lodg         |  226610 |  233451
 stovepip     |  226608 |  233489
 crane        |  226590 |  233578
 sand         |  226572 |  233270
 grain        |  226572 |  233270
 dust         |  226570 |  233391
 bunni        |  226570 |  233391
 lover        |  226564 |  233628
 fairi        |  226554 |  233743
 plaintiff    |  226537 |  233563
 wheelbarrow  |  226520 |  233206
 food         |  226445 |  233228
 stamp        |  226445 |  233228
 umbrella     |  226380 |  233273
 avocado      |  226375 |  232942
 oil          |  226357 |  233266
 filter       |  226357 |  233266
 financi      |  220105 |  225116
 complet      |  162829 |  164065
 ridicul      |  162346 |  163592
 handl        |  162200 |  163390
 singl        |  162200 |  163390
 single-handl |  162200 |  163390
 greedili     |  162123 |  163379
 careless     |  162009 |  163193
 somewhat     |  161979 |  163205
 accur        |  161975 |  163228
 overwhelm    |  161946 |  163107
 usual        |  161930 |  163158
 ostens       |  161826 |  163020
 lazili       |  161809 |  163133
 slyli        |  161803 |  163149
 underhand    |  161751 |  162955
 non          |  161585 |  162823
 chalant      |  161585 |  162823
 non-chal     |  161585 |  162823
 seldom       |  161525 |  162739
 accident     |  161511 |  162676
 almost       |  161508 |  162782
 often        |  161488 |  162733
 bare         |  161401 |  162659
 eager        |  161278 |  162513
 wise         |  161073 |  162341
 inexor       |  161042 |  162265
 feverish     |  160805 |  162020
 thorough     |  160611 |  161823
 home         |  154672 |  155766
 return       |  154672 |  155766
 lost         |  154655 |  155567
 glori        |  154655 |  155567
 start        |  154655 |  155567
 reminisc     |  154655 |  155567
 rumin        |  154577 |  155776
 read         |  154529 |  155642
 magazin      |  154529 |  155642
 pray         |  154478 |  155748
 floor        |  154396 |  155477
 sweep        |  154396 |  155477
 nag          |  154271 |  155259
 feel         |  154271 |  155259
 remors       |  154271 |  155259
 procrastin   |  154256 |  155371
 wake         |  154220 |  155397
 sleep        |  154217 |  155353
 panic        |  154189 |  155346
 get          |  154168 |  155253
 drunk        |  154168 |  155253
 stink        |  154168 |  155253
 hibern       |  154158 |  155358
 die          |  153973 |  155223
 fli          |  153943 |  155056
 rage         |  153943 |  155056
 flagel       |  153916 |  155067
 self-flagel  |  153916 |  155067
 daydream     |  153864 |  155043
 medit        |  153816 |  154935
 ceas         |  153735 |  154815
 exist        |  153735 |  154815
 joy          |  153672 |  154754
 beam         |  153672 |  154754
 trembl       |  153656 |  154799
 loud         |  153635 |  154665
 hide         |  153592 |  154797
 break        |  153559 |  154599
 coffe        |  153559 |  154599
 earn         |  153538 |  154540
 mile         |  153538 |  154540
 flier        |  153538 |  154540
 frequent     |  153538 |  154540
 leav         |  153535 |  154730
 rejoic       |  153226 |  154412
 sell         |  147231 |  148103
 plan         |  147046 |  147809
 escap        |  147046 |  147809
 throw        |  146973 |  147764
 negoti       |  146905 |  147704
 prenupti     |  146905 |  147704
 agreement    |  146905 |  147704
 card         |  146892 |  147731
 trade        |  146892 |  147731
 basebal      |  146892 |  147731
 oper         |  146888 |  147787
 small        |  146888 |  147787
 stand        |  146888 |  147787
 drink        |  146881 |  147727
 night        |  146881 |  147727
 steal        |  146835 |  147847
 pencil       |  146835 |  147847
 seek         |  146816 |  148029
 figur        |  146801 |  147908
 write        |  146736 |  147720
 letter       |  146736 |  147720
 recogn       |  146723 |  147823
 truce        |  146684 |  147630
 eat          |  146670 |  147874
 compet       |  146647 |  147760
 buy          |  146642 |  147522
 gift         |  146642 |  147522
 expens       |  146642 |  147522
 big          |  146626 |  147717
 fan          |  146626 |  147717
 fall         |  146597 |  147601
 assist       |  146587 |  147589
 requir       |  146587 |  147589
 chang        |  146542 |  147479
 heart        |  146542 |  147479
 conquer      |  146542 |  147695
 money        |  146481 |  147450
 borrow       |  146481 |  147450
 ignor        |  146475 |  147643
 share        |  146415 |  147404
 shower       |  146415 |  147404
 fault        |  146413 |  147361
 subtl        |  146413 |  147361
 kind         |  146402 |  147492
 great        |  146397 |  147367
 upon         |  146396 |  147366
 honor        |  146396 |  147366
 bestow       |  146396 |  147366
 pee          |  146394 |  147477
 avoid        |  146392 |  147388
 contact      |  146392 |  147388
 pink         |  146372 |  147347
 slip         |  146372 |  147347
 aid          |  146367 |  147225
 teach        |  146366 |  147516
 sanit        |  146361 |  147477
 lice         |  146360 |  147409
 buri         |  146360 |  147483
 cold         |  146357 |  147220
 pour         |  146357 |  147220
 freez        |  146357 |  147220
 water        |  146357 |  147220
 sea          |  146347 |  147217
 deep         |  146347 |  147217
 fish         |  146347 |  147217
 organ        |  146321 |  147476
 grit         |  146289 |  147227
 satiat       |  146251 |  147349
 assimil      |  146251 |  147377
 tri          |  146188 |  147200
 seduc        |  146188 |  147200
 reach        |  146132 |  147008
 understand   |  146132 |  147008
 brainwash    |  146068 |  147158
 admir        |  146050 |  147021
 caricatur    |  145989 |  147107
 deriv        |  145941 |  146790
 pervers      |  145941 |  146790
 satisfact    |  145941 |  146790
 moral        |  145854 |  146733
 lectur       |  145854 |  146733
 befriend     |  145799 |  146963
 learn        |  145758 |  146666
 lesson       |  145758 |  146666
 play         |  145738 |  146706
 pinochl      |  145738 |  146706
 peek         |  145698 |  146737
 danc         |  145555 |  146637
 fashion      |   78762 |   79203
 muddi        |   78750 |   79236
 hypnot       |   78747 |   79204
 childlik     |   78579 |   79002
 loyal        |   78575 |   79056
 mysteri      |   78554 |   79047
 annoy        |   78532 |   79032
 slow         |   78517 |   78996
 twist        |   78515 |   79016
 unstabl      |   78510 |   78945
 feder        |   78501 |   78967
 rever        |   78501 |   79008
 wrinkl       |   78495 |   78965
 rude         |   78495 |   78975
 boil         |   78493 |   78972
 high         |   78481 |   78940
 paid         |   78481 |   78940
 geosynchron  |   78478 |   78931
 greasi       |   78476 |   78961
 cosmopolitan |   78459 |   78903
 fat          |   78438 |   78935
 inciner      |   78429 |   78896
 dot          |   78426 |   78864
 polka        |   78426 |   78864
 polka-dot    |   78426 |   78864
 outer        |   78415 |   78910
 phoni        |   78411 |   78895
 pathet       |   78405 |   78869
 purpl        |   78405 |   78895
 frozen       |   78403 |   78886
 nearest      |   78396 |   78879
 statesmanlik |   78386 |   78830
 dirt         |   78376 |   78828
 encrust      |   78376 |   78828
 dirt-encrust |   78376 |   78828
 sur          |   78371 |   78895
 obsequi      |   78369 |   78805
 salti        |   78360 |   78834
 imagin       |   78356 |   78808
 south        |   78325 |   78787
 american     |   78325 |   78787
 load         |   78318 |   78832
 righteous    |   78282 |   78760
 fractur      |   78281 |   78737
 educ         |   78278 |   78682
 colleg       |   78278 |   78682
 college-educ |   78278 |   78682
 mitochondri  |   78269 |   78745
 treacher     |   78265 |   78697
 spartan      |   78252 |   78707
 felin        |   78244 |   78713
 ravish       |   78242 |   78765
 patern       |   78241 |   78701
 psychot      |   78238 |   78693
 shabbi       |   78228 |   78685
 dreamlik     |   78224 |   78642
 loath        |   78221 |   78653
 self-loath   |   78221 |   78653
 world        |   78203 |   78658
 call         |   78183 |   78610
 so-cal       |   78183 |   78610
 radioact     |   78182 |   78623
 alleg        |   78178 |   78664
 cantanker    |   78159 |   78620
 makeshift    |   78159 |   78648
 gentl        |   78156 |   78609
 fri          |   78143 |   78648
 linguist     |   78141 |   78586
 overrip      |   78134 |   78572
 varig        |   78132 |   78609
 vapor        |   78105 |   78548
 impromptu    |   78104 |   78569
 actual       |   78104 |   78592
 self-actu    |   78104 |   78592
 frighten     |   78100 |   78544
 molten       |   78100 |   78567
 gratifi      |   78098 |   78528
 bur          |   78094 |   78563
 hairi        |   78092 |   78563
 foreign      |   78083 |   78569
 tatter       |   78050 |   78518
 frustrat     |   78044 |   78474
 stoic        |   78036 |   78503
 eurasian     |   78033 |   78513
 proverbi     |   78031 |   78519
 green        |   78024 |   78450
 skinni       |   78023 |   78524
 familiar     |   78016 |   78477
 optim        |   78006 |   78483
 bohemian     |   78002 |   78500
 overpr       |   77983 |   78411
 pompous      |   77955 |   78460
 difficult    |   77938 |   78375
 raspi        |   77924 |   78461
 soggi        |   77912 |   78381
 resplend     |   77910 |   78351
 blotch       |   77910 |   78380
 fals         |   77908 |   78409
 infect       |   77907 |   78399
 magnific     |   77898 |   78350
 snooti       |   77897 |   78422
 moron        |   77886 |   78362
 moldi        |   77865 |   78370
 precis       |   77860 |   78331
 crispi       |   77856 |   78324
 smelli       |   77813 |   78279
 tempor       |   77810 |   78244
 alaskan      |   77808 |   78258
 elus         |   77775 |   78245
 miser        |   77772 |   78232
 flatul       |   77761 |   78201
 orbit        |   77723 |   78157
 mean-spirit  |   77660 |   78113
 flabbi       |   77649 |   78110
 nuclear      |   77609 |   78069
 go           |   15532 |   15545
 made         |       1 |       1
 america      |       1 |       1


If you need anything else, let me know!

Kris


Oleg Bartunov wrote:

>Kris,
>
>could you post 'explain analyze' output ?
>Also, could you disable index usage (set enable_indexscan=off)
>and rerun search using tsearch2 ?
>
>also, could you run 'stat' function to see frequency distribution
>of words. See http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes
>for details.
>
>Oleg
>
>
>>Hi all.  I am doing some work with tsearch2 and am not sure what to
>>expect out of it, performance wise.  Here is my setup:
>>
>>                          Table "public.product"
>>   Column    |   Type   |                    Modifiers
>>-------------+----------+-------------------------------------------------
>> description | text     |
>> product_id  | integer  | default nextval('product_product_id_seq'::text)
>> vector      | tsvector |
>>Indexes:
>>    "vector_idx" gist (vector)
>>Triggers:
>>    tsvectorupdate BEFORE INSERT OR UPDATE ON product FOR EACH ROW EXECUTE PROCEDURE tsearch2('vector',
'description')
>>
>>This table has 3,000,000 rows in it.  Each description field has roughly 50 characters.  There are fewer than ten
thousanddistinct words in my 3,000,000 rows.  The vector was filled using the description fields values.  I ran a
vacuumfull analyze before executing any of my queries. 
>>
>>Here are a couple of tests I performed using the tsearch index and like;
>>
>>search_test=# select count(*) from product where vector @@ to_tsquery('oil');
>> count
>>--------
>> 226357
>>(1 row)
>>
>>Time: 191056.230 ms
>>
>>search_test=# select count(*) from product where vector @@ to_tsquery('hydrogen');
>> count
>>--------
>> 226868
>>(1 row)
>>
>>Time: 306411.957 ms
>>
>>search_test=# select count(*) from product where description like '% oil %';
>> count
>>--------
>> 226357
>>(1 row)
>>
>>Time: 38426.851 ms
>>
>>search_test=# select count(*) from product where description like '% hydrogen %';
>> count
>>--------
>> 226868
>>(1 row)
>>
>>Time: 38265.421 ms
>>
>>
>>Both of the likes are using a sequential scan and both of the tsearch queries use the gist index.  Did I miss a
configurationparameter, are these queries incorrectly using tsearch2,or is this tsearch2's average performance?  Thanks
inadvance for the input! 
>>
>>Kris
>>
>>




---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: tsearch2 poor performance

From
Kris Kiger
Date:
Regardless of caching, the queries are still taking 19~20 seconds to run
on the 3,000,000 rows.  I've played with performance tuning and nothing
seems to make much of a difference.  If I am reading that list from stat
correctly, then I am operating on  508 distinct words.  Is this the
performance I should expect from tsearch2?  Or is something still awry?
 I'm inclined to think  something else is wrong, after reading some
other people's tsearch performance stats.  Thanks!

Kris



Re: tsearch2 poor performance

From
Oleg Bartunov
Date:
Kris,

do you actually have only 508 disctinct words ? Could you try
more complex queries, say 2-3 words. Does these queries run faster ?


    Oleg
On Mon, 27 Sep 2004, Kris Kiger wrote:

> Regardless of caching, the queries are still taking 19~20 seconds to run
> on the 3,000,000 rows.  I've played with performance tuning and nothing
> seems to make much of a difference.  If I am reading that list from stat
> correctly, then I am operating on  508 distinct words.  Is this the
> performance I should expect from tsearch2?  Or is something still awry?
>  I'm inclined to think  something else is wrong, after reading some
> other people's tsearch performance stats.  Thanks!
>
> Kris
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
>       subscribe-nomail command to majordomo@postgresql.org so that your
>       message can get through to the mailing list cleanly
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: tsearch2 poor performance

From
Kris Kiger
Date:
Oleg,

    Thanks for the help on this.

    The query I used to return the 508 number is:
         SELECT * FROM stat('SELECT vector FROM product') ORDER BY ndoc
desc, word ;

    Testing says, the more words I use, the faster the query is.  My
original search word, 'oil', appears in 226,357 documents 233,266 times.
 As far as distinct words go, 'oil' is middle of the road for
occurences.  As it is set up now, the best search time I am getting on
this single word is roughly 22 seconds.

Kris

Oleg Bartunov wrote:

>Kris,
>
>do you actually have only 508 disctinct words ? Could you try
>more complex queries, say 2-3 words. Does these queries run faster ?
>
>
>    Oleg
>On Mon, 27 Sep 2004, Kris Kiger wrote:
>
>
>
>>Regardless of caching, the queries are still taking 19~20 seconds to run
>>on the 3,000,000 rows.  I've played with performance tuning and nothing
>>seems to make much of a difference.  If I am reading that list from stat
>>correctly, then I am operating on  508 distinct words.  Is this the
>>performance I should expect from tsearch2?  Or is something still awry?
>> I'm inclined to think  something else is wrong, after reading some
>>other people's tsearch performance stats.  Thanks!
>>
>>Kris
>>
>>
>>
>    Regards,
>        Oleg
>
>


Re: tsearch2 poor performance

From
"Joshua D. Drake"
Date:
Hello,

I might of missed this on a previous message, BUT what type of hardware
are we dealing with here? Is it possible that we just don't have enough
IO/RAM to push this?

J


Kris Kiger wrote:
> Oleg,
>
>    Thanks for the help on this.
>
>    The query I used to return the 508 number is:
>         SELECT * FROM stat('SELECT vector FROM product') ORDER BY ndoc
> desc, word ;
>      Testing says, the more words I use, the faster the query is.  My
> original search word, 'oil', appears in 226,357 documents 233,266 times.
> As far as distinct words go, 'oil' is middle of the road for
> occurences.  As it is set up now, the best search time I am getting on
> this single word is roughly 22 seconds.
> Kris
>
> Oleg Bartunov wrote:
>
>> Kris,
>>
>> do you actually have only 508 disctinct words ? Could you try
>> more complex queries, say 2-3 words. Does these queries run faster ?
>>
>>
>>     Oleg
>> On Mon, 27 Sep 2004, Kris Kiger wrote:
>>
>>
>>
>>> Regardless of caching, the queries are still taking 19~20 seconds to run
>>> on the 3,000,000 rows.  I've played with performance tuning and nothing
>>> seems to make much of a difference.  If I am reading that list from stat
>>> correctly, then I am operating on  508 distinct words.  Is this the
>>> performance I should expect from tsearch2?  Or is something still awry?
>>> I'm inclined to think  something else is wrong, after reading some
>>> other people's tsearch performance stats.  Thanks!
>>>
>>> Kris
>>>
>>>
>>
>>     Regards,
>>         Oleg
>>
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL

Attachment

Re: tsearch2 poor performance

From
Kris Kiger
Date:
Josh,

    I am running my tests on a dual processor PIII 1 GHz machine with
2Gb of RAM.  There are four hard drives set up.  One for OS/logging and
three in a raid-5 configuration for the postgres data directory.

Kris

Joshua D. Drake wrote:

> Hello,
> I might of missed this on a previous message, BUT what type of
> hardware are we dealing with here? Is it possible that we just don't
> have enough IO/RAM to push this?
> J
> Kris Kiger wrote:
>
>> Oleg,
>>    Thanks for the help on this.
>>    The query I used to return the 508 number is:
>>         SELECT * FROM stat('SELECT vector FROM product') ORDER BY
>> ndoc desc, word ;
>>      Testing says, the more words I use, the faster the query is.  My
>> original search word, 'oil', appears in 226,357 documents 233,266
>> times. As far as distinct words go, 'oil' is middle of the road for
>> occurences.  As it is set up now, the best search time I am getting
>> on this single word is roughly 22 seconds. Kris
>>
>> Oleg Bartunov wrote:
>>
>>> Kris,
>>>
>>> do you actually have only 508 disctinct words ? Could you try
>>> more complex queries, say 2-3 words. Does these queries run faster ?
>>>     Oleg
>>> On Mon, 27 Sep 2004, Kris Kiger wrote:
>>>
>>>> Regardless of caching, the queries are still taking 19~20 seconds
>>>> to run
>>>> on the 3,000,000 rows.  I've played with performance tuning and
>>>> nothing
>>>> seems to make much of a difference.  If I am reading that list from
>>>> stat
>>>> correctly, then I am operating on  508 distinct words.  Is this the
>>>> performance I should expect from tsearch2?  Or is something still
>>>> awry?
>>>> I'm inclined to think  something else is wrong, after reading some
>>>> other people's tsearch performance stats.  Thanks!
>>>>
>>>> Kris
>>>
>>>     Regards,
>>>         Oleg
>>


--
______________________________
Kris Kiger
Software Developer
Digonex Technologies, Inc.
317.638.4174 Fax


CONFIDENTIALITY NOTICE:  The information in this transmission is private, confidential, may be legally privileged, is
propertyof the sender and is intended solely for the use of the addressee.  If you are not the addressee, you should
notread, disclose, distribute, copy, use or rely upon the information contained in this transmission.  If you have
receivedthis transmission in error please delete or destroy it and notify DIGONEX TECHNOLOGIES, INC. immediately at
(317)638-4154. 





Re: tsearch2 poor performance

From
Oleg Bartunov
Date:
On Mon, 27 Sep 2004, Kris Kiger wrote:

> Oleg,
>
>     Thanks for the help on this.
>
>     The query I used to return the 508 number is:
>          SELECT * FROM stat('SELECT vector FROM product') ORDER BY ndoc
> desc, word ;
>
>     Testing says, the more words I use, the faster the query is.  My
> original search word, 'oil', appears in 226,357 documents 233,266 times.
>  As far as distinct words go, 'oil' is middle of the road for
> occurences.  As it is set up now, the best search time I am getting on
> this single word is roughly 22 seconds.

Does this time (22 seconds) is still better than seq. scan (no index)
or standard 'LIKE' ?

>
> Kris
>
> Oleg Bartunov wrote:
>
> >Kris,
> >
> >do you actually have only 508 disctinct words ? Could you try
> >more complex queries, say 2-3 words. Does these queries run faster ?
> >
> >
> >    Oleg
> >On Mon, 27 Sep 2004, Kris Kiger wrote:
> >
> >
> >
> >>Regardless of caching, the queries are still taking 19~20 seconds to run
> >>on the 3,000,000 rows.  I've played with performance tuning and nothing
> >>seems to make much of a difference.  If I am reading that list from stat
> >>correctly, then I am operating on  508 distinct words.  Is this the
> >>performance I should expect from tsearch2?  Or is something still awry?
> >> I'm inclined to think  something else is wrong, after reading some
> >>other people's tsearch performance stats.  Thanks!
> >>
> >>Kris
> >>
> >>
> >>
> >    Regards,
> >        Oleg
> >
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: tsearch2 poor performance

From
"Joshua D. Drake"
Date:
Kris Kiger wrote:
> Josh,
>
>    I am running my tests on a dual processor PIII 1 GHz machine with 2Gb
> of RAM.  There are four hard drives set up.  One for OS/logging and
> three in a raid-5 configuration for the postgres data directory.
>

What about other factors?

effective_cache_size
random_page_cost
sort_mem

FYI that example I gave was on a Dual P4-Xeon with 4 Gigs of ram and a
RAID5 over 6 drives.

Just curious.

J



> Kris
>
> Joshua D. Drake wrote:
>
>> Hello,
>> I might of missed this on a previous message, BUT what type of
>> hardware are we dealing with here? Is it possible that we just don't
>> have enough IO/RAM to push this?
>> J
>> Kris Kiger wrote:
>>
>>> Oleg,
>>>    Thanks for the help on this.
>>>    The query I used to return the 508 number is:
>>>         SELECT * FROM stat('SELECT vector FROM product') ORDER BY
>>> ndoc desc, word ;
>>>      Testing says, the more words I use, the faster the query is.  My
>>> original search word, 'oil', appears in 226,357 documents 233,266
>>> times. As far as distinct words go, 'oil' is middle of the road for
>>> occurences.  As it is set up now, the best search time I am getting
>>> on this single word is roughly 22 seconds. Kris
>>>
>>> Oleg Bartunov wrote:
>>>
>>>> Kris,
>>>>
>>>> do you actually have only 508 disctinct words ? Could you try
>>>> more complex queries, say 2-3 words. Does these queries run faster ?
>>>>     Oleg
>>>> On Mon, 27 Sep 2004, Kris Kiger wrote:
>>>>
>>>>> Regardless of caching, the queries are still taking 19~20 seconds
>>>>> to run
>>>>> on the 3,000,000 rows.  I've played with performance tuning and
>>>>> nothing
>>>>> seems to make much of a difference.  If I am reading that list from
>>>>> stat
>>>>> correctly, then I am operating on  508 distinct words.  Is this the
>>>>> performance I should expect from tsearch2?  Or is something still
>>>>> awry?
>>>>> I'm inclined to think  something else is wrong, after reading some
>>>>> other people's tsearch performance stats.  Thanks!
>>>>>
>>>>> Kris
>>>>
>>>>
>>>>     Regards,
>>>>         Oleg
>>>
>>>
>
>


--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - jd@commandprompt.com - http://www.commandprompt.com
Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL

Attachment

Re: tsearch2 poor performance

From
Kris Kiger
Date:
Yes, it is much better than no index of sequential scan.  We may just be
looking at the best performance tsearch2 can offer on my machine.

search_test=#  explain analyze SELECT count(q) FROM product,
to_tsquery('oil') AS q  WHERE vector @@ q;
                                                            QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=67847264.50..67847264.50 rows=1 width=32) (actual
time=83311.552..83311.555 rows=1 loops=1)
   ->  Nested Loop  (cost=12.50..67839764.50 rows=3000001 width=32)
(actual time=0.204..81960.198 rows=226357 loops=1)
         Join Filter: ("outer".vector @@ "inner".q)
         ->  Seq Scan on product  (cost=0.00..339752.00 rows=3000000
width=32) (actual time=0.100..27415.795 rows=3000000 loops=1)
         ->  Materialize  (cost=12.50..22.50 rows=1000 width=32) (actual
time=0.003..0.006 rows=1 loops=3000000)
               ->  Function Scan on q  (cost=0.00..12.50 rows=1000
width=32) (actual time=0.020..0.024 rows=1 loops=1)
 Total runtime: 83311.735 ms
(7 rows)

search_test=# explain analyze select count(*) from product where
description like '% oil %';
                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=347264.01..347264.01 rows=1 width=0) (actual
time=39858.350..39858.353 rows=1 loops=1)
   ->  Seq Scan on product  (cost=0.00..347252.00 rows=4801 width=0)
(actual time=0.100..38320.293 rows=226357 loops=1)
         Filter: (description ~~ '% oil %'::text)
 Total runtime: 39858.491 ms


>>Oleg,
>>
>>    Thanks for the help on this.
>>
>>    The query I used to return the 508 number is:
>>         SELECT * FROM stat('SELECT vector FROM product') ORDER BY ndoc
>>desc, word ;
>>
>>    Testing says, the more words I use, the faster the query is.  My
>>original search word, 'oil', appears in 226,357 documents 233,266 times.
>> As far as distinct words go, 'oil' is middle of the road for
>>occurences.  As it is set up now, the best search time I am getting on
>>this single word is roughly 22 seconds.
>>
>>
>
>Does this time (22 seconds) is still better than seq. scan (no index)
>or standard 'LIKE' ?
>
>
>



Re: tsearch2 poor performance

From
Oleg Bartunov
Date:
On Mon, 27 Sep 2004, Kris Kiger wrote:

> Yes, it is much better than no index of sequential scan.  We may just be
> looking at the best performance tsearch2 can offer on my machine.

Hmm,

tsearch2 with no index should be faster than LIKE, because tsearch2
already has *parsed* and *sorted* list.

It's interesting to fetch just 226,357 documents from disk and see processing
time.

select count(*) from product limit 226357 offset 1;


>
> search_test=#  explain analyze SELECT count(q) FROM product,
> to_tsquery('oil') AS q  WHERE vector @@ q;
>                                                             QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=67847264.50..67847264.50 rows=1 width=32) (actual
> time=83311.552..83311.555 rows=1 loops=1)
>    ->  Nested Loop  (cost=12.50..67839764.50 rows=3000001 width=32)
> (actual time=0.204..81960.198 rows=226357 loops=1)
>          Join Filter: ("outer".vector @@ "inner".q)
>          ->  Seq Scan on product  (cost=0.00..339752.00 rows=3000000
> width=32) (actual time=0.100..27415.795 rows=3000000 loops=1)
>          ->  Materialize  (cost=12.50..22.50 rows=1000 width=32) (actual
> time=0.003..0.006 rows=1 loops=3000000)
>                ->  Function Scan on q  (cost=0.00..12.50 rows=1000
> width=32) (actual time=0.020..0.024 rows=1 loops=1)
>  Total runtime: 83311.735 ms
> (7 rows)
>
> search_test=# explain analyze select count(*) from product where
> description like '% oil %';
>                                                        QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=347264.01..347264.01 rows=1 width=0) (actual
> time=39858.350..39858.353 rows=1 loops=1)
>    ->  Seq Scan on product  (cost=0.00..347252.00 rows=4801 width=0)
> (actual time=0.100..38320.293 rows=226357 loops=1)
>          Filter: (description ~~ '% oil %'::text)
>  Total runtime: 39858.491 ms
>
>
> >>Oleg,
> >>
> >>    Thanks for the help on this.
> >>
> >>    The query I used to return the 508 number is:
> >>         SELECT * FROM stat('SELECT vector FROM product') ORDER BY ndoc
> >>desc, word ;
> >>
> >>    Testing says, the more words I use, the faster the query is.  My
> >>original search word, 'oil', appears in 226,357 documents 233,266 times.
> >> As far as distinct words go, 'oil' is middle of the road for
> >>occurences.  As it is set up now, the best search time I am getting on
> >>this single word is roughly 22 seconds.
> >>
> >>
> >
> >Does this time (22 seconds) is still better than seq. scan (no index)
> >or standard 'LIKE' ?
> >
> >
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: tsearch2 poor performance

From
George Essig
Date:
Kris Kiger wrote:

> search_test=#  explain analyze SELECT count(q) FROM product,
> to_tsquery('oil') AS q  WHERE vector @@ q;
>                                                             QUERY PLAN
>

---------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=67847264.50..67847264.50 rows=1 width=32) (actual
> time=83311.552..83311.555 rows=1 loops=1)
>    ->  Nested Loop  (cost=12.50..67839764.50 rows=3000001 width=32)
> (actual time=0.204..81960.198 rows=226357 loops=1)
>          Join Filter: ("outer".vector @@ "inner".q)
>          ->  Seq Scan on product  (cost=0.00..339752.00 rows=3000000
> width=32) (actual time=0.100..27415.795 rows=3000000 loops=1)
>          ->  Materialize  (cost=12.50..22.50 rows=1000 width=32) (actual
> time=0.003..0.006 rows=1 loops=3000000)
>                ->  Function Scan on q  (cost=0.00..12.50 rows=1000
> width=32) (actual time=0.020..0.024 rows=1 loops=1)
>  Total runtime: 83311.735 ms
> (7 rows)

The explain analyze output doesn't show that a gist index on the vector column is being used.
This is because either you don't have an index defined and\or the query is causing a poor plan to
be chosen.  I've found that putting to_tsquery in the FROM clause does not execute fast.

Try rewriting the query as:

explain analyze SELECT count(to_tsquery('oil')) FROM product  WHERE vector @@ to_tsquery('oil');

or

explain analyze SELECT count(*) FROM product WHERE vector @@ to_tsquery('oil');

George Essig

Re: tsearch2 poor performance

From
Oleg Bartunov
Date:
Kris,

we're working on prototype of tsearchd - full text search daemon, which
maintain static inverted index outside of postgresql using the same
parser, dictionary tsearch2 does.  This approach could scale up
fts capability preserving access to metadata, so yo may have
"archive" part of your collection (tsearchd) and "online", which could be
searchable with tsearch2.

Here is what we have right now:

pages ( tid integer, fts_index  tsvector)

1. Create index
select count(tdindex(tid,fts_index)) from pages;
2. Flush index
select tdflush();
3. Search
select pages.tid, rank(fts_index,to_tsquery('supernovae & magellan')) as rank
from pages, tdsearch(to_tsquery('supernovae & magellan')) as idx where
tid=idx order by rank desc;

If it's possible, you could share your data, so we could test our
prototype on real data.


    Oleg

On Mon, 27 Sep 2004, Kris Kiger wrote:

> Yes, it is much better than no index of sequential scan.  We may just be
> looking at the best performance tsearch2 can offer on my machine.
>
> search_test=#  explain analyze SELECT count(q) FROM product,
> to_tsquery('oil') AS q  WHERE vector @@ q;
>                                                             QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=67847264.50..67847264.50 rows=1 width=32) (actual
> time=83311.552..83311.555 rows=1 loops=1)
>    ->  Nested Loop  (cost=12.50..67839764.50 rows=3000001 width=32)
> (actual time=0.204..81960.198 rows=226357 loops=1)
>          Join Filter: ("outer".vector @@ "inner".q)
>          ->  Seq Scan on product  (cost=0.00..339752.00 rows=3000000
> width=32) (actual time=0.100..27415.795 rows=3000000 loops=1)
>          ->  Materialize  (cost=12.50..22.50 rows=1000 width=32) (actual
> time=0.003..0.006 rows=1 loops=3000000)
>                ->  Function Scan on q  (cost=0.00..12.50 rows=1000
> width=32) (actual time=0.020..0.024 rows=1 loops=1)
>  Total runtime: 83311.735 ms
> (7 rows)
>
> search_test=# explain analyze select count(*) from product where
> description like '% oil %';
>                                                        QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=347264.01..347264.01 rows=1 width=0) (actual
> time=39858.350..39858.353 rows=1 loops=1)
>    ->  Seq Scan on product  (cost=0.00..347252.00 rows=4801 width=0)
> (actual time=0.100..38320.293 rows=226357 loops=1)
>          Filter: (description ~~ '% oil %'::text)
>  Total runtime: 39858.491 ms
>
>
> >>Oleg,
> >>
> >>    Thanks for the help on this.
> >>
> >>    The query I used to return the 508 number is:
> >>         SELECT * FROM stat('SELECT vector FROM product') ORDER BY ndoc
> >>desc, word ;
> >>
> >>    Testing says, the more words I use, the faster the query is.  My
> >>original search word, 'oil', appears in 226,357 documents 233,266 times.
> >> As far as distinct words go, 'oil' is middle of the road for
> >>occurences.  As it is set up now, the best search time I am getting on
> >>this single word is roughly 22 seconds.
> >>
> >>
> >
> >Does this time (22 seconds) is still better than seq. scan (no index)
> >or standard 'LIKE' ?
> >
> >
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: tsearch2 poor performance

From
Kris Kiger
Date:
Oleg, the data I have right now was generated using a random paragraph
generator.  The words are real words, but there are only 508 distinct
keywords in the 3,000,000 records that tsearch2 will pick up, using
default settings.  I was using this data set for the purpose of testing
tsearch2's capabilities, so it's not real world data.  If you still want
it, let me know where to send it and I will send you a dump of the DB.

Kris



Oleg Bartunov wrote:

>Kris,
>
>we're working on prototype of tsearchd - full text search daemon, which
>maintain static inverted index outside of postgresql using the same
>parser, dictionary tsearch2 does.  This approach could scale up
>fts capability preserving access to metadata, so yo may have
>"archive" part of your collection (tsearchd) and "online", which could be
>searchable with tsearch2.
>
>Here is what we have right now:
>
>pages ( tid integer, fts_index  tsvector)
>
>1. Create index
>select count(tdindex(tid,fts_index)) from pages;
>2. Flush index
>select tdflush();
>3. Search
>select pages.tid, rank(fts_index,to_tsquery('supernovae & magellan')) as rank
>from pages, tdsearch(to_tsquery('supernovae & magellan')) as idx where
>tid=idx order by rank desc;
>
>If it's possible, you could share your data, so we could test our
>prototype on real data.
>
>
>    Oleg
>
>


Re: tsearch2 poor performance

From
Oleg Bartunov
Date:
Kris,

is't possible to share your "paragraph generator" ? It'd be better than
downloading the whole dump :)

On Wed, 29 Sep 2004, Kris Kiger wrote:

> Oleg, the data I have right now was generated using a random paragraph
> generator.  The words are real words, but there are only 508 distinct
> keywords in the 3,000,000 records that tsearch2 will pick up, using
> default settings.  I was using this data set for the purpose of testing
> tsearch2's capabilities, so it's not real world data.  If you still want
> it, let me know where to send it and I will send you a dump of the DB.
>
> Kris
>
>
>
> Oleg Bartunov wrote:
>
> >Kris,
> >
> >we're working on prototype of tsearchd - full text search daemon, which
> >maintain static inverted index outside of postgresql using the same
> >parser, dictionary tsearch2 does.  This approach could scale up
> >fts capability preserving access to metadata, so yo may have
> >"archive" part of your collection (tsearchd) and "online", which could be
> >searchable with tsearch2.
> >
> >Here is what we have right now:
> >
> >pages ( tid integer, fts_index  tsvector)
> >
> >1. Create index
> >select count(tdindex(tid,fts_index)) from pages;
> >2. Flush index
> >select tdflush();
> >3. Search
> >select pages.tid, rank(fts_index,to_tsquery('supernovae & magellan')) as rank
> >from pages, tdsearch(to_tsquery('supernovae & magellan')) as idx where
> >tid=idx order by rank desc;
> >
> >If it's possible, you could share your data, so we could test our
> >prototype on real data.
> >
> >
> >    Oleg
> >
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: tsearch2 poor performance

From
Kris Kiger
Date:
Hey all, its me again.  If I do not do a count(product_id) on my
tsearch2 queries, its actually really fast, for example;

explain analyze SELECT product_id FROM product, to_tsquery('bear') AS q
WHERE vector @@ q LIMIT 1000;
                                                                QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2081.60 rows=1000 width=4) (actual
time=2.308..51.522 rows=1000 loops=1)
   ->  Nested Loop  (cost=0.00..6244798.55 rows=3000001 width=4) (actual
time=2.299..45.637 rows=1000 loops=1)
         ->  Function Scan on q  (cost=0.00..12.50 rows=1000 width=32)
(actual time=0.023..0.023 rows=1 loops=1)
         ->  Index Scan using vector_idx on product  (cost=0.00..6207.29
rows=3000 width=36) (actual time=2.253..37.946 rows=1000 loops=1)
               Index Cond: (product.vector @@ "outer".q)
               Filter: (product.vector @@ "outer".q)
Total runtime: 122.487 ms

explain analyze SELECT product_id FROM product, to_tsquery('complex') AS
q  WHERE vector @@ q LIMIT 1000;
                                                                 QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..2081.60 rows=1000 width=4) (actual
time=4.943..2325.949 rows=1000 loops=1)
   ->  Nested Loop  (cost=0.00..6244798.55 rows=3000001 width=4) (actual
time=4.933..2319.885 rows=1000 loops=1)
         ->  Function Scan on q  (cost=0.00..12.50 rows=1000 width=32)
(actual time=0.040..0.040 rows=1 loops=1)
         ->  Index Scan using vector_idx on product  (cost=0.00..6207.29
rows=3000 width=36) (actual time=4.868..2311.087 rows=1000 loops=1)
               Index Cond: (product.vector @@ "outer".q)
               Filter: (product.vector @@ "outer".q)
 Total runtime: 2329.389 ms

 From what I know, there is only one reason I can offer why a count
takes approximately 30~40 seconds longer on these same queries... that
is that count has to evaluate whether a value is null or not.  There
probably is a better reason, if anyone has any ideas, I would much
appreciate you sharing!  Also, why the big difference in query times in
the above?

bear appears 780963 times in 696668 documents
complex appears 468669 times in 440339 documents.

Again, thanks all!

Kris


slow count() was: tsearch2 poor performance

From
Oleg Bartunov
Date:
On Fri, 1 Oct 2004, Kris Kiger wrote:

> Hey all, its me again.  If I do not do a count(product_id) on my
> tsearch2 queries, its actually really fast, for example;
>

Hmm, I also really want to know  what's the difference ?
Postgresql 8.0beta3 on Linux 2.4.25

tsearchd=# explain analyze select body from txt where fts_index @@ to_tsquery('oil') limit 1000;
                                                           QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..4027.67 rows=1000 width=315) (actual time=0.053..14.662 rows=1000 loops=1)
   ->  Index Scan using fts_idx on txt  (cost=0.00..12083.02 rows=3000 width=315) (actual time=0.049..12.552 rows=1000
loops=1)
         Index Cond: (fts_index @@ '\'oil\''::tsquery)
 Total runtime: 15.848 ms
(4 rows)

tsearchd=# explain analyze select count(body) from txt where fts_index @@ to_tsquery('oil') limit 1000;

Didn't get result after 10 minutes :(




    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83