Thread: Tsearch2 performance on big database
Hi, I'm looking for a *fast* solution to search thru ~ 4 million records of book descriptions. I've installed PostgreSQL 8.0.1 on a dual opteron server with 8G of memory, running Linux 2.6. I haven't done a lot of tuning on PostgreSQL itself, but here's the settings I have changed so far: shared_buffers = 2000 (anything much bigger says the kernel doesnt allow it, still have to look into that) effective_cache_size = 32768 Here's my table: ilab=# \d books Table "public.books" Column | Type | Modifiers ---------------+------------------------+---------------------------------------------------------- recordnumber | integer | not null default nextval('books_recordnumber_seq'::text) membernumber | integer | not null default 0 booknumber | character varying(20) | not null default ''::character varying author | character varying(60) | not null default ''::character varying titel | text | not null description | character varying(100) | not null default ''::character varying descriprest | text | not null price | bigint | not null default 0::bigint keywords | character varying(100) | not null default ''::character varying dollarprice | bigint | not null default 0::bigint countrynumber | smallint | not null default 0::smallint entrydate | date | not null status | smallint | not null default 0::smallint recordtype | smallint | not null default 0::smallint bookflags | smallint | not null default 0::smallint year | smallint | not null default 0::smallint firstedition | smallint | not null default 0::smallint dustwrapper | smallint | not null default 0::smallint signed | smallint | not null default 0::smallint cover | smallint | not null default 0::smallint specialfield | smallint | not null default 0::smallint idxfti | tsvector | Indexes: "recordnumber_idx" unique, btree (recordnumber) "idxfti_idx" gist (idxfti) idxfti is a tsvector of concatenated description and descriprest. ilab=# select avg(character_length(description)),avg(character_length(descriprest)) from books; avg | avg ---------------------+---------------------- 89.1596992873947218 | 133.0468689304200538 Queries take forever to run. Right now we run a MySQL server, on which we maintain our own indices (we split the description fields by word and have different tables for words and the bookdescriptions they appear in). For example, a query for the word 'terminology' on our MySQL search takes 5.8 seconds and returns 375 results. The same query on postgresql using the tsearch2 index takes 30802.105 ms and returns 298 results. How do I speed this up? Should I change settings, add or change indexes or.. what? Rick Jansen -- Systems Administrator for Rockingstone IT http://www.rockingstone.com http://www.megabooksearch.com - Search many book listing sites at once
On Tue, 22 Mar 2005, Rick Jansen wrote: > Hi, > > I'm looking for a *fast* solution to search thru ~ 4 million records of book > descriptions. I've installed PostgreSQL 8.0.1 on a dual opteron server with > 8G of memory, running Linux 2.6. I haven't done a lot of tuning on PostgreSQL > itself, but here's the settings I have changed so far: > > shared_buffers = 2000 (anything much bigger says the kernel doesnt allow it, > still have to look into that) use something like echo "150000000" > /proc/sys/kernel/shmmax to increase shared memory. In your case you could dedicate much more memory. Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
On Tue, 22 Mar 2005 15:36:11 +0300 (MSK), Oleg Bartunov <oleg@sai.msu.su> wrote: > On Tue, 22 Mar 2005, Rick Jansen wrote: > > > Hi, > > > > I'm looking for a *fast* solution to search thru ~ 4 million records of book > > descriptions. I've installed PostgreSQL 8.0.1 on a dual opteron server with > > 8G of memory, running Linux 2.6. I haven't done a lot of tuning on PostgreSQL > > itself, but here's the settings I have changed so far: > > > > shared_buffers = 2000 (anything much bigger says the kernel doesnt allow it, > > still have to look into that) > > use something like > echo "150000000" > /proc/sys/kernel/shmmax > to increase shared memory. In your case you could dedicate much more > memory. > > Regards, > Oleg And Oleg should know. Unless I'm mistaken, he (co)wrote tsearch2. Other than shared buffers, I can't imagine what could be causing that kind of slowness. EXPLAIN ANALYZE, please? As an example of what I think you *should* be seeing, I have a similar box (4 procs, but that doesn't matter for one query) and I can search a column with tens of millions of rows in around a second. -- Mike Rylander mrylander@gmail.com GPLS -- PINES Development Database Developer http://open-ils.org
On Tue, 22 Mar 2005, Mike Rylander wrote: > > And Oleg should know. Unless I'm mistaken, he (co)wrote tsearch2. You're not mistaken :) > Other than shared buffers, I can't imagine what could be causing that > kind of slowness. EXPLAIN ANALYZE, please? > tsearch2 config's also are very important. I've seen a lot of mistakes in configs ! Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Mike Rylander wrote: > On Tue, 22 Mar 2005 15:36:11 +0300 (MSK), Oleg Bartunov <oleg@sai.msu.su> wrote: >> >>use something like >>echo "150000000" > /proc/sys/kernel/shmmax >>to increase shared memory. In your case you could dedicate much more >>memory. >> >> Regards, >> Oleg Thanks, I'll check that out. > And Oleg should know. Unless I'm mistaken, he (co)wrote tsearch2. > Other than shared buffers, I can't imagine what could be causing that > kind of slowness. EXPLAIN ANALYZE, please? > ilab=# explain analyze select count(titel) from books where idxfti @@ to_tsquery('default', 'buckingham | palace'); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=35547.99..35547.99 rows=1 width=56) (actual time=125968.119..125968.120 rows=1 loops=1) -> Index Scan using idxfti_idx on books (cost=0.00..35525.81 rows=8869 width=56) (actual time=0.394..125958.245 rows=3080 loops=1) Index Cond: (idxfti @@ '\'buckingham\' | \'palac\''::tsquery) Total runtime: 125968.212 ms (4 rows) Time: 125969.264 ms ilab=# > As an example of what I think you *should* be seeing, I have a similar > box (4 procs, but that doesn't matter for one query) and I can search > a column with tens of millions of rows in around a second. > That sounds very promising, I'd love to get those results.. could you tell me what your settings are, howmuch memory you have and such? Thanks. Rick -- Systems Administrator for Rockingstone IT http://www.rockingstone.com http://www.megabooksearch.com - Search many book listing sites at once
On Tue, 22 Mar 2005 14:25:19 +0100, Rick Jansen <rick@rockingstone.nl> wrote: > > ilab=# explain analyze select count(titel) from books where idxfti @@ > to_tsquery('default', 'buckingham | palace'); > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=35547.99..35547.99 rows=1 width=56) (actual > time=125968.119..125968.120 rows=1 loops=1) > -> Index Scan using idxfti_idx on books (cost=0.00..35525.81 > rows=8869 width=56) (actual time=0.394..125958.245 rows=3080 loops=1) > Index Cond: (idxfti @@ '\'buckingham\' | \'palac\''::tsquery) > Total runtime: 125968.212 ms > (4 rows) > > Time: 125969.264 ms > ilab=# Ahh... I should have qualified my claim. I am creating a google-esqe search interface and almost every query uses '&' as the term joiner. 'AND' queries and one-term queries are orders of magnitude faster than 'OR' queries, and fortunately are the expected default for most users. (Think, "I typed in these words, therefore I want to match these words"...) An interesting test may be to time multiple queries independently, one for each search term, and see if the combined cost is less than a single 'OR' search. If so, you could use UNION to join the results. However, the example you originally gave ('terminology') should be very fast. On a comparable query ("select count(value) from metabib.full_rec where index_vector @@ to_tsquery('default','jane');") I get 12ms. Oleg, do you see anything else on the surface here? Try: EXPLAIN ANALYZE SELECT titel FROM books WHERE idxfti @@ to_tsquery('default', 'buckingham') UNION SELECT titel FROM books WHERE idxfti @@ to_tsquery('default', 'palace'); and see if using '&' instead of '|' where you can helps out. I imagine you'd be surprised by the speed of: SELECT titel FROM books WHERE idxfti @@ to_tsquery('default', 'buckingham&palace'); > > > As an example of what I think you *should* be seeing, I have a similar > > box (4 procs, but that doesn't matter for one query) and I can search > > a column with tens of millions of rows in around a second. > > > > That sounds very promising, I'd love to get those results.. could you > tell me what your settings are, howmuch memory you have and such? 16G of RAM on a dedicated machine. shared_buffers = 15000 # min 16, at least max_connections*2, 8KB each work_mem = 10240 # min 64, size in KB maintenance_work_mem = 1000000 # min 1024, size in KB # big m_w_m for loading data... random_page_cost = 2.5 # units are one sequential page fetch cost # fast drives, and tons of RAM -- Mike Rylander mrylander@gmail.com GPLS -- PINES Development Database Developer http://open-ils.org
Mike, no comments before Rick post tsearch configs and increased buffers ! Union shouldn't be faster than (term1|term2). tsearch2 internals description might help you understanding tsearch2 limitations. See http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals Also, don't miss my notes: http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes Oleg On Tue, 22 Mar 2005, Mike Rylander wrote: > On Tue, 22 Mar 2005 14:25:19 +0100, Rick Jansen <rick@rockingstone.nl> wrote: >> >> ilab=# explain analyze select count(titel) from books where idxfti @@ >> to_tsquery('default', 'buckingham | palace'); >> QUERY PLAN >> ---------------------------------------------------------------------------------------------------------------------------------------- >> Aggregate (cost=35547.99..35547.99 rows=1 width=56) (actual >> time=125968.119..125968.120 rows=1 loops=1) >> -> Index Scan using idxfti_idx on books (cost=0.00..35525.81 >> rows=8869 width=56) (actual time=0.394..125958.245 rows=3080 loops=1) >> Index Cond: (idxfti @@ '\'buckingham\' | \'palac\''::tsquery) >> Total runtime: 125968.212 ms >> (4 rows) >> >> Time: 125969.264 ms >> ilab=# > > Ahh... I should have qualified my claim. I am creating a google-esqe > search interface and almost every query uses '&' as the term joiner. > 'AND' queries and one-term queries are orders of magnitude faster than > 'OR' queries, and fortunately are the expected default for most users. > (Think, "I typed in these words, therefore I want to match these > words"...) An interesting test may be to time multiple queries > independently, one for each search term, and see if the combined cost > is less than a single 'OR' search. If so, you could use UNION to join > the results. > > However, the example you originally gave ('terminology') should be > very fast. On a comparable query ("select count(value) from > metabib.full_rec where index_vector @@ to_tsquery('default','jane');") > I get 12ms. > > Oleg, do you see anything else on the surface here? > > Try: > > EXPLAIN ANALYZE > SELECT titel FROM books WHERE idxfti @@ > to_tsquery('default', 'buckingham') > UNION > SELECT titel FROM books WHERE idxfti @@ > to_tsquery('default', 'palace'); > > and see if using '&' instead of '|' where you can helps out. I > imagine you'd be surprised by the speed of: > > SELECT titel FROM books WHERE idxfti @@ > to_tsquery('default', 'buckingham&palace'); > > >> >> > As an example of what I think you *should* be seeing, I have a similar >> > box (4 procs, but that doesn't matter for one query) and I can search >> > a column with tens of millions of rows in around a second. >> > >> >> That sounds very promising, I'd love to get those results.. could you >> tell me what your settings are, howmuch memory you have and such? > > 16G of RAM on a dedicated machine. > > > shared_buffers = 15000 # min 16, at least max_connections*2, 8KB each > work_mem = 10240 # min 64, size in KB > maintenance_work_mem = 1000000 # min 1024, size in KB > # big m_w_m for loading data... > > random_page_cost = 2.5 # units are one sequential page fetch cost > # fast drives, and tons of RAM > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov wrote: > Mike, > > no comments before Rick post tsearch configs and increased buffers ! > Union shouldn't be faster than (term1|term2). > tsearch2 internals description might help you understanding tsearch2 > limitations. > See http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals > Also, don't miss my notes: > http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes > > Oleg Thanks Oleg, i've seen those pages before :) I've set shared_buffers to 45000 now (yes thats probably very much, isn't it?) and it already seems a lot quicker. How do I find out what my tsearch config is? I followed the intro (http://www.sai.msu.su/~megera/oddmuse/index.cgi/tsearch-v2-intro) and applied it to our books table, thats all, didnt change anything else about configs. > On Tue, 22 Mar 2005, Mike Rylander wrote: >> Ahh... I should have qualified my claim. I am creating a google-esqe >> search interface and almost every query uses '&' as the term joiner. >> 'AND' queries and one-term queries are orders of magnitude faster than >> 'OR' queries, and fortunately are the expected default for most users. >> (Think, "I typed in these words, therefore I want to match these >> words"...) An interesting test may be to time multiple queries >> independently, one for each search term, and see if the combined cost >> is less than a single 'OR' search. If so, you could use UNION to join >> the results. Well I just asked my colleges and OR queries arent used by us anyway, so I'll test for AND queries instead. >> However, the example you originally gave ('terminology') should be >> very fast. On a comparable query ("select count(value) from >> metabib.full_rec where index_vector @@ to_tsquery('default','jane');") >> I get 12ms. ilab=# select count(*) from books where idxfti @@ to_tsquery('default', 'jane'); count ------- 4093 (1 row) Time: 217395.820 ms :( ilab=# explain analyze select count(*) from books where idxfti @@ to_tsquery('default', 'jane'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=16591.95..16591.95 rows=1 width=0) (actual time=4634.931..4634.932 rows=1 loops=1) -> Index Scan using idxfti_idx on books (cost=0.00..16581.69 rows=4102 width=0) (actual time=0.395..4631.454 rows=4093 loops=1) Index Cond: (idxfti @@ '\'jane\''::tsquery) Total runtime: 4635.023 ms (4 rows) Time: 4636.028 ms ilab=# >> 16G of RAM on a dedicated machine. >> >> >> shared_buffers = 15000 # min 16, at least max_connections*2, >> 8KB each >> work_mem = 10240 # min 64, size in KB >> maintenance_work_mem = 1000000 # min 1024, size in KB >> # big m_w_m for loading data... >> >> random_page_cost = 2.5 # units are one sequential page fetch >> cost >> # fast drives, and tons of RAM >> Right.. well I'll try copying these settings, see how that works out, thanks :) Rick -- Systems Administrator for Rockingstone IT http://www.rockingstone.com http://www.megabooksearch.com - Search many book listing sites at once
On Tue, 22 Mar 2005, Rick Jansen wrote: > Oleg Bartunov wrote: >> Mike, >> >> no comments before Rick post tsearch configs and increased buffers ! >> Union shouldn't be faster than (term1|term2). >> tsearch2 internals description might help you understanding tsearch2 >> limitations. >> See http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals >> Also, don't miss my notes: >> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes >> >> Oleg > > Thanks Oleg, i've seen those pages before :) I've set shared_buffers to 45000 > now (yes thats probably very much, isn't it?) and it already seems a lot > quicker. > > How do I find out what my tsearch config is? I followed the intro > (http://www.sai.msu.su/~megera/oddmuse/index.cgi/tsearch-v2-intro) and > applied it to our books table, thats all, didnt change anything else about > configs. Hmm, default configuration is too eager, you index every lexem using simple dictionary) ! Probably, it's too much. Here is what I have for my russian configuration in dictionary database: default_russian | lword | {en_ispell,en_stem} default_russian | lpart_hword | {en_ispell,en_stem} default_russian | lhword | {en_ispell,en_stem} default_russian | nlword | {ru_ispell,ru_stem} default_russian | nlpart_hword | {ru_ispell,ru_stem} default_russian | nlhword | {ru_ispell,ru_stem} Notice, I index only russian and english words, no numbers, url, etc. You may just delete unwanted rows in pg_ts_cfgmap for your configuration, but I'd recommend just update them setting dict_name to NULL. For example, to not indexing integers: update pg_ts_cfgmap set dict_name=NULL where ts_name='default_russian' and tok_alias='int'; voc=# select token,dict_name,tok_type,tsvector from ts_debug('Do you have +70000 bucks'); token | dict_name | tok_type | tsvector --------+---------------------+----------+---------- Do | {en_ispell,en_stem} | lword | you | {en_ispell,en_stem} | lword | have | {en_ispell,en_stem} | lword | +70000 | | int | bucks | {en_ispell,en_stem} | lword | 'buck' Only 'bucks' gets indexed :) Hmm, probably I should add this into documentation. What about word statistics (# of unique words, for example). Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov wrote: > On Tue, 22 Mar 2005, Rick Jansen wrote: > > Hmm, default configuration is too eager, you index every lexem using > simple dictionary) ! Probably, it's too much. Here is what I have for my > russian configuration in dictionary database: > > default_russian | lword | {en_ispell,en_stem} > default_russian | lpart_hword | {en_ispell,en_stem} > default_russian | lhword | {en_ispell,en_stem} > default_russian | nlword | {ru_ispell,ru_stem} > default_russian | nlpart_hword | {ru_ispell,ru_stem} > default_russian | nlhword | {ru_ispell,ru_stem} > > Notice, I index only russian and english words, no numbers, url, etc. > You may just delete unwanted rows in pg_ts_cfgmap for your configuration, > but I'd recommend just update them setting dict_name to NULL. > For example, to not indexing integers: > > update pg_ts_cfgmap set dict_name=NULL where ts_name='default_russian' > and tok_alias='int'; > > voc=# select token,dict_name,tok_type,tsvector from ts_debug('Do you > have +70000 bucks'); > token | dict_name | tok_type | tsvector > --------+---------------------+----------+---------- > Do | {en_ispell,en_stem} | lword | > you | {en_ispell,en_stem} | lword | > have | {en_ispell,en_stem} | lword | > +70000 | | int | > bucks | {en_ispell,en_stem} | lword | 'buck' > > Only 'bucks' gets indexed :) > Hmm, probably I should add this into documentation. > > What about word statistics (# of unique words, for example). > I'm now following the guide to add the ispell dictionary and I've updated most of the rows setting dict_name to NULL: ts_name | tok_alias | dict_name -----------------+--------------+----------- default | lword | {en_stem} default | nlword | {simple} default | word | {simple} default | part_hword | {simple} default | nlpart_hword | {simple} default | lpart_hword | {en_stem} default | hword | {simple} default | lhword | {en_stem} default | nlhword | {simple} These are left, but I have no idea what a 'hword' or 'nlhword' or any other of these tokens are. Anyway, how do I find out the number of unique words or other word statistics? Rick -- Systems Administrator for Rockingstone IT http://www.rockingstone.com http://www.megabooksearch.com - Search many book listing sites at once
On Wed, 23 Mar 2005, Rick Jansen wrote: > Oleg Bartunov wrote: >> On Tue, 22 Mar 2005, Rick Jansen wrote: >> >> Hmm, default configuration is too eager, you index every lexem using simple >> dictionary) ! Probably, it's too much. Here is what I have for my russian >> configuration in dictionary database: >> >> default_russian | lword | {en_ispell,en_stem} >> default_russian | lpart_hword | {en_ispell,en_stem} >> default_russian | lhword | {en_ispell,en_stem} >> default_russian | nlword | {ru_ispell,ru_stem} >> default_russian | nlpart_hword | {ru_ispell,ru_stem} >> default_russian | nlhword | {ru_ispell,ru_stem} >> >> Notice, I index only russian and english words, no numbers, url, etc. >> You may just delete unwanted rows in pg_ts_cfgmap for your configuration, >> but I'd recommend just update them setting dict_name to NULL. >> For example, to not indexing integers: >> >> update pg_ts_cfgmap set dict_name=NULL where ts_name='default_russian' and >> tok_alias='int'; >> >> voc=# select token,dict_name,tok_type,tsvector from ts_debug('Do you have >> +70000 bucks'); >> token | dict_name | tok_type | tsvector >> --------+---------------------+----------+---------- >> Do | {en_ispell,en_stem} | lword | >> you | {en_ispell,en_stem} | lword | >> have | {en_ispell,en_stem} | lword | >> +70000 | | int | >> bucks | {en_ispell,en_stem} | lword | 'buck' >> >> Only 'bucks' gets indexed :) >> Hmm, probably I should add this into documentation. >> >> What about word statistics (# of unique words, for example). >> > > I'm now following the guide to add the ispell dictionary and I've updated > most of the rows setting dict_name to NULL: > > ts_name | tok_alias | dict_name > -----------------+--------------+----------- > default | lword | {en_stem} > default | nlword | {simple} > default | word | {simple} > default | part_hword | {simple} > default | nlpart_hword | {simple} > default | lpart_hword | {en_stem} > default | hword | {simple} > default | lhword | {en_stem} > default | nlhword | {simple} > > These are left, but I have no idea what a 'hword' or 'nlhword' or any other > of these tokens are. from my notes http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes I've asked how to know token types supported by parser. Actually, there is function token_type(parser), so you just use: select * from token_type(); > > Anyway, how do I find out the number of unique words or other word > statistics? from my notes http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes It's usefull to see words statistics, for example, to check how good your dictionaries work or how did you configure pg_ts_cfgmap. Also, you may notice probable stop words relevant for your collection. Tsearch provides stat() function: ....................... Don't hesitate to read it and if you find some bugs or know better wording I'd be glad to improve my notes. > > Rick > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov wrote: > from my notes > http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes > > It's usefull to see words statistics, for example, to check how good > your dictionaries work or how did you configure pg_ts_cfgmap. Also, you > may notice probable stop words relevant for your collection. Tsearch > provides stat() function: > > ....................... > > Don't hesitate to read it and if you find some bugs or know better wording > I'd be glad to improve my notes. > Thanks, but that stat() query takes way too long.. I let it run for like 4 hours and still nothing. The database I am testing tsearch2 on is also the production database (mysql) server so I have to be careful not to use too many resources :o Anyway, here's my pg_ts_cfgmap now (well the relevant bits): default_english | lhword | {en_ispell,en_stem} default_english | lpart_hword | {en_ispell,en_stem} default_english | lword | {en_ispell,en_stem} Is it normal that queries for single words (or perhaps they are words that are common) take a really long time? Like this: ilab=# explain analyze select count(*) from books where description_fti @@ to_tsquery('default', 'hispanic'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=20369.81..20369.81 rows=1 width=0) (actual time=261512.031..261512.031 rows=1 loops=1) -> Index Scan using idxfti_idx on books (cost=0.00..20349.70 rows=8041 width=0) (actual time=45777.760..261509.288 rows=674 loops=1) Index Cond: (description_fti @@ '\'hispan\''::tsquery) Total runtime: 261518.529 ms (4 rows) ilab=# explain analyze select titel from books where description_fti @@ to_tsquery('default', 'buckingham & palace'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Index Scan using idxfti_idx on books (cost=0.00..20349.70 rows=8041 width=57) (actual time=18992.045..48863.385 rows=185 loops=1) Index Cond: (description_fti @@ '\'buckingham\' & \'palac\''::tsquery) Total runtime: 48863.874 ms (3 rows) I dont know what happened, these queries were a lot faster 2 days ago..what the feck is going on?! Rick -- Systems Administrator for Rockingstone IT http://www.rockingstone.com http://www.megabooksearch.com - Search many book listing sites at once
On Thu, 24 Mar 2005, Rick Jansen wrote: > Oleg Bartunov wrote: >> from my notes >> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes >> >> It's usefull to see words statistics, for example, to check how good >> your dictionaries work or how did you configure pg_ts_cfgmap. Also, you >> may notice probable stop words relevant for your collection. Tsearch >> provides stat() function: >> >> ....................... >> >> Don't hesitate to read it and if you find some bugs or know better wording >> I'd be glad to improve my notes. >> > > Thanks, but that stat() query takes way too long.. I let it run for like > 4 hours and still nothing. The database I am testing tsearch2 on is also > the production database (mysql) server so I have to be careful not to > use too many resources :o stat() is indeed a bigdog, it was designed for developers needs, so we recommend to save results in table. > > Anyway, here's my pg_ts_cfgmap now (well the relevant bits): > > default_english | lhword | {en_ispell,en_stem} > default_english | lpart_hword | {en_ispell,en_stem} > default_english | lword | {en_ispell,en_stem} > > Is it normal that queries for single words (or perhaps they are words > that are common) take a really long time? Like this: > 'hispanic' isn't common, I see you get only 674 rows and 'buckingham & palace' returns 185 rows. Did you run 'vacuum analyze' ? I see a big discrepancy between estimated rows (8041) and actual rows. > ilab=# explain analyze select count(*) from books where description_fti @@ > to_tsquery('default', 'hispanic'); > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------ > Aggregate (cost=20369.81..20369.81 rows=1 width=0) (actual > time=261512.031..261512.031 rows=1 loops=1) > -> Index Scan using idxfti_idx on books (cost=0.00..20349.70 rows=8041 > width=0) (actual time=45777.760..261509.288 rows=674 loops=1) > Index Cond: (description_fti @@ '\'hispan\''::tsquery) > Total runtime: 261518.529 ms > (4 rows) > > ilab=# explain analyze select titel from books where description_fti @@ > to_tsquery('default', 'buckingham & palace'); > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------ > Index Scan using idxfti_idx on books (cost=0.00..20349.70 rows=8041 > width=57) (actual time=18992.045..48863.385 rows=185 loops=1) > Index Cond: (description_fti @@ '\'buckingham\' & \'palac\''::tsquery) > Total runtime: 48863.874 ms > (3 rows) > > > I dont know what happened, these queries were a lot faster 2 days ago..what > the feck is going on?! > > Rick > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov wrote: > > stat() is indeed a bigdog, it was designed for developers needs, > so we recommend to save results in table. > >> >> Anyway, here's my pg_ts_cfgmap now (well the relevant bits): >> >> default_english | lhword | {en_ispell,en_stem} >> default_english | lpart_hword | {en_ispell,en_stem} >> default_english | lword | {en_ispell,en_stem} >> >> Is it normal that queries for single words (or perhaps they are words >> that are common) take a really long time? Like this: >> > > 'hispanic' isn't common, I see you get only 674 rows and 'buckingham & > palace' returns 185 rows. Did you run 'vacuum analyze' ? > I see a big discrepancy between estimated rows (8041) and actual rows. > > Yes, I did a vacuum analyze right before executing these queries. I'm going to recreate the gist index now, and do a vacuum full analyze after that.. see if that makes a difference. Rick -- Systems Administrator for Rockingstone IT http://www.rockingstone.com http://www.megabooksearch.com - Search many book listing sites at once