Re: Slow search.. quite clueless - Mailing list pgsql-general
From | Yonatan Ben-Nes |
---|---|
Subject | Re: Slow search.. quite clueless |
Date | |
Msg-id | 4331C174.3090207@canaan.co.il Whole thread Raw |
In response to | Re: Slow search.. quite clueless (Oleg Bartunov <oleg@sai.msu.su>) |
List | pgsql-general |
Oleg Bartunov wrote: > On Tue, 20 Sep 2005, Philip Hallstrom wrote: > >>> contrib/tsearch2 ( >>> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ ) >>> might works for you. It might because performance depends on >>> cardinality of your keywords. >> >> >> Seconded. We use tsearch2 to earch about 40,000 rows containing >> manufacturer, brand, and product name and it returns a result almost >> instantly. Before when we did normal SQL "manufacture LIKE ..., etc." >> it would take 20-30 seconds. >> >> One thing to check is the english.stop file which contains words to >> skip (i, a, the, etc.). In our case we removed almost all of them >> since one of our products is "7 up" (the drink) and it would remove >> "up". Made it really hard to pull up 7 up in the results :) > > > we have "rewriting query support ( thesauri search)" in our todo > (http://www.sai.msu.su/~megera/oddmuse/index.cgi/todo). > > >> >> -philip >> >>> >>> Oleg >>> On Tue, 20 Sep 2005, Yonatan Ben-Nes wrote: >>> >>>> Hi all, >>>> >>>> Im building a site where the users can search for products with up >>>> to 4 diffrent keywords which all MUST match to each product which >>>> found as a result to the search. >>>> >>>> I got 2 tables (which are relevant to the issue :)), one is the >>>> product table (5 million rows) and the other is the keyword table >>>> which hold the keywords of each product (60 million rows). >>>> >>>> The scheme of the tables is as follows: >>>> >>>> Table "public.product" >>>> Column | Type | Modifiers >>>> ----------------------------+---------------+--------------------- >>>> product_id | text | not null >>>> product_name | text | not null >>>> retail_price | numeric(10,2) | not null >>>> etc... >>>> Indexes: >>>> "product_product_id_key" UNIQUE, btree (product_id) >>>> >>>> Table "public.keyword" >>>> Column | Type | Modifiers >>>> -------------+---------------+----------- >>>> product_id | text | not null >>>> keyword | text | not null >>>> Indexes: >>>> "keyword_keyword" btree (keyword) >>>> >>>> The best query which I succeded to do till now is adding the keyword >>>> table for each keyword searched for example if someone search for >>>> "belt" & "black" & "pants" it will create the following query: >>>> >>>> poweraise.com=# EXPLAIN ANALYZE SELECT >>>> product_id,product_name,product_image_url,short_product_description,long_product_description,discount,discount_type,sale_price,retail_price >>>> FROM product INNER JOIN keyword t1 USING(product_id) INNER JOIN >>>> keyword t2 USING(product_id) INNER JOIN keyword t3 USING(product_id) >>>> WHERE t1.keyword='belt' AND t2.keyword='black' AND >>>> t3.keyword='pants' LIMIT 13; >>>> >>>> QUERY PLAN >>>> >>>> >>>> ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- >>>> >>>> Limit (cost=37734.15..39957.20 rows=13 width=578) (actual >>>> time=969.798..1520.354 rows=6 loops=1) >>>> -> Hash Join (cost=37734.15..3754162.82 rows=21733 width=578) >>>> (actual time=969.794..1520.337 rows=6 loops=1) >>>> Hash Cond: ("outer".product_id = "inner".product_id) >>>> -> Nested Loop (cost=18867.07..2858707.34 rows=55309 >>>> width=612) (actual time=82.266..1474.018 rows=156 loops=1) >>>> -> Hash Join (cost=18867.07..2581181.09 rows=55309 >>>> width=34) (actual time=82.170..1462.104 rows=156 loops=1) >>>> Hash Cond: ("outer".product_id = >>>> "inner".product_id) >>>> -> Index Scan using keyword_keyword on keyword >>>> t2 (cost=0.00..331244.43 rows=140771 width=17) (actual >>>> time=0.033..1307.167 rows=109007 loops=1) >>>> Index Cond: (keyword = 'black'::text) >>>> -> Hash (cost=18851.23..18851.23 rows=6337 >>>> width=17) (actual time=16.145..16.145 rows=0 loops=1) >>>> -> Index Scan using keyword_keyword on >>>> keyword t1 (cost=0.00..18851.23 rows=6337 width=17) (actual >>>> time=0.067..11.050 rows=3294 loops=1) >>>> Index Cond: (keyword = 'belt'::text) >>>> -> Index Scan using product_product_id_key on product >>>> (cost=0.00..5.01 rows=1 width=578) (actual time=0.058..0.060 rows=1 >>>> loops=156) >>>> Index Cond: (product.product_id = >>>> "outer".product_id) >>>> -> Hash (cost=18851.23..18851.23 rows=6337 width=17) >>>> (actual time=42.863..42.863 rows=0 loops=1) >>>> -> Index Scan using keyword_keyword on keyword t3 >>>> (cost=0.00..18851.23 rows=6337 width=17) (actual time=0.073..36.120 >>>> rows=3932 loops=1) >>>> Index Cond: (keyword = 'pants'::text) >>>> Total runtime: 1521.441 ms >>>> (17 rows) >>>> >>>> Sometimes the query work fast even for 3 keywords but that doesnt >>>> help me if at other times it take ages.... >>>> >>>> Now to find a result for 1 keyword its really flying so I also tried >>>> to make 3 queries and do INTERSECT between them but it was found out >>>> to be extremly slow... >>>> >>>> Whats make this query slow as far as I understand is all the merging >>>> between the results of each table... I tried to divide the keyword >>>> table into lots of keywords table which each hold keywords which >>>> start only with a specific letter, it did improve the speeds but not >>>> in a real significant way.. tried clusters,indexes,SET >>>> STATISTICS,WITHOUT OIDS on the keyword table and what not.. im quite >>>> clueless... >>>> >>>> Actually I even started to look on other solutions and maybe you can >>>> say something about them also.. maybe they can help me: >>>> 1. Omega (From the Xapian project) - http://www.xapian.org/ >>>> 2. mnoGoSearch - http://www.mnogosearch.org/doc.html >>>> 3. Swish-e - http://swish-e.org/index.html >>>> >>>> To add on everything I want at the end to be able to ORDER BY the >>>> results like order the product by price, but im less concerned about >>>> that cause I saw that with cluster I can do it without any extra >>>> overhead. >>>> >>>> Thanks alot in advance, >>>> Yonatan Ben-Nes >>>> >>>> >>>> ---------------------------(end of >>>> broadcast)--------------------------- >>>> TIP 1: if posting/reading through Usenet, please send an appropriate >>>> subscribe-nomail command to majordomo@postgresql.org so that your >>>> message can get through to the mailing list cleanly >>>> >>> >>> Regards, >>> Oleg >>> _____________________________________________________________ >>> Oleg Bartunov, sci.researcher, hostmaster of AstroNet, >>> Sternberg Astronomical Institute, Moscow University (Russia) >>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >>> phone: +007(095)939-16-83, +007(095)939-23-83 >>> >>> ---------------------------(end of broadcast)--------------------------- >>> TIP 9: In versions below 8.0, the planner will ignore your desire to >>> choose an index scan if your joining column's datatypes do not >>> match >>> >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 Hi all, First of all sorry for the delay we had a problem with out mail server... The tsearch2 looks really promising, im starting to work with it now and ill report what ill find. And to Alex thanks but I tried already all of the things you recommended and sadly it didnt help. Thanks alot for the help everyone! Yonatan Ben-Nes
pgsql-general by date: