Re: Slow search.. quite clueless - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: Slow search.. quite clueless |
Date | |
Msg-id | Pine.GSO.4.63.0509202114360.20320@ra.sai.msu.su Whole thread Raw |
In response to | Slow search.. quite clueless (Yonatan Ben-Nes <da@canaan.co.il>) |
Responses |
Re: Slow search.. quite clueless
Re: Slow search.. quite clueless Re: Slow search.. quite clueless |
List | pgsql-general |
contrib/tsearch2 ( http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ ) might works for you. It might because performance depends on cardinality of your keywords. Oleg On Tue, 20 Sep 2005, Yonatan Ben-Nes wrote: > Hi all, > > Im building a site where the users can search for products with up to 4 > diffrent keywords which all MUST match to each product which found as a > result to the search. > > I got 2 tables (which are relevant to the issue :)), one is the product table > (5 million rows) and the other is the keyword table which hold the keywords > of each product (60 million rows). > > The scheme of the tables is as follows: > > Table "public.product" > Column | Type | Modifiers > ----------------------------+---------------+--------------------- > product_id | text | not null > product_name | text | not null > retail_price | numeric(10,2) | not null > etc... > Indexes: > "product_product_id_key" UNIQUE, btree (product_id) > > Table "public.keyword" > Column | Type | Modifiers > -------------+---------------+----------- > product_id | text | not null > keyword | text | not null > Indexes: > "keyword_keyword" btree (keyword) > > The best query which I succeded to do till now is adding the keyword table > for each keyword searched for example if someone search for "belt" & "black" > & "pants" it will create the following query: > > poweraise.com=# EXPLAIN ANALYZE SELECT > product_id,product_name,product_image_url,short_product_description,long_product_description,discount,discount_type,sale_price,retail_price > FROM product INNER JOIN keyword t1 USING(product_id) INNER JOIN keyword t2 > USING(product_id) INNER JOIN keyword t3 USING(product_id) WHERE > t1.keyword='belt' AND t2.keyword='black' AND t3.keyword='pants' LIMIT 13; > > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=37734.15..39957.20 rows=13 width=578) (actual > time=969.798..1520.354 rows=6 loops=1) > -> Hash Join (cost=37734.15..3754162.82 rows=21733 width=578) (actual > time=969.794..1520.337 rows=6 loops=1) > Hash Cond: ("outer".product_id = "inner".product_id) > -> Nested Loop (cost=18867.07..2858707.34 rows=55309 width=612) > (actual time=82.266..1474.018 rows=156 loops=1) > -> Hash Join (cost=18867.07..2581181.09 rows=55309 width=34) > (actual time=82.170..1462.104 rows=156 loops=1) > Hash Cond: ("outer".product_id = "inner".product_id) > -> Index Scan using keyword_keyword on keyword t2 > (cost=0.00..331244.43 rows=140771 width=17) (actual time=0.033..1307.167 > rows=109007 loops=1) > Index Cond: (keyword = 'black'::text) > -> Hash (cost=18851.23..18851.23 rows=6337 width=17) > (actual time=16.145..16.145 rows=0 loops=1) > -> Index Scan using keyword_keyword on keyword t1 > (cost=0.00..18851.23 rows=6337 width=17) (actual time=0.067..11.050 rows=3294 > loops=1) > Index Cond: (keyword = 'belt'::text) > -> Index Scan using product_product_id_key on product > (cost=0.00..5.01 rows=1 width=578) (actual time=0.058..0.060 rows=1 > loops=156) > Index Cond: (product.product_id = "outer".product_id) > -> Hash (cost=18851.23..18851.23 rows=6337 width=17) (actual > time=42.863..42.863 rows=0 loops=1) > -> Index Scan using keyword_keyword on keyword t3 > (cost=0.00..18851.23 rows=6337 width=17) (actual time=0.073..36.120 rows=3932 > loops=1) > Index Cond: (keyword = 'pants'::text) > Total runtime: 1521.441 ms > (17 rows) > > Sometimes the query work fast even for 3 keywords but that doesnt help me if > at other times it take ages.... > > Now to find a result for 1 keyword its really flying so I also tried to make > 3 queries and do INTERSECT between them but it was found out to be extremly > slow... > > Whats make this query slow as far as I understand is all the merging between > the results of each table... I tried to divide the keyword table into lots of > keywords table which each hold keywords which start only with a specific > letter, it did improve the speeds but not in a real significant way.. tried > clusters,indexes,SET STATISTICS,WITHOUT OIDS on the keyword table and what > not.. im quite clueless... > > Actually I even started to look on other solutions and maybe you can say > something about them also.. maybe they can help me: > 1. Omega (From the Xapian project) - http://www.xapian.org/ > 2. mnoGoSearch - http://www.mnogosearch.org/doc.html > 3. Swish-e - http://swish-e.org/index.html > > To add on everything I want at the end to be able to ORDER BY the results > like order the product by price, but im less concerned about that cause I saw > that with cluster I can do it without any extra overhead. > > Thanks alot in advance, > Yonatan Ben-Nes > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
pgsql-general by date: