Slow search.. quite clueless - Mailing list pgsql-general
From | Yonatan Ben-Nes |
---|---|
Subject | Slow search.. quite clueless |
Date | |
Msg-id | 43304A17.3020400@canaan.co.il Whole thread Raw |
Responses |
Re: Slow search.. quite clueless
Re: Slow search.. quite clueless |
List | pgsql-general |
Hi all, Im building a site where the users can search for products with up to 4 diffrent keywords which all MUST match to each product which found as a result to the search. I got 2 tables (which are relevant to the issue :)), one is the product table (5 million rows) and the other is the keyword table which hold the keywords of each product (60 million rows). The scheme of the tables is as follows: Table "public.product" Column | Type | Modifiers ----------------------------+---------------+--------------------- product_id | text | not null product_name | text | not null retail_price | numeric(10,2) | not null etc... Indexes: "product_product_id_key" UNIQUE, btree (product_id) Table "public.keyword" Column | Type | Modifiers -------------+---------------+----------- product_id | text | not null keyword | text | not null Indexes: "keyword_keyword" btree (keyword) The best query which I succeded to do till now is adding the keyword table for each keyword searched for example if someone search for "belt" & "black" & "pants" it will create the following query: poweraise.com=# EXPLAIN ANALYZE SELECT product_id,product_name,product_image_url,short_product_description,long_product_description,discount,discount_type,sale_price,retail_price FROM product INNER JOIN keyword t1 USING(product_id) INNER JOIN keyword t2 USING(product_id) INNER JOIN keyword t3 USING(product_id) WHERE t1.keyword='belt' AND t2.keyword='black' AND t3.keyword='pants' LIMIT 13; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=37734.15..39957.20 rows=13 width=578) (actual time=969.798..1520.354 rows=6 loops=1) -> Hash Join (cost=37734.15..3754162.82 rows=21733 width=578) (actual time=969.794..1520.337 rows=6 loops=1) Hash Cond: ("outer".product_id = "inner".product_id) -> Nested Loop (cost=18867.07..2858707.34 rows=55309 width=612) (actual time=82.266..1474.018 rows=156 loops=1) -> Hash Join (cost=18867.07..2581181.09 rows=55309 width=34) (actual time=82.170..1462.104 rows=156 loops=1) Hash Cond: ("outer".product_id = "inner".product_id) -> Index Scan using keyword_keyword on keyword t2 (cost=0.00..331244.43 rows=140771 width=17) (actual time=0.033..1307.167 rows=109007 loops=1) Index Cond: (keyword = 'black'::text) -> Hash (cost=18851.23..18851.23 rows=6337 width=17) (actual time=16.145..16.145 rows=0 loops=1) -> Index Scan using keyword_keyword on keyword t1 (cost=0.00..18851.23 rows=6337 width=17) (actual time=0.067..11.050 rows=3294 loops=1) Index Cond: (keyword = 'belt'::text) -> Index Scan using product_product_id_key on product (cost=0.00..5.01 rows=1 width=578) (actual time=0.058..0.060 rows=1 loops=156) Index Cond: (product.product_id = "outer".product_id) -> Hash (cost=18851.23..18851.23 rows=6337 width=17) (actual time=42.863..42.863 rows=0 loops=1) -> Index Scan using keyword_keyword on keyword t3 (cost=0.00..18851.23 rows=6337 width=17) (actual time=0.073..36.120 rows=3932 loops=1) Index Cond: (keyword = 'pants'::text) Total runtime: 1521.441 ms (17 rows) Sometimes the query work fast even for 3 keywords but that doesnt help me if at other times it take ages.... Now to find a result for 1 keyword its really flying so I also tried to make 3 queries and do INTERSECT between them but it was found out to be extremly slow... Whats make this query slow as far as I understand is all the merging between the results of each table... I tried to divide the keyword table into lots of keywords table which each hold keywords which start only with a specific letter, it did improve the speeds but not in a real significant way.. tried clusters,indexes,SET STATISTICS,WITHOUT OIDS on the keyword table and what not.. im quite clueless... Actually I even started to look on other solutions and maybe you can say something about them also.. maybe they can help me: 1. Omega (From the Xapian project) - http://www.xapian.org/ 2. mnoGoSearch - http://www.mnogosearch.org/doc.html 3. Swish-e - http://swish-e.org/index.html To add on everything I want at the end to be able to ORDER BY the results like order the product by price, but im less concerned about that cause I saw that with cluster I can do it without any extra overhead. Thanks alot in advance, Yonatan Ben-Nes
pgsql-general by date: