Thread: Performance issue using Tsearch2
Hi, I'm having a performance problem on a request using Tsearch2: the request strangely takes several minutes. I've tried to follow Tsearch tuning recommendations, I've searched through the archives, but I can't seem to find a solution to solve my problem. The ts_vector field was created using dictionnary fr_ispell only on types lword, lpart_hword and lhword. An index was created on this field. According to the stat() function, there are only 42,590 word stems indexed. I also did a VACUUM FULL ANALYZE. Here's the result of EXPLAIN ANALYZE on a filtered version of my request (the non-filtered version takes so long I usually cancel it): ************************************************************************** explain analyze SELECT idstruct, headline(zonetext, q), rank(zoneindex_test, q) FROM tab_ocr, tab_chemin, to_tsquery('partir') AS q WHERE tab_chemin.chemin like '%;2;%' AND tab_ocr.idstruct = tab_chemin.label AND zoneindex_test @@ q ORDER BY rank(zoneindex_test, q) DESC; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=2345.54..2345.58 rows=16 width=308) (actual time=270638.774..270643.142 rows=7106 loops=1) Sort Key: rank(tab_ocr.zoneindex_test, q.q) -> Nested Loop (cost=80.04..2345.22 rows=16 width=308) (actual time=40886.553..270619.730 rows=7106 loops=1) -> Nested Loop (cost=80.04..1465.76 rows=392 width=308) (actual time=38209.193..173932.313 rows=272414 loops=1) -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual time=0.006..0.007 rows=1 loops=1) -> Bitmap Heap Scan on tab_ocr (cost=80.04..1460.85 rows=392 width=276) (actual time=38209.180..173507.052 rows=272414 loops=1) Filter: (tab_ocr.zoneindex_test @@ q.q) -> Bitmap Index Scan on zoneindex_test_idx (cost=0.00..79.94 rows=392 width=0) (actual time=38204.261..38204.261 rows=283606 loops=1) Index Cond: (tab_ocr.zoneindex_test @@ q.q) -> Index Scan using tab_chemin_label_index on tab_chemin (cost=0.00..2.23 rows=1 width=4) (actual time=0.036..0.036 rows=0 loops=272414) Index Cond: (tab_ocr.idstruct = tab_chemin.label) Filter: ((chemin)::text ~~ '%;2;%'::text) Total runtime: 270647.946 ms ************************************************************************** Could someone help me analyze this problem? I don't manage to see if the problem comes from bad tsearch tuning, postgresql configuration, or something else... Thanks.
On 2008-02-05 Viviane Lestic wrote: > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------- > Sort (cost=2345.54..2345.58 rows=16 width=308) (actual > time=270638.774..270643.142 rows=7106 loops=1) > Sort Key: rank(tab_ocr.zoneindex_test, q.q) > -> Nested Loop (cost=80.04..2345.22 rows=16 width=308) (actual > time=40886.553..270619.730 rows=7106 loops=1) > -> Nested Loop (cost=80.04..1465.76 rows=392 width=308) (actual > time=38209.193..173932.313 rows=272414 loops=1) > -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual > time=0.006..0.007 rows=1 loops=1) > -> Bitmap Heap Scan on tab_ocr (cost=80.04..1460.85 rows=392 > width=276) (actual time=38209.180..173507.052 rows=272414 loops=1) > Filter: (tab_ocr.zoneindex_test @@ q.q) > -> Bitmap Index Scan on zoneindex_test_idx (cost=0.00..79.94 rows=392 > width=0) (actual time=38204.261..38204.261 rows=283606 loops=1) > Index Cond: (tab_ocr.zoneindex_test @@ q.q) > -> Index Scan using tab_chemin_label_index on tab_chemin > (cost=0.00..2.23 rows=1 width=4) (actual time=0.036..0.036 rows=0 > loops=272414) > Index Cond: (tab_ocr.idstruct = tab_chemin.label) > Filter: ((chemin)::text ~~ '%;2;%'::text) > Total runtime: 270647.946 ms > ************************************************************************** > > Could someone help me analyze this problem? Your planner estimates are way off. Try increasing the statistics target for the columns used in this query and re-analyze the tables after doing so. Regards Ansgar Wiechers -- "The Mac OS X kernel should never panic because, when it does, it seriously inconveniences the user." --http://developer.apple.com/technotes/tn2004/tn2118.html
On Feb 5, 2008 12:47 PM, Viviane Lestic <vlestic@gmail.com> wrote: > Could someone help me analyze this problem? > I don't manage to see if the problem comes from bad tsearch tuning, > postgresql configuration, or something else... Can you try to replace zoneindex_test @@ q with zoneindex_test @@ to_tsquery('partir')? Increasing the statistics for zoneindex_test may be a good idea too (see ALTER TABLE ... ALTER COLUMN doc). I'm surprised you have the word "partir" in so many documents? Do you use real data? -- Guillaume
2008/2/5, Ansgar -59cobalt- Wiechers wrote: > Your planner estimates are way off. Try increasing the statistics target > for the columns used in this query and re-analyze the tables after doing > so. I first set STATISTICS to 1000 for column zoneindex_test and saw no significant improvement (with a vacuum full analyze in between). Then I set default_statistics_target to 1000: there is now an improvement, but the overall time is still way too long... (and the estimated costs didn't change...) Here are the results with default_statistics_target set to 1000: explain analyze SELECT idstruct, headline(zonetext, q), rank(zoneindex_test, q) FROM tab_ocr, tab_chemin, to_tsquery('partir') AS q WHERE tab_chemin.chemin like '%;2;%' AND tab_ocr.idstruct = tab_chemin.label AND zoneindex_test @@ q ORDER BY rank(zoneindex_test, q) DESC; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=2345.30..2345.32 rows=8 width=327) (actual time=229913.715..229918.172 rows=7106 loops=1) Sort Key: rank(tab_ocr.zoneindex_test, q.q) -> Nested Loop (cost=80.04..2345.18 rows=8 width=327) (actual time=28159.626..229892.957 rows=7106 loops=1) -> Nested Loop (cost=80.04..1465.76 rows=392 width=327) (actual time=26084.558..130979.395 rows=272414 loops=1) -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual time=0.006..0.007 rows=1 loops=1) -> Bitmap Heap Scan on tab_ocr (cost=80.04..1460.85 rows=392 width=295) (actual time=26084.544..130562.220 rows=272414 loops=1) Filter: (tab_ocr.zoneindex_test @@ q.q) -> Bitmap Index Scan on zoneindex_test_idx (cost=0.00..79.94 rows=392 width=0) (actual time=26073.315..26073.315 rows=283606 loops=1) Index Cond: (tab_ocr.zoneindex_test @@ q.q) -> Index Scan using tab_chemin_label_index on tab_chemin (cost=0.00..2.23 rows=1 width=4) (actual time=0.040..0.040 rows=0 loops=272414) Index Cond: (tab_ocr.idstruct = tab_chemin.label) Filter: ((chemin)::text ~~ '%;2;%'::text) Total runtime: 229922.864 ms 2008/2/5, Guillaume Smet wrote: > Can you try to replace zoneindex_test @@ q with zoneindex_test @@ > to_tsquery('partir')? The improvement seems negligible (with default_statistics_target back to 10, its default value): explain analyze SELECT idstruct, headline(zonetext, q), rank(zoneindex_test, q) FROM tab_ocr, tab_chemin, to_tsquery('partir') AS q WHERE tab_chemin.chemin like '%;2;%' AND tab_ocr.idstruct = tab_chemin.label AND zoneindex_test @@ to_tsquery('partir') ORDER BY rank(zoneindex_test, q) DESC; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=4358.91..4358.95 rows=16 width=308) (actual time=266489.667..266494.132 rows=7106 loops=1) Sort Key: rank(tab_ocr.zoneindex_test, q.q) -> Nested Loop (cost=80.04..4358.59 rows=16 width=308) (actual time=42245.881..266469.644 rows=7106 loops=1) -> Function Scan on q (cost=0.00..0.01 rows=1 width=32) (actual time=0.007..0.008 rows=1 loops=1) -> Nested Loop (cost=80.04..4358.34 rows=16 width=276) (actual time=42239.570..178496.761 rows=7106 loops=1) -> Bitmap Heap Scan on tab_ocr (cost=80.04..1461.83 rows=392 width=276) (actual time=38317.423..174188.779 rows=272414 loops=1) Filter: (zoneindex_test @@ '''partir'''::tsquery) -> Bitmap Index Scan on zoneindex_test_idx (cost=0.00..79.94 rows=392 width=0) (actual time=38289.289..38289.289 rows=283606 loops=1) Index Cond: (zoneindex_test @@ '''partir'''::tsquery) -> Index Scan using tab_chemin_label_index on tab_chemin (cost=0.00..7.38 rows=1 width=4) (actual time=0.014..0.014 rows=0 loops=272414) Index Cond: (tab_ocr.idstruct = tab_chemin.label) Filter: ((chemin)::text ~~ '%;2;%'::text) Total runtime: 266498.704 ms > Increasing the statistics for zoneindex_test may > be a good idea too (see ALTER TABLE ... ALTER COLUMN doc). I posted the results above. > I'm surprised you have the word "partir" in so many documents? Do you > use real data? I'm using real data. The indexed documents are extracted from newspapers, and "partir" (and its derivates) is quite a common verb in the French language, so I'm not that surprised to see it show up in many documents.