Re: Performance regression between 8.3 and 8.4 on heavy text indexing - Mailing list pgsql-performance

From gael@pilotsystems.net (Gaël Le Mignot)
Subject Re: Performance regression between 8.3 and 8.4 on heavy text indexing
Date
Msg-id plop87ocq1rpmp.fsf@aoskar.kilobug.org
Whole thread Raw
In response to Re: Performance regression between 8.3 and 8.4 on heavy text indexing  (Guillaume Smet <guillaume.smet@gmail.com>)
Responses Re: Performance regression between 8.3 and 8.4 on heavy text indexing
List pgsql-performance
Hello Guillaume!

Wed, 26 Aug 2009 23:59:25 +0200, you wrote:

 > On Wed, Aug 26, 2009 at 6:29 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote:
 >> gael@pilotsystems.net (=?iso-8859-1?Q?Ga=EBl?= Le Mignot) writes:
 >>> So it seems it was quite wrong about estimated matching rows (192 predicted, 10222 reals).
 >>
 >> Yup.  What's even more interesting is that it seems the real win would
 >> have been to use just the 'claude & duviau' condition (which apparently
 >> matched only 14 rows).  8.3 had no hope whatever of understanding that,
 >> it just got lucky.  8.4 should have figured it out, I'm thinking.
 >> Does it help if you increase the statistics target for fulltext_tsv?
 >> (Don't forget to re-ANALYZE after doing so.)

 > It could be interesting to run the query without the condition
 > (keywords_tsv @@ '''assassinat'''::tsquery) to see the estimate of
 > (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery) in 8.4.

Here it is ::

libebench=>  explain analyze SELECT classname, id FROM libeindex WHERE (classname = 'article' AND (source IN
('methode','nica')AND fulltext_tsv @@ to_tsquery('french', 'claude & duviau'))) ORDER BY publicationDate
DESC,pageNumberASC LIMIT 50; 
                                                                QUERY PLAN
                  

------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=12264.98..12265.11 rows=50 width=24) (actual time=3.799..3.825 rows=10 loops=1)
   ->  Sort  (cost=12264.98..12271.03 rows=2421 width=24) (actual time=3.794..3.802 rows=10 loops=1)
         Sort Key: publicationdate, pagenumber
         Sort Method:  quicksort  Memory: 25kB
         ->  Bitmap Heap Scan on libeindex  (cost=2363.10..12184.56 rows=2421 width=24) (actual time=3.579..3.693
rows=10loops=1) 
               Recheck Cond: (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery)
               Filter: (((source)::text = ANY ('{methode,nica}'::text[])) AND ((classname)::text = 'article'::text))
               ->  Bitmap Index Scan on fulltext_index  (cost=0.00..2362.49 rows=2877 width=0) (actual
time=3.499..3.499rows=14 loops=1) 
                     Index Cond: (fulltext_tsv @@ '''claud'' & ''duviau'''::tsquery)
 Total runtime: 166.772 ms
(10 rows)

So it estimates 2877 rows for that, while in reality it's 14.

 > Btw, what Tom means by increasing the statistics is executing the
 > following queries:
 > ALTER TABLE libeindex ALTER COLUMN fulltext_tsv SET STATISTICS 500;

Ok, I  did it for 500 also  on the keywords_tsv column,  which was the
other  contestor. Here  we have  a  clear improvement:  the search  in
keyword_tsv is now  estimated at 10398 (real being  10222) and the one
on fulltext_tsv at 1 (real being 14).

I did it at 1000 too, it's almost the same result.

By re-running  our sampling of 7334  queries on the  database with the
statistics at 1000  on both fulltext_tsv and keywords_tsv,  we do have
overall better results than with 8.3 ! So a greeat thanks to everyone.

The  weird thing  was  that with  the  default of  100 for  statistics
target, it was  worse than when we  moved back to 10. So  I didn't try
with 1000, but I should have.

I'll do  more tests and  keep the  list informed if  it can be  of any
help.

--
Gaël Le Mignot - gael@pilotsystems.net
Pilot Systems - 9, rue Desargues - 75011 Paris
Tel : +33 1 44 53 05 55 - www.pilotsystems.net
Gérez vos contacts et vos newsletters : www.cockpit-mailing.com

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: PostgreSQL does CAST implicitely between int and a domain derived from int
Next
From: Guillaume Smet
Date:
Subject: Re: Performance regression between 8.3 and 8.4 on heavy text indexing