Re: Mnogosearch (Was: Re: website doc search is ... ) - Mailing list pgsql-general
From | Marc G. Fournier |
---|---|
Subject | Re: Mnogosearch (Was: Re: website doc search is ... ) |
Date | |
Msg-id | 20040101152310.E913@ganymede.hub.org Whole thread Raw |
In response to | Re: Mnogosearch (Was: Re: website doc search is ... ) (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Mnogosearch (Was: Re: website doc search is ... )
(Tom Lane <tgl@sss.pgh.pa.us>)
|
List | pgsql-general |
On Thu, 1 Jan 2004, Tom Lane wrote: > "Marc G. Fournier" <scrappy@postgresql.org> writes: > > I don't see a difference between the two, other then time changes, but > > that could just be that runA had a server a bit more idle then runB ... > > something I'm not seeing here? > > Well, the difference I was hoping for was a more accurate rows estimate > for the indexscan, which indeed we got (estimate went from 3210 to > 16075, vs reality of 15533). But it didn't change the plan :-(. > > Looking more closely, I see the rows estimate for the seqscan on "url" > is pretty awful too (1968 vs reality of 304811). I think it would get > better if you were doing just > AND (url.url LIKE 'http://archives.postgresql.org/%%'); > without the concatenation of an empty string. Is there a reason for the > concatenation part of the expression? Believe it or not, the concatenation was based on a discussion *way* back (2 years, maybe?) when we first started using Mnogosearch, in which you suggested going that route ... in fact, at the time (bear in mind, this is back in 7.2 days), it actually sped things up ... Ok, with statistics set to 10, we now have: QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..31672.49 rows=1927 width=8) (actual time=117.064..54476.806 rows=13415 loops=1) -> Index Scan using n8_word on ndict8 (cost=0.00..12567.73 rows=3210 width=8) (actual time=80.230..47844.752 rows=15533loops=1) Index Cond: (word_id = 417851441) -> Index Scan using url_rec_id on url (cost=0.00..5.94 rows=1 width=4) (actual time=0.392..0.398 rows=1 loops=15533) Index Cond: (url.rec_id = "outer".url_id) Filter: (url ~~ 'http://archives.postgresql.org/%%'::text) Total runtime: 54555.011 ms (7 rows) And, at 1000 (and appropriate vacuum analyze on ndict8): QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Merge Join (cost=91613.33..92959.41 rows=9026 width=8) (actual time=12834.316..16726.018 rows=13415 loops=1) Merge Cond: ("outer".url_id = "inner".rec_id) -> Sort (cost=59770.57..59808.18 rows=15043 width=8) (actual time=776.823..849.798 rows=15533 loops=1) Sort Key: ndict8.url_id -> Index Scan using n8_word on ndict8 (cost=0.00..58726.82 rows=15043 width=8) (actual time=0.296..680.139 rows=15533loops=1) Index Cond: (word_id = 417851441) -> Sort (cost=31842.76..32433.09 rows=236133 width=4) (actual time=12056.594..14159.852 rows=311731 loops=1) Sort Key: url.rec_id -> Index Scan using url_url on url (cost=0.00..10768.79 rows=236133 width=4) (actual time=225.243..8353.024 rows=304811loops=1) Index Cond: ((url >= 'http://archives.postgresql.org/'::text) AND (url < 'http://archives.postgresql.org0'::text)) Filter: (url ~~ 'http://archives.postgresql.org/%%'::text) Total runtime: 16796.932 ms (12 rows) Closer to what you were looking/hoping for? Second run, @1000, shows: Total runtime: 12194.016 ms (12 rows) Second run, after knocking her back down to 10, shows: Total runtime: 58119.150 ms (7 rows) so we're definitely improved ... if this is the kinda results you were hoping to see, then I guess next step would be to increase/reanalyze all the word_id columns ... what about the url.url column? should that be done as well? what does that setting affect, *just* the time it takes to analyze the table? from the verbose output, it looks like it is scanning more rows on an analyze then @ 10 ... is this something that can be set database wide, before loading data? and/or something that the default is currently just too low? ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
pgsql-general by date: