Re: Mnogosearch (Was: Re: website doc search is ... ) - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: Mnogosearch (Was: Re: website doc search is ... ) |
Date | |
Msg-id | Pine.GSO.4.58.0401031751460.11643@ra.sai.msu.su Whole thread Raw |
In response to | Re: Mnogosearch (Was: Re: website doc search is ... ) ("Marc G. Fournier" <scrappy@postgresql.org>) |
Responses |
Re: Mnogosearch (Was: Re: website doc search is ... )
|
List | pgsql-general |
On Thu, 1 Jan 2004, Marc G. Fournier wrote: > On Thu, 1 Jan 2004, Bruce Momjian wrote: > > > Marc G. Fournier wrote: > > > 186_archives=# \d ndict7 > > > Table "public.ndict7" > > > Column | Type | Modifiers > > > ---------+---------+-------------------- > > > url_id | integer | not null default 0 > > > word_id | integer | not null default 0 > > > intag | integer | not null default 0 > > > Indexes: > > > "n7_url" btree (url_id) > > > "n7_word" btree (word_id) > > > > > > > > > The slowdown is the LIKE condition, as the ndict[78] word_id conditions > > > return near instantly when run individually, and when I run the 'url/LIKE' > > > condition, it takes "forever" ... > > > > Does it help to CLUSTER url.url? Is your data being loaded in so > > identical values used by LIKE are next to each other? > > Just tried CLUSTER, and no difference, but ... chat'd with Dave on ICQ > this evening, and was thinking of something ... and it comes back to > something that I mentioned awhile back ... > > Taking the ndict8 query that I originally presented, now post CLUSTER, and > an explain analyze looks like: > > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------- > Hash Join (cost=13918.23..26550.58 rows=17 width=8) (actual time=4053.403..83481.769 rows=13415 loops=1) > Hash Cond: ("outer".url_id = "inner".rec_id) > -> Index Scan using n8_word on ndict8 (cost=0.00..12616.09 rows=3219 width=8) (actual time=113.645..79163.431 rows=15533loops=1) > Index Cond: (word_id = 417851441) > -> Hash (cost=13913.31..13913.31 rows=1968 width=4) (actual time=3920.597..3920.597 rows=0 loops=1) > -> Seq Scan on url (cost=0.00..13913.31 rows=1968 width=4) (actual time=3.837..2377.853 rows=304811 loops=1) > Filter: ((url || ''::text) ~~ 'http://archives.postgresql.org/%%'::text) > Total runtime: 83578.572 ms > (8 rows) > > Now, if I knock off the LIKE, so that I'm returning all rows from ndict8, > join'd to all the URLs that contain them, you get: > > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------- > Nested Loop (cost=0.00..30183.13 rows=3219 width=8) (actual time=0.299..1217.116 rows=15533 loops=1) > -> Index Scan using n8_word on ndict8 (cost=0.00..12616.09 rows=3219 width=8) (actual time=0.144..458.891 rows=15533loops=1) > Index Cond: (word_id = 417851441) > -> Index Scan using url_rec_id on url (cost=0.00..5.44 rows=1 width=4) (actual time=0.024..0.029 rows=1 loops=15533) > Index Cond: (url.rec_id = "outer".url_id) > Total runtime: 1286.647 ms > (6 rows) > > So, there are 15333 URLs that contain that word ... now, what I want to > find out is how many of those 15333 URLs contain > 'http://archives.postgresql.org/%%', which is 13415 ... what's the need for such query ? Are you trying to restrict search to archives ? Why not just have site attribute for document and use simple join ? > > The problem is that right now, we look at the LIKE first, giving us ~300k > rows, and then search through those for those who have the word matching > ... is there some way of reducing the priority of the LIKE part of the > query, as far as the planner is concerned, so that it will "resolve" the = > first, and then work the LIKE on the resultant set, instead of the other > way around? So that the query is only checking 15k records for the 13k > that match, instead of searching through 300k? > > I'm guessing that the reason that the LIKE is taking precidence(sp?) is > because the URL table has less rows in it then ndict8? > > ---- > Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) > Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664 > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
pgsql-general by date: