Re: Mnogosearch (Was: Re: website doc search is ... ) - Mailing list pgsql-general
From | Arjen van der Meijden |
---|---|
Subject | Re: Mnogosearch (Was: Re: website doc search is ... ) |
Date | |
Msg-id | 3FF471A9.8010806@vulcanus.its.tudelft.nl Whole thread Raw |
In response to | Re: Mnogosearch (Was: Re: website doc search is ... ) ("Marc G. Fournier" <scrappy@postgresql.org>) |
Responses |
Re: Mnogosearch (Was: Re: website doc search is ... )
|
List | pgsql-general |
Marc G. Fournier wrote: > On Thu, 1 Jan 2004, Arjen van der Meijden wrote: > > >>Marc G. Fournier wrote: >> >>>Now, if I knock off the LIKE, so that I'm returning all rows from ndict8, >>>join'd to all the URLs that contain them, you get: >> >>Can't you build seperate databases for each domain you want to index? >>Than you wouldn't need the like operator at all. > > > First off, that would make searching across multiple domains difficult, > no? If mnogosearch would allow searching in multiple databases; no. But it doesn't seem to feature that and indeed; yes that might become a bit difficult. It was something I thought of because our solution allows it, but that is no solution for you, I checked the mnogosearch features after sending that email, instead of before. Perhaps I should've turned that around. > Second, the LIKE is still required ... the LIKE allows the search to > "group" URLs ... for instance, if I wanted to just search on the docs, the > LIKE would look for all URLs that contain: > > http://www.postgresql.org/docs/%% > > whereas searching the whole site would be: > > http://www.postgresql.org/%% That depends. If it were possible, you could decide from the search usage stats to split /docs from the "the rest" of www.postgresql.org and by that avoiding quite a bit of like's. >>Anyway, that doesn't help you much, perhaps decreasing the size of the >>index-tables can help, are they with OIDs ? If so, wouldn't it help to >>recreate them without, so you save yourselves 4 bytes per word-document >>couple, therefore allowing it to fit in less pages and by that speeding >>up the seqscans. > > > This one I hadn't thought about ... for some reason, I thought that > WITHOUT OIDs was now the default ... looking at that one now ... No, it's still the default to do it with oids. >>By the way, can a construction like (tablefield || '') ever use an index >>in postgresql? > > > again, as shown in a previous email, the index is being used for the LIKE > query ... the big problem as I see it is that the result set from the LIKE > is ~20x larger then the result set for the the = ... if there was some way > to telling the planner that going the LIKE route was the more expensive of > the two (even though table size seems to indicate the other way around), I > suspect that that would improve things also ... Yeah, I noticed. Hopefully Tom's suggestion will work to achieve that. I can imagine how you feel about all this, I had to do a similar job a year ago, but was less restricted by a preference like the "it'd be a nice postgresql showcase". But then again, our search engine is loaded with an average of 24 queries per minute (peaking to over 100/m in the afternoon and evenings) and we didn't have any working solution (not even a slow one). Good luck, Arjen van der Meijden
pgsql-general by date: