Re: Mnogosearch (Was: Re: website doc search is ... ) - Mailing list pgsql-general

From Arjen van der Meijden
Subject Re: Mnogosearch (Was: Re: website doc search is ... )
Date
Msg-id 3FF471A9.8010806@vulcanus.its.tudelft.nl
Whole thread Raw
In response to Re: Mnogosearch (Was: Re: website doc search is ... )  ("Marc G. Fournier" <scrappy@postgresql.org>)
Responses Re: Mnogosearch (Was: Re: website doc search is ... )  ("Marc G. Fournier" <scrappy@postgresql.org>)
List pgsql-general
Marc G. Fournier wrote:

> On Thu, 1 Jan 2004, Arjen van der Meijden wrote:
>
>
>>Marc G. Fournier wrote:
>>
>>>Now, if I knock off the LIKE, so that I'm returning all rows from ndict8,
>>>join'd to all the URLs that contain them, you get:
>>
>>Can't you build seperate databases for each domain you want to index?
>>Than you wouldn't need the like operator at all.
>
>
> First off, that would make searching across multiple domains difficult,
> no?
If mnogosearch would allow searching in multiple databases; no. But it
doesn't seem to feature that and indeed; yes that might become a bit
difficult.
It was something I thought of because our solution allows it, but that
is no solution for you, I checked the mnogosearch features after sending
that email, instead of before. Perhaps I should've turned that around.

> Second, the LIKE is still required ... the LIKE allows the search to
> "group" URLs ... for instance, if I wanted to just search on the docs, the
> LIKE would look for all URLs that contain:
>
> http://www.postgresql.org/docs/%%
>
> whereas searching the whole site would be:
>
> http://www.postgresql.org/%%
That depends. If it were possible, you could decide from the search
usage stats to split /docs from the "the rest" of www.postgresql.org and
by that avoiding quite a bit of like's.

>>Anyway, that doesn't help you much, perhaps decreasing the size of the
>>index-tables can help, are they with OIDs ? If so, wouldn't it help to
>>recreate them without, so you save yourselves 4 bytes per word-document
>>couple, therefore allowing it to fit in less pages and by that speeding
>>up the seqscans.
>
>
> This one I hadn't thought about ... for some reason, I thought that
> WITHOUT OIDs was now the default ... looking at that one now ...
No, it's still the default to do it with oids.

>>By the way, can a construction like (tablefield || '') ever use an index
>>in postgresql?
>
>
> again, as shown in a previous email, the index is being used for the LIKE
> query ... the big problem as I see it is that the result set from the LIKE
> is ~20x larger then the result set for the the = ... if there was some way
> to telling the planner that going the LIKE route was the more expensive of
> the two (even though table size seems to indicate the other way around), I
> suspect that that would improve things also ...
Yeah, I noticed. Hopefully Tom's suggestion will work to achieve that.

I can imagine how you feel about all this, I had to do a similar job a
year ago, but was less restricted by a preference like the "it'd be a
nice postgresql showcase". But then again, our search engine is loaded
with an average of 24 queries per minute (peaking to over 100/m in the
afternoon and evenings) and we didn't have any working solution (not
even a slow one).

Good luck,

Arjen van der Meijden



pgsql-general by date:

Previous
From: "Marc G. Fournier"
Date:
Subject: Mnogosearch: Comparing PgSQL 7.4 to MySQL 4.1
Next
From: Tom Lane
Date:
Subject: Re: Mnogosearch (Was: Re: website doc search is ... )