Re: Mnogosearch (Was: Re: website doc search is ... ) - Mailing list pgsql-general

From Arjen van der Meijden
Subject Re: Mnogosearch (Was: Re: website doc search is ... )
Date
Msg-id 3FF41B99.6080301@vulcanus.its.tudelft.nl
Whole thread Raw
In response to Re: Mnogosearch (Was: Re: website doc search is ... )  ("Marc G. Fournier" <scrappy@postgresql.org>)
Responses Re: Mnogosearch (Was: Re: website doc search is ... )  ("Marc G. Fournier" <scrappy@postgresql.org>)
List pgsql-general
Marc G. Fournier wrote:
>
> Now, if I knock off the LIKE, so that I'm returning all rows from ndict8,
> join'd to all the URLs that contain them, you get:

Can't you build seperate databases for each domain you want to index?
Than you wouldn't need the like operator at all.

The like-operator doesn't seem to allow a very scalable production
environment. And besides that point, I don't really believe a "record
per word/document-couple" is very scalable (not in SQL, not anywhere).

Anyway, that doesn't help you much, perhaps decreasing the size of the
index-tables can help, are they with OIDs ? If so, wouldn't it help to
recreate them without, so you save yourselves 4 bytes per word-document
couple, therefore allowing it to fit in less pages and by that speeding
up the seqscans.

Are _all_ your queries with the like on the url? Wouldn't it help to
create an index on both the wordid and the urlid for ndict8?

Perhaps you can create your own 'host table' (which could be filled
using a trigger or a slightly adjusted indexer), and a foreign key from
your url table to that, so you can search on url.hostid = X (or a join
with that host table) instead of the like that is used now?

By the way, can a construction like (tablefield || '') ever use an index
in postgresql?

Best regards and good luck,

Arjen van der Meijden



pgsql-general by date:

Previous
From: Együd Csaba
Date:
Subject: Re: What does count(*) count?
Next
From: "Lee Harr"
Date:
Subject: Re: finding my schema (for sequences)