Re: Mnogosearch (Was: Re: website doc search is ... ) - Mailing list pgsql-general

From Marc G. Fournier
Subject Re: Mnogosearch (Was: Re: website doc search is ... )
Date
Msg-id 20040101133802.H913@ganymede.hub.org
Whole thread Raw
In response to Re: Mnogosearch (Was: Re: website doc search is ... )  (Arjen van der Meijden <acmmailing@vulcanus.its.tudelft.nl>)
Responses Re: Mnogosearch (Was: Re: website doc search is ... )
List pgsql-general
On Thu, 1 Jan 2004, Arjen van der Meijden wrote:

> Marc G. Fournier wrote:
> >
> > Now, if I knock off the LIKE, so that I'm returning all rows from ndict8,
> > join'd to all the URLs that contain them, you get:
>
> Can't you build seperate databases for each domain you want to index?
> Than you wouldn't need the like operator at all.

First off, that would make searching across multiple domains difficult,
no?

Second, the LIKE is still required ... the LIKE allows the search to
"group" URLs ... for instance, if I wanted to just search on the docs, the
LIKE would look for all URLs that contain:

http://www.postgresql.org/docs/%%

whereas searching the whole site would be:

http://www.postgresql.org/%%

> Anyway, that doesn't help you much, perhaps decreasing the size of the
> index-tables can help, are they with OIDs ? If so, wouldn't it help to
> recreate them without, so you save yourselves 4 bytes per word-document
> couple, therefore allowing it to fit in less pages and by that speeding
> up the seqscans.

This one I hadn't thought about ... for some reason, I thought that
WITHOUT OIDs was now the default ... looking at that one now ...

> Are _all_ your queries with the like on the url? Wouldn't it help to
> create an index on both the wordid and the urlid for ndict8?

as mentioned in a previous email, the schema for ndict8 is:

186_archives=# \d ndict8
         Table "public.ndict8"
 Column  |  Type   |     Modifiers
---------+---------+--------------------
 url_id  | integer | not null default 0
 word_id | integer | not null default 0
 intag   | integer | not null default 0
Indexes:
    "n8_url" btree (url_id)
    "n8_word" btree (word_id)

> By the way, can a construction like (tablefield || '') ever use an index
> in postgresql?

again, as shown in a previous email, the index is being used for the LIKE
query ... the big problem as I see it is that the result set from the LIKE
is ~20x larger then the result set for the the = ... if there was some way
to telling the planner that going the LIKE route was the more expensive of
the two (even though table size seems to indicate the other way around), I
suspect that that would improve things also ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: speeding up inserts
Next
From: Casey Allen Shobe
Date:
Subject: Re: Is my MySQL Gaining ?