Re: Mnogosearch (Was: Re: website doc search is ... ) - Mailing list pgsql-general

From Tom Lane
Subject Re: Mnogosearch (Was: Re: website doc search is ... )
Date
Msg-id 691.1072996030@sss.pgh.pa.us
Whole thread Raw
In response to Re: Mnogosearch (Was: Re: website doc search is ... )  ("Marc G. Fournier" <scrappy@postgresql.org>)
Responses Re: Mnogosearch (Was: Re: website doc search is ... )
Re: Mnogosearch (Was: Re: website doc search is ... )
List pgsql-general
"Marc G. Fournier" <scrappy@postgresql.org> writes:
> On Thu, 1 Jan 2004, Tom Lane wrote:
>> "Marc G. Fournier" <scrappy@postgresql.org> writes:
>>> what sort of impact does CLUSTER have on the system?  For instance, an
>>> index happens nightly, so I'm guessing that I'll have to CLUSTER each
>>> right after?
>>
>> Depends; what does the "index" process do --- are ndict8 and friends
>> rebuilt from scratch?

> nope, but heavily updated ... basically, the indexer looks at url for what
> urls need to be 're-indexed' ... if it does, it removed all words from the
> ndict# tables that belong to that url, and re-adds accordingly ...

Hmm, but in practice only a small fraction of the pages on the site
change in any given day, no?  I'd think the typical nightly run changes
only a small fraction of the entries in the tables, if it is smart
enough not to re-index pages that did not change.

My guess is that it'd be enough to re-cluster once a week or so.

But this is pointless speculation until we find out whether clustering
helps enough to make it worth maintaining clustered-ness at all.  Did
you get any results yet?

            regards, tom lane

pgsql-general by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: why the need for is null?
Next
From: "Marc G. Fournier"
Date:
Subject: Re: Mnogosearch (Was: Re: website doc search is ... )