On Fri, 2009-10-23 at 07:18 +0200, Jesper Krogh wrote:
> This is indeed information on individual terms from the statistics that
> enable this.
My mistake, I didn't know it was that smart about it.
> > In effect, what you want are words that aren't searched (or stored) in
> > the index, but are included in the tsvector (so the RECHECK still
> > works). That sounds like it would solve your problem and it would reduce
> > index size, improve update performance, etc. I don't know how difficult
> > it would be to implement, but it sounds reasonable to me.
> That sounds like it could require an index rebuild if the distribution
> changes?
My thought was that the common words could be declared to be common the
same way stop words are. As long as words are only added to this list,
it should be OK.
> That would be another plan to pursue, but the MCV is allready there
The problem with MCVs is that the index search can never eliminate
documents because they don't contain a match, because it might contain a
match that was previously an MCV, but is no longer.
Also, MCVs are relatively few -- you only get ~1000 or so. There might
be a lot of common words you'd like to track.
Perhaps ANALYZE can automatically add the common words above some
frequency threshold to the list?
Regards,
Jeff Davis