Re: Queryplan within FTS/GIN index -search. - Mailing list pgsql-performance

From Jeff Davis
Subject Re: Queryplan within FTS/GIN index -search.
Date
Msg-id 1256276358.2580.794.camel@jdavis
Whole thread Raw
In response to Re: Queryplan within FTS/GIN index -search.  (Jesper Krogh <jesper@krogh.cc>)
Responses Re: Queryplan within FTS/GIN index -search.  (jesper@krogh.cc)
List pgsql-performance
On Fri, 2009-10-23 at 07:18 +0200, Jesper Krogh wrote:
> This is indeed information on individual terms from the statistics that
> enable this.

My mistake, I didn't know it was that smart about it.

> > In effect, what you want are words that aren't searched (or stored) in
> > the index, but are included in the tsvector (so the RECHECK still
> > works). That sounds like it would solve your problem and it would reduce
> > index size, improve update performance, etc. I don't know how difficult
> > it would be to implement, but it sounds reasonable to me.


> That sounds like it could require an index rebuild if the distribution
> changes?

My thought was that the common words could be declared to be common the
same way stop words are. As long as words are only added to this list,
it should be OK.

> That would be another plan to pursue, but the MCV is allready there

The problem with MCVs is that the index search can never eliminate
documents because they don't contain a match, because it might contain a
match that was previously an MCV, but is no longer.

Also, MCVs are relatively few -- you only get ~1000 or so. There might
be a lot of common words you'd like to track.

Perhaps ANALYZE can automatically add the common words above some
frequency threshold to the list?

Regards,
    Jeff Davis


pgsql-performance by date:

Previous
From: Jesper Krogh
Date:
Subject: Re: Queryplan within FTS/GIN index -search.
Next
From: Scara Maccai
Date:
Subject: Re: Table Clustering & Time Range Queries