On Wed, Oct 07, 2015 at 03:06:50PM -0400, Stephen Frost wrote: > * Nathan Wagner (nw+pg@hydaspes.if.org) wrote: > > I have added full text searching to my tracker. I only index the first > > 50 KB of each message. There's apparently a one MB limit on that > > anyway, which a few messages exceed. I figure anything important is > > probably in the first 50KB. I could be wrong. I could re-index fairly > > easily. It seems to work pretty well.
we have a patch, which eliminates 1MB limit, will be published soon.
> > Note that we have FTS for the -bugs, and all the other, mailing lists..
True, but that finds emails. The search I have finds bugs (well, bug reports anyway). Specifically, I have the following function:
create or replace function bugvector(bugid bigint) returns tsvector language 'sql' as $$ select tsvagg( setweight(to_tsvector(substr(body(msg), 1, 50*1024)), 'D') || setweight(to_tsvector(header_value(msg, 'Subject')), 'C') ) from emails where bug = $1 $$ strict;
which, as you can see, collects into one tsvector all the emails associated with that particular bug. So a search hit is for the whole bug. There's probably some search artifacts here. I suspect a bug with a long email thread will be ranked higher than a one with a short thread. Perhaps that's ok though.
it's possible to write bugs specific parser for fts. Also, order results by date submitted, so we always will have originated message first.