Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit - Mailing list pgsql-patches

From Tom Lane
Subject Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit
Date
Msg-id 7543.1204872744@sss.pgh.pa.us
Whole thread Raw
In response to Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit  (Euler Taveira de Oliveira <euler@timbira.com>)
Responses Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit  (Euler Taveira de Oliveira <euler@timbira.com>)
List pgsql-patches
Euler Taveira de Oliveira <euler@timbira.com> writes:
> The problem with this approach is how to select the part of the document
> to index. How will you ensure you're not ignoring the more important
> words of the document?

That's *always* a risk, anytime you do any sort of processing or
normalization on the text.  The question here is not whether or not
we will make tradeoffs, only which ones to make.

> IMHO Postgres shouldn't decide it; it would be good if an user could set
> it runtime and/or on postgresql.conf.

Well, there is exactly zero chance of that happening in 8.3.x, because
the bit allocations for on-disk tsvector representation are already
determined.  It's fairly hard to see a way of doing it in future
releases that would have acceptable costs, either.

But more to the point: no matter what the document length limit is,
why should it be a hard error to exceed it?  The downside of not
indexing words beyond the length limit is that searches won't find
documents in which the search terms occur only very far into the
document.  The downside of throwing an error is that we can't store such
documents at all, which surely guarantees that searches won't find
them.  How can you possibly argue that that option is better?

            regards, tom lane

pgsql-patches by date:

Previous
From: Euler Taveira de Oliveira
Date:
Subject: Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit
Next
From: Bruce Momjian
Date:
Subject: Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit