On Fri, Oct 23, 2009 at 2:32 PM, Jesper Krogh <jesper@krogh.cc> wrote:
> Tom Lane wrote:
>> Jesper Krogh <jesper@krogh.cc> writes:
>>> Tom Lane wrote:
>>>> ... There's something strange about your tsvector index. Maybe
>>>> it's really huge because the documents are huge?
>>
>>> huge is a relative term, but length(ts_vector(body)) is about 200 for
>>> each document. Is that huge?
>>
>> It's bigger than the toy example I was trying, but not *that* much
>> bigger. I think maybe your index is bloated. Try dropping and
>> recreating it and see if the estimates change any.
>
> I'm a bit reluctant to dropping it and re-creating it. It'll take a
> couple of days to regenerate, so this should hopefully not be an common
> situation for the system.
Note that if it is bloated, you can create the replacement index with
a concurrently created one, then drop the old one when the new one
finishes. So, no time spent without an index.
> I have set the statistics target to 1000 for the tsvector, the
> documentation didn't specify any heavy negative sides of doing that and
> since that I haven't seen row estimates that are orders of magnitude off.
It increases planning time mostly. Also increases analyze times but
not that much.
> It is build from scratch using inserts all the way to around 10m now,
> should that result in index-bloat? Can I inspect the size of bloat
> without rebuilding (or similar locking operation)?
Depends on how many lost inserts there were. If 95% of all your
inserts failed then yeah, it would be bloated.