Re: tsvector pg_stats seems quite a bit off. - Mailing list pgsql-hackers

From Jan Urbański
Subject Re: tsvector pg_stats seems quite a bit off.
Date
Msg-id 4C02A81D.30209@wulczer.org
Whole thread Raw
In response to Re: tsvector pg_stats seems quite a bit off.  (Jesper Krogh <jesper@krogh.cc>)
Responses Re: tsvector pg_stats seems quite a bit off.
Re: tsvector pg_stats seems quite a bit off.
List pgsql-hackers
On 30/05/10 09:08, Jesper Krogh wrote:
> On 2010-05-29 15:56, Jan Urbański wrote:
>> On 29/05/10 12:34, Jesper Krogh wrote:
>>> I can "fairly easy" try out patches or do other kind of testing.
>>>
>> I'll try to come up with a patch for you to try and fiddle with these
>> values before Monday.

Here's a patch against recent git, but should apply to 8.4 sources as
well. It would be interesting to measure the memory and time needed to
analyse the table after applying it, because we will be now using a lot
bigger bucket size and I haven't done any performance impact testing on
it. I updated the initial comment block in compute_tsvector_stats, but
the prose could probably be improved.

> testdb=# explain select id from testdb.reference where document_tsvector
> @@ plainto_tsquery('where') order by id limit 50;
> NOTICE:  text-search query contains only stop words or doesn't contain
> lexemes, ignored

That's orthogonal to the issue with the statistics collection, you just
need to modify your stopwords list (for instance make it empty).

Cheers,
Jan

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pg_trgm
Next
From: Andres Freund
Date:
Subject: Re: Re: [RFC][PATCH]: CRC32 is limiting at COPY/CTAS/INSERT ... SELECT + speeding it up