Home > mailing lists

Re: tsvector pg_stats seems quite a bit off. - Mailing list pgsql-hackers

From	Jan Urbański
Subject	Re: tsvector pg_stats seems quite a bit off.
Date	May 28, 2010 05:27:22
Msg-id	4BFF7EAB.6040706@wulczer.org Whole thread Raw
In response to	Re: tsvector pg_stats seems quite a bit off. (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

On 28/05/10 04:47, Tom Lane wrote:
> Jan Urbański <wulczer@wulczer.org> writes:
>> On 19/05/10 21:01, Jesper Krogh wrote:
>>> In practice, just cranking the statistics estimate up high enough seems
>>> to solve the problem, but doesn't
>>> there seem to be something wrong in how the statistics are collected?
> 
>> The algorithm to determine most common vals does not do it accurately.
>> That would require keeping all lexemes from the analysed tsvectors in
>> memory, which would be impractical. If you want to learn more about the
>> algorithm being used, try reading
>> http://www.vldb.org/conf/2002/S10P03.pdf and corresponding comments in
>> ts_typanalyze.c
> 
> I re-scanned that paper and realized that there is indeed something
> wrong with the way we are doing it.

> So I think we have to fix this. 

Hm, I'll try to take another look this evening (CEST).

Cheers,
Jan

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 28 May 2010, 04:48:55
Subject: Re: Idea for getting rid of VACUUM FREEZE on cold pages

From: Fujii Masao
Date: 28 May 2010, 05:46:53
Subject: Re: Patch submission deadline for CommitFest 2010-07

Re: tsvector pg_stats seems quite a bit off. - Mailing list pgsql-hackers

Previous

Next