Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit - Mailing list pgsql-patches

From Teodor Sigaev
Subject Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit
Date
Msg-id 47D14998.3080304@sigaev.ru
Whole thread Raw
In response to Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit  (Bruce Momjian <bruce@momjian.us>)
List pgsql-patches
To be precise about tsvector:

1) GiST index is lossy for any kind of tserach queries, GIN index for @@
operation is not lossy, for @@@ - is lossy.

2) Number of positions per word is limited to 256 number - bigger number of
positions is not helpful for ranking, but produces a big tsvector. If word has a
lot of positions in document then it close to be a stopword. We could easy
increase this limit to 65536 positions

3) Maximum value of position is 2^14, because for position's storage we use
uint16. In this integer it's needed to reserve 2 bits to store weight of this
position. It's possible to increase int16 to int32, but it will doubled tsvector
size, which is unpractical, I suppose. So, part of document used for ranking
contains first 16384 words - that is about first 50-100 kilobytes.

4) Limit of total size of tsvector is in WordEntry->pos (ts_type.h) field. It
contains number of bytes between first lexeme in tsvector and needed lexeme.
So, limitation is total length of lexemes plus  theirs positional information.


--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

pgsql-patches by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit
Next
From: Peter Eisentraut
Date:
Subject: Re: Minimum selectivity estimate for LIKE 'prefix%'