Hello, hackers!
Historically tsvector type can't hold more than 1MB data.
I want to propose a patch that removes that limit.
That limit is created by 'pos' field from WordEntry, which have only
20 bits for storage.
In the proposed patch I removed this field and instead of it I keep
offsets only at each Nth item in WordEntry's array. Now I set N as 4,
because it gave best results in my benchmarks. It can be increased in
the future without affecting already saved data in database. Also
removing the field improves compression of tsvectors.
I simplified the code by creating functions that can be used to
build tsvectors. There were duplicated code fragments in places where
tsvector was built.
Also new patch frees some space in WordEntry that can be used to
save some additional information about saved words.
-
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers