[HACKERS] Remove 1MB size limit in tsvector - Mailing list pgsql-hackers

From Ildus Kurbangaliev
Subject [HACKERS] Remove 1MB size limit in tsvector
Date
Msg-id 20170801170846.66e3ab06@wp.localdomain
Whole thread Raw
Responses Re: [HACKERS] Remove 1MB size limit in tsvector  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hello, hackers!

Historically tsvector type can't hold more than 1MB data.
I want to propose a patch that removes that limit.

That limit is created by 'pos' field from WordEntry, which have only
20 bits for storage.

In the proposed patch I removed this field and instead of it I keep
offsets only at each Nth item in WordEntry's array. Now I set N as 4,
because it gave best results in my benchmarks. It can be increased in
the future without affecting already saved data in database. Also
removing the field improves compression of tsvectors.

I simplified the code by creating functions that can be used to
build tsvectors. There were duplicated code fragments in places where
tsvector was built.

Also new patch frees some space in WordEntry that can be used to
save some additional information about saved words.

- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] PostgreSQL 10 (latest beta) and older ICU
Next
From: Alexander Kuzmenkov
Date:
Subject: Re: [HACKERS] Proposal for CSN based snapshots