Thread: Re: [GENERAL] Feature Request: bigtsvector
On Wed, Jun 17, 2015 at 07:58:21AM +0200, CPT wrote: > Hi all; > > We are running a multi-TB bioinformatics system on PostgreSQL and > use a denormalized schema in > places with a lot of tsvectors aggregated together for centralized > searching. This is > very important to the performance of the system. These aggregate > many documents (sometimes tens of thousands), many of which contain > large numbers of references to other documents. It isn't uncommon > to have tens of thousands of lexemes. The tsvectors hold mixed > document id and natural language search information (all f which > comes in from the same documents). > > Recently we have started hitting the 1MB limit on tsvector size. We > have found it possible to > patch PostgreSQL to make the tsvector larger but this changes the > on-disk layout. How likely is > it that either the tsvector size could be increased in future > versions to allow for vectors up to toastable size (1GB logical)? I > can't imagine we are the only ones with such a problem. Since, I > think, changing the on-disk layout might not be such a good idea, > maybe it would be worth considering having a new bigtsvector type? > > Btw, we've been very impressed with the extent that PostgreSQL has > tolerated all kinds of loads we have thrown at it. Can anyone on hackers answer this question from June? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Wed, 9 Sep 2015 10:52:02 -0400 Bruce Momjian <bruce@momjian.us> wrote: > On Wed, Jun 17, 2015 at 07:58:21AM +0200, CPT wrote: > > Hi all; > > > > We are running a multi-TB bioinformatics system on PostgreSQL and > > use a denormalized schema in > > places with a lot of tsvectors aggregated together for centralized > > searching. This is > > very important to the performance of the system. These aggregate > > many documents (sometimes tens of thousands), many of which contain > > large numbers of references to other documents. It isn't uncommon > > to have tens of thousands of lexemes. The tsvectors hold mixed > > document id and natural language search information (all f which > > comes in from the same documents). > > > > Recently we have started hitting the 1MB limit on tsvector size. We > > have found it possible to > > patch PostgreSQL to make the tsvector larger but this changes the > > on-disk layout. How likely is > > it that either the tsvector size could be increased in future > > versions to allow for vectors up to toastable size (1GB logical)? I > > can't imagine we are the only ones with such a problem. Since, I > > think, changing the on-disk layout might not be such a good idea, > > maybe it would be worth considering having a new bigtsvector type? > > > > Btw, we've been very impressed with the extent that PostgreSQL has > > tolerated all kinds of loads we have thrown at it. > > Can anyone on hackers answer this question from June? > Hi, I'm working on patch now that removes this limit without changes (or small changes) of on-disk layout. I think it'll be ready during this month. ---- Ildus Kurbangaliev Postgres Professional: http://www.postgrespro.com <http://www.postgrespro.com/> The Russian Postgres Company
On Wed, Sep 9, 2015 at 06:14:28PM +0300, Ildus Kurbangaliev wrote: > On Wed, 9 Sep 2015 10:52:02 -0400 > Bruce Momjian <bruce@momjian.us> wrote: > > > On Wed, Jun 17, 2015 at 07:58:21AM +0200, CPT wrote: > > > Hi all; > > > > > > We are running a multi-TB bioinformatics system on PostgreSQL and > > > use a denormalized schema in > > > places with a lot of tsvectors aggregated together for centralized > > > searching. This is > > > very important to the performance of the system. These aggregate > > > many documents (sometimes tens of thousands), many of which contain > > > large numbers of references to other documents. It isn't uncommon > > > to have tens of thousands of lexemes. The tsvectors hold mixed > > > document id and natural language search information (all f which > > > comes in from the same documents). > > > > > > Recently we have started hitting the 1MB limit on tsvector size. We > > > have found it possible to > > > patch PostgreSQL to make the tsvector larger but this changes the > > > on-disk layout. How likely is > > > it that either the tsvector size could be increased in future > > > versions to allow for vectors up to toastable size (1GB logical)? I > > > can't imagine we are the only ones with such a problem. Since, I > > > think, changing the on-disk layout might not be such a good idea, > > > maybe it would be worth considering having a new bigtsvector type? > > > > > > Btw, we've been very impressed with the extent that PostgreSQL has > > > tolerated all kinds of loads we have thrown at it. > > > > Can anyone on hackers answer this question from June? > > > > Hi, I'm working on patch now that removes this limit without changes (or > small changes) of on-disk layout. I think it'll be ready during this > month. Oh, great, thanks. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +