Thread: Re: [GENERAL] Feature Request: bigtsvector

Re: [GENERAL] Feature Request: bigtsvector

From

Bruce Momjian

Date:

09 September 2015, 14:52:08

On Wed, Jun 17, 2015 at 07:58:21AM +0200, CPT wrote:
> Hi all;
> 
> We are running a multi-TB bioinformatics system on PostgreSQL and
> use a denormalized schema in
> places with a lot of tsvectors aggregated together for centralized
> searching.  This is
> very important to the performance of the system.  These aggregate
> many documents (sometimes tens of thousands), many of which contain
> large numbers of references to other documents.  It isn't uncommon
> to have tens of thousands of lexemes.  The tsvectors hold mixed
> document id and natural language search information (all f which
> comes in from the same documents).
> 
> Recently we have started hitting the 1MB limit on tsvector size.  We
> have found it possible to
> patch PostgreSQL to make the tsvector larger but this changes the
> on-disk layout.  How likely is
> it that either the tsvector size could be increased in future
> versions to allow for vectors up to toastable size (1GB logical)?  I
> can't imagine we are the only ones with such a problem.  Since, I
> think, changing the on-disk layout might not be such a good idea,
> maybe it would be worth considering having a new bigtsvector type?
> 
> Btw, we've been very impressed with the extent that PostgreSQL has
> tolerated all kinds of loads we have thrown at it.

Can anyone on hackers answer this question from June?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: [GENERAL] Feature Request: bigtsvector

From

Ildus Kurbangaliev

Date:

09 September 2015, 15:14:35

On Wed, 9 Sep 2015 10:52:02 -0400
Bruce Momjian <bruce@momjian.us> wrote:

> On Wed, Jun 17, 2015 at 07:58:21AM +0200, CPT wrote:
> > Hi all;
> > 
> > We are running a multi-TB bioinformatics system on PostgreSQL and
> > use a denormalized schema in
> > places with a lot of tsvectors aggregated together for centralized
> > searching.  This is
> > very important to the performance of the system.  These aggregate
> > many documents (sometimes tens of thousands), many of which contain
> > large numbers of references to other documents.  It isn't uncommon
> > to have tens of thousands of lexemes.  The tsvectors hold mixed
> > document id and natural language search information (all f which
> > comes in from the same documents).
> > 
> > Recently we have started hitting the 1MB limit on tsvector size.  We
> > have found it possible to
> > patch PostgreSQL to make the tsvector larger but this changes the
> > on-disk layout.  How likely is
> > it that either the tsvector size could be increased in future
> > versions to allow for vectors up to toastable size (1GB logical)?  I
> > can't imagine we are the only ones with such a problem.  Since, I
> > think, changing the on-disk layout might not be such a good idea,
> > maybe it would be worth considering having a new bigtsvector type?
> > 
> > Btw, we've been very impressed with the extent that PostgreSQL has
> > tolerated all kinds of loads we have thrown at it.
> 
> Can anyone on hackers answer this question from June?
> 

Hi, I'm working on patch now that removes this limit without changes (or
small changes) of on-disk layout. I think it'll be ready during this 
month.

----
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com <http://www.postgrespro.com/> 
The Russian Postgres Company

Re: [GENERAL] Feature Request: bigtsvector

From

Bruce Momjian

Date:

09 September 2015, 15:17:23

On Wed, Sep  9, 2015 at 06:14:28PM +0300, Ildus Kurbangaliev wrote:
> On Wed, 9 Sep 2015 10:52:02 -0400
> Bruce Momjian <bruce@momjian.us> wrote:
> 
> > On Wed, Jun 17, 2015 at 07:58:21AM +0200, CPT wrote:
> > > Hi all;
> > > 
> > > We are running a multi-TB bioinformatics system on PostgreSQL and
> > > use a denormalized schema in
> > > places with a lot of tsvectors aggregated together for centralized
> > > searching.  This is
> > > very important to the performance of the system.  These aggregate
> > > many documents (sometimes tens of thousands), many of which contain
> > > large numbers of references to other documents.  It isn't uncommon
> > > to have tens of thousands of lexemes.  The tsvectors hold mixed
> > > document id and natural language search information (all f which
> > > comes in from the same documents).
> > > 
> > > Recently we have started hitting the 1MB limit on tsvector size.  We
> > > have found it possible to
> > > patch PostgreSQL to make the tsvector larger but this changes the
> > > on-disk layout.  How likely is
> > > it that either the tsvector size could be increased in future
> > > versions to allow for vectors up to toastable size (1GB logical)?  I
> > > can't imagine we are the only ones with such a problem.  Since, I
> > > think, changing the on-disk layout might not be such a good idea,
> > > maybe it would be worth considering having a new bigtsvector type?
> > > 
> > > Btw, we've been very impressed with the extent that PostgreSQL has
> > > tolerated all kinds of loads we have thrown at it.
> > 
> > Can anyone on hackers answer this question from June?
> > 
> 
> Hi, I'm working on patch now that removes this limit without changes (or
> small changes) of on-disk layout. I think it'll be ready during this 
> month.

Oh, great, thanks.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +