Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit
Date
Msg-id 200803051553.m25Frct09843@momjian.us
Whole thread Raw
Responses Re: [BUGS] BUG #3975: tsearch2 index should not bomb out of 1Mb limit  (Edwin Groothuis <postgresql@mavetju.org>)
List pgsql-patches
Euler Taveira de Oliveira wrote:
> Edwin Groothuis wrote:
>
> > Ouch. But... since very long words are already not indexed (is the length
> > configurable anywhere because I don't mind setting it to 50 characters), I
> > don't think that it should bomb out of this but print a similar warning like
> > "String only partly indexed".
> >
> This is not a bug. I would say it's a limitation. Look at
> src/include/tsearch/ts_type.h. You could decrease len in WordEntry to 9
> (512 characters) and increase pos to 22 (4 Mb). Don't forget to update
> MAXSTRLEN and MAXSTRPOS accordingly.
>
> > I'm still trying to determine how big the message it failed on was...
> >
> Maybe we should change the "string is too long for tsvector" to "string
> is too long (%ld bytes, max %ld bytes) for tsvector".

Good idea.  I have applied the following patch to report in the error
message the string length and maximum, like we already do for long
words:

Old:
    test=> select repeat('a', 3000)::tsvector;
    ERROR:  word is too long (3000 bytes, max 2046 bytes)

New:
    test=> select repeat('a ', 3000000)::tsvector;
    ERROR:  string is too long for tsvector (1048576 bytes, max 1048575 bytes)

I did not backpatch this to 8.3 because it would require translation
string updates.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: src/backend/tsearch/to_tsany.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/tsearch/to_tsany.c,v
retrieving revision 1.8
diff -c -c -r1.8 to_tsany.c
*** src/backend/tsearch/to_tsany.c    1 Jan 2008 19:45:52 -0000    1.8
--- src/backend/tsearch/to_tsany.c    5 Mar 2008 15:41:36 -0000
***************
*** 163,169 ****
      if (lenstr > MAXSTRPOS)
          ereport(ERROR,
                  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
!                  errmsg("string is too long for tsvector")));

      totallen = CALCDATASIZE(prs->curwords, lenstr);
      in = (TSVector) palloc0(totallen);
--- 163,169 ----
      if (lenstr > MAXSTRPOS)
          ereport(ERROR,
                  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
!                      errmsg("string is too long for tsvector (%d bytes, max %d bytes)", lenstr, MAXSTRPOS)));

      totallen = CALCDATASIZE(prs->curwords, lenstr);
      in = (TSVector) palloc0(totallen);
Index: src/backend/utils/adt/tsvector.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/tsvector.c,v
retrieving revision 1.11
diff -c -c -r1.11 tsvector.c
*** src/backend/utils/adt/tsvector.c    1 Jan 2008 19:45:53 -0000    1.11
--- src/backend/utils/adt/tsvector.c    5 Mar 2008 15:41:36 -0000
***************
*** 224,230 ****
          if (cur - tmpbuf > MAXSTRPOS)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
!                      errmsg("string is too long for tsvector")));

          /*
           * Enlarge buffers if needed
--- 224,230 ----
          if (cur - tmpbuf > MAXSTRPOS)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
!                       errmsg("string is too long for tsvector (%d bytes, max %d bytes)", cur - tmpbuf, MAXSTRPOS)));

          /*
           * Enlarge buffers if needed
***************
*** 273,279 ****
      if (buflen > MAXSTRPOS)
          ereport(ERROR,
                  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
!                  errmsg("string is too long for tsvector")));

      totallen = CALCDATASIZE(len, buflen);
      in = (TSVector) palloc0(totallen);
--- 273,279 ----
      if (buflen > MAXSTRPOS)
          ereport(ERROR,
                  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
!                  errmsg("string is too long for tsvector (%d bytes, max %d bytes)", buflen, MAXSTRPOS)));

      totallen = CALCDATASIZE(len, buflen);
      in = (TSVector) palloc0(totallen);
Index: src/backend/utils/adt/tsvector_op.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/adt/tsvector_op.c,v
retrieving revision 1.12
diff -c -c -r1.12 tsvector_op.c
*** src/backend/utils/adt/tsvector_op.c    1 Jan 2008 19:45:53 -0000    1.12
--- src/backend/utils/adt/tsvector_op.c    5 Mar 2008 15:41:36 -0000
***************
*** 488,494 ****
      if (dataoff > MAXSTRPOS)
          ereport(ERROR,
                  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
!                  errmsg("string is too long for tsvector")));

      out->size = ptr - ARRPTR(out);
      SET_VARSIZE(out, CALCDATASIZE(out->size, dataoff));
--- 488,494 ----
      if (dataoff > MAXSTRPOS)
          ereport(ERROR,
                  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
!                  errmsg("string is too long for tsvector (%d bytes, max %d bytes)", dataoff, MAXSTRPOS)));

      out->size = ptr - ARRPTR(out);
      SET_VARSIZE(out, CALCDATASIZE(out->size, dataoff));

pgsql-patches by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: WIP: guc enums
Next
From: Bruce Momjian
Date:
Subject: Re: Endless recovery