Re: BUG #18080: to_tsvector fails for long text input - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #18080: to_tsvector fails for long text input
Date
Msg-id 1146921.1695411070@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #18080: to_tsvector fails for long text input  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
I wrote:
> BTW, the field order in ParsedWord is such that there's a fair
> amount of wasted pad space on 64-bit builds.  I doubt we can
> get away with rearranging it in released branches; but maybe
> it's worth doing something about that in HEAD, to push out
> the point at which you hit the 1Gb limit.

I poked at that a little bit.  We can reduce 64-bit sizeof(ParsedWord)
from 40 bytes to 24 bytes with the attached patch.  The main thing
needed to make this pack tightly is to reduce the "alen" field from
uint32 to uint16.  While it's not immediately obvious that that's
a good thing to do, a look at the one place where alen is increased
(uniqueWORD() in to_tsany.c) shows that it cannot get to more than
twice MAXNUMPOS:

            if (res->pos.apos[0] < MAXNUMPOS - 1 && ...)
            {
                if (res->pos.apos[0] + 1 >= res->alen)
                {
                    res->alen *= 2;
                    res->pos.apos = (uint16 *) repalloc(res->pos.apos, sizeof(uint16) * res->alen);
                }

MAXNUMPOS is currently 256, and even if it's possible to increase
that it seems unlikely that we'd want to make it more than 32k.
So this limitation seems OK to me.

            regards, tom lane

diff --git a/src/include/tsearch/ts_utils.h b/src/include/tsearch/ts_utils.h
index d3dc8bae47..d2aae0c337 100644
--- a/src/include/tsearch/ts_utils.h
+++ b/src/include/tsearch/ts_utils.h
@@ -81,8 +81,10 @@ extern void pushOperator(TSQueryParserState state, int8 oper, int16 distance);
  */
 typedef struct
 {
+    uint16        flags;            /* currently, only TSL_PREFIX */
     uint16        len;
     uint16        nvariant;
+    uint16        alen;
     union
     {
         uint16        pos;
@@ -90,13 +92,11 @@ typedef struct
         /*
          * When apos array is used, apos[0] is the number of elements in the
          * array (excluding apos[0]), and alen is the allocated size of the
-         * array.
+         * array.  We do not allow more than MAXNUMPOS array elements.
          */
         uint16       *apos;
     }            pos;
-    uint16        flags;            /* currently, only TSL_PREFIX */
     char       *word;
-    uint32        alen;
 } ParsedWord;

 typedef struct

pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #18131: PL/pgSQL: regclass procedure parameter wrongly memoized(?)
Next
From: Christian Stork
Date:
Subject: Re: BUG #18131: PL/pgSQL: regclass procedure parameter wrongly memoized(?)