Thread: SP-GiST bug.

SP-GiST bug.

From

Teodor Sigaev

Date:

20 May 2014, 23:13:58

Hi!

create table xxx ( t text);

insert into xxx select 'x' || v::text from generate_series(1, 291) as v;
insert into xxx  values ('');

create index xxxidx on xxx using spgist (t);

And postgres will eat memory forever. It seems to me that checkAllTheSame
wrongly decides that all tuples are the same. All tuples except tuple to be
inserted have the same character 'x' and only new tuple has '\0'. But
checkAllTheSame() removes new tuple because set of tuples is too big to fit page
and founds that all tuples are the same. Patch attached, suppose, all branches
with spgist should be fixed.

Original data is too big to send in e-mail list or even send link, and the bug
caused not in root page as in example above.




--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Attachment

spgist_allthesame.patch.gz

Re: SP-GiST bug.

From

Tom Lane

Date:

28 May 2014, 22:32:29

Teodor Sigaev <teodor@sigaev.ru> writes:
> create table xxx ( t text);

> insert into xxx select 'x' || v::text from generate_series(1, 291) as v;
> insert into xxx  values ('');

> create index xxxidx on xxx using spgist (t);

Fun!

> And postgres will eat memory forever. It seems to me that checkAllTheSame 
> wrongly decides that all tuples are the same. All tuples except tuple to be 
> inserted have the same character 'x' and only new tuple has '\0'.

I don't think that checkAllTheSame is broken, but it probably would be
after this patch --- ISTM you're breaking the corner case described in the
header comment for checkAllTheSame, where we need to force splitting of a
set of existing tuples that the opclass can't distinguish.

What actually is broken, I think, is the spgist text opclass, because it's
using a zero node label for two different situations.  The scenario we're
looking at here is:

1. We call picksplit with all 292 of these entries.  Not surprisingly,
it puts the first 291 into a single node with label 'x' and the last one
into another node with label '\0'.

2. Because this is too many to fit on one index page, checkAllTheSame
decides it had better create an allTheSame tuple for the first 291 and
then try again to insert the empty string into that tuple.  While you
could argue whether this is *necessary*, it is not *incorrect*, so ISTM
the opclass had better cope with it.

3. We call spg_text_choose with the allTheSame tuple and the empty-string
value to be inserted.  It finds the empty-string value doesn't match
(since all the nodes have label 'x') so it requests a split:
    out->resultType = spgSplitTuple;    out->result.splitTuple.prefixHasPrefix = in->hasPrefix;
out->result.splitTuple.prefixPrefixDatum= in->prefixDatum;    out->result.splitTuple.nodeLabel = UInt8GetDatum('\0');
out->result.splitTuple.postfixHasPrefix = false;

After splitting, we have a new upper tuple with a single node with label
'\0', pointing to a new allTheSame tuple containing the original 291
entries.

4. We call spg_text_choose with the new upper tuple and the empty-string
value to be inserted.  It looks for a node labeled '\0', and finds it, so
it reports that we should descend into that node --- ie, down to the new
allTheSame tuple.

5. We call spg_text_choose with the new allTheSame tuple and the
empty-string value to be inserted.  This is exactly like the situation
at step 3, so we're in an infinite loop.

It doesn't look to me like the core code has done anything wrong here;
it just did what the opclass requested.  Rather, the problem is we're
using '\0' node label both to represent a "no op" label descent and
to represent "we're past the end of the string".

I'm afraid there may not be any way to fix this without breaking on-disk
compatibility for spgist text indexes :-(.  It seems like we have to
distinguish the case of zero-length string from that of a dummy label
for a pushed-down allTheSame tuple.
        regards, tom lane

Re: SP-GiST bug.

From

Teodor Sigaev

Date:

29 May 2014, 14:52:15

> I don't think that checkAllTheSame is broken, but it probably would be
> after this patch --- ISTM you're breaking the corner case described in the
> header comment for checkAllTheSame, where we need to force splitting of a
> set of existing tuples that the opclass can't distinguish.
INHO, this patch will not break - because it forces (in corner case) to create a 
node for new tuple but exclude it from list to insert. On next iteration we will 
find a needed node with empty list.

Comment: * If we know that the leaf tuples wouldn't all fit on one page, then we * exclude the last tuple (which is the
incomingnew tuple that forced a split) * from the check to see if more than one node is used.  The reason for this * is
thatif the existing tuples are put into only one chain, then even if * we move them all to an empty page, there would
stillnot be room for the * new tuple, so we'd get into an infinite loop of picksplit attempts.
 

If we reserve a node for new tuple then on next iteration we will not fall into 
picksplit again - we will have an empty list. So new tuple will fall into 
another page.

BTW, look for following example:
create table xxx (p point);

insert into xxx select '(1,1)'::point from generate_series(1, 226) as v;
insert into xxx values ('(3,3)'::point);

create index xxxidx on xxx using spgist (p quad_point_ops);

easy to see that the single node is allTheSame although underlying leafs are 
different. Due to allTheSame search logic scans are correct but inefficient. And 
patch fixes it too.


> What actually is broken, I think, is the spgist text opclass, because it's
> using a zero node label for two different situations.  The scenario we're
> looking at here is:

Agree for different situations, but disagree that's broken :)

> It seems like we have to> distinguish the case of zero-length string from that of a dummy label> for a pushed-down
allTheSametuple.
 

Actually, we do this: if allTheSame is true then nodes have a dummy labels.


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/

Re: SP-GiST bug.

From

Tom Lane

Date:

29 May 2014, 19:51:44

Teodor Sigaev <teodor@sigaev.ru> writes:
>> I don't think that checkAllTheSame is broken, but it probably would be
>> after this patch --- ISTM you're breaking the corner case described in the
>> header comment for checkAllTheSame, where we need to force splitting of a
>> set of existing tuples that the opclass can't distinguish.

> INHO, this patch will not break - because it forces (in corner case) to create a 
> node for new tuple but exclude it from list to insert. On next iteration we will 
> find a needed node with empty list.

> Comment:
>   * If we know that the leaf tuples wouldn't all fit on one page, then we
>   * exclude the last tuple (which is the incoming new tuple that forced a split)
>   * from the check to see if more than one node is used.  The reason for this
>   * is that if the existing tuples are put into only one chain, then even if
>   * we move them all to an empty page, there would still not be room for the
>   * new tuple, so we'd get into an infinite loop of picksplit attempts.

> If we reserve a node for new tuple then on next iteration we will not fall into 
> picksplit again - we will have an empty list. So new tuple will fall into 
> another page.

The point I'm making is that the scenario your test case exposes is not
an infinite loop of picksplits, but an infinite loop of choose calls.
There are certainly ways to get into that loop that don't involve this
specific checkAllTheSame logic.  Here's an example:

regression=# create table xxx ( t text);
CREATE TABLE
regression=# create index xxxidx on xxx using spgist (t);
CREATE INDEX
regression=# insert into xxx select repeat('x',4000);
INSERT 0 1
regression=# insert into xxx select repeat('x',4000);
INSERT 0 1
regression=# insert into xxx select repeat('x',4000);
INSERT 0 1
regression=# insert into xxx select repeat('x',3996);

In this example, we get a picksplit at the third insert, and here we
*must* take the allTheSame path because the values are, in fact, all the
same (the SPGIST_MAX_PREFIX_LENGTH constraint means we can only put 3996
characters into the prefix, and then the 3997'th character of each value
is 'x').  Then the fourth insert triggers this same infinite loop, because
spg_text_choose is asked how to insert the slightly-shorter value into the
allTheSame tuple, and it does the wrong thing.

It's certainly possible that we could/should change what checkAllTheSame
is doing --- on re-reading the comment, I'm not really sure that the
scenario it's concerned about can happen.  However, that will not fix
the bug in spgtextproc.c; it will only block one avenue for reaching
the bug.
        regards, tom lane

Re: SP-GiST bug.

From

Teodor Sigaev

Date:

30 May 2014, 14:55:31

> The point I'm making is that the scenario your test case exposes is not
> an infinite loop of picksplits, but an infinite loop of choose calls.

Thank you, now I see what I missed before. After some brainstorming, it's
possible to use '\0' only as end of string marker.  The idea is: when we split
allthesame tuple to upper/lower we copy node label to upper tuple instead of set
it to '\0'. Node labels of lower tuple will be unchanged but should be ignored
for reconstructionValue - and they are ignored because of allTheSame flag.  See
attached patch.

Unfortunately, it will break on-disk compatibility. Although it will not cause a
panic/error/coredump allTheSame approach will not find tuples. Users should
recreate spgist indexes over text column.


> It's certainly possible that we could/should change what checkAllTheSame
> is doing --- on re-reading the comment, I'm not really sure that the
> scenario it's concerned about can happen.  However, that will not fix

I rewrited a patch to fix missed way - allTheSame result of picksplit and tooBig
is set. I believe this patch is still needed because it could make a tree more
efficient as it was demonstrated for quad tree.

this patch doesn't break on-disk compatibility, although index recreation is
recommended.

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Attachment

Re: SP-GiST bug.

From

Bruce Momjian

Date:

30 May 2014, 15:11:58

On Fri, May 30, 2014 at 06:55:18PM +0400, Teodor Sigaev wrote:
> >The point I'm making is that the scenario your test case exposes is not
> >an infinite loop of picksplits, but an infinite loop of choose calls.
> 
> Thank you, now I see what I missed before. After some brainstorming,
> it's possible to use '\0' only as end of string marker.  The idea
> is: when we split allthesame tuple to upper/lower we copy node label
> to upper tuple instead of set it to '\0'. Node labels of lower tuple
> will be unchanged but should be ignored for reconstructionValue -
> and they are ignored because of allTheSame flag.  See attached
> patch.
> 
> Unfortunately, it will break on-disk compatibility. Although it will
> not cause a panic/error/coredump allTheSame approach will not find
> tuples. Users should recreate spgist indexes over text column.

If we bump the system catalog version, pg_upgrade can mark those indexes
as invalid and output a script to reindex them.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: SP-GiST bug.

From

Teodor Sigaev

Date:

30 May 2014, 16:19:17

> If we bump the system catalog version, pg_upgrade can mark those indexes
> as invalid and output a script to reindex them.
>

Between 9.3 - 9.4 catalog version is changed anyway.
-- 
Teodor Sigaev                      E-mail: teodor@sigaev.ru                                      WWW:
http://www.sigaev.ru/

Re: SP-GiST bug.

From

Bruce Momjian

Date:

30 May 2014, 19:22:43

On Fri, May 30, 2014 at 08:19:07PM +0400, Teodor Sigaev wrote:
> >If we bump the system catalog version, pg_upgrade can mark those indexes
> >as invalid and output a script to reindex them.
> >
> 
> Between 9.3 - 9.4 catalog version is changed anyway.

Right.  I was thinking of beta users between beta1 and beta2 as well. 
We would need to trigger the index invalidation on the catalog version
used for the commit that changes the index format, not the major version
like 9.4.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: SP-GiST bug.

From

Tom Lane

Date:

03 June 2014, 17:41:50

Teodor Sigaev <teodor@sigaev.ru> writes:
>> The point I'm making is that the scenario your test case exposes is not
>> an infinite loop of picksplits, but an infinite loop of choose calls.

> Thank you, now I see what I missed before. After some brainstorming, it's 
> possible to use '\0' only as end of string marker.  The idea is: when we split 
> allthesame tuple to upper/lower we copy node label to upper tuple instead of set 
> it to '\0'. Node labels of lower tuple will be unchanged but should be ignored 
> for reconstructionValue - and they are ignored because of allTheSame flag.  See 
> attached patch.

I don't believe this patch works at all.  In particular, for an allTheSame
node occurring at the root level, omitting the nodeChar from the
reconstructed value is certainly wrong since there was no parent that
could have had a label containing the same character.  More generally, the
patch is assuming that any allTheSame tuple is one that is underneath a
split node; but that's surely wrong (where did such a tuple come from
before the split happened?).

If the root case were the only possible one then we could fix the problem
by adding a test on whether level == 0, but AFAICS, allTheSame tuples can
also get created at lower tree levels.  All you need is a bunch of strings
that are duplicates for more than the maximum prefix length.

I think what we have to do is use a different dummy value for the node
label of a pushed-down allTheSame tuple than we do for end-of-string.
This requires widening the node labels from uint8 to (at least) int16.
However, I think that's essentially free because pass-by-value node
labels are stored as Datums anyway.  In fact, it might even be upward
compatible, at least for cases that aren't broken today.

> I rewrited a patch to fix missed way - allTheSame result of picksplit and tooBig 
> is set. I believe this patch is still needed because it could make a tree more 
> efficient as it was demonstrated for quad tree.

The question here continues to be how sure we are that the case described
in checkAllTheSame's header comment can't actually occur.  We certainly
thought it could when we originally wrote this stuff.  I think possibly
we were wrong, but I'd like to see a clear proof of that before we discuss
dropping the logic that's meant to cope with the situation.
        regards, tom lane

Re: SP-GiST bug.

From

Tom Lane

Date:

05 June 2014, 22:39:21

I wrote:
> I think what we have to do is use a different dummy value for the node
> label of a pushed-down allTheSame tuple than we do for end-of-string.
> This requires widening the node labels from uint8 to (at least) int16.
> However, I think that's essentially free because pass-by-value node
> labels are stored as Datums anyway.  In fact, it might even be upward
> compatible, at least for cases that aren't broken today.

Here's a draft patch along these lines.  AFAICT it is fully upward
compatible with existing indexes, although in cases where the current code
creates a zero node label for strings ending at that level, this code will
choose to push new strings into a new child node with label -1 instead, so
that there's some very small added inefficiency.  I don't think we need to
tell people to reindex if we use this fix, though.  I went with new
dummy labels of -1 and -2 partially so that an existing zero label would
not be a hazard for future insertions, and partly with the thought that if
we ever allow embedded \0 in strings, this coding would be able to handle
it with minimal adjustment.

Note that this fix slightly reduces the maximum safe length of a prefix
string (SPGIST_MAX_PREFIX_LENGTH).  In an existing index, there might be
prefixes longer than that.  The code will still work, but it's
theoretically possible that addition of nodes to such a tuple could result
in "SPGiST inner tuple size exceeds maximum" errors, if you managed to
populate all possible next-byte values at that tuple.  I think the odds of
this being a problem in the field are negligible (and if it did happen,
reindexing would fix the problem).

            regards, tom lane

diff --git a/src/backend/access/spgist/spgtextproc.c b/src/backend/access/spgist/spgtextproc.c
index 5b7a5a0..b278916 100644
*** a/src/backend/access/spgist/spgtextproc.c
--- b/src/backend/access/spgist/spgtextproc.c
***************
*** 3,8 ****
--- 3,34 ----
   * spgtextproc.c
   *      implementation of radix tree (compressed trie) over text
   *
+  * In a text_ops SPGiST index, inner tuples can have a prefix which is the
+  * common prefix of all strings indexed under that tuple.  The node labels
+  * represent the next byte of the string(s) after the prefix.  Assuming we
+  * always use the longest possible prefix, we will get more than one node
+  * label unless the prefix length is restricted by SPGIST_MAX_PREFIX_LENGTH.
+  *
+  * To reconstruct the indexed string for any index entry, concatenate the
+  * inner-tuple prefixes and node labels starting at the root and working
+  * down to the leaf entry, then append the datum in the leaf entry.
+  * (While descending the tree, "level" is the number of bytes reconstructed
+  * so far.)
+  *
+  * However, there are two special cases for node labels: -1 indicates that
+  * there are no more bytes after the prefix-so-far, and -2 indicates that we
+  * had to split an existing prefix-less allTheSame tuple (in such a case we
+  * have to create a node label that doesn't correspond to any string byte).
+  * In either case, the node label does not contribute anything to the
+  * reconstructed string.
+  *
+  * Previously, we used a node label of zero for both special cases, but
+  * this was problematic because one can't tell whether a string ending at
+  * the current level can be pushed down into such a child node.  For
+  * backwards compatibility, we still support such node labels for reading;
+  * but no new entries will ever be pushed down into a zero-labeled child.
+  * No new entries ever get pushed into a -2-labeled child, either.
+  *
   *
   * Portions Copyright (c) 1996-2014, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
***************
*** 24,51 ****

  /*
   * In the worst case, an inner tuple in a text radix tree could have as many
!  * as 256 nodes (one for each possible byte value).  Each node can take 16
!  * bytes on MAXALIGN=8 machines.  The inner tuple must fit on an index page
!  * of size BLCKSZ.  Rather than assuming we know the exact amount of overhead
   * imposed by page headers, tuple headers, etc, we leave 100 bytes for that
   * (the actual overhead should be no more than 56 bytes at this writing, so
   * there is slop in this number).  So we can safely create prefixes up to
!  * BLCKSZ - 256 * 16 - 100 bytes long.  Unfortunately, because 256 * 16 is
!  * already 4K, there is no safe prefix length when BLCKSZ is less than 8K;
   * it is always possible to get "SPGiST inner tuple size exceeds maximum"
   * if there are too many distinct next-byte values at a given place in the
   * tree.  Since use of nonstandard block sizes appears to be negligible in
   * the field, we just live with that fact for now, choosing a max prefix
   * size of 32 bytes when BLCKSZ is configured smaller than default.
   */
! #define SPGIST_MAX_PREFIX_LENGTH    Max((int) (BLCKSZ - 256 * 16 - 100), 32)

  /* Struct for sorting values in picksplit */
  typedef struct spgNodePtr
  {
      Datum        d;
      int            i;
!     uint8        c;
  } spgNodePtr;


--- 50,78 ----

  /*
   * In the worst case, an inner tuple in a text radix tree could have as many
!  * as 258 nodes (one for each possible byte value, plus the two special
!  * cases).  Each node can take 16 bytes on MAXALIGN=8 machines.
!  * The inner tuple must fit on an index page of size BLCKSZ.
!  * Rather than assuming we know the exact amount of overhead
   * imposed by page headers, tuple headers, etc, we leave 100 bytes for that
   * (the actual overhead should be no more than 56 bytes at this writing, so
   * there is slop in this number).  So we can safely create prefixes up to
!  * BLCKSZ - 258 * 16 - 100 bytes long.  Unfortunately, because 258 * 16 is
!  * over 4K, there is no safe prefix length when BLCKSZ is less than 8K;
   * it is always possible to get "SPGiST inner tuple size exceeds maximum"
   * if there are too many distinct next-byte values at a given place in the
   * tree.  Since use of nonstandard block sizes appears to be negligible in
   * the field, we just live with that fact for now, choosing a max prefix
   * size of 32 bytes when BLCKSZ is configured smaller than default.
   */
! #define SPGIST_MAX_PREFIX_LENGTH    Max((int) (BLCKSZ - 258 * 16 - 100), 32)

  /* Struct for sorting values in picksplit */
  typedef struct spgNodePtr
  {
      Datum        d;
      int            i;
!     int16        c;
  } spgNodePtr;


*************** spg_text_config(PG_FUNCTION_ARGS)
*** 56,62 ****
      spgConfigOut *cfg = (spgConfigOut *) PG_GETARG_POINTER(1);

      cfg->prefixType = TEXTOID;
!     cfg->labelType = CHAROID;
      cfg->canReturnData = true;
      cfg->longValuesOK = true;    /* suffixing will shorten long values */
      PG_RETURN_VOID();
--- 83,89 ----
      spgConfigOut *cfg = (spgConfigOut *) PG_GETARG_POINTER(1);

      cfg->prefixType = TEXTOID;
!     cfg->labelType = INT2OID;
      cfg->canReturnData = true;
      cfg->longValuesOK = true;    /* suffixing will shorten long values */
      PG_RETURN_VOID();
*************** commonPrefix(const char *a, const char *
*** 107,118 ****
  }

  /*
!  * Binary search an array of uint8 datums for a match to c
   *
   * On success, *i gets the match location; on failure, it gets where to insert
   */
  static bool
! searchChar(Datum *nodeLabels, int nNodes, uint8 c, int *i)
  {
      int            StopLow = 0,
                  StopHigh = nNodes;
--- 134,145 ----
  }

  /*
!  * Binary search an array of int16 datums for a match to c
   *
   * On success, *i gets the match location; on failure, it gets where to insert
   */
  static bool
! searchChar(Datum *nodeLabels, int nNodes, int16 c, int *i)
  {
      int            StopLow = 0,
                  StopHigh = nNodes;
*************** searchChar(Datum *nodeLabels, int nNodes
*** 120,126 ****
      while (StopLow < StopHigh)
      {
          int            StopMiddle = (StopLow + StopHigh) >> 1;
!         uint8        middle = DatumGetUInt8(nodeLabels[StopMiddle]);

          if (c < middle)
              StopHigh = StopMiddle;
--- 147,153 ----
      while (StopLow < StopHigh)
      {
          int            StopMiddle = (StopLow + StopHigh) >> 1;
!         int16        middle = DatumGetInt16(nodeLabels[StopMiddle]);

          if (c < middle)
              StopHigh = StopMiddle;
*************** spg_text_choose(PG_FUNCTION_ARGS)
*** 145,160 ****
      text       *inText = DatumGetTextPP(in->datum);
      char       *inStr = VARDATA_ANY(inText);
      int            inSize = VARSIZE_ANY_EXHDR(inText);
!     uint8        nodeChar = '\0';
!     int            i = 0;
      int            commonLen = 0;

      /* Check for prefix match, set nodeChar to first byte after prefix */
      if (in->hasPrefix)
      {
          text       *prefixText = DatumGetTextPP(in->prefixDatum);
!         char       *prefixStr = VARDATA_ANY(prefixText);
!         int            prefixSize = VARSIZE_ANY_EXHDR(prefixText);

          commonLen = commonPrefix(inStr + in->level,
                                   prefixStr,
--- 172,190 ----
      text       *inText = DatumGetTextPP(in->datum);
      char       *inStr = VARDATA_ANY(inText);
      int            inSize = VARSIZE_ANY_EXHDR(inText);
!     char       *prefixStr = NULL;
!     int            prefixSize = 0;
      int            commonLen = 0;
+     int16        nodeChar = 0;
+     int            i = 0;

      /* Check for prefix match, set nodeChar to first byte after prefix */
      if (in->hasPrefix)
      {
          text       *prefixText = DatumGetTextPP(in->prefixDatum);
!
!         prefixStr = VARDATA_ANY(prefixText);
!         prefixSize = VARSIZE_ANY_EXHDR(prefixText);

          commonLen = commonPrefix(inStr + in->level,
                                   prefixStr,
*************** spg_text_choose(PG_FUNCTION_ARGS)
*** 164,172 ****
          if (commonLen == prefixSize)
          {
              if (inSize - in->level > commonLen)
!                 nodeChar = *(uint8 *) (inStr + in->level + commonLen);
              else
!                 nodeChar = '\0';
          }
          else
          {
--- 194,202 ----
          if (commonLen == prefixSize)
          {
              if (inSize - in->level > commonLen)
!                 nodeChar = *(unsigned char *) (inStr + in->level + commonLen);
              else
!                 nodeChar = -1;
          }
          else
          {
*************** spg_text_choose(PG_FUNCTION_ARGS)
*** 184,190 ****
                      formTextDatum(prefixStr, commonLen);
              }
              out->result.splitTuple.nodeLabel =
!                 UInt8GetDatum(*(prefixStr + commonLen));

              if (prefixSize - commonLen == 1)
              {
--- 214,220 ----
                      formTextDatum(prefixStr, commonLen);
              }
              out->result.splitTuple.nodeLabel =
!                 Int16GetDatum(*(unsigned char *) (prefixStr + commonLen));

              if (prefixSize - commonLen == 1)
              {
*************** spg_text_choose(PG_FUNCTION_ARGS)
*** 203,213 ****
      }
      else if (inSize > in->level)
      {
!         nodeChar = *(uint8 *) (inStr + in->level);
      }
      else
      {
!         nodeChar = '\0';
      }

      /* Look up nodeChar in the node label array */
--- 233,243 ----
      }
      else if (inSize > in->level)
      {
!         nodeChar = *(unsigned char *) (inStr + in->level);
      }
      else
      {
!         nodeChar = -1;
      }

      /* Look up nodeChar in the node label array */
*************** spg_text_choose(PG_FUNCTION_ARGS)
*** 233,254 ****
      else if (in->allTheSame)
      {
          /*
!          * Can't use AddNode action, so split the tuple.  The upper tuple has
!          * the same prefix as before and uses an empty node label for the
!          * lower tuple.  The lower tuple has no prefix and the same node
!          * labels as the original tuple.
           */
          out->resultType = spgSplitTuple;
!         out->result.splitTuple.prefixHasPrefix = in->hasPrefix;
!         out->result.splitTuple.prefixPrefixDatum = in->prefixDatum;
!         out->result.splitTuple.nodeLabel = UInt8GetDatum('\0');
          out->result.splitTuple.postfixHasPrefix = false;
      }
      else
      {
          /* Add a node for the not-previously-seen nodeChar value */
          out->resultType = spgAddNode;
!         out->result.addNode.nodeLabel = UInt8GetDatum(nodeChar);
          out->result.addNode.nodeN = i;
      }

--- 263,303 ----
      else if (in->allTheSame)
      {
          /*
!          * Can't use AddNode action, so split the tuple.  There are two cases:
!          * If the existing tuple has a prefix, the new upper tuple has a
!          * prefix one byte shorter (possibly empty) and uses the last byte of
!          * the old prefix as node label for the lower tuple.  If the existing
!          * tuple has no prefix, the new upper tuple doesn't either, and it
!          * uses the dummy node label -2 for the lower tuple.  In either case,
!          * the new lower tuple has no prefix and the same node labels as the
!          * original tuple.
           */
          out->resultType = spgSplitTuple;
!         if (prefixSize > 0)
!         {
!             if (prefixSize > 1)
!             {
!                 out->result.splitTuple.prefixHasPrefix = true;
!                 out->result.splitTuple.prefixPrefixDatum =
!                     formTextDatum(prefixStr, prefixSize - 1);
!             }
!             else
!                 out->result.splitTuple.prefixHasPrefix = false;
!             out->result.splitTuple.nodeLabel =
!                 Int16GetDatum(*(unsigned char *) (prefixStr + prefixSize - 1));
!         }
!         else
!         {
!             out->result.splitTuple.prefixHasPrefix = false;
!             out->result.splitTuple.nodeLabel = Int16GetDatum(-2);
!         }
          out->result.splitTuple.postfixHasPrefix = false;
      }
      else
      {
          /* Add a node for the not-previously-seen nodeChar value */
          out->resultType = spgAddNode;
!         out->result.addNode.nodeLabel = Int16GetDatum(nodeChar);
          out->result.addNode.nodeN = i;
      }

*************** cmpNodePtr(const void *a, const void *b)
*** 262,273 ****
      const spgNodePtr *aa = (const spgNodePtr *) a;
      const spgNodePtr *bb = (const spgNodePtr *) b;

!     if (aa->c == bb->c)
!         return 0;
!     else if (aa->c > bb->c)
!         return 1;
!     else
!         return -1;
  }

  Datum
--- 311,317 ----
      const spgNodePtr *aa = (const spgNodePtr *) a;
      const spgNodePtr *bb = (const spgNodePtr *) b;

!     return aa->c - bb->c;
  }

  Datum
*************** spg_text_picksplit(PG_FUNCTION_ARGS)
*** 319,333 ****
          text       *texti = DatumGetTextPP(in->datums[i]);

          if (commonLen < VARSIZE_ANY_EXHDR(texti))
!             nodes[i].c = *(uint8 *) (VARDATA_ANY(texti) + commonLen);
          else
!             nodes[i].c = '\0';    /* use \0 if string is all common */
          nodes[i].i = i;
          nodes[i].d = in->datums[i];
      }

      /*
!      * Sort by label bytes so that we can group the values into nodes.  This
       * also ensures that the nodes are ordered by label value, allowing the
       * use of binary search in searchChar.
       */
--- 363,377 ----
          text       *texti = DatumGetTextPP(in->datums[i]);

          if (commonLen < VARSIZE_ANY_EXHDR(texti))
!             nodes[i].c = *(unsigned char *) (VARDATA_ANY(texti) + commonLen);
          else
!             nodes[i].c = -1;    /* use -1 if string is all common */
          nodes[i].i = i;
          nodes[i].d = in->datums[i];
      }

      /*
!      * Sort by label values so that we can group the values into nodes.  This
       * also ensures that the nodes are ordered by label value, allowing the
       * use of binary search in searchChar.
       */
*************** spg_text_picksplit(PG_FUNCTION_ARGS)
*** 346,352 ****

          if (i == 0 || nodes[i].c != nodes[i - 1].c)
          {
!             out->nodeLabels[out->nNodes] = UInt8GetDatum(nodes[i].c);
              out->nNodes++;
          }

--- 390,396 ----

          if (i == 0 || nodes[i].c != nodes[i - 1].c)
          {
!             out->nodeLabels[out->nNodes] = Int16GetDatum(nodes[i].c);
              out->nNodes++;
          }

*************** spg_text_inner_consistent(PG_FUNCTION_AR
*** 377,385 ****

      /*
       * Reconstruct values represented at this tuple, including parent data,
!      * prefix of this tuple if any, and the node label if any.  in->level
!      * should be the length of the previously reconstructed value, and the
!      * number of bytes added here is prefixSize or prefixSize + 1.
       *
       * Note: we assume that in->reconstructedValue isn't toasted and doesn't
       * have a short varlena header.  This is okay because it must have been
--- 421,429 ----

      /*
       * Reconstruct values represented at this tuple, including parent data,
!      * prefix of this tuple if any, and the node label if it's non-dummy.
!      * in->level should be the length of the previously reconstructed value,
!      * and the number of bytes added here is prefixSize or prefixSize + 1.
       *
       * Note: we assume that in->reconstructedValue isn't toasted and doesn't
       * have a short varlena header.  This is okay because it must have been
*************** spg_text_inner_consistent(PG_FUNCTION_AR
*** 422,438 ****

      for (i = 0; i < in->nNodes; i++)
      {
!         uint8        nodeChar = DatumGetUInt8(in->nodeLabels[i]);
          int            thisLen;
          bool        res = true;
          int            j;

!         /* If nodeChar is zero, don't include it in data */
!         if (nodeChar == '\0')
              thisLen = maxReconstrLen - 1;
          else
          {
!             ((char *) VARDATA(reconstrText))[maxReconstrLen - 1] = nodeChar;
              thisLen = maxReconstrLen;
          }

--- 466,482 ----

      for (i = 0; i < in->nNodes; i++)
      {
!         int16        nodeChar = DatumGetInt16(in->nodeLabels[i]);
          int            thisLen;
          bool        res = true;
          int            j;

!         /* If nodeChar is a dummy value, don't include it in data */
!         if (nodeChar <= 0)
              thisLen = maxReconstrLen - 1;
          else
          {
!             ((unsigned char *) VARDATA(reconstrText))[maxReconstrLen - 1] = nodeChar;
              thisLen = maxReconstrLen;
          }

*************** spg_text_inner_consistent(PG_FUNCTION_AR
*** 447,453 ****
               * If it's a collation-aware operator, but the collation is C, we
               * can treat it as non-collation-aware.  With non-C collation we
               * need to traverse whole tree :-( so there's no point in making
!              * any check here.
               */
              if (strategy > 10)
              {
--- 491,499 ----
               * If it's a collation-aware operator, but the collation is C, we
               * can treat it as non-collation-aware.  With non-C collation we
               * need to traverse whole tree :-( so there's no point in making
!              * any check here.  (Note also that our reconstructed value may
!              * well end with a partial multibyte character, so that applying
!              * any encoding-sensitive test to it would be risky anyhow.)
               */
              if (strategy > 10)
              {