Re: jsonb format is pessimal for toast compression - Mailing list pgsql-hackers

From Tom Lane
Subject Re: jsonb format is pessimal for toast compression
Date
Msg-id 11291.1409091093@sss.pgh.pa.us
Whole thread Raw
In response to Re: jsonb format is pessimal for toast compression  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: jsonb format is pessimal for toast compression
List pgsql-hackers
I wrote:
> I wish it were cache-friendly too, per the upthread tangent about having
> to fetch keys from all over the place within a large JSON object.

> ... and while I was typing that sentence, lightning struck.  The existing
> arrangement of object subfields with keys and values interleaved is just
> plain dumb.  We should rearrange that as all the keys in order, then all
> the values in the same order.  Then the keys are naturally adjacent in
> memory and object-key searches become much more cache-friendly: you
> probably touch most of the key portion of the object, but none of the
> values portion, until you know exactly what part of the latter to fetch.
> This approach might complicate the lookup logic marginally but I bet not
> very much; and it will be a huge help if we ever want to do smart access
> to EXTERNAL (non-compressed) JSON values.

> I will go prototype that just to see how much code rearrangement is
> required.

This looks pretty good from a coding point of view.  I have not had time
yet to see if it affects the speed of the benchmark cases we've been
trying.  I suspect that it won't make much difference in them.  I think
if we do decide to make an on-disk format change, we should seriously
consider including this change.

The same concept could be applied to offset-based storage of course,
although I rather doubt that we'd make that combination of choices since
it would be giving up on-disk compatibility for benefits that are mostly
in the future.

Attached are two patches: one is a "delta" against the last jsonb-lengths
patch I posted, and the other is a "merged" patch showing the total change
from HEAD, for ease of application.

            regards, tom lane

diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index e47eaea..4e7fe67 100644
*** a/src/backend/utils/adt/jsonb_util.c
--- b/src/backend/utils/adt/jsonb_util.c
***************
*** 26,33 ****
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (the total size of an array's elements is also limited by JENTRY_LENMASK,
!  * but we're not concerned about that here)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))
--- 26,33 ----
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (The total size of an array's or object's elements is also limited by
!  * JENTRY_LENMASK, but we're not concerned about that here.)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))
*************** findJsonbValueFromContainer(JsonbContain
*** 294,303 ****
  {
      JEntry       *children = container->children;
      int            count = (container->header & JB_CMASK);
!     JsonbValue *result = palloc(sizeof(JsonbValue));

      Assert((flags & ~(JB_FARRAY | JB_FOBJECT)) == 0);

      if (flags & JB_FARRAY & container->header)
      {
          char       *base_addr = (char *) (children + count);
--- 294,309 ----
  {
      JEntry       *children = container->children;
      int            count = (container->header & JB_CMASK);
!     JsonbValue *result;

      Assert((flags & ~(JB_FARRAY | JB_FOBJECT)) == 0);

+     /* Quick out without a palloc cycle if object/array is empty */
+     if (count <= 0)
+         return NULL;
+
+     result = palloc(sizeof(JsonbValue));
+
      if (flags & JB_FARRAY & container->header)
      {
          char       *base_addr = (char *) (children + count);
*************** findJsonbValueFromContainer(JsonbContain
*** 323,329 ****
          char       *base_addr = (char *) (children + count * 2);
          uint32       *offsets;
          uint32        lastoff;
!         int            lastoffpos;
          uint32        stopLow = 0,
                      stopHigh = count;

--- 329,335 ----
          char       *base_addr = (char *) (children + count * 2);
          uint32       *offsets;
          uint32        lastoff;
!         int            i;
          uint32        stopLow = 0,
                      stopHigh = count;

*************** findJsonbValueFromContainer(JsonbContain
*** 332,379 ****

          /*
           * We use a cache to avoid redundant getJsonbOffset() computations
!          * inside the search loop.  Note that count may well be zero at this
!          * point; to avoid an ugly special case for initializing lastoff and
!          * lastoffpos, we allocate one extra array element.
           */
!         offsets = (uint32 *) palloc((count * 2 + 1) * sizeof(uint32));
!         offsets[0] = lastoff = 0;
!         lastoffpos = 0;

          /* Binary search on object/pair keys *only* */
          while (stopLow < stopHigh)
          {
              uint32        stopMiddle;
-             int            index;
              int            difference;
              JsonbValue    candidate;

              stopMiddle = stopLow + (stopHigh - stopLow) / 2;

-             /*
-              * Compensate for the fact that we're searching through pairs (not
-              * entries).
-              */
-             index = stopMiddle * 2;
-
-             /* Update the offsets cache through at least index+1 */
-             while (lastoffpos <= index)
-             {
-                 lastoff += JBE_LEN(children, lastoffpos);
-                 offsets[++lastoffpos] = lastoff;
-             }
-
              candidate.type = jbvString;
!             candidate.val.string.val = base_addr + offsets[index];
!             candidate.val.string.len = JBE_LEN(children, index);

              difference = lengthCompareJsonbStringValue(&candidate, key);

              if (difference == 0)
              {
!                 /* Found our key, return value */
!                 fillJsonbValue(children, index + 1,
!                                base_addr, offsets[index + 1],
                                 result);

                  pfree(offsets);
--- 338,383 ----

          /*
           * We use a cache to avoid redundant getJsonbOffset() computations
!          * inside the search loop.  The entire cache can be filled immediately
!          * since we expect to need the last offset for value access.  (This
!          * choice could lose if the key is not present, but avoiding extra
!          * logic inside the search loop probably makes up for that.)
           */
!         offsets = (uint32 *) palloc(count * sizeof(uint32));
!         lastoff = 0;
!         for (i = 0; i < count; i++)
!         {
!             offsets[i] = lastoff;
!             lastoff += JBE_LEN(children, i);
!         }
!         /* lastoff now has the offset of the first value item */

          /* Binary search on object/pair keys *only* */
          while (stopLow < stopHigh)
          {
              uint32        stopMiddle;
              int            difference;
              JsonbValue    candidate;

              stopMiddle = stopLow + (stopHigh - stopLow) / 2;

              candidate.type = jbvString;
!             candidate.val.string.val = base_addr + offsets[stopMiddle];
!             candidate.val.string.len = JBE_LEN(children, stopMiddle);

              difference = lengthCompareJsonbStringValue(&candidate, key);

              if (difference == 0)
              {
!                 /* Found our key, return corresponding value */
!                 int            index = stopMiddle + count;
!
!                 /* navigate to appropriate offset */
!                 for (i = count; i < index; i++)
!                     lastoff += JBE_LEN(children, i);
!
!                 fillJsonbValue(children, index,
!                                base_addr, lastoff,
                                 result);

                  pfree(offsets);
*************** recurse:
*** 730,735 ****
--- 734,740 ----
              val->val.array.rawScalar = (*it)->isScalar;
              (*it)->curIndex = 0;
              (*it)->curDataOffset = 0;
+             (*it)->curValueOffset = 0;    /* not actually used */
              /* Set state for next call */
              (*it)->state = JBI_ARRAY_ELEM;
              return WJB_BEGIN_ARRAY;
*************** recurse:
*** 780,785 ****
--- 785,792 ----
               */
              (*it)->curIndex = 0;
              (*it)->curDataOffset = 0;
+             (*it)->curValueOffset = getJsonbOffset((*it)->children,
+                                                    (*it)->nElems);
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;
              return WJB_BEGIN_OBJECT;
*************** recurse:
*** 799,805 ****
              else
              {
                  /* Return key of a key/value pair.  */
!                 fillJsonbValue((*it)->children, (*it)->curIndex * 2,
                                 (*it)->dataProper, (*it)->curDataOffset,
                                 val);
                  if (val->type != jbvString)
--- 806,812 ----
              else
              {
                  /* Return key of a key/value pair.  */
!                 fillJsonbValue((*it)->children, (*it)->curIndex,
                                 (*it)->dataProper, (*it)->curDataOffset,
                                 val);
                  if (val->type != jbvString)
*************** recurse:
*** 814,828 ****
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;

!             (*it)->curDataOffset += JBE_LEN((*it)->children,
!                                             (*it)->curIndex * 2);
!
!             fillJsonbValue((*it)->children, (*it)->curIndex * 2 + 1,
!                            (*it)->dataProper, (*it)->curDataOffset,
                             val);

              (*it)->curDataOffset += JBE_LEN((*it)->children,
!                                             (*it)->curIndex * 2 + 1);
              (*it)->curIndex++;

              /*
--- 821,834 ----
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;

!             fillJsonbValue((*it)->children, (*it)->curIndex + (*it)->nElems,
!                            (*it)->dataProper, (*it)->curValueOffset,
                             val);

              (*it)->curDataOffset += JBE_LEN((*it)->children,
!                                             (*it)->curIndex);
!             (*it)->curValueOffset += JBE_LEN((*it)->children,
!                                              (*it)->curIndex + (*it)->nElems);
              (*it)->curIndex++;

              /*
*************** convertJsonbObject(StringInfo buffer, JE
*** 1509,1514 ****
--- 1515,1524 ----
      /* Reserve space for the JEntries of the keys and values. */
      metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.object.nPairs * 2);

+     /*
+      * Iterate over the keys, then over the values, since that is the ordering
+      * we want in the on-disk representation.
+      */
      totallen = 0;
      for (i = 0; i < val->val.object.nPairs; i++)
      {
*************** convertJsonbObject(StringInfo buffer, JE
*** 1529,1534 ****
--- 1539,1561 ----
          metaoffset += sizeof(JEntry);

          /*
+          * Bail out if total variable-length data exceeds what will fit in a
+          * JEntry length field.  We check this in each iteration, not just
+          * once at the end, to forestall possible integer overflow.
+          */
+         if (totallen > JENTRY_LENMASK)
+             ereport(ERROR,
+                     (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                      errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
+                             JENTRY_LENMASK)));
+     }
+     for (i = 0; i < val->val.object.nPairs; i++)
+     {
+         JsonbPair  *pair = &val->val.object.pairs[i];
+         int            len;
+         JEntry        meta;
+
+         /*
           * Convert value, producing a JEntry and appending its variable-length
           * data to buffer
           */
*************** convertJsonbObject(StringInfo buffer, JE
*** 1543,1551 ****
          /*
           * Bail out if total variable-length data exceeds what will fit in a
           * JEntry length field.  We check this in each iteration, not just
!          * once at the end, to forestall possible integer overflow.  But it
!          * should be sufficient to check once per iteration, since
!          * JENTRY_LENMASK is several bits narrower than int.
           */
          if (totallen > JENTRY_LENMASK)
              ereport(ERROR,
--- 1570,1576 ----
          /*
           * Bail out if total variable-length data exceeds what will fit in a
           * JEntry length field.  We check this in each iteration, not just
!          * once at the end, to forestall possible integer overflow.
           */
          if (totallen > JENTRY_LENMASK)
              ereport(ERROR,
diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index b9a4314..f9472af 100644
*** a/src/include/utils/jsonb.h
--- b/src/include/utils/jsonb.h
*************** typedef uint32 JEntry;
*** 149,156 ****
  /*
   * A jsonb array or object node, within a Jsonb Datum.
   *
!  * An array has one child for each element. An object has two children for
!  * each key/value pair.
   */
  typedef struct JsonbContainer
  {
--- 149,160 ----
  /*
   * A jsonb array or object node, within a Jsonb Datum.
   *
!  * An array has one child for each element, stored in array order.
!  *
!  * An object has two children for each key/value pair.  The keys all appear
!  * first, in key sort order; then the values appear, in an order matching the
!  * key order.  This arrangement keeps the keys compact in memory, making a
!  * search for a particular key more cache-friendly.
   */
  typedef struct JsonbContainer
  {
*************** typedef struct JsonbContainer
*** 162,169 ****
  } JsonbContainer;

  /* flags for the header-field in JsonbContainer */
! #define JB_CMASK                0x0FFFFFFF
! #define JB_FSCALAR                0x10000000
  #define JB_FOBJECT                0x20000000
  #define JB_FARRAY                0x40000000

--- 166,173 ----
  } JsonbContainer;

  /* flags for the header-field in JsonbContainer */
! #define JB_CMASK                0x0FFFFFFF        /* mask for count field */
! #define JB_FSCALAR                0x10000000        /* flag bits */
  #define JB_FOBJECT                0x20000000
  #define JB_FARRAY                0x40000000

*************** struct JsonbValue
*** 238,255 ****
                                       (jsonbval)->type <= jbvBool)

  /*
!  * Pair within an Object.
   *
!  * Pairs with duplicate keys are de-duplicated.  We store the order for the
!  * benefit of doing so in a well-defined way with respect to the original
!  * observed order (which is "last observed wins").  This is only used briefly
!  * when originally constructing a Jsonb.
   */
  struct JsonbPair
  {
      JsonbValue    key;            /* Must be a jbvString */
      JsonbValue    value;            /* May be of any type */
!     uint32        order;            /* preserves order of pairs with equal keys */
  };

  /* Conversion state used when parsing Jsonb from text, or for type coercion */
--- 242,261 ----
                                       (jsonbval)->type <= jbvBool)

  /*
!  * Key/value pair within an Object.
   *
!  * This struct type is only used briefly while constructing a Jsonb; it is
!  * *not* the on-disk representation.
!  *
!  * Pairs with duplicate keys are de-duplicated.  We store the originally
!  * observed pair ordering for the purpose of removing duplicates in a
!  * well-defined way (which is "last observed wins").
   */
  struct JsonbPair
  {
      JsonbValue    key;            /* Must be a jbvString */
      JsonbValue    value;            /* May be of any type */
!     uint32        order;            /* Pair's index in original sequence */
  };

  /* Conversion state used when parsing Jsonb from text, or for type coercion */
*************** typedef struct JsonbIterator
*** 284,295 ****
      /* Data proper.  This points just past end of children array */
      char       *dataProper;

!     /* Current item in buffer (up to nElems, but must * 2 for objects) */
      int            curIndex;

      /* Data offset corresponding to current item */
      uint32        curDataOffset;

      /* Private state */
      JsonbIterState state;

--- 290,308 ----
      /* Data proper.  This points just past end of children array */
      char       *dataProper;

!     /* Current item in buffer (up to nElems) */
      int            curIndex;

      /* Data offset corresponding to current item */
      uint32        curDataOffset;

+     /*
+      * If the container is an object, we want to return keys and values
+      * alternately; so curDataOffset points to the current key, and
+      * curValueOffset points to the current value.
+      */
+     uint32        curValueOffset;
+
      /* Private state */
      JsonbIterState state;

diff --git a/src/backend/utils/adt/jsonb.c b/src/backend/utils/adt/jsonb.c
index 2fd87fc..456011a 100644
*** a/src/backend/utils/adt/jsonb.c
--- b/src/backend/utils/adt/jsonb.c
*************** jsonb_from_cstring(char *json, int len)
*** 196,207 ****
  static size_t
  checkStringLen(size_t len)
  {
!     if (len > JENTRY_POSMASK)
          ereport(ERROR,
                  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                   errmsg("string too long to represent as jsonb string"),
                   errdetail("Due to an implementation restriction, jsonb strings cannot exceed %d bytes.",
!                            JENTRY_POSMASK)));

      return len;
  }
--- 196,207 ----
  static size_t
  checkStringLen(size_t len)
  {
!     if (len > JENTRY_LENMASK)
          ereport(ERROR,
                  (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                   errmsg("string too long to represent as jsonb string"),
                   errdetail("Due to an implementation restriction, jsonb strings cannot exceed %d bytes.",
!                            JENTRY_LENMASK)));

      return len;
  }
diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 04f35bf..4e7fe67 100644
*** a/src/backend/utils/adt/jsonb_util.c
--- b/src/backend/utils/adt/jsonb_util.c
***************
*** 26,40 ****
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (the total size of an array's elements is also limited by JENTRY_POSMASK,
!  * but we're not concerned about that here)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))

! static void fillJsonbValue(JEntry *array, int index, char *base_addr,
                 JsonbValue *result);
! static bool    equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static int    compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static Jsonb *convertToJsonb(JsonbValue *val);
  static void convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
--- 26,41 ----
   * in MaxAllocSize, and the number of elements (or pairs) must fit in the bits
   * reserved for that in the JsonbContainer.header field.
   *
!  * (The total size of an array's or object's elements is also limited by
!  * JENTRY_LENMASK, but we're not concerned about that here.)
   */
  #define JSONB_MAX_ELEMS (Min(MaxAllocSize / sizeof(JsonbValue), JB_CMASK))
  #define JSONB_MAX_PAIRS (Min(MaxAllocSize / sizeof(JsonbPair), JB_CMASK))

! static void fillJsonbValue(JEntry *children, int index,
!                char *base_addr, uint32 offset,
                 JsonbValue *result);
! static bool equalsJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static int    compareJsonbScalarValue(JsonbValue *a, JsonbValue *b);
  static Jsonb *convertToJsonb(JsonbValue *val);
  static void convertJsonbValue(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
*************** static void convertJsonbArray(StringInfo
*** 42,48 ****
  static void convertJsonbObject(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
  static void convertJsonbScalar(StringInfo buffer, JEntry *header, JsonbValue *scalarVal);

! static int reserveFromBuffer(StringInfo buffer, int len);
  static void appendToBuffer(StringInfo buffer, const char *data, int len);
  static void copyToBuffer(StringInfo buffer, int offset, const char *data, int len);
  static short padBufferToInt(StringInfo buffer);
--- 43,49 ----
  static void convertJsonbObject(StringInfo buffer, JEntry *header, JsonbValue *val, int level);
  static void convertJsonbScalar(StringInfo buffer, JEntry *header, JsonbValue *scalarVal);

! static int    reserveFromBuffer(StringInfo buffer, int len);
  static void appendToBuffer(StringInfo buffer, const char *data, int len);
  static void copyToBuffer(StringInfo buffer, int offset, const char *data, int len);
  static short padBufferToInt(StringInfo buffer);
*************** JsonbValueToJsonb(JsonbValue *val)
*** 108,113 ****
--- 109,135 ----
  }

  /*
+  * Get the offset of the variable-length portion of a Jsonb node within
+  * the variable-length-data part of its container.  The node is identified
+  * by index within the container's JEntry array.
+  *
+  * We do this by adding up the lengths of all the previous nodes'
+  * variable-length portions.  It's best to avoid using this function when
+  * iterating through all the nodes in a container, since that would result
+  * in O(N^2) work.
+  */
+ uint32
+ getJsonbOffset(const JEntry *ja, int index)
+ {
+     uint32        off = 0;
+     int            i;
+
+     for (i = 0; i < index; i++)
+         off += JBE_LEN(ja, i);
+     return off;
+ }
+
+ /*
   * BT comparator worker function.  Returns an integer less than, equal to, or
   * greater than zero, indicating whether a is less than, equal to, or greater
   * than b.  Consistent with the requirements for a B-Tree operator class
*************** compareJsonbContainers(JsonbContainer *a
*** 201,207 ****
               *
               * If the two values were of the same container type, then there'd
               * have been a chance to observe the variation in the number of
!              * elements/pairs (when processing WJB_BEGIN_OBJECT, say).  They're
               * either two heterogeneously-typed containers, or a container and
               * some scalar type.
               *
--- 223,229 ----
               *
               * If the two values were of the same container type, then there'd
               * have been a chance to observe the variation in the number of
!              * elements/pairs (when processing WJB_BEGIN_OBJECT, say). They're
               * either two heterogeneously-typed containers, or a container and
               * some scalar type.
               *
*************** findJsonbValueFromContainer(JsonbContain
*** 272,333 ****
  {
      JEntry       *children = container->children;
      int            count = (container->header & JB_CMASK);
!     JsonbValue *result = palloc(sizeof(JsonbValue));

      Assert((flags & ~(JB_FARRAY | JB_FOBJECT)) == 0);

      if (flags & JB_FARRAY & container->header)
      {
          char       *base_addr = (char *) (children + count);
          int            i;

          for (i = 0; i < count; i++)
          {
!             fillJsonbValue(children, i, base_addr, result);

              if (key->type == result->type)
              {
                  if (equalsJsonbScalarValue(key, result))
                      return result;
              }
          }
      }
      else if (flags & JB_FOBJECT & container->header)
      {
          /* Since this is an object, account for *Pairs* of Jentrys */
          char       *base_addr = (char *) (children + count * 2);
          uint32        stopLow = 0,
!                     stopMiddle;

!         /* Object key past by caller must be a string */
          Assert(key->type == jbvString);

          /* Binary search on object/pair keys *only* */
!         while (stopLow < count)
          {
!             int            index;
              int            difference;
              JsonbValue    candidate;

!             /*
!              * Note how we compensate for the fact that we're iterating
!              * through pairs (not entries) throughout.
!              */
!             stopMiddle = stopLow + (count - stopLow) / 2;
!
!             index = stopMiddle * 2;

              candidate.type = jbvString;
!             candidate.val.string.val = base_addr + JBE_OFF(children, index);
!             candidate.val.string.len = JBE_LEN(children, index);

              difference = lengthCompareJsonbStringValue(&candidate, key);

              if (difference == 0)
              {
!                 /* Found our key, return value */
!                 fillJsonbValue(children, index + 1, base_addr, result);

                  return result;
              }
              else
--- 294,386 ----
  {
      JEntry       *children = container->children;
      int            count = (container->header & JB_CMASK);
!     JsonbValue *result;

      Assert((flags & ~(JB_FARRAY | JB_FOBJECT)) == 0);

+     /* Quick out without a palloc cycle if object/array is empty */
+     if (count <= 0)
+         return NULL;
+
+     result = palloc(sizeof(JsonbValue));
+
      if (flags & JB_FARRAY & container->header)
      {
          char       *base_addr = (char *) (children + count);
+         uint32        offset = 0;
          int            i;

          for (i = 0; i < count; i++)
          {
!             fillJsonbValue(children, i, base_addr, offset, result);

              if (key->type == result->type)
              {
                  if (equalsJsonbScalarValue(key, result))
                      return result;
              }
+
+             offset += JBE_LEN(children, i);
          }
      }
      else if (flags & JB_FOBJECT & container->header)
      {
          /* Since this is an object, account for *Pairs* of Jentrys */
          char       *base_addr = (char *) (children + count * 2);
+         uint32       *offsets;
+         uint32        lastoff;
+         int            i;
          uint32        stopLow = 0,
!                     stopHigh = count;

!         /* Object key passed by caller must be a string */
          Assert(key->type == jbvString);

+         /*
+          * We use a cache to avoid redundant getJsonbOffset() computations
+          * inside the search loop.  The entire cache can be filled immediately
+          * since we expect to need the last offset for value access.  (This
+          * choice could lose if the key is not present, but avoiding extra
+          * logic inside the search loop probably makes up for that.)
+          */
+         offsets = (uint32 *) palloc(count * sizeof(uint32));
+         lastoff = 0;
+         for (i = 0; i < count; i++)
+         {
+             offsets[i] = lastoff;
+             lastoff += JBE_LEN(children, i);
+         }
+         /* lastoff now has the offset of the first value item */
+
          /* Binary search on object/pair keys *only* */
!         while (stopLow < stopHigh)
          {
!             uint32        stopMiddle;
              int            difference;
              JsonbValue    candidate;

!             stopMiddle = stopLow + (stopHigh - stopLow) / 2;

              candidate.type = jbvString;
!             candidate.val.string.val = base_addr + offsets[stopMiddle];
!             candidate.val.string.len = JBE_LEN(children, stopMiddle);

              difference = lengthCompareJsonbStringValue(&candidate, key);

              if (difference == 0)
              {
!                 /* Found our key, return corresponding value */
!                 int            index = stopMiddle + count;
!
!                 /* navigate to appropriate offset */
!                 for (i = count; i < index; i++)
!                     lastoff += JBE_LEN(children, i);
!
!                 fillJsonbValue(children, index,
!                                base_addr, lastoff,
!                                result);

+                 pfree(offsets);
                  return result;
              }
              else
*************** findJsonbValueFromContainer(JsonbContain
*** 335,343 ****
                  if (difference < 0)
                      stopLow = stopMiddle + 1;
                  else
!                     count = stopMiddle;
              }
          }
      }

      /* Not found */
--- 388,398 ----
                  if (difference < 0)
                      stopLow = stopMiddle + 1;
                  else
!                     stopHigh = stopMiddle;
              }
          }
+
+         pfree(offsets);
      }

      /* Not found */
*************** getIthJsonbValueFromContainer(JsonbConta
*** 368,374 ****

      result = palloc(sizeof(JsonbValue));

!     fillJsonbValue(container->children, i, base_addr, result);

      return result;
  }
--- 423,431 ----

      result = palloc(sizeof(JsonbValue));

!     fillJsonbValue(container->children, i, base_addr,
!                    getJsonbOffset(container->children, i),
!                    result);

      return result;
  }
*************** getIthJsonbValueFromContainer(JsonbConta
*** 377,387 ****
   * A helper function to fill in a JsonbValue to represent an element of an
   * array, or a key or value of an object.
   *
   * A nested array or object will be returned as jbvBinary, ie. it won't be
   * expanded.
   */
  static void
! fillJsonbValue(JEntry *children, int index, char *base_addr, JsonbValue *result)
  {
      JEntry        entry = children[index];

--- 434,450 ----
   * A helper function to fill in a JsonbValue to represent an element of an
   * array, or a key or value of an object.
   *
+  * The node's JEntry is at children[index], and its variable-length data
+  * is at base_addr + offset.  We make the caller determine the offset since
+  * in many cases the caller can amortize the work across multiple children.
+  *
   * A nested array or object will be returned as jbvBinary, ie. it won't be
   * expanded.
   */
  static void
! fillJsonbValue(JEntry *children, int index,
!                char *base_addr, uint32 offset,
!                JsonbValue *result)
  {
      JEntry        entry = children[index];

*************** fillJsonbValue(JEntry *children, int ind
*** 392,405 ****
      else if (JBE_ISSTRING(entry))
      {
          result->type = jbvString;
!         result->val.string.val = base_addr + JBE_OFF(children, index);
!         result->val.string.len = JBE_LEN(children, index);
          Assert(result->val.string.len >= 0);
      }
      else if (JBE_ISNUMERIC(entry))
      {
          result->type = jbvNumeric;
!         result->val.numeric = (Numeric) (base_addr + INTALIGN(JBE_OFF(children, index)));
      }
      else if (JBE_ISBOOL_TRUE(entry))
      {
--- 455,468 ----
      else if (JBE_ISSTRING(entry))
      {
          result->type = jbvString;
!         result->val.string.val = base_addr + offset;
!         result->val.string.len = JBE_LENFLD(entry);
          Assert(result->val.string.len >= 0);
      }
      else if (JBE_ISNUMERIC(entry))
      {
          result->type = jbvNumeric;
!         result->val.numeric = (Numeric) (base_addr + INTALIGN(offset));
      }
      else if (JBE_ISBOOL_TRUE(entry))
      {
*************** fillJsonbValue(JEntry *children, int ind
*** 415,422 ****
      {
          Assert(JBE_ISCONTAINER(entry));
          result->type = jbvBinary;
!         result->val.binary.data = (JsonbContainer *) (base_addr + INTALIGN(JBE_OFF(children, index)));
!         result->val.binary.len = JBE_LEN(children, index) - (INTALIGN(JBE_OFF(children, index)) - JBE_OFF(children,
index));
      }
  }

--- 478,486 ----
      {
          Assert(JBE_ISCONTAINER(entry));
          result->type = jbvBinary;
!         /* Remove alignment padding from data pointer and len */
!         result->val.binary.data = (JsonbContainer *) (base_addr + INTALIGN(offset));
!         result->val.binary.len = JBE_LENFLD(entry) - (INTALIGN(offset) - offset);
      }
  }

*************** recurse:
*** 668,680 ****
               * a full conversion
               */
              val->val.array.rawScalar = (*it)->isScalar;
!             (*it)->i = 0;
              /* Set state for next call */
              (*it)->state = JBI_ARRAY_ELEM;
              return WJB_BEGIN_ARRAY;

          case JBI_ARRAY_ELEM:
!             if ((*it)->i >= (*it)->nElems)
              {
                  /*
                   * All elements within array already processed.  Report this
--- 732,746 ----
               * a full conversion
               */
              val->val.array.rawScalar = (*it)->isScalar;
!             (*it)->curIndex = 0;
!             (*it)->curDataOffset = 0;
!             (*it)->curValueOffset = 0;    /* not actually used */
              /* Set state for next call */
              (*it)->state = JBI_ARRAY_ELEM;
              return WJB_BEGIN_ARRAY;

          case JBI_ARRAY_ELEM:
!             if ((*it)->curIndex >= (*it)->nElems)
              {
                  /*
                   * All elements within array already processed.  Report this
*************** recurse:
*** 686,692 ****
                  return WJB_END_ARRAY;
              }

!             fillJsonbValue((*it)->children, (*it)->i++, (*it)->dataProper, val);

              if (!IsAJsonbScalar(val) && !skipNested)
              {
--- 752,763 ----
                  return WJB_END_ARRAY;
              }

!             fillJsonbValue((*it)->children, (*it)->curIndex,
!                            (*it)->dataProper, (*it)->curDataOffset,
!                            val);
!
!             (*it)->curDataOffset += JBE_LEN((*it)->children, (*it)->curIndex);
!             (*it)->curIndex++;

              if (!IsAJsonbScalar(val) && !skipNested)
              {
*************** recurse:
*** 697,704 ****
              else
              {
                  /*
!                  * Scalar item in array, or a container and caller didn't
!                  * want us to recurse into it.
                   */
                  return WJB_ELEM;
              }
--- 768,775 ----
              else
              {
                  /*
!                  * Scalar item in array, or a container and caller didn't want
!                  * us to recurse into it.
                   */
                  return WJB_ELEM;
              }
*************** recurse:
*** 712,724 ****
               * v->val.object.pairs is not actually set, because we aren't
               * doing a full conversion
               */
!             (*it)->i = 0;
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;
              return WJB_BEGIN_OBJECT;

          case JBI_OBJECT_KEY:
!             if ((*it)->i >= (*it)->nElems)
              {
                  /*
                   * All pairs within object already processed.  Report this to
--- 783,798 ----
               * v->val.object.pairs is not actually set, because we aren't
               * doing a full conversion
               */
!             (*it)->curIndex = 0;
!             (*it)->curDataOffset = 0;
!             (*it)->curValueOffset = getJsonbOffset((*it)->children,
!                                                    (*it)->nElems);
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;
              return WJB_BEGIN_OBJECT;

          case JBI_OBJECT_KEY:
!             if ((*it)->curIndex >= (*it)->nElems)
              {
                  /*
                   * All pairs within object already processed.  Report this to
*************** recurse:
*** 732,738 ****
              else
              {
                  /* Return key of a key/value pair.  */
!                 fillJsonbValue((*it)->children, (*it)->i * 2, (*it)->dataProper, val);
                  if (val->type != jbvString)
                      elog(ERROR, "unexpected jsonb type as object key");

--- 806,814 ----
              else
              {
                  /* Return key of a key/value pair.  */
!                 fillJsonbValue((*it)->children, (*it)->curIndex,
!                                (*it)->dataProper, (*it)->curDataOffset,
!                                val);
                  if (val->type != jbvString)
                      elog(ERROR, "unexpected jsonb type as object key");

*************** recurse:
*** 745,752 ****
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;

!             fillJsonbValue((*it)->children, ((*it)->i++) * 2 + 1,
!                            (*it)->dataProper, val);

              /*
               * Value may be a container, in which case we recurse with new,
--- 821,835 ----
              /* Set state for next call */
              (*it)->state = JBI_OBJECT_KEY;

!             fillJsonbValue((*it)->children, (*it)->curIndex + (*it)->nElems,
!                            (*it)->dataProper, (*it)->curValueOffset,
!                            val);
!
!             (*it)->curDataOffset += JBE_LEN((*it)->children,
!                                             (*it)->curIndex);
!             (*it)->curValueOffset += JBE_LEN((*it)->children,
!                                              (*it)->curIndex + (*it)->nElems);
!             (*it)->curIndex++;

              /*
               * Value may be a container, in which case we recurse with new,
*************** reserveFromBuffer(StringInfo buffer, int
*** 1209,1216 ****
      buffer->len += len;

      /*
!      * Keep a trailing null in place, even though it's not useful for us;
!      * it seems best to preserve the invariants of StringInfos.
       */
      buffer->data[buffer->len] = '\0';

--- 1292,1299 ----
      buffer->len += len;

      /*
!      * Keep a trailing null in place, even though it's not useful for us; it
!      * seems best to preserve the invariants of StringInfos.
       */
      buffer->data[buffer->len] = '\0';

*************** convertToJsonb(JsonbValue *val)
*** 1284,1291 ****

      /*
       * Note: the JEntry of the root is discarded. Therefore the root
!      * JsonbContainer struct must contain enough information to tell what
!      * kind of value it is.
       */

      res = (Jsonb *) buffer.data;
--- 1367,1374 ----

      /*
       * Note: the JEntry of the root is discarded. Therefore the root
!      * JsonbContainer struct must contain enough information to tell what kind
!      * of value it is.
       */

      res = (Jsonb *) buffer.data;
*************** convertJsonbValue(StringInfo buffer, JEn
*** 1315,1324 ****
          return;

      /*
!      * A JsonbValue passed as val should never have a type of jbvBinary,
!      * and neither should any of its sub-components. Those values will be
!      * produced by convertJsonbArray and convertJsonbObject, the results of
!      * which will not be passed back to this function as an argument.
       */

      if (IsAJsonbScalar(val))
--- 1398,1407 ----
          return;

      /*
!      * A JsonbValue passed as val should never have a type of jbvBinary, and
!      * neither should any of its sub-components. Those values will be produced
!      * by convertJsonbArray and convertJsonbObject, the results of which will
!      * not be passed back to this function as an argument.
       */

      if (IsAJsonbScalar(val))
*************** convertJsonbArray(StringInfo buffer, JEn
*** 1340,1353 ****
      int            totallen;
      uint32        header;

!     /* Initialize pointer into conversion buffer at this level */
      offset = buffer->len;

      padBufferToInt(buffer);

      /*
!      * Construct the header Jentry, stored in the beginning of the variable-
!      * length payload.
       */
      header = val->val.array.nElems | JB_FARRAY;
      if (val->val.array.rawScalar)
--- 1423,1437 ----
      int            totallen;
      uint32        header;

!     /* Remember where variable-length data starts for this array */
      offset = buffer->len;

+     /* Align to 4-byte boundary (any padding counts as part of my data) */
      padBufferToInt(buffer);

      /*
!      * Construct the header Jentry and store it in the beginning of the
!      * variable-length payload.
       */
      header = val->val.array.nElems | JB_FARRAY;
      if (val->val.array.rawScalar)
*************** convertJsonbArray(StringInfo buffer, JEn
*** 1358,1364 ****
      }

      appendToBuffer(buffer, (char *) &header, sizeof(uint32));
!     /* reserve space for the JEntries of the elements. */
      metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.array.nElems);

      totallen = 0;
--- 1442,1449 ----
      }

      appendToBuffer(buffer, (char *) &header, sizeof(uint32));
!
!     /* Reserve space for the JEntries of the elements. */
      metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.array.nElems);

      totallen = 0;
*************** convertJsonbArray(StringInfo buffer, JEn
*** 1368,1391 ****
          int            len;
          JEntry        meta;

          convertJsonbValue(buffer, &meta, elem, level + 1);
-         len = meta & JENTRY_POSMASK;
-         totallen += len;

!         if (totallen > JENTRY_POSMASK)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
!                             JENTRY_POSMASK)));

-         if (i > 0)
-             meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }

      totallen = buffer->len - offset;

      /* Initialize the header of this node, in the container's JEntry array */
      *pheader = JENTRY_ISCONTAINER | totallen;
  }
--- 1453,1491 ----
          int            len;
          JEntry        meta;

+         /*
+          * Convert element, producing a JEntry and appending its
+          * variable-length data to buffer
+          */
          convertJsonbValue(buffer, &meta, elem, level + 1);

!         /*
!          * Bail out if total variable-length data exceeds what will fit in a
!          * JEntry length field.  We check this in each iteration, not just
!          * once at the end, to forestall possible integer overflow.
!          */
!         len = JBE_LENFLD(meta);
!         totallen += len;
!         if (totallen > JENTRY_LENMASK)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
                       errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
!                             JENTRY_LENMASK)));

          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }

+     /* Total data size is everything we've appended to buffer */
      totallen = buffer->len - offset;

+     /* Check length again, since we didn't include the metadata above */
+     if (totallen > JENTRY_LENMASK)
+         ereport(ERROR,
+                 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                  errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
+                         JENTRY_LENMASK)));
+
      /* Initialize the header of this node, in the container's JEntry array */
      *pheader = JENTRY_ISCONTAINER | totallen;
  }
*************** convertJsonbArray(StringInfo buffer, JEn
*** 1393,1457 ****
  static void
  convertJsonbObject(StringInfo buffer, JEntry *pheader, JsonbValue *val, int level)
  {
-     uint32        header;
      int            offset;
      int            metaoffset;
      int            i;
      int            totallen;

!     /* Initialize pointer into conversion buffer at this level */
      offset = buffer->len;

      padBufferToInt(buffer);

!     /* Initialize header */
      header = val->val.object.nPairs | JB_FOBJECT;
      appendToBuffer(buffer, (char *) &header, sizeof(uint32));

!     /* reserve space for the JEntries of the keys and values */
      metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.object.nPairs * 2);

      totallen = 0;
      for (i = 0; i < val->val.object.nPairs; i++)
      {
!         JsonbPair *pair = &val->val.object.pairs[i];
!         int len;
!         JEntry meta;

!         /* put key */
          convertJsonbScalar(buffer, &meta, &pair->key);

!         len = meta & JENTRY_POSMASK;
          totallen += len;

-         if (totallen > JENTRY_POSMASK)
-             ereport(ERROR,
-                     (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
-                      errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
-                             JENTRY_POSMASK)));
-
-         if (i > 0)
-             meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);

!         convertJsonbValue(buffer, &meta, &pair->value, level);
!         len = meta & JENTRY_POSMASK;
!         totallen += len;
!
!         if (totallen > JENTRY_POSMASK)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
!                      errmsg("total size of jsonb array elements exceeds the maximum of %u bytes",
!                             JENTRY_POSMASK)));

-         meta = (meta & ~JENTRY_POSMASK) | totallen;
          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
      }

      totallen = buffer->len - offset;

      *pheader = JENTRY_ISCONTAINER | totallen;
  }

--- 1493,1595 ----
  static void
  convertJsonbObject(StringInfo buffer, JEntry *pheader, JsonbValue *val, int level)
  {
      int            offset;
      int            metaoffset;
      int            i;
      int            totallen;
+     uint32        header;

!     /* Remember where variable-length data starts for this object */
      offset = buffer->len;

+     /* Align to 4-byte boundary (any padding counts as part of my data) */
      padBufferToInt(buffer);

!     /*
!      * Construct the header Jentry and store it in the beginning of the
!      * variable-length payload.
!      */
      header = val->val.object.nPairs | JB_FOBJECT;
      appendToBuffer(buffer, (char *) &header, sizeof(uint32));

!     /* Reserve space for the JEntries of the keys and values. */
      metaoffset = reserveFromBuffer(buffer, sizeof(JEntry) * val->val.object.nPairs * 2);

+     /*
+      * Iterate over the keys, then over the values, since that is the ordering
+      * we want in the on-disk representation.
+      */
      totallen = 0;
      for (i = 0; i < val->val.object.nPairs; i++)
      {
!         JsonbPair  *pair = &val->val.object.pairs[i];
!         int            len;
!         JEntry        meta;

!         /*
!          * Convert key, producing a JEntry and appending its variable-length
!          * data to buffer
!          */
          convertJsonbScalar(buffer, &meta, &pair->key);

!         len = JBE_LENFLD(meta);
          totallen += len;

          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);

!         /*
!          * Bail out if total variable-length data exceeds what will fit in a
!          * JEntry length field.  We check this in each iteration, not just
!          * once at the end, to forestall possible integer overflow.
!          */
!         if (totallen > JENTRY_LENMASK)
              ereport(ERROR,
                      (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
!                      errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
!                             JENTRY_LENMASK)));
!     }
!     for (i = 0; i < val->val.object.nPairs; i++)
!     {
!         JsonbPair  *pair = &val->val.object.pairs[i];
!         int            len;
!         JEntry        meta;
!
!         /*
!          * Convert value, producing a JEntry and appending its variable-length
!          * data to buffer
!          */
!         convertJsonbValue(buffer, &meta, &pair->value, level + 1);
!
!         len = JBE_LENFLD(meta);
!         totallen += len;

          copyToBuffer(buffer, metaoffset, (char *) &meta, sizeof(JEntry));
          metaoffset += sizeof(JEntry);
+
+         /*
+          * Bail out if total variable-length data exceeds what will fit in a
+          * JEntry length field.  We check this in each iteration, not just
+          * once at the end, to forestall possible integer overflow.
+          */
+         if (totallen > JENTRY_LENMASK)
+             ereport(ERROR,
+                     (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                      errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
+                             JENTRY_LENMASK)));
      }

+     /* Total data size is everything we've appended to buffer */
      totallen = buffer->len - offset;

+     /* Check length again, since we didn't include the metadata above */
+     if (totallen > JENTRY_LENMASK)
+         ereport(ERROR,
+                 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
+                  errmsg("total size of jsonb object elements exceeds the maximum of %u bytes",
+                         JENTRY_LENMASK)));
+
+     /* Initialize the header of this node, in the container's JEntry array */
      *pheader = JENTRY_ISCONTAINER | totallen;
  }

diff --git a/src/include/utils/jsonb.h b/src/include/utils/jsonb.h
index 91e3e14..f9472af 100644
*** a/src/include/utils/jsonb.h
--- b/src/include/utils/jsonb.h
*************** typedef struct JsonbValue JsonbValue;
*** 83,91 ****
   * buffer is accessed, but they can also be deep copied and passed around.
   *
   * Jsonb is a tree structure. Each node in the tree consists of a JEntry
!  * header, and a variable-length content.  The JEntry header indicates what
!  * kind of a node it is, e.g. a string or an array, and the offset and length
!  * of its variable-length portion within the container.
   *
   * The JEntry and the content of a node are not stored physically together.
   * Instead, the container array or object has an array that holds the JEntrys
--- 83,91 ----
   * buffer is accessed, but they can also be deep copied and passed around.
   *
   * Jsonb is a tree structure. Each node in the tree consists of a JEntry
!  * header and a variable-length content (possibly of zero size).  The JEntry
!  * header indicates what kind of a node it is, e.g. a string or an array,
!  * and includes the length of its variable-length portion.
   *
   * The JEntry and the content of a node are not stored physically together.
   * Instead, the container array or object has an array that holds the JEntrys
*************** typedef struct JsonbValue JsonbValue;
*** 95,133 ****
   * hold its JEntry. Hence, no JEntry header is stored for the root node.  It
   * is implicitly known that the root node must be an array or an object,
   * so we can get away without the type indicator as long as we can distinguish
!  * the two.  For that purpose, both an array and an object begins with a uint32
   * header field, which contains an JB_FOBJECT or JB_FARRAY flag.  When a naked
   * scalar value needs to be stored as a Jsonb value, what we actually store is
   * an array with one element, with the flags in the array's header field set
   * to JB_FSCALAR | JB_FARRAY.
   *
-  * To encode the length and offset of the variable-length portion of each
-  * node in a compact way, the JEntry stores only the end offset within the
-  * variable-length portion of the container node. For the first JEntry in the
-  * container's JEntry array, that equals to the length of the node data.  The
-  * begin offset and length of the rest of the entries can be calculated using
-  * the end offset of the previous JEntry in the array.
-  *
   * Overall, the Jsonb struct requires 4-bytes alignment. Within the struct,
   * the variable-length portion of some node types is aligned to a 4-byte
   * boundary, while others are not. When alignment is needed, the padding is
   * in the beginning of the node that requires it. For example, if a numeric
   * node is stored after a string node, so that the numeric node begins at
   * offset 3, the variable-length portion of the numeric node will begin with
!  * one padding byte.
   */

  /*
   * Jentry format.
   *
!  * The least significant 28 bits store the end offset of the entry (see
!  * JBE_ENDPOS, JBE_OFF, JBE_LEN macros below). The next three bits
!  * are used to store the type of the entry. The most significant bit
!  * is unused, and should be set to zero.
   */
  typedef uint32 JEntry;

! #define JENTRY_POSMASK            0x0FFFFFFF
  #define JENTRY_TYPEMASK            0x70000000

  /* values stored in the type bits */
--- 95,126 ----
   * hold its JEntry. Hence, no JEntry header is stored for the root node.  It
   * is implicitly known that the root node must be an array or an object,
   * so we can get away without the type indicator as long as we can distinguish
!  * the two.  For that purpose, both an array and an object begin with a uint32
   * header field, which contains an JB_FOBJECT or JB_FARRAY flag.  When a naked
   * scalar value needs to be stored as a Jsonb value, what we actually store is
   * an array with one element, with the flags in the array's header field set
   * to JB_FSCALAR | JB_FARRAY.
   *
   * Overall, the Jsonb struct requires 4-bytes alignment. Within the struct,
   * the variable-length portion of some node types is aligned to a 4-byte
   * boundary, while others are not. When alignment is needed, the padding is
   * in the beginning of the node that requires it. For example, if a numeric
   * node is stored after a string node, so that the numeric node begins at
   * offset 3, the variable-length portion of the numeric node will begin with
!  * one padding byte so that the actual numeric data is 4-byte aligned.
   */

  /*
   * Jentry format.
   *
!  * The least significant 28 bits store the data length of the entry (see
!  * JBE_LENFLD and JBE_LEN macros below). The next three bits store the type
!  * of the entry. The most significant bit is reserved for future use, and
!  * should be set to zero.
   */
  typedef uint32 JEntry;

! #define JENTRY_LENMASK            0x0FFFFFFF
  #define JENTRY_TYPEMASK            0x70000000

  /* values stored in the type bits */
*************** typedef uint32 JEntry;
*** 148,166 ****
  #define JBE_ISBOOL(je_)            (JBE_ISBOOL_TRUE(je_) || JBE_ISBOOL_FALSE(je_))

  /*
!  * Macros for getting the offset and length of an element. Note multiple
!  * evaluations and access to prior array element.
   */
! #define JBE_ENDPOS(je_)            ((je_) & JENTRY_POSMASK)
! #define JBE_OFF(ja, i)            ((i) == 0 ? 0 : JBE_ENDPOS((ja)[i - 1]))
! #define JBE_LEN(ja, i)            ((i) == 0 ? JBE_ENDPOS((ja)[i]) \
!                                  : JBE_ENDPOS((ja)[i]) - JBE_ENDPOS((ja)[i - 1]))

  /*
   * A jsonb array or object node, within a Jsonb Datum.
   *
!  * An array has one child for each element. An object has two children for
!  * each key/value pair.
   */
  typedef struct JsonbContainer
  {
--- 141,160 ----
  #define JBE_ISBOOL(je_)            (JBE_ISBOOL_TRUE(je_) || JBE_ISBOOL_FALSE(je_))

  /*
!  * Macros for getting the data length of a JEntry.
   */
! #define JBE_LENFLD(je_)            ((je_) & JENTRY_LENMASK)
! #define JBE_LEN(ja, i)            JBE_LENFLD((ja)[i])

  /*
   * A jsonb array or object node, within a Jsonb Datum.
   *
!  * An array has one child for each element, stored in array order.
!  *
!  * An object has two children for each key/value pair.  The keys all appear
!  * first, in key sort order; then the values appear, in an order matching the
!  * key order.  This arrangement keeps the keys compact in memory, making a
!  * search for a particular key more cache-friendly.
   */
  typedef struct JsonbContainer
  {
*************** typedef struct JsonbContainer
*** 172,179 ****
  } JsonbContainer;

  /* flags for the header-field in JsonbContainer */
! #define JB_CMASK                0x0FFFFFFF
! #define JB_FSCALAR                0x10000000
  #define JB_FOBJECT                0x20000000
  #define JB_FARRAY                0x40000000

--- 166,173 ----
  } JsonbContainer;

  /* flags for the header-field in JsonbContainer */
! #define JB_CMASK                0x0FFFFFFF        /* mask for count field */
! #define JB_FSCALAR                0x10000000        /* flag bits */
  #define JB_FOBJECT                0x20000000
  #define JB_FARRAY                0x40000000

*************** struct JsonbValue
*** 248,265 ****
                                       (jsonbval)->type <= jbvBool)

  /*
!  * Pair within an Object.
   *
!  * Pairs with duplicate keys are de-duplicated.  We store the order for the
!  * benefit of doing so in a well-defined way with respect to the original
!  * observed order (which is "last observed wins").  This is only used briefly
!  * when originally constructing a Jsonb.
   */
  struct JsonbPair
  {
      JsonbValue    key;            /* Must be a jbvString */
      JsonbValue    value;            /* May be of any type */
!     uint32        order;            /* preserves order of pairs with equal keys */
  };

  /* Conversion state used when parsing Jsonb from text, or for type coercion */
--- 242,261 ----
                                       (jsonbval)->type <= jbvBool)

  /*
!  * Key/value pair within an Object.
   *
!  * This struct type is only used briefly while constructing a Jsonb; it is
!  * *not* the on-disk representation.
!  *
!  * Pairs with duplicate keys are de-duplicated.  We store the originally
!  * observed pair ordering for the purpose of removing duplicates in a
!  * well-defined way (which is "last observed wins").
   */
  struct JsonbPair
  {
      JsonbValue    key;            /* Must be a jbvString */
      JsonbValue    value;            /* May be of any type */
!     uint32        order;            /* Pair's index in original sequence */
  };

  /* Conversion state used when parsing Jsonb from text, or for type coercion */
*************** typedef struct JsonbIterator
*** 287,306 ****
  {
      /* Container being iterated */
      JsonbContainer *container;
!     uint32        nElems;            /* Number of elements in children array (will be
!                                  * nPairs for objects) */
      bool        isScalar;        /* Pseudo-array scalar value? */
!     JEntry       *children;

!     /* Current item in buffer (up to nElems, but must * 2 for objects) */
!     int            i;

      /*
!      * Data proper.  This points just past end of children array.
!      * We use the JBE_OFF() macro on the Jentrys to find offsets of each
!      * child in this area.
       */
!     char       *dataProper;

      /* Private state */
      JsonbIterState state;
--- 283,307 ----
  {
      /* Container being iterated */
      JsonbContainer *container;
!     uint32        nElems;            /* Number of elements in children array (will
!                                  * be nPairs for objects) */
      bool        isScalar;        /* Pseudo-array scalar value? */
!     JEntry       *children;        /* JEntrys for child nodes */
!     /* Data proper.  This points just past end of children array */
!     char       *dataProper;

!     /* Current item in buffer (up to nElems) */
!     int            curIndex;
!
!     /* Data offset corresponding to current item */
!     uint32        curDataOffset;

      /*
!      * If the container is an object, we want to return keys and values
!      * alternately; so curDataOffset points to the current key, and
!      * curValueOffset points to the current value.
       */
!     uint32        curValueOffset;

      /* Private state */
      JsonbIterState state;
*************** extern Datum gin_consistent_jsonb_path(P
*** 344,349 ****
--- 345,351 ----
  extern Datum gin_triconsistent_jsonb_path(PG_FUNCTION_ARGS);

  /* Support functions */
+ extern uint32 getJsonbOffset(const JEntry *ja, int index);
  extern int    compareJsonbContainers(JsonbContainer *a, JsonbContainer *b);
  extern JsonbValue *findJsonbValueFromContainer(JsonbContainer *sheader,
                              uint32 flags,

pgsql-hackers by date:

Previous
From: David G Johnston
Date:
Subject: Re: proposal: rounding up time value less than its unit.
Next
From: Alvaro Herrera
Date:
Subject: Re: Per table autovacuum vacuum cost limit behaviour strange