Re: Brain fade in gin_extract_jsonb_path() - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Brain fade in gin_extract_jsonb_path()
Date
Msg-id 3346.1446759312@sss.pgh.pa.us
Whole thread Raw
In response to Re: Brain fade in gin_extract_jsonb_path()  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Brain fade in gin_extract_jsonb_path()
List pgsql-hackers
Peter Geoghegan <pg@heroku.com> writes:
> On Thu, Nov 5, 2015 at 12:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The attached one-liner fixes it, but of course this means that on-disk
>> jsonb_path_ops indexes are possibly broken and will need to be reindexed.
>> I see no way around that ... does anybody else?

> I think it's impossible to avoid having to recommend a reindex here.

Good, because that's not the only bug here :-(

Extending the example with

insert into t (j) values ('[[14,2,3]]');
insert into t (j) values ('[1,[14,2,3]]');

we get this with seqscan on:

regression=# select * from t where j @> '[[14]]';
        j
-----------------
 [[14, 2, 3]]
 [1, [14, 2, 3]]
(2 rows)

which is the right answer, and this with seqscan off:

regression=# select * from t where j @> '[[14]]';
      j
--------------
 [[14, 2, 3]]
(1 row)

which is not so much.

The reason for that is that when processing an outer-level WJB_BEGIN_XXX,
we initialize stack->hash to something that is different from
stack->parent->hash.  This is a waste of time in unnested cases because
when we hit the first WJB_KEY or WJB_ELEM, we'll reset stack->hash to
stack->parent->hash.  However, if we have a nested array, then processing
of elements inside that array treats stack->parent->hash as the reference
value, so that we compute different hash values than we would have at
outer level.  What's worse, it matters whether the nested array is the
first subobject or not, as shown by the above example; that's because once
we reset stack->hash to equal stack->parent->hash, hashes for later
elements will be inconsistent with the special-case initialization.
(Note: that explains why this is broken with my previous patch.  It fails
in much the same way with unmodified HEAD, but I think for slightly
different reasons, which I've not bothered to understand in detail.)

I think the easiest way to fix this is to forget about the special
initialization at outer level and just always initialize a new stack level
to have the same hash as its parent.  That leads to the first patch below
--- but once you look at that, you realize that we've got unnecessarily
many copies of stack->parent->hash into stack->hash, so what I actually
propose now is the second patch below.

I think that this only changes the hashes calculated for nested-array
cases, but it's still a larger change footprint than the previous patch.
Not that that affects much; a reindex is a reindex, and a bug is a bug,
so we don't have much wiggle room.

            regards, tom lane

diff --git a/src/backend/utils/adt/jsonb_gin.c b/src/backend/utils/adt/jsonb_gin.c
index 204fb8b..9172268 100644
*** a/src/backend/utils/adt/jsonb_gin.c
--- b/src/backend/utils/adt/jsonb_gin.c
*************** gin_extract_jsonb_path(PG_FUNCTION_ARGS)
*** 375,406 ****
                  parent = stack;
                  stack = (PathHashStack *) palloc(sizeof(PathHashStack));

!                 if (parent->parent)
!                 {
!                     /*
!                      * We pass forward hashes from previous container nesting
!                      * levels so that nested arrays with an outermost nested
!                      * object will have element hashes mixed with the
!                      * outermost key.  It's also somewhat useful to have
!                      * nested objects' innermost values have hashes that are a
!                      * function of not just their own key, but outer keys too.
!                      *
!                      * Nesting an array within another array will not alter
!                      * innermost scalar element hash values, but that seems
!                      * inconsequential.
!                      */
!                     stack->hash = parent->hash;
!                 }
!                 else
!                 {
!                     /*
!                      * At the outermost level, initialize hash with container
!                      * type proxy value.  Note that this makes JB_FARRAY and
!                      * JB_FOBJECT part of the on-disk representation, but they
!                      * are that in the base jsonb object storage already.
!                      */
!                     stack->hash = (r == WJB_BEGIN_ARRAY) ? JB_FARRAY : JB_FOBJECT;
!                 }
                  stack->parent = parent;
                  break;
              case WJB_KEY:
--- 375,390 ----
                  parent = stack;
                  stack = (PathHashStack *) palloc(sizeof(PathHashStack));

!                 /*
!                  * We pass forward hashes from outer nesting levels so that
!                  * the hashes for nested values will include outer keys as
!                  * well as their own keys.
!                  *
!                  * Nesting an array within another array will not alter
!                  * innermost scalar element hash values, but that seems
!                  * inconsequential.
!                  */
!                 stack->hash = parent->hash;
                  stack->parent = parent;
                  break;
              case WJB_KEY:
*************** gin_extract_jsonb_path(PG_FUNCTION_ARGS)
*** 419,425 ****
                  JsonbHashScalarValue(&v, &stack->hash);
                  /* and emit an index entry */
                  entries[i++] = UInt32GetDatum(stack->hash);
!                 /* Note: we assume we'll see KEY before another VALUE */
                  break;
              case WJB_END_ARRAY:
              case WJB_END_OBJECT:
--- 403,410 ----
                  JsonbHashScalarValue(&v, &stack->hash);
                  /* and emit an index entry */
                  entries[i++] = UInt32GetDatum(stack->hash);
!                 /* reset hash for next sub-object */
!                 stack->hash = stack->parent->hash;
                  break;
              case WJB_END_ARRAY:
              case WJB_END_OBJECT:
diff --git a/src/backend/utils/adt/jsonb_gin.c b/src/backend/utils/adt/jsonb_gin.c
index 204fb8b..04d2318 100644
*** a/src/backend/utils/adt/jsonb_gin.c
--- b/src/backend/utils/adt/jsonb_gin.c
*************** gin_extract_jsonb_path(PG_FUNCTION_ARGS)
*** 375,425 ****
                  parent = stack;
                  stack = (PathHashStack *) palloc(sizeof(PathHashStack));

!                 if (parent->parent)
!                 {
!                     /*
!                      * We pass forward hashes from previous container nesting
!                      * levels so that nested arrays with an outermost nested
!                      * object will have element hashes mixed with the
!                      * outermost key.  It's also somewhat useful to have
!                      * nested objects' innermost values have hashes that are a
!                      * function of not just their own key, but outer keys too.
!                      *
!                      * Nesting an array within another array will not alter
!                      * innermost scalar element hash values, but that seems
!                      * inconsequential.
!                      */
!                     stack->hash = parent->hash;
!                 }
!                 else
!                 {
!                     /*
!                      * At the outermost level, initialize hash with container
!                      * type proxy value.  Note that this makes JB_FARRAY and
!                      * JB_FOBJECT part of the on-disk representation, but they
!                      * are that in the base jsonb object storage already.
!                      */
!                     stack->hash = (r == WJB_BEGIN_ARRAY) ? JB_FARRAY : JB_FOBJECT;
!                 }
                  stack->parent = parent;
                  break;
              case WJB_KEY:
!                 /* initialize hash from parent */
!                 stack->hash = stack->parent->hash;
!                 /* and mix in this key */
                  JsonbHashScalarValue(&v, &stack->hash);
                  /* hash is now ready to incorporate the value */
                  break;
              case WJB_ELEM:
-                 /* array elements use parent hash mixed with element's hash */
-                 stack->hash = stack->parent->hash;
-                 /* FALL THRU */
              case WJB_VALUE:
                  /* mix the element or value's hash into the prepared hash */
                  JsonbHashScalarValue(&v, &stack->hash);
                  /* and emit an index entry */
                  entries[i++] = UInt32GetDatum(stack->hash);
!                 /* Note: we assume we'll see KEY before another VALUE */
                  break;
              case WJB_END_ARRAY:
              case WJB_END_OBJECT:
--- 375,405 ----
                  parent = stack;
                  stack = (PathHashStack *) palloc(sizeof(PathHashStack));

!                 /*
!                  * We pass forward hashes from outer nesting levels so that
!                  * the hashes for nested values will include outer keys as
!                  * well as their own keys.
!                  *
!                  * Nesting an array within another array will not alter
!                  * innermost scalar element hash values, but that seems
!                  * inconsequential.
!                  */
!                 stack->hash = parent->hash;
                  stack->parent = parent;
                  break;
              case WJB_KEY:
!                 /* mix this key into the current outer hash */
                  JsonbHashScalarValue(&v, &stack->hash);
                  /* hash is now ready to incorporate the value */
                  break;
              case WJB_ELEM:
              case WJB_VALUE:
                  /* mix the element or value's hash into the prepared hash */
                  JsonbHashScalarValue(&v, &stack->hash);
                  /* and emit an index entry */
                  entries[i++] = UInt32GetDatum(stack->hash);
!                 /* reset hash for next key, value, or sub-object */
!                 stack->hash = stack->parent->hash;
                  break;
              case WJB_END_ARRAY:
              case WJB_END_OBJECT:

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: proposal: multiple psql option -c
Next
From: "David G. Johnston"
Date:
Subject: Re: [PATCH] Skip ALTER x SET SCHEMA if the schema didn't change