Re: Brain fade in gin_extract_jsonb_path() - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Brain fade in gin_extract_jsonb_path() |
Date | |
Msg-id | 3346.1446759312@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Brain fade in gin_extract_jsonb_path() (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: Brain fade in gin_extract_jsonb_path()
|
List | pgsql-hackers |
Peter Geoghegan <pg@heroku.com> writes: > On Thu, Nov 5, 2015 at 12:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The attached one-liner fixes it, but of course this means that on-disk >> jsonb_path_ops indexes are possibly broken and will need to be reindexed. >> I see no way around that ... does anybody else? > I think it's impossible to avoid having to recommend a reindex here. Good, because that's not the only bug here :-( Extending the example with insert into t (j) values ('[[14,2,3]]'); insert into t (j) values ('[1,[14,2,3]]'); we get this with seqscan on: regression=# select * from t where j @> '[[14]]'; j ----------------- [[14, 2, 3]] [1, [14, 2, 3]] (2 rows) which is the right answer, and this with seqscan off: regression=# select * from t where j @> '[[14]]'; j -------------- [[14, 2, 3]] (1 row) which is not so much. The reason for that is that when processing an outer-level WJB_BEGIN_XXX, we initialize stack->hash to something that is different from stack->parent->hash. This is a waste of time in unnested cases because when we hit the first WJB_KEY or WJB_ELEM, we'll reset stack->hash to stack->parent->hash. However, if we have a nested array, then processing of elements inside that array treats stack->parent->hash as the reference value, so that we compute different hash values than we would have at outer level. What's worse, it matters whether the nested array is the first subobject or not, as shown by the above example; that's because once we reset stack->hash to equal stack->parent->hash, hashes for later elements will be inconsistent with the special-case initialization. (Note: that explains why this is broken with my previous patch. It fails in much the same way with unmodified HEAD, but I think for slightly different reasons, which I've not bothered to understand in detail.) I think the easiest way to fix this is to forget about the special initialization at outer level and just always initialize a new stack level to have the same hash as its parent. That leads to the first patch below --- but once you look at that, you realize that we've got unnecessarily many copies of stack->parent->hash into stack->hash, so what I actually propose now is the second patch below. I think that this only changes the hashes calculated for nested-array cases, but it's still a larger change footprint than the previous patch. Not that that affects much; a reindex is a reindex, and a bug is a bug, so we don't have much wiggle room. regards, tom lane diff --git a/src/backend/utils/adt/jsonb_gin.c b/src/backend/utils/adt/jsonb_gin.c index 204fb8b..9172268 100644 *** a/src/backend/utils/adt/jsonb_gin.c --- b/src/backend/utils/adt/jsonb_gin.c *************** gin_extract_jsonb_path(PG_FUNCTION_ARGS) *** 375,406 **** parent = stack; stack = (PathHashStack *) palloc(sizeof(PathHashStack)); ! if (parent->parent) ! { ! /* ! * We pass forward hashes from previous container nesting ! * levels so that nested arrays with an outermost nested ! * object will have element hashes mixed with the ! * outermost key. It's also somewhat useful to have ! * nested objects' innermost values have hashes that are a ! * function of not just their own key, but outer keys too. ! * ! * Nesting an array within another array will not alter ! * innermost scalar element hash values, but that seems ! * inconsequential. ! */ ! stack->hash = parent->hash; ! } ! else ! { ! /* ! * At the outermost level, initialize hash with container ! * type proxy value. Note that this makes JB_FARRAY and ! * JB_FOBJECT part of the on-disk representation, but they ! * are that in the base jsonb object storage already. ! */ ! stack->hash = (r == WJB_BEGIN_ARRAY) ? JB_FARRAY : JB_FOBJECT; ! } stack->parent = parent; break; case WJB_KEY: --- 375,390 ---- parent = stack; stack = (PathHashStack *) palloc(sizeof(PathHashStack)); ! /* ! * We pass forward hashes from outer nesting levels so that ! * the hashes for nested values will include outer keys as ! * well as their own keys. ! * ! * Nesting an array within another array will not alter ! * innermost scalar element hash values, but that seems ! * inconsequential. ! */ ! stack->hash = parent->hash; stack->parent = parent; break; case WJB_KEY: *************** gin_extract_jsonb_path(PG_FUNCTION_ARGS) *** 419,425 **** JsonbHashScalarValue(&v, &stack->hash); /* and emit an index entry */ entries[i++] = UInt32GetDatum(stack->hash); ! /* Note: we assume we'll see KEY before another VALUE */ break; case WJB_END_ARRAY: case WJB_END_OBJECT: --- 403,410 ---- JsonbHashScalarValue(&v, &stack->hash); /* and emit an index entry */ entries[i++] = UInt32GetDatum(stack->hash); ! /* reset hash for next sub-object */ ! stack->hash = stack->parent->hash; break; case WJB_END_ARRAY: case WJB_END_OBJECT: diff --git a/src/backend/utils/adt/jsonb_gin.c b/src/backend/utils/adt/jsonb_gin.c index 204fb8b..04d2318 100644 *** a/src/backend/utils/adt/jsonb_gin.c --- b/src/backend/utils/adt/jsonb_gin.c *************** gin_extract_jsonb_path(PG_FUNCTION_ARGS) *** 375,425 **** parent = stack; stack = (PathHashStack *) palloc(sizeof(PathHashStack)); ! if (parent->parent) ! { ! /* ! * We pass forward hashes from previous container nesting ! * levels so that nested arrays with an outermost nested ! * object will have element hashes mixed with the ! * outermost key. It's also somewhat useful to have ! * nested objects' innermost values have hashes that are a ! * function of not just their own key, but outer keys too. ! * ! * Nesting an array within another array will not alter ! * innermost scalar element hash values, but that seems ! * inconsequential. ! */ ! stack->hash = parent->hash; ! } ! else ! { ! /* ! * At the outermost level, initialize hash with container ! * type proxy value. Note that this makes JB_FARRAY and ! * JB_FOBJECT part of the on-disk representation, but they ! * are that in the base jsonb object storage already. ! */ ! stack->hash = (r == WJB_BEGIN_ARRAY) ? JB_FARRAY : JB_FOBJECT; ! } stack->parent = parent; break; case WJB_KEY: ! /* initialize hash from parent */ ! stack->hash = stack->parent->hash; ! /* and mix in this key */ JsonbHashScalarValue(&v, &stack->hash); /* hash is now ready to incorporate the value */ break; case WJB_ELEM: - /* array elements use parent hash mixed with element's hash */ - stack->hash = stack->parent->hash; - /* FALL THRU */ case WJB_VALUE: /* mix the element or value's hash into the prepared hash */ JsonbHashScalarValue(&v, &stack->hash); /* and emit an index entry */ entries[i++] = UInt32GetDatum(stack->hash); ! /* Note: we assume we'll see KEY before another VALUE */ break; case WJB_END_ARRAY: case WJB_END_OBJECT: --- 375,405 ---- parent = stack; stack = (PathHashStack *) palloc(sizeof(PathHashStack)); ! /* ! * We pass forward hashes from outer nesting levels so that ! * the hashes for nested values will include outer keys as ! * well as their own keys. ! * ! * Nesting an array within another array will not alter ! * innermost scalar element hash values, but that seems ! * inconsequential. ! */ ! stack->hash = parent->hash; stack->parent = parent; break; case WJB_KEY: ! /* mix this key into the current outer hash */ JsonbHashScalarValue(&v, &stack->hash); /* hash is now ready to incorporate the value */ break; case WJB_ELEM: case WJB_VALUE: /* mix the element or value's hash into the prepared hash */ JsonbHashScalarValue(&v, &stack->hash); /* and emit an index entry */ entries[i++] = UInt32GetDatum(stack->hash); ! /* reset hash for next key, value, or sub-object */ ! stack->hash = stack->parent->hash; break; case WJB_END_ARRAY: case WJB_END_OBJECT:
pgsql-hackers by date: