Re: jsonb and nested hstore - Mailing list pgsql-hackers

From Teodor Sigaev
Subject Re: jsonb and nested hstore
Date
Msg-id 5310BE2C.90402@sigaev.ru
Whole thread Raw
In response to Re: jsonb and nested hstore  (Andrew Dunstan <andrew@dunslane.net>)
List pgsql-hackers
>>> +            v.size += VARSIZE_ANY(v.numeric) +sizeof(JEntry) /* alignment */ ;
>> Why does + sizeof(JEntry) change anything about alignment? If it was
>> aligned before, adding a statically sized value doesn't give any new
>> guarantees about alignment?
> Teodor, please comment.

Because numeric type will be copied into jsonb value. And we need to keep 
alignment inside jsonb value. The same is true for  nested jsonb (array or object).


>>> +                type = JsonbIteratorGet(&it, &v, false);
>>> +                if (type == WJB_VALUE)
>>> +                {
>>> +                    first = false;
>>> +                    putEscapedValue(out, &v);
>>> +                }
>>> +                else
>>> +                {
>>> +                    Assert(type == WJB_BEGIN_OBJECT || type ==
>>> WJB_BEGIN_ARRAY);
>>> +                    /*
>>> +                     * We need to rerun current switch() due to put
>>> +                     * in current place object which we just got
>>> +                     * from iterator.
>>> +                     */
>> "due to put"?
>
>
> I think that's due to the author not being a native English speaker. I've tried
> to improve it a bit.
>
> Teodor, please comment if you like.

Pls, fix my English. I mean if we got first element of array/object and it isn't 
a scalar value we should do actions pointed by case 
WJB_BEGIN_OBJECT/WJB_BEGIN_ARRAY in the same switch without calling iterator.


> Teodor, please examine and comment on all comments below this point.

>>> +JsonbValue *
>>> +findUncompressedJsonbValueByValue(char *buffer, uint32 flags,
>>> +                                  uint32 *lowbound, JsonbValue *key)
>>> +{
>> Functions like this *REALLY* need documentation for their
>> parameters. And of their actual purpose.>>>> What's actually the uncompressed bit here? Isn't it actually the>>
contrary?This is navigating the compressed, non-tree form, no?
 

Functions returns value in  JsonbValue form (uncompressed, not just a pointer). 
For object it performs search by key and returns corresponding value, for array 
- returns matched value. If lowbound is not null then it will be set into 
array/object position of found value. And search will be started from *lowbound 
position in array/object. That allows some optimizations.

Buffer is a pointer to header of jsonb value. After pointer there is an array of 
JEntry and following list of values of array or key and values of object. This 
is internal representation for jsonb or hstore without varlena header.
Nested array/object have the same representation. Flags points desired search - 
in array or object. For example, If buffer contains array and flags has only 
JB_FLAG_OBJECT then function returns NULL.



>>> +    else if (flags & JB_FLAG_OBJECT & header)
>>> +    {
>>> +        JEntry       *array = (JEntry *) (buffer + sizeof(header));
>>> +        char       *data = (char *) (array + (header & JB_COUNT_MASK) * 2);
>>> +        uint32        stopLow = lowbound ? *lowbound : 0,
>>> +                    stopHigh = (header & JB_COUNT_MASK),
>>> +                    stopMiddle;
>> I don't understand what the point of the lowbound logic could be here?
>> If a key hasn't been found, it hasn't been found? Maybe the idea is to
>> use it when testing containedness or somesuch? Wouldn't iterating over
>> the keyspace be a better idea for that case?
If we has keys (a,b,c,d,e,f,g) and need to search keys e and f, then for second 
search we could do in in subset of keys (f,g), we don't need to search in full 
set of keys. The idea was introduced in hstoreFindKey() in hstore V2.


>>
>>> +        if (key->type != jbvString)
>>> +            return NULL;
>> That's not allowed, right?

Right. it should be an Assert or ERROR.

>>
>>> +/*
>>> + * Get i-th value of array or hash. if i < 0 then it counts from
>>> + * the end of array/hash. Note: returns pointer to statically
>>> + * allocated JsonbValue.
>>> + */
>>> +JsonbValue *
>>> +getJsonbValue(char *buffer, uint32 flags, int32 i)
>>> +{
>>> +    uint32        header = *(uint32 *) buffer;
>>> +    static JsonbValue r;
>> Really? And why on earth would static allocation be a good idea? Specify
>> it on the caller's stack if need be. Or even return by value, today's
>> calling convention will just allocate that on the caller's stack without
>> copying.
>> Accessing static data isn't even faster.

Just to prevent multiple palloc(). Could be changed, I don't insist. I saw 
problems with a lot of small allocations but didn't see such problems with 
static allocations.

>>
>>> +    if (JBE_ISSTRING(*e))
>>> +    {
>>> +        r.type = jbvString;
>>> +        r.string.val = data + JBE_OFF(*e);
>>> +        r.string.len = JBE_LEN(*e);
>>> +        r.size = sizeof(JEntry) + r.string.len;
>>> +    }
>>> +    else if (JBE_ISBOOL(*e))
>>> +    {
>>> +        r.type = jbvBool;
>>> +        r.boolean = (JBE_ISBOOL_TRUE(*e)) ? true : false;
>>> +        r.size = sizeof(JEntry);
>>> +    }
>>> +    else if (JBE_ISNUMERIC(*e))
>>> +    {
>>> +        r.type = jbvNumeric;
>>> +        r.numeric = (Numeric) (data + INTALIGN(JBE_OFF(*e)));
>>> +
>>> +        r.size = 2 * sizeof(JEntry) + VARSIZE_ANY(r.numeric);
>>> +    }
>>> +    else if (JBE_ISNULL(*e))
>>> +    {
>>> +        r.type = jbvNull;
>>> +        r.size = sizeof(JEntry);
>>> +    }
>>> +    else
>>> +    {
>>> +        r.type = jbvBinary;
>>> +        r.binary.data = data + INTALIGN(JBE_OFF(*e));
>>> +        r.binary.len = JBE_LEN(*e) - (INTALIGN(JBE_OFF(*e)) - JBE_OFF(*e));
>>> +        r.size = r.binary.len + 2 * sizeof(JEntry);
>>> +    }
>> This bit of code exists pretty similarly in several places, maybe consolitate?
findUncompressedJsonbValueByValue(), getJsonbValue() and formAnswer(). But one 
has inconvenient difference with skipNested flag. Ok, will fix.



>>
>>> +/****************************************************************************
>>> + *                      Walk on tree representation of
>>> jsonb                    *
>>> + ****************************************************************************/
>>> +static void
>>> +walkUncompressedJsonbDo(JsonbValue *v, walk_jsonb_cb cb, void *cb_arg,
>>> uint32 level)
>>> +{
>>> +    int            i;
>> check stack limit.
>>
>>> +void
>>> +walkUncompressedJsonb(JsonbValue *v, walk_jsonb_cb cb, void *cb_arg)
>>> +{
>>> +    if (v)
>>> +        walkUncompressedJsonbDo(v, cb, cb_arg, 0);
>>> +}
>>> +
>>> +/****************************************************************************
>>> + *                           Iteration over binary
>>> jsonb                        *
>>> + ****************************************************************************/
>> This needs docs.
>>
>>> +static void
>>> +parseBuffer(JsonbIterator *it, char *buffer)
>>> +{
>> Why invent completely independent naming conventions to the previous
>> functions here?
Suggest it.

>>
>>> +static bool
>>> +formAnswer(JsonbIterator **it, JsonbValue *v, JEntry * e, bool skipNested)
>>> +{
>> Imaginatively undescriptive name. But if it were slightly more more
>> abstracted away from JsonbIterator it could be the answer to my prayers
>> above about removing redundant code.
formAnswerOfJsonbIteratorGet()?

>>
>>> +static JsonbIterator *
>>> +up(JsonbIterator *it)
>>> +{
>> Not a good name.
...

>>
>>> +int
>>> +JsonbIteratorGet(JsonbIterator **it, JsonbValue *v, bool skipNested)
>>> +{
>>> +    int            res;
>> recursive, stack depth check.
fixed.

>>
>>> +    switch ((*it)->type | (*it)->state)
>>> +    {
>>> +        case JB_FLAG_ARRAY | jbi_start:
>> I don't know, but I don't see the point in avoid if (), else if()
>> ... constructs if it requires such dirty tricks.
Will be:
if ((*it)->type == JB_FLAG_ARRAY && (*it)->state == jbi_start)

A bit slower and I don't feel that switch is more worse. But I don't insist.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/ 



pgsql-hackers by date:

Previous
From: Emre Hasegeli
Date:
Subject: Re: GiST support for inet datatypes
Next
From: David Johnston
Date:
Subject: Re: Equivalence Rules