Re: jsonb and nested hstore - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: jsonb and nested hstore
Date
Msg-id CAHyXU0xGMcUkTv7WPpqeiZXSmBvY29C5JovgVM8EpnhZ+LhVqQ@mail.gmail.com
Whole thread Raw
In response to Re: jsonb and nested hstore  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: jsonb and nested hstore  (Andres Freund <andres@2ndquadrant.com>)
Re: jsonb and nested hstore  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On Sat, Feb 1, 2014 at 4:20 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2014-01-30 14:07:42 -0500, Andrew Dunstan wrote:
>> +  <para id="functions-json-table">
>> +   <xref linkend="functions-json-creation-table"> shows the functions that are
>> +   available for creating <type>json</type> values.
>> +   (see <xref linkend="datatype-json">)
>>    </para>
>>
>> -  <table id="functions-json-table">
>> -    <title>JSON Support Functions</title>
>> +  <indexterm>
>> +   <primary>array_to_json</primary>
>> +  </indexterm>
>> +  <indexterm>
>> +   <primary>row_to_json</primary>
>> +  </indexterm>
>> +  <indexterm>
>> +   <primary>to_json</primary>
>> +  </indexterm>
>> +  <indexterm>
>> +   <primary>json_build_array</primary>
>> +  </indexterm>
>> +  <indexterm>
>> +   <primary>json_build_object</primary>
>> +  </indexterm>
>> +  <indexterm>
>> +   <primary>json_object</primary>
>> +  </indexterm>
>
> Hm, why are you collecting the indexterms at the top in the contrast to
> the previous way of collecting them at the point of documentation?
>
>> diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
>> index 1ae9fa0..fd93d9b 100644
>> --- a/src/backend/utils/adt/Makefile
>> +++ b/src/backend/utils/adt/Makefile
>> @@ -32,7 +32,8 @@ OBJS = acl.o arrayfuncs.o array_selfuncs.o array_typanalyze.o \
>>       tsquery_op.o tsquery_rewrite.o tsquery_util.o tsrank.o \
>>       tsvector.o tsvector_op.o tsvector_parser.o \
>>       txid.o uuid.o windowfuncs.o xml.o rangetypes_spgist.o \
>> -     rangetypes_typanalyze.o rangetypes_selfuncs.o
>> +     rangetypes_typanalyze.o rangetypes_selfuncs.o \
>> +     jsonb.o jsonb_support.o
>
> Odd, most OBJS lines are kept in alphabetical order, but that doesn't
> seem to be the case here.
>
>> +/*
>> + * for jsonb we always want the de-escaped value - that's what's in token
>> + */
>> +
>
> strange newline.
>
>> +static void
>> +jsonb_in_scalar(void *state, char *token, JsonTokenType tokentype)
>> +{
>> +     JsonbInState *_state = (JsonbInState *) state;
>> +     JsonbValue      v;
>> +
>> +     v.size = sizeof(JEntry);
>> +
>> +     switch (tokentype)
>> +     {
>> +
> ...
>
>> +             default:                                /* nothing else should be here in fact */
>> +                     break;
>
> Shouldn't this at least Assert(false) or something?
>
>> +static void
>> +recvJsonbValue(StringInfo buf, JsonbValue *v, uint32 level, int c)
>> +{
>> +     uint32          hentry = c & JENTRY_TYPEMASK;
>> +
>> +     if (hentry == JENTRY_ISNULL)
>> +     {
>> +             v->type = jbvNull;
>> +             v->size = sizeof(JEntry);
>> +     }
>> +     else if (hentry == JENTRY_ISOBJECT || hentry == JENTRY_ISARRAY || hentry == JENTRY_ISCALAR)
>> +     {
>> +             recvJsonb(buf, v, level + 1, (uint32) c);
>> +     }
>> +     else if (hentry == JENTRY_ISFALSE || hentry == JENTRY_ISTRUE)
>> +     {
>> +             v->type = jbvBool;
>> +             v->size = sizeof(JEntry);
>> +             v->boolean = (hentry == JENTRY_ISFALSE) ? false : true;
>> +     }
>> +     else if (hentry == JENTRY_ISNUMERIC)
>> +     {
>> +             v->type = jbvNumeric;
>> +             v->numeric = DatumGetNumeric(DirectFunctionCall3(numeric_recv, PointerGetDatum(buf),
>> +                                                                        Int32GetDatum(0), Int32GetDatum(-1)));
>> +
>> +             v->size = sizeof(JEntry) * 2 + VARSIZE_ANY(v->numeric);
>
> What's the *2 here?
>
>> +static void
>> +recvJsonb(StringInfo buf, JsonbValue *v, uint32 level, uint32 header)
>> +{
>> +     uint32          hentry;
>> +     uint32          i;
>
> This function and recvJsonbValue call each other recursively, afaics
> without any limit, shouldn't they check for the stack depth?
>
>> +     hentry = header & JENTRY_TYPEMASK;
>> +
>> +     v->size = 3 * sizeof(JEntry);
>
> *3?
>
>> +     if (hentry == JENTRY_ISOBJECT)
>> +     {
>> +             v->type = jbvHash;
>> +             v->hash.npairs = header & JB_COUNT_MASK;
>> +             if (v->hash.npairs > 0)
>> +             {
>> +                     v->hash.pairs = palloc(sizeof(*v->hash.pairs) * v->hash.npairs);
>> +
>
> Hm, if I understand correctly, we're just allocating JB_COUNT_MASK
> (which is 0x0FFFFFFF) * sizeof(*v->hash.pairs) bytes here, without any
> crosschecks about the actual length of the data? So with a few bytes the
> server can be coaxed to allocate a gigabyte of data?
> Since this immediately calls another input routine, this can be done in
> a nested fashion, quickly OOMing the server.
>
> I think this and several other places really need a bit more input
> sanity checking.
>
>> +                     for (i = 0; i < v->hash.npairs; i++)
>> +                     {
>> +                             recvJsonbValue(buf, &v->hash.pairs[i].key, level, pq_getmsgint(buf, 4));
>> +                             if (v->hash.pairs[i].key.type != jbvString)
>> +                                     elog(ERROR, "jsonb's key could be only a string");
>
> Shouldn't that be an ereport(ERRCODE_DATATYPE_MISMATCH)? Similar in a
> few other places.
>
>> +char *
>> +JsonbToCString(StringInfo out, char *in, int estimated_len)
>> +{
>> +     bool            first = true;
>> +     JsonbIterator *it;
>> +     int                     type;
>> +     JsonbValue      v;
>> +     int                     level = 0;
>> +
>> +     if (out == NULL)
>> +             out = makeStringInfo();
>
> Such a behaviour certainly deserves a documentary comment. Generally
> some more functions could use that.
>
>> +     while ((type = JsonbIteratorGet(&it, &v, false)) != 0)
>> +     {
>> +reout:
>> +             switch (type)
>> +             {
> ...
>> +                             {
>> +                                     Assert(type == WJB_BEGIN_OBJECT || type == WJB_BEGIN_ARRAY);
>> +                                     goto reout;
>
> Hrmpf.
>
>> +Datum
>> +jsonb_typeof(PG_FUNCTION_ARGS)
>> +{
> ...
>> +}
>
> Hm, shouldn't that be in jsonfuncs.c?
>
>> diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
>> index a19b222..f1eacc6 100644
>> --- a/src/backend/utils/adt/jsonfuncs.c
>> +++ b/src/backend/utils/adt/jsonfuncs.c
>> @@ -27,6 +27,7 @@
>>  #include "utils/builtins.h"
>>  #include "utils/hsearch.h"
>>  #include "utils/json.h"
>> +#include "utils/jsonb.h"
>>  #include "utils/jsonapi.h"
>>  #include "utils/lsyscache.h"
>>  #include "utils/memutils.h"
>> @@ -51,6 +52,7 @@ static inline Datum get_path_all(PG_FUNCTION_ARGS, bool as_text);
>>  static inline text *get_worker(text *json, char *field, int elem_index,
>>                  char **tpath, int *ipath, int npath,
>>                  bool normalize_results);
>> +static inline Datum get_jsonb_path_all(PG_FUNCTION_ARGS, bool as_text);
>
> I don't see the point of using PG_FUNCTION_ARGS if you're manually
> calling it like
> +       return get_jsonb_path_all(fcinfo, false);
>
> That just makes it harder if someday PG_FUNCTION_ARGS grows a second
> argument or something.
>
>
>> +Datum
>> +jsonb_object_keys(PG_FUNCTION_ARGS)
>> +{
>> +     FuncCallContext *funcctx;
>> +     OkeysState *state;
>> +     int                     i;
>> +
>> +     if (SRF_IS_FIRSTCALL())
>> +     {
>> +             MemoryContext oldcontext;
>> +             Jsonb      *jb = PG_GETARG_JSONB(0);
>> +             bool            skipNested = false;
>> +             JsonbIterator *it;
>> +             JsonbValue      v;
>> +             int                     r = 0;
>> +
>> +             if (JB_ROOT_IS_SCALAR(jb))
>> +                     ereport(ERROR,
>> +                                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> +                                      errmsg("cannot call jsonb_object_keys on a scalar")));
>> +             else if (JB_ROOT_IS_ARRAY(jb))
>> +                     ereport(ERROR,
>> +                                     (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> +                                      errmsg("cannot call jsonb_object_keys on an array")));
>> +
>> +             funcctx = SRF_FIRSTCALL_INIT();
>> +             oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
>
> This will detoast 'jb' into the expression context, since
> PG_GETARG_JSONB() is called before the MemoryContextSwitchTo. But that's
> ok since the percall code only deals with ->result, right?
>
>> -     /* make these in a sufficiently long-lived memory context */
>>       old_cxt = MemoryContextSwitchTo(rsi->econtext->ecxt_per_query_memory);
>
> wh remove that comment?
>
>> +#define JENTRY_ISCALAR               (0x10000000 | 0x40000000)
>
> Isn't there an S missing here?
>
>> --- a/contrib/hstore/hstore_compat.c
>> +++ b/contrib/hstore/hstore_compat.c
>> +/*
>> + * New Old version (new not-nested version of hstore, v2 version)
>> + * V2 and v3 (nested) are upward binary compatible. But
>> + * framework was fully changed. Keep here old definitions (v2)
>> + */
>
> That's an, err, interesting sentence. I think referring to old new
> version and stuff is less than helpful. I realize lots of that is
> baggage from existing code, but yet another version doesn't make it
> easier.
>
> I lost my stomach (or maybe it was the glass of red) somewhere in the
> middle, but I think this needs a lot of work. Especially the io code
> doesn't seem ready to me. I'd consider ripping out the send/recv code
> for 9.4, that seems the biggest can of worms. It will still be usable
> without.

Not having type send/recv functions is somewhat dangerous; it can
cause problems for libraries that run everything through the binary
wire format.  I'd give jsonb a pass on that, being a new type, but
would be concerned if hstore had that ability revoked.

offhand note: hstore_send seems pretty simply written and clean; it's
a simple nonrecursive iterator...

merlin



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: bgworker crashed or not?
Next
From: "Tomas Vondra"
Date:
Subject: Re: GIN improvements part2: fast scan