Re: jsonb and nested hstore - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: jsonb and nested hstore |
Date | |
Msg-id | 20140201222045.GC5930@awork2.anarazel.de Whole thread Raw |
In response to | Re: jsonb and nested hstore (Andrew Dunstan <andrew@dunslane.net>) |
Responses |
Re: jsonb and nested hstore
Re: jsonb and nested hstore Re: jsonb and nested hstore adt Makefile, was Re: jsonb and nested hstore |
List | pgsql-hackers |
On 2014-01-30 14:07:42 -0500, Andrew Dunstan wrote: > + <para id="functions-json-table"> > + <xref linkend="functions-json-creation-table"> shows the functions that are > + available for creating <type>json</type> values. > + (see <xref linkend="datatype-json">) > </para> > > - <table id="functions-json-table"> > - <title>JSON Support Functions</title> > + <indexterm> > + <primary>array_to_json</primary> > + </indexterm> > + <indexterm> > + <primary>row_to_json</primary> > + </indexterm> > + <indexterm> > + <primary>to_json</primary> > + </indexterm> > + <indexterm> > + <primary>json_build_array</primary> > + </indexterm> > + <indexterm> > + <primary>json_build_object</primary> > + </indexterm> > + <indexterm> > + <primary>json_object</primary> > + </indexterm> Hm, why are you collecting the indexterms at the top in the contrast to the previous way of collecting them at the point of documentation? > diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile > index 1ae9fa0..fd93d9b 100644 > --- a/src/backend/utils/adt/Makefile > +++ b/src/backend/utils/adt/Makefile > @@ -32,7 +32,8 @@ OBJS = acl.o arrayfuncs.o array_selfuncs.o array_typanalyze.o \ > tsquery_op.o tsquery_rewrite.o tsquery_util.o tsrank.o \ > tsvector.o tsvector_op.o tsvector_parser.o \ > txid.o uuid.o windowfuncs.o xml.o rangetypes_spgist.o \ > - rangetypes_typanalyze.o rangetypes_selfuncs.o > + rangetypes_typanalyze.o rangetypes_selfuncs.o \ > + jsonb.o jsonb_support.o Odd, most OBJS lines are kept in alphabetical order, but that doesn't seem to be the case here. > +/* > + * for jsonb we always want the de-escaped value - that's what's in token > + */ > + strange newline. > +static void > +jsonb_in_scalar(void *state, char *token, JsonTokenType tokentype) > +{ > + JsonbInState *_state = (JsonbInState *) state; > + JsonbValue v; > + > + v.size = sizeof(JEntry); > + > + switch (tokentype) > + { > + ... > + default: /* nothing else should be here in fact */ > + break; Shouldn't this at least Assert(false) or something? > +static void > +recvJsonbValue(StringInfo buf, JsonbValue *v, uint32 level, int c) > +{ > + uint32 hentry = c & JENTRY_TYPEMASK; > + > + if (hentry == JENTRY_ISNULL) > + { > + v->type = jbvNull; > + v->size = sizeof(JEntry); > + } > + else if (hentry == JENTRY_ISOBJECT || hentry == JENTRY_ISARRAY || hentry == JENTRY_ISCALAR) > + { > + recvJsonb(buf, v, level + 1, (uint32) c); > + } > + else if (hentry == JENTRY_ISFALSE || hentry == JENTRY_ISTRUE) > + { > + v->type = jbvBool; > + v->size = sizeof(JEntry); > + v->boolean = (hentry == JENTRY_ISFALSE) ? false : true; > + } > + else if (hentry == JENTRY_ISNUMERIC) > + { > + v->type = jbvNumeric; > + v->numeric = DatumGetNumeric(DirectFunctionCall3(numeric_recv, PointerGetDatum(buf), > + Int32GetDatum(0), Int32GetDatum(-1))); > + > + v->size = sizeof(JEntry) * 2 + VARSIZE_ANY(v->numeric); What's the *2 here? > +static void > +recvJsonb(StringInfo buf, JsonbValue *v, uint32 level, uint32 header) > +{ > + uint32 hentry; > + uint32 i; This function and recvJsonbValue call each other recursively, afaics without any limit, shouldn't they check for the stack depth? > + hentry = header & JENTRY_TYPEMASK; > + > + v->size = 3 * sizeof(JEntry); *3? > + if (hentry == JENTRY_ISOBJECT) > + { > + v->type = jbvHash; > + v->hash.npairs = header & JB_COUNT_MASK; > + if (v->hash.npairs > 0) > + { > + v->hash.pairs = palloc(sizeof(*v->hash.pairs) * v->hash.npairs); > + Hm, if I understand correctly, we're just allocating JB_COUNT_MASK (which is 0x0FFFFFFF) * sizeof(*v->hash.pairs) bytes here, without any crosschecks about the actual length of the data? So with a few bytes the server can be coaxed to allocate a gigabyte of data? Since this immediately calls another input routine, this can be done in a nested fashion, quickly OOMing the server. I think this and several other places really need a bit more input sanity checking. > + for (i = 0; i < v->hash.npairs; i++) > + { > + recvJsonbValue(buf, &v->hash.pairs[i].key, level, pq_getmsgint(buf, 4)); > + if (v->hash.pairs[i].key.type != jbvString) > + elog(ERROR, "jsonb's key could be only a string"); Shouldn't that be an ereport(ERRCODE_DATATYPE_MISMATCH)? Similar in a few other places. > +char * > +JsonbToCString(StringInfo out, char *in, int estimated_len) > +{ > + bool first = true; > + JsonbIterator *it; > + int type; > + JsonbValue v; > + int level = 0; > + > + if (out == NULL) > + out = makeStringInfo(); Such a behaviour certainly deserves a documentary comment. Generally some more functions could use that. > + while ((type = JsonbIteratorGet(&it, &v, false)) != 0) > + { > +reout: > + switch (type) > + { ... > + { > + Assert(type == WJB_BEGIN_OBJECT || type == WJB_BEGIN_ARRAY); > + goto reout; Hrmpf. > +Datum > +jsonb_typeof(PG_FUNCTION_ARGS) > +{ ... > +} Hm, shouldn't that be in jsonfuncs.c? > diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c > index a19b222..f1eacc6 100644 > --- a/src/backend/utils/adt/jsonfuncs.c > +++ b/src/backend/utils/adt/jsonfuncs.c > @@ -27,6 +27,7 @@ > #include "utils/builtins.h" > #include "utils/hsearch.h" > #include "utils/json.h" > +#include "utils/jsonb.h" > #include "utils/jsonapi.h" > #include "utils/lsyscache.h" > #include "utils/memutils.h" > @@ -51,6 +52,7 @@ static inline Datum get_path_all(PG_FUNCTION_ARGS, bool as_text); > static inline text *get_worker(text *json, char *field, int elem_index, > char **tpath, int *ipath, int npath, > bool normalize_results); > +static inline Datum get_jsonb_path_all(PG_FUNCTION_ARGS, bool as_text); I don't see the point of using PG_FUNCTION_ARGS if you're manually calling it like + return get_jsonb_path_all(fcinfo, false); That just makes it harder if someday PG_FUNCTION_ARGS grows a second argument or something. > +Datum > +jsonb_object_keys(PG_FUNCTION_ARGS) > +{ > + FuncCallContext *funcctx; > + OkeysState *state; > + int i; > + > + if (SRF_IS_FIRSTCALL()) > + { > + MemoryContext oldcontext; > + Jsonb *jb = PG_GETARG_JSONB(0); > + bool skipNested = false; > + JsonbIterator *it; > + JsonbValue v; > + int r = 0; > + > + if (JB_ROOT_IS_SCALAR(jb)) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("cannot call jsonb_object_keys on a scalar"))); > + else if (JB_ROOT_IS_ARRAY(jb)) > + ereport(ERROR, > + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > + errmsg("cannot call jsonb_object_keys on an array"))); > + > + funcctx = SRF_FIRSTCALL_INIT(); > + oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); This will detoast 'jb' into the expression context, since PG_GETARG_JSONB() is called before the MemoryContextSwitchTo. But that's ok since the percall code only deals with ->result, right? > - /* make these in a sufficiently long-lived memory context */ > old_cxt = MemoryContextSwitchTo(rsi->econtext->ecxt_per_query_memory); wh remove that comment? > +#define JENTRY_ISCALAR (0x10000000 | 0x40000000) Isn't there an S missing here? > --- a/contrib/hstore/hstore_compat.c > +++ b/contrib/hstore/hstore_compat.c > +/* > + * New Old version (new not-nested version of hstore, v2 version) > + * V2 and v3 (nested) are upward binary compatible. But > + * framework was fully changed. Keep here old definitions (v2) > + */ That's an, err, interesting sentence. I think referring to old new version and stuff is less than helpful. I realize lots of that is baggage from existing code, but yet another version doesn't make it easier. I lost my stomach (or maybe it was the glass of red) somewhere in the middle, but I think this needs a lot of work. Especially the io code doesn't seem ready to me. I'd consider ripping out the send/recv code for 9.4, that seems the biggest can of worms. It will still be usable without. There's just about no comments in large and relevant parts of the code. There's not much documentation about the binary layout of a definitely not trivial type with convoluted interdependencies with hstore... Greetings, Andres Freund
pgsql-hackers by date: