Re: jsonb and nested hstore - Mailing list pgsql-hackers
From | Merlin Moncure |
---|---|
Subject | Re: jsonb and nested hstore |
Date | |
Msg-id | CAHyXU0xGMcUkTv7WPpqeiZXSmBvY29C5JovgVM8EpnhZ+LhVqQ@mail.gmail.com Whole thread Raw |
In response to | Re: jsonb and nested hstore (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: jsonb and nested hstore
Re: jsonb and nested hstore |
List | pgsql-hackers |
On Sat, Feb 1, 2014 at 4:20 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-01-30 14:07:42 -0500, Andrew Dunstan wrote: >> + <para id="functions-json-table"> >> + <xref linkend="functions-json-creation-table"> shows the functions that are >> + available for creating <type>json</type> values. >> + (see <xref linkend="datatype-json">) >> </para> >> >> - <table id="functions-json-table"> >> - <title>JSON Support Functions</title> >> + <indexterm> >> + <primary>array_to_json</primary> >> + </indexterm> >> + <indexterm> >> + <primary>row_to_json</primary> >> + </indexterm> >> + <indexterm> >> + <primary>to_json</primary> >> + </indexterm> >> + <indexterm> >> + <primary>json_build_array</primary> >> + </indexterm> >> + <indexterm> >> + <primary>json_build_object</primary> >> + </indexterm> >> + <indexterm> >> + <primary>json_object</primary> >> + </indexterm> > > Hm, why are you collecting the indexterms at the top in the contrast to > the previous way of collecting them at the point of documentation? > >> diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile >> index 1ae9fa0..fd93d9b 100644 >> --- a/src/backend/utils/adt/Makefile >> +++ b/src/backend/utils/adt/Makefile >> @@ -32,7 +32,8 @@ OBJS = acl.o arrayfuncs.o array_selfuncs.o array_typanalyze.o \ >> tsquery_op.o tsquery_rewrite.o tsquery_util.o tsrank.o \ >> tsvector.o tsvector_op.o tsvector_parser.o \ >> txid.o uuid.o windowfuncs.o xml.o rangetypes_spgist.o \ >> - rangetypes_typanalyze.o rangetypes_selfuncs.o >> + rangetypes_typanalyze.o rangetypes_selfuncs.o \ >> + jsonb.o jsonb_support.o > > Odd, most OBJS lines are kept in alphabetical order, but that doesn't > seem to be the case here. > >> +/* >> + * for jsonb we always want the de-escaped value - that's what's in token >> + */ >> + > > strange newline. > >> +static void >> +jsonb_in_scalar(void *state, char *token, JsonTokenType tokentype) >> +{ >> + JsonbInState *_state = (JsonbInState *) state; >> + JsonbValue v; >> + >> + v.size = sizeof(JEntry); >> + >> + switch (tokentype) >> + { >> + > ... > >> + default: /* nothing else should be here in fact */ >> + break; > > Shouldn't this at least Assert(false) or something? > >> +static void >> +recvJsonbValue(StringInfo buf, JsonbValue *v, uint32 level, int c) >> +{ >> + uint32 hentry = c & JENTRY_TYPEMASK; >> + >> + if (hentry == JENTRY_ISNULL) >> + { >> + v->type = jbvNull; >> + v->size = sizeof(JEntry); >> + } >> + else if (hentry == JENTRY_ISOBJECT || hentry == JENTRY_ISARRAY || hentry == JENTRY_ISCALAR) >> + { >> + recvJsonb(buf, v, level + 1, (uint32) c); >> + } >> + else if (hentry == JENTRY_ISFALSE || hentry == JENTRY_ISTRUE) >> + { >> + v->type = jbvBool; >> + v->size = sizeof(JEntry); >> + v->boolean = (hentry == JENTRY_ISFALSE) ? false : true; >> + } >> + else if (hentry == JENTRY_ISNUMERIC) >> + { >> + v->type = jbvNumeric; >> + v->numeric = DatumGetNumeric(DirectFunctionCall3(numeric_recv, PointerGetDatum(buf), >> + Int32GetDatum(0), Int32GetDatum(-1))); >> + >> + v->size = sizeof(JEntry) * 2 + VARSIZE_ANY(v->numeric); > > What's the *2 here? > >> +static void >> +recvJsonb(StringInfo buf, JsonbValue *v, uint32 level, uint32 header) >> +{ >> + uint32 hentry; >> + uint32 i; > > This function and recvJsonbValue call each other recursively, afaics > without any limit, shouldn't they check for the stack depth? > >> + hentry = header & JENTRY_TYPEMASK; >> + >> + v->size = 3 * sizeof(JEntry); > > *3? > >> + if (hentry == JENTRY_ISOBJECT) >> + { >> + v->type = jbvHash; >> + v->hash.npairs = header & JB_COUNT_MASK; >> + if (v->hash.npairs > 0) >> + { >> + v->hash.pairs = palloc(sizeof(*v->hash.pairs) * v->hash.npairs); >> + > > Hm, if I understand correctly, we're just allocating JB_COUNT_MASK > (which is 0x0FFFFFFF) * sizeof(*v->hash.pairs) bytes here, without any > crosschecks about the actual length of the data? So with a few bytes the > server can be coaxed to allocate a gigabyte of data? > Since this immediately calls another input routine, this can be done in > a nested fashion, quickly OOMing the server. > > I think this and several other places really need a bit more input > sanity checking. > >> + for (i = 0; i < v->hash.npairs; i++) >> + { >> + recvJsonbValue(buf, &v->hash.pairs[i].key, level, pq_getmsgint(buf, 4)); >> + if (v->hash.pairs[i].key.type != jbvString) >> + elog(ERROR, "jsonb's key could be only a string"); > > Shouldn't that be an ereport(ERRCODE_DATATYPE_MISMATCH)? Similar in a > few other places. > >> +char * >> +JsonbToCString(StringInfo out, char *in, int estimated_len) >> +{ >> + bool first = true; >> + JsonbIterator *it; >> + int type; >> + JsonbValue v; >> + int level = 0; >> + >> + if (out == NULL) >> + out = makeStringInfo(); > > Such a behaviour certainly deserves a documentary comment. Generally > some more functions could use that. > >> + while ((type = JsonbIteratorGet(&it, &v, false)) != 0) >> + { >> +reout: >> + switch (type) >> + { > ... >> + { >> + Assert(type == WJB_BEGIN_OBJECT || type == WJB_BEGIN_ARRAY); >> + goto reout; > > Hrmpf. > >> +Datum >> +jsonb_typeof(PG_FUNCTION_ARGS) >> +{ > ... >> +} > > Hm, shouldn't that be in jsonfuncs.c? > >> diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c >> index a19b222..f1eacc6 100644 >> --- a/src/backend/utils/adt/jsonfuncs.c >> +++ b/src/backend/utils/adt/jsonfuncs.c >> @@ -27,6 +27,7 @@ >> #include "utils/builtins.h" >> #include "utils/hsearch.h" >> #include "utils/json.h" >> +#include "utils/jsonb.h" >> #include "utils/jsonapi.h" >> #include "utils/lsyscache.h" >> #include "utils/memutils.h" >> @@ -51,6 +52,7 @@ static inline Datum get_path_all(PG_FUNCTION_ARGS, bool as_text); >> static inline text *get_worker(text *json, char *field, int elem_index, >> char **tpath, int *ipath, int npath, >> bool normalize_results); >> +static inline Datum get_jsonb_path_all(PG_FUNCTION_ARGS, bool as_text); > > I don't see the point of using PG_FUNCTION_ARGS if you're manually > calling it like > + return get_jsonb_path_all(fcinfo, false); > > That just makes it harder if someday PG_FUNCTION_ARGS grows a second > argument or something. > > >> +Datum >> +jsonb_object_keys(PG_FUNCTION_ARGS) >> +{ >> + FuncCallContext *funcctx; >> + OkeysState *state; >> + int i; >> + >> + if (SRF_IS_FIRSTCALL()) >> + { >> + MemoryContext oldcontext; >> + Jsonb *jb = PG_GETARG_JSONB(0); >> + bool skipNested = false; >> + JsonbIterator *it; >> + JsonbValue v; >> + int r = 0; >> + >> + if (JB_ROOT_IS_SCALAR(jb)) >> + ereport(ERROR, >> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), >> + errmsg("cannot call jsonb_object_keys on a scalar"))); >> + else if (JB_ROOT_IS_ARRAY(jb)) >> + ereport(ERROR, >> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), >> + errmsg("cannot call jsonb_object_keys on an array"))); >> + >> + funcctx = SRF_FIRSTCALL_INIT(); >> + oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx); > > This will detoast 'jb' into the expression context, since > PG_GETARG_JSONB() is called before the MemoryContextSwitchTo. But that's > ok since the percall code only deals with ->result, right? > >> - /* make these in a sufficiently long-lived memory context */ >> old_cxt = MemoryContextSwitchTo(rsi->econtext->ecxt_per_query_memory); > > wh remove that comment? > >> +#define JENTRY_ISCALAR (0x10000000 | 0x40000000) > > Isn't there an S missing here? > >> --- a/contrib/hstore/hstore_compat.c >> +++ b/contrib/hstore/hstore_compat.c >> +/* >> + * New Old version (new not-nested version of hstore, v2 version) >> + * V2 and v3 (nested) are upward binary compatible. But >> + * framework was fully changed. Keep here old definitions (v2) >> + */ > > That's an, err, interesting sentence. I think referring to old new > version and stuff is less than helpful. I realize lots of that is > baggage from existing code, but yet another version doesn't make it > easier. > > I lost my stomach (or maybe it was the glass of red) somewhere in the > middle, but I think this needs a lot of work. Especially the io code > doesn't seem ready to me. I'd consider ripping out the send/recv code > for 9.4, that seems the biggest can of worms. It will still be usable > without. Not having type send/recv functions is somewhat dangerous; it can cause problems for libraries that run everything through the binary wire format. I'd give jsonb a pass on that, being a new type, but would be concerned if hstore had that ability revoked. offhand note: hstore_send seems pretty simply written and clean; it's a simple nonrecursive iterator... merlin
pgsql-hackers by date: