Re: jsonb and nested hstore - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: jsonb and nested hstore |
Date | |
Msg-id | CAM3SWZRase2g5Yyhg+vt1dzv85jmfyauE7A7yFBdToEULwojag@mail.gmail.com Whole thread Raw |
In response to | Re: jsonb and nested hstore (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: jsonb and nested hstore
Re: jsonb and nested hstore Re: jsonb and nested hstore Re: jsonb and nested hstore |
List | pgsql-hackers |
On Fri, Mar 7, 2014 at 9:00 AM, Bruce Momjian <bruce@momjian.us> wrote: > OK, it sounds like the adjustments are minimal, like not using the > high-order bit. Attached patch is a refinement of the work of Oleg, Teodor and Andrew. Revisions are mostly my own, although Andrew contributed too. Changes include: * Extensive relocation, and moderate restructuring of code. Many comments added, while many existing comments were copy-edited. Nothing remains in contrib. jsonb is a distinct, in-core type with no user-visible relationship to hstore. There is no code dependency between the two. The amount of code redundancy this turned out to create (between jsonb and an unchanged hstore) is, in my estimation, quite acceptable. * B-Tree and hash operator classes for the core type are included. A GiST operator class, and two GIN operator classes are also included. Obviously this is where I spent most time by far. * Everything else that was in hstore in the last revision (the complement of the hstore2 opclasses) is removed entirely. The patch is much smaller. If we just consider code (excluding tests and documentation), the diffstat seems far more manageable: ***SNIP*** src/backend/catalog/system_views.sql | 8 + src/backend/utils/adt/Makefile | 10 +- src/backend/utils/adt/json.c | 42 +- src/backend/utils/adt/jsonb.c | 461 +++++++++++ src/backend/utils/adt/jsonb_gin.c | 471 +++++++++++ src/backend/utils/adt/jsonb_gist.c | 699 ++++++++++++++++ src/backend/utils/adt/jsonb_op.c | 565 +++++++++++++ src/backend/utils/adt/jsonb_support.c | 1350 +++++++++++++++++++++++++++++++ src/backend/utils/adt/jsonfuncs.c | 1155 +++++++++++++++++++++++--- src/backend/utils/adt/numeric.c | 39 + src/include/catalog/pg_amop.h | 36 + src/include/catalog/pg_amproc.h | 18 +- src/include/catalog/pg_cast.h | 4 + src/include/catalog/pg_opclass.h | 5 + src/include/catalog/pg_operator.h | 37 +- src/include/catalog/pg_opfamily.h | 5 + src/include/catalog/pg_proc.h | 103 ++- src/include/catalog/pg_type.h | 8 + src/include/funcapi.h | 9 + src/include/utils/json.h | 15 + src/include/utils/jsonapi.h | 8 +- src/include/utils/jsonb.h | 245 ++++++ src/include/utils/numeric.h | 1 + ***SNIP*** That's less than 5,000 lines of code, which is actually not terribly much for a new datatype with voluminous though simple code to implement each of the new operators. I would like to emphasize that 100% of changes to C files occur at src/backend/utils/adt/* . On reflection I realize that I didn't like the "type aware" operators that existed for hstore in prior revisions, and I don't have a sense that I've made cuts to the patch to reduce the code footprint that I'd not have made with no constraints on time. While there are one or two exceptions to this, surprisingly, that's all. Frequently, it's sufficient to interact with jsonb using the jsonb shadow type system, and where it isn't then I think that shadow-type-specific operators are the wrong way to go. * Extensive additional documentation. References to the very new JSON RFC. I think that this revision is in general a lot more coherent, and I found that reflecting on what idiomatic usage should look like while writing the documentation brought clarity to my thoughts on how the code should be structured. The documentation is worth a read if you want to get a better sense of what the patch is about relatively quickly. * Operators for jsonb that comprise the default B-Tree operator class no longer look strange (i.e., they're no longer deliberately ugly to discourage their use, because unlike with hstore this doesn't make sense, particularly when sorting jsonb that only the shadow type system knows to be "raw strings"). * Numerous bug fixes. * Since preserving on-disk compatibility with hstore1 is no longer an objective, I was able to remove some code there. * The jsonb type has test coverage more or less equivalent to the previous revision where the now-expunged hstore2 type's output was tested (though there are some more tests that I myself added towards the end of this past week as part of the process of fixing bugs). I thank my Heroku colleague Maciek Sakrejda for working on this. This was largely a mechanical process. Concerns: * I would like to have a lot more comments on the GiST code from its authors, because it's currently quite difficult to follow. A couple of the bigger GiST support routines currently have zero useful comments. I would deem it acceptable to lose GiST support entirely for now, if the 700 line saving made it possible to squeeze this into 9.4. ISTM that the two GIN operator classes are far more useful, and their code is quite a bit simpler (apparently the state of the art here is a system that can query JSON-like structures reasonably well, but has a global write lock, which to me suggests that GIN represents a better trade-off). * That GiST code has bugs, too. I ran out of steam for fixing it before posting. The basic symptom is I saw is that when creating a large GiST index, sometimes you'll hit an infinite loop here: warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fffc70eb000 0x0000000000829de5 in hemdistsign (a=0x7fd6c833b260 "*\342\060\331\351b\201\250\002\060\212)\304\337\a*", b=0x17084f0 "") at jsonb_gist.c:655 655 if (GETBIT(a, i) != GETBIT(b, i)) (gdb) (gdb) (gdb) n 656 dist++; (gdb) 653 LOOPBIT (gdb) 655 if (GETBIT(a, i) != GETBIT(b, i)) (gdb) 653 LOOPBIT It's possible that I regress things in a way not captured by the regression test, and that this is my fault, but whatever the case let's either get it fixed or cut it for now. * The jsonb_hash_ops non-default GIN opclass is broken. It has its own idiosyncratic notion of what constitutes containment, that sees it only return, say, jsonb arrays that have a matching string as their "leftmost" element (if we ask it if it contains within it another array with the same string). Because of the limited number of indexable operators (only @>), I'd put this opclass in the same category as GiST in terms of my willingness to forgo it for a release, even if it did receive a loud applause at pgConf.EU. Again, it might be some disparity between the opertors as they existed in hstore2 at one time, and as they exist in the core code now, but I doubt it, not least since the regression tests didn't pick this up, and it's such a basic thing. Perhaps Oleg and Teodor just need to explain this to me. * More generally, I do not understand GiST and GIN to the same extent as several other people (even after excluding the authors of this patch). Feedback on those aspects from others would be particularly useful at this point. Is the text-based storage format (i.e. the serialization logic for the GiST and GIN default operator classes) appropriate? I tweaked the serialization logic here a bit, in order to fix a bug. Overall, most bugs fixed were along the lines of "index scans are not in agreement with sequential scans", and most fixes were simple enough. Roughly speaking, on a few occasions opclass support code needed to be reconciled with jsonb code (or existing core code that it's implemented in terms of). None of these bugs were difficult to fix, or in any way scary. Still, as I mentioned there are some open items here. * I haven't given the I/O code much attention, nor much of the code written by Andrew. In general I haven't done all that much with the parts of the patch that were previously proposed for core inclusion. * I could have spent more time on testing and breaking things some more, but didn't want to block on that. The more scrutiny this gets the better. Thoughts? -- Peter Geoghegan
Attachment
pgsql-hackers by date: