Re: JSON, JSONB questions - Mailing list pgsql-novice

From Merlin Moncure
Subject Re: JSON, JSONB questions
Date
Msg-id CAHyXU0wnCWb3Ltj8xMAFH4Pdivb3OTF8Ap6KKi6ScCDE3Z8+sA@mail.gmail.com
Whole thread Raw
In response to JSON, JSONB questions  (Tom C <barteri@gmail.com>)
List pgsql-novice
On Wed, Sep 28, 2016 at 9:07 PM, Tom C <barteri@gmail.com> wrote:
> I'm new to Postgres so forgive me if these are dumb question.
>
> I'm trying to understanding why you would use JSON over JSONB. From what I
> have seen so far, JSONB doesn't take that much more space than JSON and
> offers better indexing and search capabilities than plain JSON data type. Is
> it really just about preserving whitespace and structure? If that was the
> case then why bother having the JSON data type at all? Why not just stick
> with varchar

This was broadly discussed in the archives.  The reasons are:
*) pure varchar would drop validation and break code
*) json came out first and dropping support would break some edge cases
*) yes, whitespace and structure are in important in some cases.
*) jsonb is pessimal for pure serialization strategies.  If all I do
is run various variants of to_json, marshaling the json to a hash
table only to immediately re-serialize it to a string is pessimal.
There are a lot of uses for json in postgres besides document storage
*) some operations are faster textually.

The basic rule of thumb is that if you are actually storing data,
jsonb is a better choice.  At the time the json functionality was
being built out, I argued (and still think) the real mistakes made
were:

*) not packaging functionality into extensions
*) using type prefixed function names

> Second question is around indexing and searching on data types of JSONB. I
> understand that you can create indexes for specific fields within the JSON.
> Is there a performance penalty when adding indexes to specific fields in
> large JSON payload? For example, let's say I have a very complex nested JSON
> payload stored in a JSONB column. All I'm interested in indexing is a
> collection array within the JSON payload. Will it be more efficient if I
> break out the array into a separate JSONB column?

Basically, no. The index lookup will run the same speed either way
since (at least for btree) the indexed data is present in the index
itself.  During the scan you still have to verify the visibility
information in the heap but this has to be done regardless.  I think
(but did not measure) certain special cases (like BRIN index) might
benefit from breaking out the data since you always have to refer back
to the heap.

> Third and final question is how do you create an index into for an array
> element within the JSONB structure?

postgres=# create table foo (j jsonb);
CREATE TABLE

postgres=# insert into  foo values('[{"a": "b"}]');
INSERT 0 1

postgres=# create index on foo((j->0->>'a'));
CREATE INDEX

-- disincline the server from correctly preferring seq scans for 1 record tables
postgres=# set enable_seqscan to false;
SET

postgres=# explain select * from foo where j->0->>'a' = 'b';
                               QUERY PLAN
─────────────────────────────────────────────────────────────────────────
 Index Scan using foo_expr_idx on foo  (cost=0.12..8.14 rows=1 width=32)
   Index Cond: (((j -> 0) ->> 'a'::text) = 'b'::text)
(2 rows)

postgres=# select * from foo where j->0->>'a' = 'b';
      j
──────────────
 [{"a": "b"}]

merlin


pgsql-novice by date:

Previous
From: Tom C
Date:
Subject: JSON, JSONB questions
Next
From: "曾广宇"
Date:
Subject: pgadmin4 restore command keep running