Re: legacy assumptions - Mailing list pgsql-docs

From Jonathan S. Katz
Subject Re: legacy assumptions
Date
Msg-id cba4b70f-2ae8-83f8-0655-47f3bebea755@postgresql.org
Whole thread Raw
In response to legacy assumptions  (PG Doc comments form <noreply@postgresql.org>)
Responses Re: legacy assumptions  (Jonathan Buhacoff <jonathan@buhacoff.net>)
List pgsql-docs
Hi,

On 11/25/19 12:47 PM, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/12/datatype-json.html
> Description:
>
> I'm wondering if this one line of section 8.14 JSON Types
> (https://www.postgresql.org/docs/current/datatype-json.html) can be edited
> to remove the word "legacy":
>
> "In general, most applications should prefer to store JSON data as jsonb,
> unless there are quite specialized needs, such as legacy assumptions about
> ordering of object keys."
>
> I'm concerned that with the word "legacy" there, someone might come along
> eventually and decide the json column type isn't needed anymore because it's
> "legacy", where in fact there are modern and legitimate uses for a field
> that allows you to retrieve the data exactly as it was stored and allows
> JSON queries on that data (even if they are slower).

While I'm certainly sensitive to this need as once upon a time I had a
similar requirement, slightly less strict requirement, I made sure to
not rely on the PostgreSQL JSON type itself to ensure ordering was
preserved (and in my case I was able to rely on a solution external to
PostgreSQL).

The JSON RFC states that objects should be considered "unordered", and
mentions that while different parsing libraries may preserve key
ordering, "implementations whose behavior does not depend on member
ordering will be interoperable in the sense that they will not be
affected by these differences."[1]

> An alternative would be to store the
> plaintext as binary data for the integrity check and have a separate jsonb
> column with a second copy of the same data. Since different applications
> have different time/space tradeoffs, it's good to have the choice.

Another approach is to leverage PostgreSQL's expression index
capabilities, which would allow you to limit the data duplication. For
example:

CREATE TABLE docs (doc bytea);

-- populating some test data
INSERT INTO docs
SELECT ('{"id": ' || x || ', "data": [1,2,3] }')::bytea
FROM generate_series(1, 100000) x;

-- create an expression index that maps to the operators supported by GIN
CREATE INDEX docs_doc_json_idx ON docs
    USING gin(jsonb(encode(doc, 'escape')));

and in one test run:

EXPLAIN
SELECT doc
FROM docs WHERE encode(doc, 'escape')::jsonb @> '{"id": 567}';

I got a plan similar to:

                                     QUERY PLAN

------------------------------------------------------------------------------------
 Bitmap Heap Scan on docs  (cost=28.77..306.00 rows=100 width=31)
   Recheck Cond: ((encode(doc, 'escape'::text))::jsonb @> '{"id":
567}'::jsonb)
   ->  Bitmap Index Scan on docs_doc_json_idx  (cost=0.00..28.75
rows=100 width=0)
         Index Cond: ((encode(doc, 'escape'::text))::jsonb @> '{"id":
567}'::jsonb)

In this way, you can:

- Keep the key ordering preserved and perform any integrity checks, etc.
that your application requires
- Limit your data duplication to that of the index
- Still get the benefits of the JSONB lookup functions that work with
the indexing
- Still perform JSON validation:

INSERT INTO docs VALUES ('{]'::bytea);

ERROR:  invalid input syntax for type json
DETAIL:  Expected string or "}", but found "]".
CONTEXT:  JSON data, line 1: {]

> My suggestion for that sentence:
>
> "In general, most applications should prefer to store JSON data as jsonb,
> unless there are quite specialized needs, such as assumptions about ordering
> of object keys or the need to retrieve the data exactly as it was stored."

My preference would be that we guide in the documentation on what to do
if one has an application sensitive to ordering. I'm not opposed to the
wording, but I'd prefer we encourage people to leverage JSONB for
storage & retrieval.

Thanks!

Jonathan

[1] https://tools.ietf.org/html/rfc7159#section-4


Attachment

pgsql-docs by date:

Previous
From: PG Doc comments form
Date:
Subject: legacy assumptions
Next
From: Jonathan Buhacoff
Date:
Subject: Re: legacy assumptions