| Storing arbitrary PostgreSQL data types in JSONB

By Glukhov Nikita

Last month Alvaro Hernandez-Tortosa published a blog post discussing how the existing JSON features in PostgreSQL can be improved. The main request was about increasing the number of data types supported in JSONB. We addressed this concern and are ready to share our development team’s commentary.

Storing data of custom arbitrary data types in JSONB is impossible as you cannot keep track of dependencies between data and data types used within this data. Storing composite record types with a pre-known structure is possible because the record type is dependent on all its member types. If there is no dependency tracking, the old data type can be removed. After that, a new type with the same name or oid can be created. In this case, PostgreSQL will obviously not work correctly when trying to use the old binary data with the new type and may even crash. Therefore, custom data types have to be stored in text format to prevent PostgreSQL from crashes.

Some built-in types like bytea, point, int8, float8, timestamp can be stored in binary format to reduce storage space and speed up processing.

Support for storage of datetime types (date, time, timetz, timestamp, timestamptz) in JSONB would be very useful for JSONPATH and SQL/JSON, since JSONPATH can already work with datetime types. However, to work with them, it needs to convert datetime data from a string format.

Support for storage of multiple numeric data types (int2, int4, int8, float4, float8) in JSONB would also become a useful feature for JSONPATH. To comply with the new standard, it should be able to work with them like SQL. However, our current implementation supports only the numeric data type inherited from JSONB. All numeric data is pre-converted into this data type.

Possible JSONPATH Extensions

Along with support for new data types in JSONB, later on, it will be possible to extend JSONPATH so that it could use operators from the directory. 

This would make possible the execution of queries with geometric operators in JSONPATH, for example: jsonb_path_query (street,'$. houses [*] ? (@. location <-> "(1200,2300)"::point < 100)') (search for houses in the street located within the specified distance from the defined point). However, we should also take into account the problem with the tracking of dependencies between JSONPATH queries and types/operators in the system catalog. The same query may work differently for different users. Functional indexes cannot be built on such JSONPATH expressions, as they are not immutable.

Indexing

It is not difficult to add support for a limited set of data types in the existing opclasses for GIN indexes. However, you won't be able to index any of these data types, because the opclass needs to be aware of the internal structure of types. It shouldn't depend on changes in the system catalog either.

The implementation of index search on nested geometric data is of great interest. For example, in a single JSONPATH query employing an index, you could have combined operators of different types: shop @@ '$. location <-> "(1200,2300)" < 100 && $.name starts with "foo"'. However, the existing opclasses and even indexes (GIN, GiST, SP-GiST) require significant enhancements to support this solution.

 

A new data type or another JSONB extension?

 

Introducing a new data type instead of extending the JSONB type is undesirable. For the new data type you will need to duplicate the entire set of functions, operators, and opclasses already created for JSONB to keep its full functionality. Of course, you can't simply modify the behavior of functions like json_build_object() so that they start storing information about argument types to ensure compatibility. Alternatively, you can either add a set of new constructor functions, or introduce just one to_typed_jsonb () function that would wrap any data in JSONB, saving information about their type.

Then the old constructors will work as they have worked previously:

jsonb_build_object('i', 123, 'd', '2020-08-12'::date) =>{"d": "2020-08-12", "i": 123}

The new constructor functions will start to generate data in a new incompatible format:

jsonb_build_object_typed('i', 123, 
                         'd', '2020-08-12'::date), 

jsonb_build_object('i', to_typed_jsonb(123), 
                   'd', to_typed_jsonb('2020-08- 12'::date)) =>

  {"d": "2020-08-12"::date, "i": 123::int4}

Please expect further updates on the topic. We also understand that the community has more questions regarding the roadmap for JSON in PostgreSQL and are ready to answer many of them at our upcoming “Roadmap for JSON in PostgreSQL: What’s Next?” webinar with Oleg Bartunov starting at 7 PM CEST on Thursday, September 17, 2020.