Thread: Using JSONB directly from application
Hello,
It would be nice if application connected to a Postrgesql database could send and receive JSONB in binary. It could save some useless text conversion. All works could be done on application side which are often more scalable than database itself.Regards,
Anthony Communier
On 26 February 2018 at 04:05, Anthony Communier <anthony.communier@gmail.com> wrote:
Hello,It would be nice if application connected to a Postrgesql database could send and receive JSONB in binary. It could save some useless text conversion. All works could be done on application side which are often more scalable than database itself.
To support this, you'd need to extract PostgreSQL's jsonb support into a C library that could be used independently of backend server infrastructure like 'palloc' and memory contexts, ereport(), etc. Or write a parallel implementation.
It's one of the reasons some people questioned the wisdom of doing jsonb as a bespoke format not using protobufs or whatever. I'm not one of them, since I wasn't the one doing the work, and I also know how hard it can be to neatly fit general purpose library code into the DB server where we expect it to do little things like not abort() on malloc() failure.
If you're interested in writing such a library, I suggest proposing a broad design for how you intend to do it here. I suspect that duplicating enough server backend infrastructure to make the existing jsonb implementation friendly to frontend code would be frustrating, but maybe it's not as bad as I think. Certainly if you did such a thing, many people would thank you, because the inability to use ereport() and elog(), PG_TRY, the List API, etc, in FRONTEND code is a constant if minor source of irritation in PostgreSQL development.
(continuing an old thread from https://www.postgresql.org/message-id/CAMsr%2BYEtamQYZ5EocsuthQCvyvmRnQrucDP6GZynPtf0gsMbuw%40mail.gmail.com )
Craig Ringer <craig(at)2ndquadrant(dot)com>, 2018-02-26:
> On 26 February 2018 at 04:05, Anthony Communier <anthony(dot)communier(at)gmail(dot)com> wrote:
>
> > It would be nice if application connected to a Postrgesql database could
> > send and receive JSONB in binary. It could save some useless text
> > conversion. All works could be done on application side which are often
> > more scalable than database itself.
>
> To support this, you'd need to extract PostgreSQL's jsonb support into a C
> library that could be used independently of backend server infrastructure
> like 'palloc' and memory contexts, ereport(), etc. Or write a parallel
> implementation.
At Shift, we also have use cases that would likely be sped up quite a bit if we could avoid the conversion from JSONB to JSON, and instead pass binary JSONB to the application side and parse it there (in Go). I doubt we'd want to reuse any of Postgres's C code, and would instead go with your "parallel implementation" idea; I can't imagine it being particularly difficult to implement a JSONB parser from scratch.
All we need, I think, is a Postgres function raw_jsonb(jsonb) that returns bytea but is the identity function at the byte level. (Or allow a cast from jsonb to bytea.)
Our Go code would then send queries like SELECT col1, col2, raw_jsonb(col3) FROM table1 WHERE ...; I haven't thought in depth about how we'd parse the JSONB in Go, but perhaps we can synthesize a stream of JSON tokens from the binary JSONB (one token at a time, to avoid copies and allocations) and adapt the streaming parser https://github.com/json-iterator/go to turn it into Go values.
Sending raw JSONB to Postgres might also be interesting, but I'd start with receiving.
Would implementing raw_jsonb be as trivial as it sounds? What about usages like SELECT raw_jsonb(col3->'foo'); does the subobject returned by '->' share structure with the containing object, making the conversion to a self-contained JSONB value less direct?
Can these conversions be implemented without copying the bytes?
An open question about the API contract would be how raw_jsonb would be affected if Postgres introduces a version 2 of JSONB encoding. My intuition is to punt that problem to the application, and define that raw_jsonb returns whatever version of JSONB is most convenient for Postgres for that particular datum; this minimizes conversion work on the Postgres side, which is the purpose of the mechanism. Applications that want a stable format can use the conventional textual JSON format. But I could see a case for making the function raw_jsonb(int, jsonb) and allowing the caller to specify what (maximum?) version of JSONB they want.
On Mon, Feb 26, 2018 at 09:36:52PM +0800, Craig Ringer wrote: > On 26 February 2018 at 04:05, Anthony Communier <anthony.communier@gmail.com > > wrote: > > It would be nice if application connected to a Postrgesql database could > > send and receive JSONB in binary. It could save some useless text > > conversion. All works could be done on application side which are often > > more scalable than database itself. > > To support this, you'd need to extract PostgreSQL's jsonb support into a C > library that could be used independently of backend server infrastructure > like 'palloc' and memory contexts, ereport(), etc. Or write a parallel > implementation. > > It's one of the reasons some people questioned the wisdom of doing jsonb as > a bespoke format not using protobufs or whatever. I'm not one of them, > since I wasn't the one doing the work, and I also know how hard it can be > to neatly fit general purpose library code into the DB server where we > expect it to do little things like not abort() on malloc() failure. > > If you're interested in writing such a library, I suggest proposing a broad > design for how you intend to do it here. I suspect that duplicating enough > server backend infrastructure to make the existing jsonb implementation > friendly to frontend code would be frustrating, but maybe it's not as bad > as I think. Certainly if you did such a thing, many people would thank you, > because the inability to use ereport() and elog(), PG_TRY, the List API, > etc, in FRONTEND code is a constant if minor source of irritation in > PostgreSQL development. A good starting point might be to take the description of the JSONB on-disk encoding from src/include/utils/jsonb.h, fill in some of the blanks (the encoding of scalar types is not discussed in sufficient detail in that file, IIRC), and put it in an XML document, perhaps with some ASCII art. Eventually this could become yet another binary encoding of JSON that could be standardized at the IETF, say. The JSONB format is actually really neat, and deserves wider use outside PG, both as an interchange format, and as a storage format. Nico --
On 06/22/2018 11:52 PM, Christian Ohler wrote: > (continuing an old thread from > https://www.postgresql.org/message-id/CAMsr%2BYEtamQYZ5EocsuthQCvyvmRnQrucDP6GZynPtf0gsMbuw%40mail.gmail.com > ) > > Craig Ringer <craig(at)2ndquadrant(dot)com>, 2018-02-26: > >> On 26 February 2018 at 04:05, Anthony Communier > <anthony(dot)communier(at)gmail(dot)com> wrote: >> >> > It would be nice if application connected to a Postrgesql database could >> > send and receive JSONB in binary. It could save some useless text >> > conversion. All works could be done on application side which are often >> > more scalable than database itself. >> >> To support this, you'd need to extract PostgreSQL's jsonb support into a C >> library that could be used independently of backend server infrastructure >> like 'palloc' and memory contexts, ereport(), etc. Or write a parallel >> implementation. > > At Shift, we also have use cases that would likely be sped up quite a > bit if we could avoid the conversion from JSONB to JSON, and instead > pass binary JSONB to the application side and parse it there (in Go). I > doubt we'd want to reuse any of Postgres's C code, and would instead go > with your "parallel implementation" idea; I can't imagine it being > particularly difficult to implement a JSONB parser from scratch. > > All we need, I think, is a Postgres function raw_jsonb(jsonb) that > returns bytea but is the identity function at the byte level. (Or allow > a cast from jsonb to bytea.) > > Our Go code would then send queries like SELECT col1, col2, > raw_jsonb(col3) FROM table1 WHERE ...; I haven't thought in depth about > how we'd parse the JSONB in Go, but perhaps we can synthesize a stream > of JSON tokens from the binary JSONB (one token at a time, to avoid > copies and allocations) and adapt the streaming parser > https://github.com/json-iterator/go to turn it into Go values. > > Sending raw JSONB to Postgres might also be interesting, but I'd start > with receiving. > > Would implementing raw_jsonb be as trivial as it sounds? What about > usages like SELECT raw_jsonb(col3->'foo'); does the subobject returned > by '->' share structure with the containing object, making the > conversion to a self-contained JSONB value less direct? > > Can these conversions be implemented without copying the bytes? > I don't think you need the function, actually. PostgreSQL protocol supports both text and binary mode - in the text mode the server formats everything as text before sending it to the client. I guess this is what you mean by "convert to json". But with the extended protocol you (or rather the connection library you're using) can specify that the output should be handed in binary, i.e. as exact copy of the data. This happens at "bind" phase, see the "Bind" message docs here: https://www.postgresql.org/docs/current/static/protocol-message-formats.html > > An open question about the API contract would be how raw_jsonb would be > affected if Postgres introduces a version 2 of JSONB encoding. My > intuition is to punt that problem to the application, and define that > raw_jsonb returns whatever version of JSONB is most convenient for > Postgres for that particular datum; this minimizes conversion work on > the Postgres side, which is the purpose of the mechanism. Applications > that want a stable format can use the conventional textual JSON format. > But I could see a case for making the function raw_jsonb(int, jsonb) and > allowing the caller to specify what (maximum?) version of JSONB they want. > I doubt we'll introduce a new JSONB any time soon, so I wouldn't be particularly worried about this. If it eventually happens, you'll have to adapt your parser to that, I think. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 06/24/2018 12:42 PM, Tomas Vondra wrote: >> Sending raw JSONB to Postgres might also be interesting, but I'd start >> with receiving. >> >> Would implementing raw_jsonb be as trivial as it sounds? What about >> usages like SELECT raw_jsonb(col3->'foo'); does the subobject returned >> by '->' share structure with the containing object, making the >> conversion to a self-contained JSONB value less direct? >> >> Can these conversions be implemented without copying the bytes? >> > I don't think you need the function, actually. PostgreSQL protocol > supports both text and binary mode - in the text mode the server formats > everything as text before sending it to the client. I guess this is what > you mean by "convert to json". > > But with the extended protocol you (or rather the connection library > you're using) can specify that the output should be handed in binary, > i.e. as exact copy of the data. This happens at "bind" phase, see the > "Bind" message docs here: > > https://www.postgresql.org/docs/current/static/protocol-message-formats.html > > jsonb_send just sends 1 followed by the stringified value. If you want real binary transmission I think you'll need a new output version, by adjusting jsonb_send and jsonb_recv. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services