Thread: Benchmark of using JSON to transport query results in node.js

Benchmark of using JSON to transport query results in node.js

From
Mitar
Date:
Hi!

I made some benchmarks of using JSON to transport results to node.js
and it seems it really makes a difference over using native or
standard PostgreSQL. So the idea is that you simply wrap all results
into JSON like SELECT to_json(t) FROM (... original query ...) AS t. I
am guessing because node.js/JavaScript has really fast JSON parser but
for everything else there is overhead. See my blog post for more
details [1]. Any feedback welcome.

This makes me wonder. If serialization/deserialization makes such big
impact, where there efforts to improve how results are serialized for
over-the-wire transmission? For example, to use something like
Capnproto [2] to serialize into structure which can be directly used
without any real deserialization?

[1] https://mitar.tnode.com/post/181893159351/in-nodejs-always-query-in-json-from-postgresql
[2] https://capnproto.org/


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Benchmark of using JSON to transport query results in node.js

From
Tony Shelver
Date:
I'm fairly new to Postgres, but one question is how node.js implements the native driver when fetching the data: fetchall, fetchmany or fetch.single?  Also which native driver is it using?
Does the native driver do a round trip for each record fetched, or can it batch them into multiples?

For example, in the Oracle native driver (for Python, in my case), setting the cursor arraysize makes a huge performance difference when pulling back large datasets.
Pulling back 800k + records through a cursor on  a remote machine with the default arraysize was way too long(3 hours before I canceled it).  
Upping the arraysize to 800 dropped that to around 40 minutes, including loading each record into a local Postgres via a function call (more complex database structure to be handled).
This is on low-level test equipment.

This is a relevant issue for us, as we well be developing a new front end to our application. and we still haven't finalized the architecture. 
The backend build to date uses Python / Postgres.  Python/Flask is one option, possibly serving the data to Android / web via JSON / REST.
Another option is to query directly from node.js and get JSON or native query from the database (extensive use of functions / stored procedures).

Our application is data-intensive, involving a lot of geotracking data across hundreds of devices at it's core, and then quite a bit of geo/mapping/ analytics around that..



On Thu, 10 Jan 2019 at 23:52, Mitar <mmitar@gmail.com> wrote:
Hi!

I made some benchmarks of using JSON to transport results to node.js
and it seems it really makes a difference over using native or
standard PostgreSQL. So the idea is that you simply wrap all results
into JSON like SELECT to_json(t) FROM (... original query ...) AS t. I
am guessing because node.js/JavaScript has really fast JSON parser but
for everything else there is overhead. See my blog post for more
details [1]. Any feedback welcome.

This makes me wonder. If serialization/deserialization makes such big
impact, where there efforts to improve how results are serialized for
over-the-wire transmission? For example, to use something like
Capnproto [2] to serialize into structure which can be directly used
without any real deserialization?

[1] https://mitar.tnode.com/post/181893159351/in-nodejs-always-query-in-json-from-postgresql
[2] https://capnproto.org/


Mitar

--
http://mitar.tnode.com/
https://twitter.com/mitar_m

Re: Benchmark of using JSON to transport query results in node.js

From
Mitar
Date:
Hi!

On Fri, Jan 11, 2019 at 3:06 AM Tony Shelver <tshelver@gmail.com> wrote:
> I'm fairly new to Postgres, but one question is how node.js implements the native driver when fetching the data:
fetchall,fetchmany or fetch.single?  Also which native driver is it using?
 

The package I used is here [1]. It is using libpq. Is there some
better native driver to use? It seems it is mostly using PQexec call,
not sure how that call internally fetches the data [2]. I think the
reason for slowness is because parsing of results is still done in
JavaScript [3].

[1] https://github.com/brianc/node-pg-native
[2] https://www.postgresql.org/docs/devel/libpq-exec.html
[3] https://github.com/brianc/node-pg-native/blob/master/lib/build-result.js

> Does the native driver do a round trip for each record fetched, or can it batch them into multiples?

I think it waits for all results to first arrive using native driver
and then it starts processing it in JavaScript.

> Another option is to query directly from node.js and get JSON or native query from the database (extensive use of
functions/ stored procedures).
 

For web applications, I was even thinking about this crazy approach:
get PostgreSQL to encode all results in JSON, and then in node.js do
not parse JSON, but send it as string directly to the browser. I have
seen this many times in my other projects. That having REST and other
stuff in between requires you to read from the database, serialize it
into something, then you deserialize it back on the server-side, maybe
even wrap it into ORM objects (Django I am looking at you) just so
that you can send it to your REST code, which then converts it to JSON
and sends it over. From my older benchmarks using Python/Django this
added few 100ms very quickly when having for example time-series
GeoJSON data being read and send to the client through a REST
interface, few thousands of points. I hacked it so that I directly
passed through GeoJSON fields as strings by injecting GeoJSON into a
placeholder in otherwise serialized JSON (luckily JSON is recursive)
through pure string manipulation and it improved greatly. And I just
disabled all other REST content types except for JSON. We have not
used XML or YAML over our API anyway.

So to me it seems encoding all results in JSON is a win always in
node.js. If you consume it by node.js, great. If you are planing to
pass it on, maybe just pass it on as-is.

> Our application is data-intensive, involving a lot of geotracking data across hundreds of devices at it's core, and
thenquite a bit of geo/mapping/ analytics around that..
 

It seems maybe very similar to what I had in Python/Django/REST in the past. :-)


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m


Re: Benchmark of using JSON to transport query results in node.js

From
Michel Pelletier
Date:
On Fri, Jan 11, 2019 at 10:31 AM Mitar <mmitar@gmail.com> wrote:
Hi!

> Another option is to query directly from node.js and get JSON or native query from the database (extensive use of functions / stored procedures).

For web applications, I was even thinking about this crazy approach:
get PostgreSQL to encode all results in JSON, and then in node.js do
not parse JSON, but send it as string directly to the browser.

Not crazy at all, that's basically how PostgREST works:


-Michel