Thread: [GENERAL] performance considerations of jsonb vs separate rows
Hi,
I’ve got a web app where I want to store user’s session data. The schema in this data changes a lot so it may be useful here to store the session properties in either a jsonb column, or in multiple rows. Something like:
session_id | data
100 { a: 1, bar: 2 ...
101 { a: 3, baz: 123 …
or
session_id | name | value
100 a 1
100 bar 2
101 baz 123
101 a 3
...
The app currently does something like option 1, but on an older pre-jsonb version of postgres, so the field is just text. I’m hoping to upgrade Postgres soon so jsonb is an option.
I'm wondering about the tradeoffs, specifically: is it possible to update one piece of a jsonb value without having to rewrite the entire field? There are cases where that data field was getting pretty big (500kb). Would you expect any obvious performance differences between these two options?
Yes, I’ll need to build performance tests myself, but that’s a lot of work to get two realistic situations with millions of rows, so I’m wondering about guesses or common knowledge on this.
thanks,
Rob
Am 19. Juni 2017 21:29:40 MESZ schrieb Rob Nikander <rob.nikander@gmail.com>: > >I'm wondering about the tradeoffs, specifically: is it possible to >update one piece of a jsonb value without having to rewrite the entire >field? Updates in PostgreSQL are always Delete & Insert. So the answer is no. For session-data please consider unlogged tables. Regards, Andreas -- 2ndQuadrant - The PostgreSQL Support Company
> On 19 June 2017 at 21:29, Rob Nikander <rob.nikander@gmail.com> wrote:
>
> I'm wondering about the tradeoffs, specifically: is it possible to update one piece of a jsonb value without having to rewrite the entire field? There are cases where that data field was getting pretty big (500kb). Would you expect any obvious performance differences between these two options?
Unfortunately no, an entire jsonb field has to be written back even if you've touched only one key.
From my own benchmarks it looks like you'll scarcely notice this (e.g. in comparison with MongoDB) only if you work
with small enough documents (about 2kb), and everything above this limit more or less seriously hit the performance.
You can take a look at this presentation [1] from Oleg Bartunov, it contains results of some benchmarks (from slide 44).
[1]: http://www.sai.msu.su/~megera/postgres/talks/jsonb-pgconf.us-2017.pdf
>
> I'm wondering about the tradeoffs, specifically: is it possible to update one piece of a jsonb value without having to rewrite the entire field? There are cases where that data field was getting pretty big (500kb). Would you expect any obvious performance differences between these two options?
Unfortunately no, an entire jsonb field has to be written back even if you've touched only one key.
From my own benchmarks it looks like you'll scarcely notice this (e.g. in comparison with MongoDB) only if you work
with small enough documents (about 2kb), and everything above this limit more or less seriously hit the performance.
You can take a look at this presentation [1] from Oleg Bartunov, it contains results of some benchmarks (from slide 44).
[1]: http://www.sai.msu.su/~megera/postgres/talks/jsonb-pgconf.us-2017.pdf
On Mon, Jun 19, 2017 at 2:29 PM, Rob Nikander <rob.nikander@gmail.com> wrote: > I'm wondering about the tradeoffs, specifically: is it possible to update > one piece of a jsonb value without having to rewrite the entire field? There > are cases where that data field was getting pretty big (500kb). Would you > expect any obvious performance differences between these two options? You are basically asking, what are the relative advantages/disadvantages of document model vs data store? This is a complicated discussion. Here are some tradeoffs: *) Document is always read/written in bulk. Row data reads/writes are more discrete (but generally postgres always reads/writes 8kb minimum!) *) for documents transaction tracking is for the entire document. This is more efficient for storage but can have very serious consequences if sub-portions of the document are updated under heavy concurrency. *) Documents are a pain if the structure changes in such a way so as to require invalidation of all of them. *) Documents can be a *real* pain if the data relationships change in some fundamental way. This is a pain with traditional tables as well, but relational type models tend to be the most flexible vs other approaches. Basically there is a reason why SQL and relational type systems won the 'data wars' of the 1970's and 1980's. There are downsides to the basic approach (especially performance due to joining) but the simplicity and elegance of being able to model just about any problem tends to compensate certain performance disadvantages. Document style storage tends to move the database model out of the database and into the application (which is a very poor tradeoff IMO) and fall over when some of the initial assumptions with respect to the document modeling discrete business units break down; you end up storing the same information over and over in different documents which causes all kinds of problems. They do tend to work well in low- or no- update applications however. merlin