Thread: [GENERAL] performance considerations of jsonb vs separate rows

[GENERAL] performance considerations of jsonb vs separate rows

From
Rob Nikander
Date:
Hi,

I’ve got a web app where I want to store user’s session data. The schema in this data changes a lot so it may be useful here to store the session properties in either a jsonb column, or in multiple rows. Something like:

  session_id | data 
  100          { a: 1, bar: 2 ...
  101          { a: 3, baz: 123 …

or

  session_id | name | value
  100          a      1
  100          bar    2
  101          baz    123
  101          a      3
  ... 

The app currently does something like option 1, but on an older pre-jsonb version of postgres, so the field is just text. I’m hoping to upgrade Postgres soon so jsonb is an option.

I'm wondering about the tradeoffs, specifically: is it possible to update one piece of a jsonb value without having to rewrite the entire field? There are cases where that data field was getting pretty big (500kb). Would you expect any obvious performance differences between these two options?

Yes, I’ll need to build performance tests myself, but that’s a lot of work to get two realistic situations with millions of rows, so I’m wondering about guesses or common knowledge on this.

thanks,
Rob

Re: [GENERAL] performance considerations of jsonb vs separate rows

From
Andreas Kretschmer
Date:
Am 19. Juni 2017 21:29:40 MESZ schrieb Rob Nikander <rob.nikander@gmail.com>:

>
>I'm wondering about the tradeoffs, specifically: is it possible to
>update one piece of a jsonb value without having to rewrite the entire
>field?


Updates in PostgreSQL are always Delete & Insert. So the answer is no. For session-data please consider unlogged
tables.


Regards, Andreas


--
2ndQuadrant - The PostgreSQL Support Company


Re: [GENERAL] performance considerations of jsonb vs separate rows

From
Dmitry Dolgov
Date:
> On 19 June 2017 at 21:29, Rob Nikander <rob.nikander@gmail.com> wrote:
>
> I'm wondering about the tradeoffs, specifically: is it possible to update one piece of a jsonb value without having to rewrite the entire field? There are cases where that data field was getting pretty big (500kb). Would you expect any obvious performance differences between these two options?

Unfortunately no, an entire jsonb field has to be written back even if you've touched only one key.
From my own benchmarks it looks like you'll scarcely notice this (e.g. in comparison with MongoDB) only if you work
with small enough documents (about 2kb), and everything above this limit more or less seriously hit the performance.
You can take a look at this presentation [1] from Oleg Bartunov, it contains results of some benchmarks (from slide 44).

[1]: http://www.sai.msu.su/~megera/postgres/talks/jsonb-pgconf.us-2017.pdf

Re: [GENERAL] performance considerations of jsonb vs separate rows

From
Merlin Moncure
Date:
On Mon, Jun 19, 2017 at 2:29 PM, Rob Nikander <rob.nikander@gmail.com> wrote:
> I'm wondering about the tradeoffs, specifically: is it possible to update
> one piece of a jsonb value without having to rewrite the entire field? There
> are cases where that data field was getting pretty big (500kb). Would you
> expect any obvious performance differences between these two options?

You are basically asking, what are the relative
advantages/disadvantages of document model vs data store?  This is a
complicated discussion.  Here are some tradeoffs:

*) Document is always read/written in bulk.  Row data reads/writes are
more discrete (but generally postgres always reads/writes 8kb
minimum!)

*) for documents transaction tracking is for the entire document.
This is more efficient for storage but can have very serious
consequences if sub-portions of the document are updated under heavy
concurrency.

*) Documents are a pain if the structure changes in such a way so as
to require invalidation of all of them.

*) Documents can be a *real* pain if the data relationships change in
some fundamental way.   This is a pain with traditional tables as
well, but relational type models tend to be the most flexible vs other
approaches.

Basically there is a reason why SQL and relational type systems won
the 'data wars' of the 1970's and 1980's.   There are downsides to the
basic approach (especially performance due to joining) but the
simplicity and elegance of being able to model just about any problem
tends to compensate certain performance disadvantages.

Document style storage tends to move the database model out of the
database and into the application (which is a very poor tradeoff IMO)
and fall over when some of the initial assumptions with respect to the
document modeling discrete business units break down; you end up
storing the same information over and over in different documents
which causes all kinds of problems.  They do tend to work well in low-
or no- update applications however.

merlin