Home > mailing lists

Thread: JSONB performance enhancement for 9.6

JSONB performance enhancement for 9.6

From

Tom Smith

Date:

29 November 2015, 02:28:04

Hello:

Is there a plan for 9.6 to resolve the issue of very slow query/retrieval of jsonb fields

when there are large number (maybe several thousands) of top level keys.

Currently, if I save a large json document with top level keys of thousands and query/retrieve

field values, the whole document has to be first decompressed and load to memory

before searching for the specific field key/value.

Thanks in Advance

Re: JSONB performance enhancement for 9.6

From

John R Pierce

Date:

29 November 2015, 05:37:55

On 11/28/2015 6:27 PM, Tom Smith wrote:
> Is there a plan for 9.6 to resolve the issue of very slow
> query/retrieval of jsonb fields
> when there are large number (maybe several thousands) of top level keys.
> Currently, if I save a large json document with top level keys of
> thousands and query/retrieve
> field values,  the whole document has to be first decompressed and
> load to memory
> before searching for the specific field key/value.

If it was my data, I'd be decomposing that large JSON thing into
multiple SQL records, and storing as much stuff as possible in named SQL
fields, using JSON in the database only for things that are too
ambiguous for SQL.

--
john r pierce, recycling bits in santa cruz

Re: JSONB performance enhancement for 9.6

From

Bill Moran

Date:

29 November 2015, 12:09:42

On Sat, 28 Nov 2015 21:27:51 -0500
Tom Smith <tomsmith1989sk@gmail.com> wrote:
>
> Is there a plan for 9.6 to resolve the issue of very slow query/retrieval
> of jsonb fields
> when there are large number (maybe several thousands) of top level keys.
> Currently, if I save a large json document with top level keys of thousands
> and query/retrieve
> field values,  the whole document has to be first decompressed and load to
> memory
> before searching for the specific field key/value.

I could be off-base here, but have you tried:

ATLER TABLE $table ALTER COLUMN $json_column SET STORAGE EXTERNAL;

?

The default storage for a JSONB field is EXTENDED. Switching it to
EXTERNAL will disable compression. You'll have to insert your data over
again, since this change doesn't alter any existing data, but see
if that change improves performance.

--
Bill Moran

Re: JSONB performance enhancement for 9.6

From

Francisco Olarte

Date:

29 November 2015, 12:29:20

Hi:
On Sun, Nov 29, 2015 at 1:09 PM, Bill Moran <wmoran@potentialtech.com> wrote:
> On Sat, 28 Nov 2015 21:27:51 -0500
>> Currently, if I save a large json document with top level keys of thousands
** LARGE **
> The default storage for a JSONB field is EXTENDED. Switching it to
> EXTERNAL will disable compression. You'll have to insert your data over
> again, since this change doesn't alter any existing data, but see
> if that change improves performance.

Good recomendation, but see if it improves AND if it affects other
queries in the system. Turning off compresion CAN decrease the
eficiency  ( hit ratio ) of the shared buffers and the cache, IIRC (
but worth testing anyway ).

Francisco Olarte.

Re: JSONB performance enhancement for 9.6

From

Thomas Kellerer

Date:

29 November 2015, 12:35:14

Tom Smith schrieb am 29.11.2015 um 03:27:
> Hello:
>
> Is there a plan for 9.6 to resolve the issue of very slow query/retrieval of jsonb fields
> when there are large number (maybe several thousands) of top level keys.
> Currently, if I save a large json document with top level keys of thousands and query/retrieve
> field values,  the whole document has to be first decompressed and load to memory
> before searching for the specific field key/value.
>
> Thanks in Advance

If you are concerned about the compression overhead, then why don't you use (or try) JSON instead?

Re: JSONB performance enhancement for 9.6

From

Tom Smith

Date:

29 November 2015, 13:24:17

Hi, Thanks for everyone's response.

The issue is not just compression, but lack of "indexing" or "segmentation" when a

single doc has, say 2000 top level keys (with multiple levels of subkeys).
right now, if I query for one key, the whole doc

has to be first uncompressed and loaded and then search for the single key.

Compared to traditional way of storing each top level key with a separate column, this is huge overhead when table scan is required. Some kind of "keyed/slotted" storage for the doc could

help, (for illustration, all keys starting with 'A' would have its own storage unit, so on,
so when I search for key "A1" only that unit would be unpacked and traversed to get :"A1" value". it is like postgresql predfine 26 columns/slots for the whole doc. an internal indexing

within each doc for fast retrieval of individual field values.

Someone mentioned a plan in roadmap for this route but I'd like to know if it is in 9.6 plan.

below url mentions the similar issue. I am not sure if it has been completely resolved.

http://stackoverflow.com/questions/26259374/jsonb-performance-degrades-as-number-of-keys-increase

below url mentions the potential issue.

https://www.reddit.com/r/PostgreSQL/comments/36rdlr/improving_select_performance_with_jsonb_vs_hstore/

Thanks

On Sun, Nov 29, 2015 at 7:35 AM, Thomas Kellerer <spam_eater@gmx.net> wrote:

Tom Smith schrieb am 29.11.2015 um 03:27:
Hello:

Is there a plan for 9.6 to resolve the issue of very slow query/retrieval of jsonb fields
when there are large number (maybe several thousands) of top level keys.
Currently, if I save a large json document with top level keys of thousands and query/retrieve
field values, the whole document has to be first decompressed and load to memory
before searching for the specific field key/value.

Thanks in Advance

If you are concerned about the compression overhead, then why don't you use (or try) JSON instead?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: JSONB performance enhancement for 9.6

From

Tom Smith

Date:

29 November 2015, 13:35:32

Unfortunately, the keys can not be predefined or fixed. it is a doc, the reason jsonb

is used. It works well for small docs with small number of keys.

but really slow with large number of keys. If this issue is resolved, I think Postgresql

would be an absolutely superior choice over MongoDB.for document data.

On Sun, Nov 29, 2015 at 12:37 AM, John R Pierce <pierce@hogranch.com> wrote:

On 11/28/2015 6:27 PM, Tom Smith wrote:
Is there a plan for 9.6 to resolve the issue of very slow query/retrieval of jsonb fields
when there are large number (maybe several thousands) of top level keys.
Currently, if I save a large json document with top level keys of thousands and query/retrieve
field values, the whole document has to be first decompressed and load to memory
before searching for the specific field key/value.

If it was my data, I'd be decomposing that large JSON thing into multiple SQL records, and storing as much stuff as possible in named SQL fields, using JSON in the database only for things that are too ambiguous for SQL.

--
john r pierce, recycling bits in santa cruz

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: JSONB performance enhancement for 9.6

From

Bill Moran

Date:

29 November 2015, 14:49:47

On Sun, 29 Nov 2015 08:24:12 -0500
Tom Smith <tomsmith1989sk@gmail.com> wrote:

> Hi, Thanks for everyone's response.
>
> The issue is not just compression, but lack of "indexing" or "segmentation"
> when a
> single doc has, say 2000 top level keys (with multiple levels of subkeys).
>  right now, if I query for one key,  the whole doc
> has to be first uncompressed and loaded and then search for the single key.
>
> Compared to traditional way of storing each top level key with a separate
> column, this is huge overhead when table scan is required.  Some kind of
> "keyed/slotted" storage for the doc could
> help, (for illustration, all keys starting with 'A' would have its own
> storage unit, so on,
> so when I search for key  "A1" only that unit would be unpacked and
> traversed to get :"A1" value". it is like postgresql predfine 26
> columns/slots for the whole doc. an internal indexing
> within each doc for fast retrieval of individual field values.

Sounds like you're pushing the limits of what JSONB is designed to do
(at this stage, at least). I'm not aware of any improvements in recent
versions (or head) that would do much to improve the situation, but I
don't track ever commit either. If you really need this improvement and
you're willing to wait for 9.6, then I suggest you check out the latest
git version and test on that to see if anything has been done.

I doubt you'll see much, though. As a thought experiment, the only way
I can think to improve this use case is to ditch the current TOAST
system and replace it with something that stores large JSON values in
a form optimized for indexed access. That's a pretty massive change
to some fairly core stuff just to optimize a single use-case of a
single data type. Not saying it won't happen ... in fact, all things
considered, it's pretty likely to happen at some point.

As far as a current solution: my solution would be to decompose the
JSON into an optimized table. I.e.:

CREATE TABLE store1 (
 id SERIAL PRIMARY KEY,
 data JSONB
);

CREATE TABLE store2 (
 id INT NOT NULL REFERENCES store1(id),
 top_level_key VARCHAR(1024),
 data JSONB,
 PRIMARY KEY(top_level_key, id)
);

You can then use a trigger to ensure that store2 is always in sync with
store1. Lookups can then use store2 and will be quite fast because of
the index. A lot of the design is conjectural: do you even still need
the data column on store1? Are there other useful indexes? etc. But,
hopefully the general idea is made clear.

This probably aren't the answers you want, but (to the best of my
knowledge) they're the best answers available at this time. I'd really
like to build the alternate TOAST storage, but I'm not in a position to
start on a project that ambitious right ... I'm not even really keeping
up with the project I'm currently supposed to be doing.

--
Bill Moran

Re: JSONB performance enhancement for 9.6

From

Arthur Silva

Date:

29 November 2015, 15:30:50

Is this correct? I'm fairly sure jsonb supports lazily parsing objects and each object level is actually searched using binary search.

Em 29/11/2015 11:25 AM, "Tom Smith" <tomsmith1989sk@gmail.com> escreveu:

Hi, Thanks for everyone's response.

The issue is not just compression, but lack of "indexing" or "segmentation" when a
single doc has, say 2000 top level keys (with multiple levels of subkeys).
right now, if I query for one key, the whole doc
has to be first uncompressed and loaded and then search for the single key.

Compared to traditional way of storing each top level key with a separate column, this is huge overhead when table scan is required. Some kind of "keyed/slotted" storage for the doc could
help, (for illustration, all keys starting with 'A' would have its own storage unit, so on,
so when I search for key "A1" only that unit would be unpacked and traversed to get :"A1" value". it is like postgresql predfine 26 columns/slots for the whole doc. an internal indexing
within each doc for fast retrieval of individual field values.

Someone mentioned a plan in roadmap for this route but I'd like to know if it is in 9.6 plan.

below url mentions the similar issue. I am not sure if it has been completely resolved.

http://stackoverflow.com/questions/26259374/jsonb-performance-degrades-as-number-of-keys-increase

below url mentions the potential issue.

https://www.reddit.com/r/PostgreSQL/comments/36rdlr/improving_select_performance_with_jsonb_vs_hstore/

Thanks

On Sun, Nov 29, 2015 at 7:35 AM, Thomas Kellerer <spam_eater@gmx.net> wrote:
Tom Smith schrieb am 29.11.2015 um 03:27:
Hello:

Is there a plan for 9.6 to resolve the issue of very slow query/retrieval of jsonb fields
when there are large number (maybe several thousands) of top level keys.
Currently, if I save a large json document with top level keys of thousands and query/retrieve
field values, the whole document has to be first decompressed and load to memory
before searching for the specific field key/value.

Thanks in Advance

If you are concerned about the compression overhead, then why don't you use (or try) JSON instead?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: JSONB performance enhancement for 9.6

From

Tom Lane

Date:

29 November 2015, 17:54:59

Bill Moran <wmoran@potentialtech.com> writes:
> Tom Smith <tomsmith1989sk@gmail.com> wrote:
>> Is there a plan for 9.6 to resolve the issue of very slow
>> query/retrieval of jsonb fields when there are large number (maybe
>> several thousands) of top level keys.  Currently, if I save a large
>> json document with top level keys of thousands and query/retrieve field
>> values, the whole document has to be first decompressed and load to
>> memory before searching for the specific field key/value.

> I could be off-base here, but have you tried:
> ATLER TABLE $table ALTER COLUMN $json_column SET STORAGE EXTERNAL;

There is just about zero chance we'll ever worry about this for compressed
columns.  However, even the uncompressed case does currently involve
loading the whole column value, as Tom says.  We did consider the
possibility of such an optimization when designing the JSONB storage
format, but I don't know of anyone actively working on it.

In any case, it's unlikely that it'd ever be super fast, since it's
certainly going to involve at least a couple of TOAST fetches.
Personally I'd be looking for a different representation.  If there
are specific fields that are known to be needed a lot, maybe functional
indexes would help?

            regards, tom lane

Re: JSONB performance enhancement for 9.6

From

Tom Smith

Date:

29 November 2015, 18:10:15

Hi:

The goal is fast retrieval of a a field value with a row when the row is already

picked, one scenario is download a particular field value (if exists) of all rows in the table.

It is actually a very common use case of exporting data of several user selected fields.

The performance is extremely slow.

Thanks

On Sun, Nov 29, 2015 at 12:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Bill Moran <wmoran@potentialtech.com> writes:
> Tom Smith <tomsmith1989sk@gmail.com> wrote:
>> Is there a plan for 9.6 to resolve the issue of very slow
>> query/retrieval of jsonb fields when there are large number (maybe
>> several thousands) of top level keys. Currently, if I save a large
>> json document with top level keys of thousands and query/retrieve field
>> values, the whole document has to be first decompressed and load to
>> memory before searching for the specific field key/value.

> I could be off-base here, but have you tried:
> ATLER TABLE $table ALTER COLUMN $json_column SET STORAGE EXTERNAL;

There is just about zero chance we'll ever worry about this for compressed
columns. However, even the uncompressed case does currently involve
loading the whole column value, as Tom says. We did consider the
possibility of such an optimization when designing the JSONB storage
format, but I don't know of anyone actively working on it.

In any case, it's unlikely that it'd ever be super fast, since it's
certainly going to involve at least a couple of TOAST fetches.
Personally I'd be looking for a different representation. If there
are specific fields that are known to be needed a lot, maybe functional
indexes would help?

regards, tom lane

Re: JSONB performance enhancement for 9.6

From

Jim Nasby

Date:

03 December 2015, 23:01:22

On 11/29/15 9:30 AM, Arthur Silva wrote:
> Is this correct? I'm fairly sure jsonb supports lazily parsing objects
> and each object level is actually searched using binary search.

The problem is there's no support for loading just part of a TOASTed
field. Even if that existed, we'd still need a way to know what byte
position in the TOASTed field a key lived at.

It's possible to add all that, but I think it'd be a serious amount of work.

Since someone else was just wondering about storing more specific types
in JSON, it might be more useful/interesting to devise a
Postgres-specific way to store variable schema documents. That would
give us a lot more flexibility over implementation than the JSON type
ever will.

Or think about it this way: there's really no great reason why everyone
chose JSON. There's tons of other serialization storage formats for
other languages out there, so why not one specialized to Postgres? (And
of course we'd want to be able to cast from that to JSON and back...)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: JSONB performance enhancement for 9.6

From

Tom Smith

Date:

12 January 2016, 02:01:08

Hi,

Congrats on the official release of 9.5

And I'd like bring up the issue again about if 9.6 would address the jsonb performance issue

with large number of top level keys.

It is true that it does not have to use JSON format. it is about serialization and fast retrieval

of dynamic tree structure objects. (at top level, it might be called dynamic columns)

So if postgresql can have its own way, that would work out too as long as it can have intuitive query

(like what are implemented for json and jsonb) and fast retrieval of a tree like object,

it can be called no-sql data type. After all, most motivations of using no-sql dbs like MongoDB
is about working with dynamic tree object.

If postgresql can have high performance on this, then many no-sql dbs would become history.

Thanks

On Thu, Dec 3, 2015 at 5:31 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

On 11/29/15 9:30 AM, Arthur Silva wrote:
Is this correct? I'm fairly sure jsonb supports lazily parsing objects
and each object level is actually searched using binary search.

The problem is there's no support for loading just part of a TOASTed field. Even if that existed, we'd still need a way to know what byte position in the TOASTed field a key lived at.

It's possible to add all that, but I think it'd be a serious amount of work.

Since someone else was just wondering about storing more specific types in JSON, it might be more useful/interesting to devise a Postgres-specific way to store variable schema documents. That would give us a lot more flexibility over implementation than the JSON type ever will.

Or think about it this way: there's really no great reason why everyone chose JSON. There's tons of other serialization storage formats for other languages out there, so why not one specialized to Postgres? (And of course we'd want to be able to cast from that to JSON and back...)
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

Re: JSONB performance enhancement for 9.6

From

Bruce Momjian

Date:

20 January 2016, 01:51:46

On Mon, Jan 11, 2016 at 09:01:03PM -0500, Tom Smith wrote:
> Hi,
>
> Congrats on the official release of 9.5
>
> And I'd like bring up the issue again about if 9.6 would address the jsonb
> performance issue
> with large number of top level keys.
> It is true that it does not have to use JSON format. it is about serialization
> and fast retrieval
> of dynamic tree structure objects. (at top level, it might be called dynamic
> columns)
> So if postgresql can have its own way, that would work out too as long as it
> can have intuitive query
> (like what are implemented for json and jsonb) and fast retrieval of a tree
> like object,
> it can be called no-sql data type. After all, most motivations of using no-sql
> dbs like MongoDB
> is about working with dynamic tree object.
>
> If postgresql can have high performance on this, then many no-sql dbs would
> become history.

I can give you some backstory on this.  TOAST was designed in 2001 as a
way to store, in a data-type-agnostic way, long strings compressed and
any other long data type, e.g. long arrays.

In all previous cases, _part_ of the value wasn't useful.  JSONB is a
unique case because it is one of the few types that can be processed
without reading the entire value, e.g. it has an index.

We are going to be hesitant to do something data-type-specific for
JSONB.  It would be good if we could develop a data-type-agnostic
approach to has TOAST can be improved.  I know of no such work for 9.6,
and it is unlikely it will be done in time for 9.6.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription                             +

Re: JSONB performance enhancement for 9.6

From

Alvaro Herrera

Date:

20 January 2016, 02:53:33

Bill Moran wrote:

> As far as a current solution: my solution would be to decompose the
> JSON into an optimized table. I.e.:
>
> CREATE TABLE store1 (
>  id SERIAL PRIMARY KEY,
>  data JSONB
> );
>
> CREATE TABLE store2 (
>  id INT NOT NULL REFERENCES store1(id),
>  top_level_key VARCHAR(1024),
>  data JSONB,
>  PRIMARY KEY(top_level_key, id)
> );

Isn't this what ToroDB already does?
https://www.8kdata.com/torodb/

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: JSONB performance enhancement for 9.6

From

Oleg Bartunov

Date:

20 January 2016, 08:33:02

On Wed, Jan 20, 2016 at 4:51 AM, Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Jan 11, 2016 at 09:01:03PM -0500, Tom Smith wrote:
> Hi,
>
> Congrats on the official release of 9.5
>
> And I'd like bring up the issue again about if 9.6 would address the jsonb
> performance issue
> with large number of top level keys.
> It is true that it does not have to use JSON format. it is about serialization
> and fast retrieval
> of dynamic tree structure objects. (at top level, it might be called dynamic
> columns)
> So if postgresql can have its own way, that would work out too as long as it
> can have intuitive query
> (like what are implemented for json and jsonb) and fast retrieval of a tree
> like object,
> it can be called no-sql data type. After all, most motivations of using no-sql
> dbs like MongoDB
> is about working with dynamic tree object.
>
> If postgresql can have high performance on this, then many no-sql dbs would
> become history.

I can give you some backstory on this. TOAST was designed in 2001 as a
way to store, in a data-type-agnostic way, long strings compressed and
any other long data type, e.g. long arrays.

In all previous cases, _part_ of the value wasn't useful. JSONB is a
unique case because it is one of the few types that can be processed
without reading the entire value, e.g. it has an index.

We are going to be hesitant to do something data-type-specific for
JSONB. It would be good if we could develop a data-type-agnostic
approach to has TOAST can be improved. I know of no such work for 9.6,
and it is unlikely it will be done in time for 9.6.

I'm looking on this time to time.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: JSONB performance enhancement for 9.6

From

Dorian Hoxha

Date:

20 January 2016, 10:37:59

Is there any database that actually supports what the original poster wanted ?

The only thing that I know that's similar is bigtable/hbase/hypertable wide column store.

The way it works is:
break the lexicographically sorted rows into blocks of compressed XXKB, and then keeps an index on the start_key+end_key of each block.

This way we can store the index(that links to several toast values) on the row and depending on which key you need it will get+decompress the required block.

You can interpret nested values by using a separator on the key like "first_level:2ndlevel:3rd_level:value".

If the index is too big, you can store the index itself in a toast value.

Note: I have no idea how to(if it can be) actually code this.

On Wed, Jan 20, 2016 at 9:32 AM, Oleg Bartunov <obartunov@gmail.com> wrote:

On Wed, Jan 20, 2016 at 4:51 AM, Bruce Momjian <bruce@momjian.us> wrote:
On Mon, Jan 11, 2016 at 09:01:03PM -0500, Tom Smith wrote:
> Hi,
>
> Congrats on the official release of 9.5
>
> And I'd like bring up the issue again about if 9.6 would address the jsonb
> performance issue
> with large number of top level keys.
> It is true that it does not have to use JSON format. it is about serialization
> and fast retrieval
> of dynamic tree structure objects. (at top level, it might be called dynamic
> columns)
> So if postgresql can have its own way, that would work out too as long as it
> can have intuitive query
> (like what are implemented for json and jsonb) and fast retrieval of a tree
> like object,
> it can be called no-sql data type. After all, most motivations of using no-sql
> dbs like MongoDB
> is about working with dynamic tree object.
>
> If postgresql can have high performance on this, then many no-sql dbs would
> become history.

I can give you some backstory on this. TOAST was designed in 2001 as a
way to store, in a data-type-agnostic way, long strings compressed and
any other long data type, e.g. long arrays.

In all previous cases, _part_ of the value wasn't useful. JSONB is a
unique case because it is one of the few types that can be processed
without reading the entire value, e.g. it has an index.

We are going to be hesitant to do something data-type-specific for
JSONB. It would be good if we could develop a data-type-agnostic
approach to has TOAST can be improved. I know of no such work for 9.6,
and it is unlikely it will be done in time for 9.6.

I'm looking on this time to time.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: JSONB performance enhancement for 9.6

From

Bill Moran

Date:

20 January 2016, 12:33:06

On Tue, 19 Jan 2016 23:53:19 -0300
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> Bill Moran wrote:
>
> > As far as a current solution: my solution would be to decompose the
> > JSON into an optimized table. I.e.:
> >
> > CREATE TABLE store1 (
> >  id SERIAL PRIMARY KEY,
> >  data JSONB
> > );
> >
> > CREATE TABLE store2 (
> >  id INT NOT NULL REFERENCES store1(id),
> >  top_level_key VARCHAR(1024),
> >  data JSONB,
> >  PRIMARY KEY(top_level_key, id)
> > );
>
> Isn't this what ToroDB already does?
> https://www.8kdata.com/torodb/

Looks like. I wasn't aware of ToroDB, thanks for the link.

--
Bill Moran

Re: JSONB performance enhancement for 9.6

From

Tom Smith

Date:

09 February 2016, 04:10:58

Using JSON/JSONB type in postgresql is usually due to the use case that the keys (top level included) can not be predefined. this is the major difference between NoSQL/Document and RDBMS.

Why would TOAST have to be used? Can some speciailly structured "raw" files be used

outside current database files? and jsonb column value would be a pointer to that file.

On Wed, Jan 20, 2016 at 7:32 AM, Bill Moran <wmoran@potentialtech.com> wrote:

On Tue, 19 Jan 2016 23:53:19 -0300
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

> Bill Moran wrote:
>
> > As far as a current solution: my solution would be to decompose the
> > JSON into an optimized table. I.e.:
> >
> > CREATE TABLE store1 (
> > id SERIAL PRIMARY KEY,
> > data JSONB
> > );
> >
> > CREATE TABLE store2 (
> > id INT NOT NULL REFERENCES store1(id),
> > top_level_key VARCHAR(1024),
> > data JSONB,
> > PRIMARY KEY(top_level_key, id)
> > );
>
> Isn't this what ToroDB already does?
> https://www.8kdata.com/torodb/

Looks like. I wasn't aware of ToroDB, thanks for the link.

--
Bill Moran

Re: JSONB performance enhancement for 9.6

From

Álvaro Hernández Tortosa

Date:

09 February 2016, 04:38:40


On 20/01/16 13:32, Bill Moran wrote:
> On Tue, 19 Jan 2016 23:53:19 -0300
> Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
>> Bill Moran wrote:
>>
>>> As far as a current solution: my solution would be to decompose the
>>> JSON into an optimized table. I.e.:
>>>
>>> CREATE TABLE store1 (
>>>   id SERIAL PRIMARY KEY,
>>>   data JSONB
>>> );
>>>
>>> CREATE TABLE store2 (
>>>   id INT NOT NULL REFERENCES store1(id),
>>>   top_level_key VARCHAR(1024),
>>>   data JSONB,
>>>   PRIMARY KEY(top_level_key, id)
>>> );
>> Isn't this what ToroDB already does?
>> https://www.8kdata.com/torodb/
> Looks like. I wasn't aware of ToroDB, thanks for the link.
>

     Hi Bill.

     Effectively, that's what ToroDB does. You will have a dynamic
schema, but automatically created for you. It will be a relational
schema, no json/jsonb needed for your data.

     Please read the FAQ: https://github.com/torodb/torodb/wiki/FAQ and
let us know (https://groups.google.com/forum/#!forum/torodb-dev) if you
would have any additional question.

     Thanks, Álvaro, for the reference :)

     Cheers,

     Álvaro

Re: JSONB performance enhancement for 9.6

From

Tom Smith

Date:

01 May 2016, 03:46:55

Hello:

I'd like to bring this JSONB performance issue again.

Below is a link of MySQL way of storing/retrieving Json key/value

https://dev.mysql.com/doc/refman/5.7/en/json.html

Instead of providing column indexing(like GIN for JSONB in Postgresql).

it provides only internal data structure level indexing within each individual json object

for fast retrieval. compression is not used.

Perhaps without implementing complicated column level GIN indexing, implementing

a new variant JSON type that only handle individual json object indexing would be

feasible? Combined with current JSONB implementation, both common use cases

(one is global doc indexing, the other is fast retrieval of individual values)

would work out and make postgresql unbeatable.

On Tue, Jan 19, 2016 at 8:51 PM, Bruce Momjian <bruce@momjian.us> wrote:

On Mon, Jan 11, 2016 at 09:01:03PM -0500, Tom Smith wrote:
> Hi,
>
> Congrats on the official release of 9.5
>
> And I'd like bring up the issue again about if 9.6 would address the jsonb
> performance issue
> with large number of top level keys.
> It is true that it does not have to use JSON format. it is about serialization
> and fast retrieval
> of dynamic tree structure objects. (at top level, it might be called dynamic
> columns)
> So if postgresql can have its own way, that would work out too as long as it
> can have intuitive query
> (like what are implemented for json and jsonb) and fast retrieval of a tree
> like object,
> it can be called no-sql data type. After all, most motivations of using no-sql
> dbs like MongoDB
> is about working with dynamic tree object.
>
> If postgresql can have high performance on this, then many no-sql dbs would
> become history.

I can give you some backstory on this. TOAST was designed in 2001 as a
way to store, in a data-type-agnostic way, long strings compressed and
any other long data type, e.g. long arrays.

In all previous cases, _part_ of the value wasn't useful. JSONB is a
unique case because it is one of the few types that can be processed
without reading the entire value, e.g. it has an index.

We are going to be hesitant to do something data-type-specific for
JSONB. It would be good if we could develop a data-type-agnostic
approach to has TOAST can be improved. I know of no such work for 9.6,
and it is unlikely it will be done in time for 9.6.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +

Re: JSONB performance enhancement for 9.6

From

Oleg Bartunov

Date:

01 May 2016, 20:14:51

On Sun, May 1, 2016 at 6:46 AM, Tom Smith <tomsmith1989sk@gmail.com> wrote:

Hello:

I'd like to bring this JSONB performance issue again.
Below is a link of MySQL way of storing/retrieving Json key/value

https://dev.mysql.com/doc/refman/5.7/en/json.html

Instead of providing column indexing(like GIN for JSONB in Postgresql).
it provides only internal data structure level indexing within each individual json object
for fast retrieval. compression is not used.

Perhaps without implementing complicated column level GIN indexing, implementing
a new variant JSON type that only handle individual json object indexing would be
feasible? Combined with current JSONB implementation, both common use cases
(one is global doc indexing, the other is fast retrieval of individual values)
would work out and make postgresql unbeatable.

It's called expression index ?

On Tue, Jan 19, 2016 at 8:51 PM, Bruce Momjian <bruce@momjian.us> wrote:
On Mon, Jan 11, 2016 at 09:01:03PM -0500, Tom Smith wrote:
> Hi,
>
> Congrats on the official release of 9.5
>
> And I'd like bring up the issue again about if 9.6 would address the jsonb
> performance issue
> with large number of top level keys.
> It is true that it does not have to use JSON format. it is about serialization
> and fast retrieval
> of dynamic tree structure objects. (at top level, it might be called dynamic
> columns)
> So if postgresql can have its own way, that would work out too as long as it
> can have intuitive query
> (like what are implemented for json and jsonb) and fast retrieval of a tree
> like object,
> it can be called no-sql data type. After all, most motivations of using no-sql
> dbs like MongoDB
> is about working with dynamic tree object.
>
> If postgresql can have high performance on this, then many no-sql dbs would
> become history.

I can give you some backstory on this. TOAST was designed in 2001 as a
way to store, in a data-type-agnostic way, long strings compressed and
any other long data type, e.g. long arrays.

In all previous cases, _part_ of the value wasn't useful. JSONB is a
unique case because it is one of the few types that can be processed
without reading the entire value, e.g. it has an index.

We are going to be hesitant to do something data-type-specific for
JSONB. It would be good if we could develop a data-type-agnostic
approach to has TOAST can be improved. I know of no such work for 9.6,
and it is unlikely it will be done in time for 9.6.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +

Re: JSONB performance enhancement for 9.6

From

Tom Smith

Date:

01 May 2016, 20:22:45

No, it is within the individual json object storage. In a way, it would be part of query plan,

but strictly for the individual json object storage structure, it is not necessarily an "index"

one possible(but primitive) implementation could be like having multiple "segments" in the storage,

all keys starting with "a" is in first segment, etc.

On Sun, May 1, 2016 at 4:14 PM, Oleg Bartunov <obartunov@gmail.com> wrote:

On Sun, May 1, 2016 at 6:46 AM, Tom Smith <tomsmith1989sk@gmail.com> wrote:
Hello:

I'd like to bring this JSONB performance issue again.
Below is a link of MySQL way of storing/retrieving Json key/value

https://dev.mysql.com/doc/refman/5.7/en/json.html

Instead of providing column indexing(like GIN for JSONB in Postgresql).
it provides only internal data structure level indexing within each individual json object
for fast retrieval. compression is not used.

Perhaps without implementing complicated column level GIN indexing, implementing
a new variant JSON type that only handle individual json object indexing would be
feasible? Combined with current JSONB implementation, both common use cases
(one is global doc indexing, the other is fast retrieval of individual values)
would work out and make postgresql unbeatable.

It's called expression index ?

On Tue, Jan 19, 2016 at 8:51 PM, Bruce Momjian <bruce@momjian.us> wrote:
On Mon, Jan 11, 2016 at 09:01:03PM -0500, Tom Smith wrote:
> Hi,
>
> Congrats on the official release of 9.5
>
> And I'd like bring up the issue again about if 9.6 would address the jsonb
> performance issue
> with large number of top level keys.
> It is true that it does not have to use JSON format. it is about serialization
> and fast retrieval
> of dynamic tree structure objects. (at top level, it might be called dynamic
> columns)
> So if postgresql can have its own way, that would work out too as long as it
> can have intuitive query
> (like what are implemented for json and jsonb) and fast retrieval of a tree
> like object,
> it can be called no-sql data type. After all, most motivations of using no-sql
> dbs like MongoDB
> is about working with dynamic tree object.
>
> If postgresql can have high performance on this, then many no-sql dbs would
> become history.

I can give you some backstory on this. TOAST was designed in 2001 as a
way to store, in a data-type-agnostic way, long strings compressed and
any other long data type, e.g. long arrays.

In all previous cases, _part_ of the value wasn't useful. JSONB is a
unique case because it is one of the few types that can be processed
without reading the entire value, e.g. it has an index.

We are going to be hesitant to do something data-type-specific for
JSONB. It would be good if we could develop a data-type-agnostic
approach to has TOAST can be improved. I know of no such work for 9.6,
and it is unlikely it will be done in time for 9.6.

--
Bruce Momjian <bruce@momjian.us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +