Thread: jsonb subscripting assignment performance

jsonb subscripting assignment performance

From
"Joel Jacobson"
Date:
Hi,

commit 676887a3 added support for jsonb subscripting.

Many thanks for working on this. I really like the improved syntax.

I was also hoping for some performance benefits,
but my testing shows that

   jsonb_value['existing_key'] = new_value;

takes just as long time as

   jsonb_value := jsonb_set(jsonb_value, ARRAY['existing_key'], new_value);

which is a bit surprising to me. Shouldn't subscripting be a lot faster, since it could modify the existing data structure in-place? What am I missing here?

I came to think of the this new functionality when trying to optimize some
PL/pgSQL code where the bottle-neck turned out to be lots of calls to jsonb_set() for large jsonb objects.

Here is the output from attached bench:

n=10000
00:00:00.002628 jsonb := jsonb_set(jsonb, ARRAY[existing key], value);
00:00:00.002778 jsonb := jsonb_set(jsonb, ARRAY[new key], value);
00:00:00.002332 jsonb[existing key] := value;
00:00:00.002794 jsonb[new key] := value;
n=100000
00:00:00.042843 jsonb := jsonb_set(jsonb, ARRAY[existing key], value);
00:00:00.046515 jsonb := jsonb_set(jsonb, ARRAY[new key], value);
00:00:00.044974 jsonb[existing key] := value;
00:00:00.075429 jsonb[new key] := value;
n=1000000
00:00:00.420808 jsonb := jsonb_set(jsonb, ARRAY[existing key], value);
00:00:00.449622 jsonb := jsonb_set(jsonb, ARRAY[new key], value);
00:00:00.31834 jsonb[existing key] := value;
00:00:00.527904 jsonb[new key] := value;

Many thanks for clarifying.

Best regards,

Joel
Attachment

Re: jsonb subscripting assignment performance

From
Pavel Stehule
Date:


st 14. 4. 2021 v 7:39 odesílatel Joel Jacobson <joel@compiler.org> napsal:
Hi,

commit 676887a3 added support for jsonb subscripting.

Many thanks for working on this. I really like the improved syntax.

I was also hoping for some performance benefits,
but my testing shows that

   jsonb_value['existing_key'] = new_value;

takes just as long time as

   jsonb_value := jsonb_set(jsonb_value, ARRAY['existing_key'], new_value);

which is a bit surprising to me. Shouldn't subscripting be a lot faster, since it could modify the existing data structure in-place? What am I missing here?

no - it doesn't support in-place modification. Only arrays and records support it.


I came to think of the this new functionality when trying to optimize some
PL/pgSQL code where the bottle-neck turned out to be lots of calls to jsonb_set() for large jsonb objects.

sure - there is big room for optimization. But this patch was big enough without its optimization. And it was not clean, if I will be committed or not (it waited in commitfest application for 4 years). So I accepted implemented behaviour (without inplace update). Now, this patch is in core, and anybody can work on others possible optimizations.

Regards

Pavel
 

Here is the output from attached bench:

n=10000
00:00:00.002628 jsonb := jsonb_set(jsonb, ARRAY[existing key], value);
00:00:00.002778 jsonb := jsonb_set(jsonb, ARRAY[new key], value);
00:00:00.002332 jsonb[existing key] := value;
00:00:00.002794 jsonb[new key] := value;
n=100000
00:00:00.042843 jsonb := jsonb_set(jsonb, ARRAY[existing key], value);
00:00:00.046515 jsonb := jsonb_set(jsonb, ARRAY[new key], value);
00:00:00.044974 jsonb[existing key] := value;
00:00:00.075429 jsonb[new key] := value;
n=1000000
00:00:00.420808 jsonb := jsonb_set(jsonb, ARRAY[existing key], value);
00:00:00.449622 jsonb := jsonb_set(jsonb, ARRAY[new key], value);
00:00:00.31834 jsonb[existing key] := value;
00:00:00.527904 jsonb[new key] := value;

Many thanks for clarifying.

Best regards,

Joel

Re: jsonb subscripting assignment performance

From
"Joel Jacobson"
Date:
On Wed, Apr 14, 2021, at 09:20, Pavel Stehule wrote:
sure - there is big room for optimization. But this patch was big enough without its optimization. And it was not clean, if I will be committed or not (it waited in commitfest application for 4 years). So I accepted implemented behaviour (without inplace update). Now, this patch is in core, and anybody can work on others possible optimizations.

Thanks for explaining.

Do we a rough idea on how in-place could be implemented in a non-invasive non-controversial way that ought to be accepted by the project, if done right? Or are there other complicated problems that needs to be solved first?

I'm asking because I could be interested in working on this, but I know my limitations when it comes to C, so I want to get an idea on if it should be more or less straightforward, or if we already know on beforehand it would require committer-level expertise of the PostgreSQL code base for any realistic chance of being successful.

/Joel

Re: jsonb subscripting assignment performance

From
Dmitry Dolgov
Date:
> On Wed, Apr 14, 2021 at 09:20:08AM +0200, Pavel Stehule wrote:
> st 14. 4. 2021 v 7:39 odesílatel Joel Jacobson <joel@compiler.org> napsal:
>
> > Hi,
> >
> > commit 676887a3 added support for jsonb subscripting.
> >
> > Many thanks for working on this. I really like the improved syntax.
> >
> > I was also hoping for some performance benefits,
> > but my testing shows that
> >
> >    jsonb_value['existing_key'] = new_value;
> >
> > takes just as long time as
> >
> >    jsonb_value := jsonb_set(jsonb_value, ARRAY['existing_key'], new_value);
> >
> > which is a bit surprising to me. Shouldn't subscripting be a lot faster,
> > since it could modify the existing data structure in-place? What am I
> > missing here?
> >
>
> no - it doesn't support in-place modification. Only arrays and records
> support it.
>
>
> > I came to think of the this new functionality when trying to optimize some
> > PL/pgSQL code where the bottle-neck turned out to be lots of calls
> > to jsonb_set() for large jsonb objects.
> >
>
> sure - there is big room for optimization. But this patch was big enough
> without its optimization. And it was not clean, if I will be committed or
> not (it waited in commitfest application for 4 years). So I accepted
> implemented behaviour (without inplace update). Now, this patch is in core,
> and anybody can work on others possible optimizations.

Right, jsonb subscripting deals mostly with the syntax part and doesn't
change internal jsonb behaviour. If I understand the original question
correctly, "in-place" here means updating of e.g. just one particular
key within a jsonb object, since jsonb_set looks like an overwrite of
the whole jsonb. If so, then update will still cause the whole jsonb to
be updated, there is no partial update functionality for the on-disk
format. Although there is work going on to optimize this in case when
jsonb is big enough to be put into a toast table (partial toast
decompression thread, or bytea appendable toast).



Re: jsonb subscripting assignment performance

From
Pavel Stehule
Date:


st 14. 4. 2021 v 9:57 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Wed, Apr 14, 2021 at 09:20:08AM +0200, Pavel Stehule wrote:
> st 14. 4. 2021 v 7:39 odesílatel Joel Jacobson <joel@compiler.org> napsal:
>
> > Hi,
> >
> > commit 676887a3 added support for jsonb subscripting.
> >
> > Many thanks for working on this. I really like the improved syntax.
> >
> > I was also hoping for some performance benefits,
> > but my testing shows that
> >
> >    jsonb_value['existing_key'] = new_value;
> >
> > takes just as long time as
> >
> >    jsonb_value := jsonb_set(jsonb_value, ARRAY['existing_key'], new_value);
> >
> > which is a bit surprising to me. Shouldn't subscripting be a lot faster,
> > since it could modify the existing data structure in-place? What am I
> > missing here?
> >
>
> no - it doesn't support in-place modification. Only arrays and records
> support it.
>
>
> > I came to think of the this new functionality when trying to optimize some
> > PL/pgSQL code where the bottle-neck turned out to be lots of calls
> > to jsonb_set() for large jsonb objects.
> >
>
> sure - there is big room for optimization. But this patch was big enough
> without its optimization. And it was not clean, if I will be committed or
> not (it waited in commitfest application for 4 years). So I accepted
> implemented behaviour (without inplace update). Now, this patch is in core,
> and anybody can work on others possible optimizations.

Right, jsonb subscripting deals mostly with the syntax part and doesn't
change internal jsonb behaviour. If I understand the original question
correctly, "in-place" here means updating of e.g. just one particular
key within a jsonb object, since jsonb_set looks like an overwrite of
the whole jsonb. If so, then update will still cause the whole jsonb to
be updated, there is no partial update functionality for the on-disk
format. Although there is work going on to optimize this in case when
jsonb is big enough to be put into a toast table (partial toast
decompression thread, or bytea appendable toast).

Almost all and almost everywhere Postgres's values are immutable. There is only one exception - runtime plpgsql. "local variables" can hold values of complex values unboxed. Then the repeated update is significantly cheaper. Normal non repeated updates have the same speed, because the value should be unboxed and boxed. Outside plpgsql the values are immutable. I think this is a very hard problem, how to update big toasted values effectively, and I am not sure if there is a solution. TOAST value is immutable. It needs to introduce some alternative to TOAST. The benefits are clear - it can be nice to have fast append arrays for time series. But this is a very different topic.

Regards

Pavel







Re: jsonb subscripting assignment performance

From
Oleg Bartunov
Date:


On Wed, Apr 14, 2021 at 11:09 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:


st 14. 4. 2021 v 9:57 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Wed, Apr 14, 2021 at 09:20:08AM +0200, Pavel Stehule wrote:
> st 14. 4. 2021 v 7:39 odesílatel Joel Jacobson <joel@compiler.org> napsal:
>
> > Hi,
> >
> > commit 676887a3 added support for jsonb subscripting.
> >
> > Many thanks for working on this. I really like the improved syntax.
> >
> > I was also hoping for some performance benefits,
> > but my testing shows that
> >
> >    jsonb_value['existing_key'] = new_value;
> >
> > takes just as long time as
> >
> >    jsonb_value := jsonb_set(jsonb_value, ARRAY['existing_key'], new_value);
> >
> > which is a bit surprising to me. Shouldn't subscripting be a lot faster,
> > since it could modify the existing data structure in-place? What am I
> > missing here?
> >
>
> no - it doesn't support in-place modification. Only arrays and records
> support it.
>
>
> > I came to think of the this new functionality when trying to optimize some
> > PL/pgSQL code where the bottle-neck turned out to be lots of calls
> > to jsonb_set() for large jsonb objects.
> >
>
> sure - there is big room for optimization. But this patch was big enough
> without its optimization. And it was not clean, if I will be committed or
> not (it waited in commitfest application for 4 years). So I accepted
> implemented behaviour (without inplace update). Now, this patch is in core,
> and anybody can work on others possible optimizations.

Right, jsonb subscripting deals mostly with the syntax part and doesn't
change internal jsonb behaviour. If I understand the original question
correctly, "in-place" here means updating of e.g. just one particular
key within a jsonb object, since jsonb_set looks like an overwrite of
the whole jsonb. If so, then update will still cause the whole jsonb to
be updated, there is no partial update functionality for the on-disk
format. Although there is work going on to optimize this in case when
jsonb is big enough to be put into a toast table (partial toast
decompression thread, or bytea appendable toast).

Almost all and almost everywhere Postgres's values are immutable. There is only one exception - runtime plpgsql. "local variables" can hold values of complex values unboxed. Then the repeated update is significantly cheaper. Normal non repeated updates have the same speed, because the value should be unboxed and boxed. Outside plpgsql the values are immutable. I think this is a very hard problem, how to update big toasted values effectively, and I am not sure if there is a solution. TOAST value is immutable. It needs to introduce some alternative to TOAST. The benefits are clear - it can be nice to have fast append arrays for time series. But this is a very different topic.

 

Regards

Pavel









--
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonb subscripting assignment performance

From
Pavel Stehule
Date:


st 14. 4. 2021 v 11:07 odesílatel Oleg Bartunov <obartunov@postgrespro.ru> napsal:


On Wed, Apr 14, 2021 at 11:09 AM Pavel Stehule <pavel.stehule@gmail.com> wrote:


st 14. 4. 2021 v 9:57 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Wed, Apr 14, 2021 at 09:20:08AM +0200, Pavel Stehule wrote:
> st 14. 4. 2021 v 7:39 odesílatel Joel Jacobson <joel@compiler.org> napsal:
>
> > Hi,
> >
> > commit 676887a3 added support for jsonb subscripting.
> >
> > Many thanks for working on this. I really like the improved syntax.
> >
> > I was also hoping for some performance benefits,
> > but my testing shows that
> >
> >    jsonb_value['existing_key'] = new_value;
> >
> > takes just as long time as
> >
> >    jsonb_value := jsonb_set(jsonb_value, ARRAY['existing_key'], new_value);
> >
> > which is a bit surprising to me. Shouldn't subscripting be a lot faster,
> > since it could modify the existing data structure in-place? What am I
> > missing here?
> >
>
> no - it doesn't support in-place modification. Only arrays and records
> support it.
>
>
> > I came to think of the this new functionality when trying to optimize some
> > PL/pgSQL code where the bottle-neck turned out to be lots of calls
> > to jsonb_set() for large jsonb objects.
> >
>
> sure - there is big room for optimization. But this patch was big enough
> without its optimization. And it was not clean, if I will be committed or
> not (it waited in commitfest application for 4 years). So I accepted
> implemented behaviour (without inplace update). Now, this patch is in core,
> and anybody can work on others possible optimizations.

Right, jsonb subscripting deals mostly with the syntax part and doesn't
change internal jsonb behaviour. If I understand the original question
correctly, "in-place" here means updating of e.g. just one particular
key within a jsonb object, since jsonb_set looks like an overwrite of
the whole jsonb. If so, then update will still cause the whole jsonb to
be updated, there is no partial update functionality for the on-disk
format. Although there is work going on to optimize this in case when
jsonb is big enough to be put into a toast table (partial toast
decompression thread, or bytea appendable toast).

Almost all and almost everywhere Postgres's values are immutable. There is only one exception - runtime plpgsql. "local variables" can hold values of complex values unboxed. Then the repeated update is significantly cheaper. Normal non repeated updates have the same speed, because the value should be unboxed and boxed. Outside plpgsql the values are immutable. I think this is a very hard problem, how to update big toasted values effectively, and I am not sure if there is a solution. TOAST value is immutable. It needs to introduce some alternative to TOAST. The benefits are clear - it can be nice to have fast append arrays for time series. But this is a very different topic.


+1

Pavel

 

Regards

Pavel









--
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: jsonb subscripting assignment performance

From
"Joel Jacobson"
Date:
On Wed, Apr 14, 2021, at 11:07, Oleg Bartunov wrote:

Page 49/55 in the PDF:
"UPDATE test_toast SET jb = jsonb_set(jb, {keyN,0}, ?);"

Would you get similar improvements if updating jsonb variables in PL/pgSQL?
If not, could the infrastructure somehow be reused to improve the PL/pgSQL use-case as well?

I would be happy to help out if there is something I can do, such as testing.

/Joel

Re: jsonb subscripting assignment performance

From
Alexander Korotkov
Date:
On Wed, Apr 14, 2021 at 10:57 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > sure - there is big room for optimization. But this patch was big enough
> > without its optimization. And it was not clean, if I will be committed or
> > not (it waited in commitfest application for 4 years). So I accepted
> > implemented behaviour (without inplace update). Now, this patch is in core,
> > and anybody can work on others possible optimizations.
>
> Right, jsonb subscripting deals mostly with the syntax part and doesn't
> change internal jsonb behaviour. If I understand the original question
> correctly, "in-place" here means updating of e.g. just one particular
> key within a jsonb object, since jsonb_set looks like an overwrite of
> the whole jsonb. If so, then update will still cause the whole jsonb to
> be updated, there is no partial update functionality for the on-disk
> format. Although there is work going on to optimize this in case when
> jsonb is big enough to be put into a toast table (partial toast
> decompression thread, or bytea appendable toast).

+1

------
Regards,
Alexander Korotkov