Re: jsonb_set() strictness considered harmful to data - Mailing list pgsql-general

From Adrian Klaver
Subject Re: jsonb_set() strictness considered harmful to data
Date
Msg-id 7299c63c-fcdc-20fc-c955-4cd330cc4580@aklaver.com
Whole thread Raw
In response to Re: jsonb_set() strictness considered harmful to data  (Ariadne Conill <ariadne@dereferenced.org>)
Responses Re: jsonb_set() strictness considered harmful to data
List pgsql-general
On 10/18/19 4:31 PM, Ariadne Conill wrote:
> Hello,
> 
> On Fri, Oct 18, 2019 at 6:01 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>
>> On 10/18/19 3:11 PM, Ariadne Conill wrote:
>>> Hello,
>>>
>>> On Fri, Oct 18, 2019 at 5:01 PM David G. Johnston
>>> <david.g.johnston@gmail.com> wrote:
>>>>
>>>> On Fri, Oct 18, 2019 at 2:50 PM Christoph Moench-Tegeder <cmt@burggraben.net> wrote:
>>>>>
>>>>> ## Ariadne Conill (ariadne@dereferenced.org):
>>>>>
>>>>>>      update users set info=jsonb_set(info, '{bar}', info->'foo');
>>>>>>
>>>>>> Typically, this works nicely, except for cases where evaluating
>>>>>> info->'foo' results in an SQL null being returned.  When that happens,
>>>>>> jsonb_set() returns an SQL null, which then results in data loss.[3]
>>>>>
>>>>> So why don't you use the facilities of SQL to make sure to only
>>>>> touch the rows which match the prerequisites?
>>>>>
>>>>>     UPDATE users SET info = jsonb_set(info, '{bar}', info->'foo')
>>>>>       WHERE info->'foo' IS NOT NULL;
>>>>>
>>>>
>>>> There are many ways to add code to queries to make working with this function safer - though using them
presupposesone remembers at the time of writing the query that there is danger and caveats in using this function.  I
agreethat we should have (and now) provided sane defined behavior when one of the inputs to the function is null
insteadblowing off the issue and defining the function as being strict.  Whether that is "ignore and return the
originalobject" or "add the key with a json null scalar value" is debatable but either is considerably more useful than
returningSQL NULL.
 
>>>
>>> A great example of how we got burned by this last year: Pleroma
>>> maintains pre-computed counters in JSONB for various types of
>>> activities (posts, followers, followings).  Last year, another counter
>>> was added, with a migration.  But some people did not run the
>>> migration, because they are users, and that's what users do.  This
>>
>> So you are more forgiving of your misstep, allowing users to run
>> outdated code, then of running afoul of Postgres documented behavior:
> 
> I'm not forgiving of either.
> 
>> https://www.postgresql.org/docs/11/functions-json.html
>> " The field/element/path extraction operators return NULL, rather than
>> failing, if the JSON input does not have the right structure to match
>> the request; for example if no such element exists"
> 
> It is known that the extraction operators return NULL.  The problem
> here is jsonb_set() returning NULL when it encounters SQL NULL.

I'm not following. Your original case was:

jsonb_set(info, '{bar}', info->'foo');

where info->'foo' is equivalent to:

test=# select '{"f1":1,"f2":null}'::jsonb ->'f3';
  ?column?
----------
  NULL

So you know there is a possibility that a value extraction could return 
NULL and from your wrapper that COALESCE is the way to deal with this.


> 
>> Just trying to figure why one is worse then the other.
> 
> Any time a user loses data, it is worse.  The preference for not
> having data loss is why Pleroma uses PostgreSQL as it's database of
> choice, as PostgreSQL has traditionally valued durability.  If we
> should not use PostgreSQL, just say so.

There are any number of ways you can make Postgres lose data that are 
not related to durability e.g build the following in code:

DELETE FROM some_table;

and forget the WHERE.

> 
> Ariadne
> 
>>
>>> resulted in Pleroma blanking out the `info` structure for users as
>>> they performed new activities that incremented that counter.  At that
>>> time, Pleroma maintained various things like private keys used to sign
>>> things in that JSONB column (we no longer do this because of being
>>> burned by this several times now), which broke federation temporarily
>>> for the affected accounts with other servers for up to a week as those
>>> servers had to learn new public keys for those accounts (since the
>>> original private keys were lost).
>>>
>>> I believe that anything that can be catastrophically broken by users
>>> not following upgrade instructions precisely is a serious problem, and
>>> can lead to serious problems.  I am sure that this is not the only
>>> project using JSONB which have had users destroy their own data in
>>> such a completely preventable fashion.
>>>
>>> Ariadne
>>>
>>>
>>>
>>
>>
>> --
>> Adrian Klaver
>> adrian.klaver@aklaver.com
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



pgsql-general by date:

Previous
From: Stephen Frost
Date:
Subject: Re: jsonb_set() strictness considered harmful to data
Next
From: Ariadne Conill
Date:
Subject: Re: jsonb_set() strictness considered harmful to data