Re: jsonb_set() strictness considered harmful to data - Mailing list pgsql-general
From | Tomas Vondra |
---|---|
Subject | Re: jsonb_set() strictness considered harmful to data |
Date | |
Msg-id | 20191019110831.7y7d2yys5xlujqw6@development Whole thread Raw |
In response to | Re: jsonb_set() strictness considered harmful to data (Ariadne Conill <ariadne@dereferenced.org>) |
Responses |
Re: jsonb_set() strictness considered harmful to data
|
List | pgsql-general |
On Fri, Oct 18, 2019 at 09:14:09PM -0500, Ariadne Conill wrote: >Hello, > >On Fri, Oct 18, 2019 at 6:52 PM Stephen Frost <sfrost@snowman.net> wrote: >> >> Greetings, >> >> * Ariadne Conill (ariadne@dereferenced.org) wrote: >> > On Fri, Oct 18, 2019 at 6:01 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote: >> > > https://www.postgresql.org/docs/11/functions-json.html >> > > " The field/element/path extraction operators return NULL, rather than >> > > failing, if the JSON input does not have the right structure to match >> > > the request; for example if no such element exists" >> > >> > It is known that the extraction operators return NULL. The problem >> > here is jsonb_set() returning NULL when it encounters SQL NULL. >> > >> > > Just trying to figure why one is worse then the other. >> > >> > Any time a user loses data, it is worse. The preference for not >> > having data loss is why Pleroma uses PostgreSQL as it's database of >> > choice, as PostgreSQL has traditionally valued durability. If we >> > should not use PostgreSQL, just say so. >> >> Your contention that the documented, clear, and easily addressed >> behavior of a particular strict function equates to "the database system >> loses data and isn't durable" is really hurting your arguments here, not >> helping it. >> >> The argument about how it's unintuitive and can cause application >> developers to misuse the function (which is clearly an application bug, >> but perhaps an understandable one if the function interface isn't >> intuitive or is confusing) is a reasonable one and might be convincing >> enough to result in a change here. >> >> I'd suggest sticking to the latter argument when making this case. >> >> > > > I believe that anything that can be catastrophically broken by users >> > > > not following upgrade instructions precisely is a serious problem, and >> > > > can lead to serious problems. I am sure that this is not the only >> > > > project using JSONB which have had users destroy their own data in >> > > > such a completely preventable fashion. >> >> Let's be clear here that the issue with the upgrade instructions was >> that the user didn't follow your *application's* upgrade instructions, >> and your later code wasn't written to use the function, as documented, >> properly- this isn't a case of PG destroying your data. It's fine to >> contend that the interface sucks and that we should change it, but the >> argument that PG is eating data because the application sent a query to >> the database telling it, based on our documentation, to eat the data, >> isn't appropriate. Again, let's have a reasonable discussion here about >> if it makes sense to make a change here because the interface isn't >> intuitive and doesn't match what other systems do (I'm guessing it isn't >> in the SQL standard either, so we unfortunately can't look to that for >> help; though I'd hardly be surprised if they supported what PG does >> today anyway). > >Okay, I will admit that saying PG is eating data is perhaps >hyperbolic, My experience is that using such hyperbole is pretty detrimental, even when one is trying to make a pretty sensible case. The problem is that people often respond in a similarly hyperbolic claims, particularly when you hit a nerve. And that's exactly what happened here, becase we're *extremely* sensitive about data corruption issues, so when you claim PostgreSQL is "eating data" people are likely to jump on you, beating you with the documentation stick. It's unfortunate, but it's also entirely predictable. >but I will also say that the behaviour of jsonb_set() >under this type of edge case is unintuitive and frequently results in >unintended data loss. So, while PostgreSQL is not actually eating the >data, it is putting the user in a position where they may suffer data >loss if they are not extremely careful. > >Here is how other implementations handle this case: > >MySQL/MariaDB: > >select json_set('{"a":1,"b":2,"c":3}', '$.a', NULL) results in: > {"a":null,"b":2,"c":3} > >Microsoft SQL Server: > >select json_modify('{"a":1,"b":2,"c":3}', '$.a', NULL) results in: > {"b":2,"c":3} > >Both of these outcomes make sense, given the nature of JSON objects. >I am actually more in favor of what MSSQL does however, I think that >makes the most sense of all. > I do mostly agree with this. The json[b]_set behavior seems rather surprising, and I think I've seen a couple of cases running into exactly this issue. I've solved that with a simple CASE, but maybe changing the behavior would be better. That's unlikely to be back-patchable, though, so maybe a better option is to create a non-strict wrappers. But that does not work when the user is unaware of the behavior :-( regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-general by date: