Re: [HACKERS] [PATCH] Generic type subscripting - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: [HACKERS] [PATCH] Generic type subscripting
Date
Msg-id CAFj8pRBcsLTFi0hT6Uz2K7zkgNianV4Zk8xqranEm3wcvEv+Mg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] [PATCH] Generic type subscripting  (Pavel Stehule <pavel.stehule@gmail.com>)
Responses Re: [HACKERS] [PATCH] Generic type subscripting  (Dmitry Dolgov <9erthalion6@gmail.com>)
List pgsql-hackers


ne 10. 1. 2021 v 19:52 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hi


I'm thinking of the update path as a kind of implicit schema. JSON is
intentionally not bound to any schema on creation, so I don't see a
failure to enforce another schema at runtime (and outside the WHERE
clause, at that) as an error exactly.

This concept is not consistent with other implemented behaviour.

1. The schema is dynamically enhanced - so although the path doesn't exists, it is created and data are changed

postgres=# create table foo(a jsonb);
CREATE TABLE
postgres=# insert into foo values('{}');
INSERT 0 1
postgres=# update foo set a['a']['a'][10] = '0';
UPDATE 1
postgres=# select * from foo;
┌───────────────────────────────────────────────────────────────────────────────┐
│                                       a                                       │
╞═══════════════════════════════════════════════════════════════════════════════╡
│ {"a": {"a": [null, null, null, null, null, null, null, null, null, null, 0]}} │
└───────────────────────────────────────────────────────────────────────────────┘
(1 row)

So although the path [a,a,10] was not exists, it was created.

2. this update fails (and it is correct)

postgres=# update foo set a['a']['a']['b'] = '0';
ERROR:  path element at position 3 is not an integer: "b"

although the path [a,a,b] doesn't exists, and it is not ignored.

This implementation doesn't do only UPDATE (and then analogy with WHERE clause isn't fully adequate). It does MERGE. This is necessary, because without it, the behaviour will be pretty unfriendly - because there is not any external schema. I think so this is important - and it can be little bit messy. I am not sure if I use correct technical terms - we try to use LAX update in first step, and if it is not successful, then we try to do LAX insert. This is maybe correct from JSON semantic - but for developer it is unfriendly, because he hasn't possibility to detect if insert was or was not successful. In special JSON functions I can control behave and can specify LAX or STRICT how it is necessity. But in this interface (subscripting) this possibility is missing.

I think so there should be final check (semantically) if value was updated, and if the value was changed. If not, then error should be raised. It should be very similar like RLS update. I know and I understand so there should be more than one possible implementations, but safe is only one - after successful update I would to see new value inside, and when it is not possible, then I expect exception. I think so it is more practical too. I can control filtering with WHERE clause. But I cannot to control MERGE process. Manual recheck after every update can be terrible slow.

I tested behaviour and I didn't find anything other than the mentioned issue.

Now I can check this feature from plpgsql, and it is working. Because there is no special support in plpgsql runtime, the update of jsonb is significantly slower than in update of arrays, and looks so update of jsonb has O(N2) cost. I don't think it is important at this moment - more important is fact, so I didn't find any memory problems.


postgres=# do $$
declare v int[] = array_fill(0, ARRAY[10,10,10,20]);
begin
  for i1 in 1..10 loop
    for i2 in 1..10 loop
      for i3 in 1..10 loop
        for i4 in 1..20 loop
          v[i1][i2][i3][i4] = 10;
          raise notice '% % % %', i1, i2, i3, i4;
        end loop;
      end loop;
    end loop;
  end loop;
end;
$$;

postgres=# do $$
declare v jsonb;
begin
  for i1 in 1..10 loop
    for i2 in 1..10 loop
      for i3 in 1..10 loop
        for i4 in 1..20 loop
          v[i1][i2][i3][i4] = '10'::jsonb;
          raise notice '% % % %', i1, i2, i3, i4;
        end loop;
      end loop;
    end loop;
  end loop;
end;
$$;

There are some unwanted white spaces

+       Jsonb   *jsonbSource = DatumGetJsonbP(*op->resvalue);
+       sbsrefstate->prevvalue = jsonb_get_element(jsonbSource,
+                                                  sbsrefstate->upperindex,
+                                                  sbsrefstate->numupper,
+                                                  &sbsrefstate->prevnull,
+                                                  false);
 

+   workspace = palloc0(MAXALIGN(sizeof(JsonbSubWorkspace)) +
+                       nupper * (sizeof(Datum) + sizeof(Oid)));
+   workspace->expectArray = false;
+   ptr = ((char *) workspace) + MAXALIGN(sizeof(JsonbSubWorkspace));

Regards

Pavel



Regards

Pavel



But I looked into the bulk case a little further, and "outside the
WHERE clause" cuts both ways. The server reports an update whether or
not the JSON could have been modified, which suggests triggers will
fire for no-op updates. That's more clearly a problem.

insert into j (val) values
 ('{"a": 100}'),
 ('{"a": "200"}'),
 ('{"b": "300"}'),
 ('{"c": {"d": 400}}'),
 ('{"a": {"z": 500}}');

INSERT 0 5
update j set val['a']['z'] = '600' returning *;
                val                 
────────────────────────────────────
 {"a": 100}
 {"a": "200"}
 {"a": {"z": 600}, "b": "300"}
 {"a": {"z": 600}, "c": {"d": 400}}
 {"a": {"z": 600}}
(5 rows)

*UPDATE 5*

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Key management with tests
Next
From: Tomas Vondra
Date:
Subject: Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits