Re: [HACKERS] [PATCH] Generic type subscripting - Mailing list pgsql-hackers
From | Nikita Glukhov |
---|---|
Subject | Re: [HACKERS] [PATCH] Generic type subscripting |
Date | |
Msg-id | 099309f0-4b41-2c78-aac8-0835fe9a9ee7@postgrespro.ru Whole thread Raw |
In response to | Re: [HACKERS] [PATCH] Generic type subscripting (Dmitry Dolgov <9erthalion6@gmail.com>) |
Responses |
Re: [HACKERS] [PATCH] Generic type subscripting
|
List | pgsql-hackers |
On 30.09.2019 14:57, Dmitry Dolgov wrote:
On Wed, Sep 25, 2019 at 10:22 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: This broke recently. Can you please rebase again?Thanks for noticing! Sure, here it is. We're quite close to the records.
Hi. I added new 5th patch to this patch set. Jsonb subscripting uses text for representing subscript values. This is Ok for object keys, but integer arrays indexes should be parsed at runtime. Another problem is that floats can't be used as array indexes because integers simply can't be parsed from a string containing a floating point. But we can use float indexes in ordinary Postgres arrays: SELECT ('{1,2,3}'::int[])[2.3];int4 ------ 2 (1 row) Also SQL standard allows to use float indexes in JSON path with implementation- defined rounding or truncation: SELECT jsonb_path_query('[1, 2, 3]', '$[1.3]');jsonb_path_query ------------------2 (1 row) So, I decided to fix these two issues introducing polymorphic subscripting, in which each subscript expression variant interpreted depending on the result of previous subscripting step. There are two variants of jsonb subscript expressions -- the first is casted to text and the second is casted to int4. Executor at each subscripting step selects which variant to execute by calling callback jsonb_subscript_selectexpr(). To manage the subscripting state, another callback jsonb_subscript_step() was introduced along with the new field SubscriptingRefState.privatedata. Such expression selecting has noticeable overhead, which we can eliminate by generating only one expression variant when subscript is of int2/int4 or text type. After float subscripts start to works as expected: SELECT ('[1, 2, 3]'::jsonb)[1.2];jsonb -------2 (1 row) SELECT ('{"1": "a", "1.0": "b", "1.2": "c"}'::jsonb)[i] FROM unnest('{1,1.0,1.2}'::numeric[]) i; jsonb -------"a""b""c" (3 rows) Performance was compared on 4 tables with 10M rows: -- [ i ] CREATE TABLE arr_1 AS SELECT jsonb_build_array(i)::jsonb js FROM generate_series(1, 10000000) i; -- { "a": i } CREATE TABLE obj_1 AS SELECT jsonb_build_object('a', i) js FROM generate_series(1, 10000000) i; -- [[[[[[[[[[ i ]]]]]]]]]] CREATE TABLE arr_10 AS SELECT (repeat('[', 10) || i || repeat(']', 10))::jsonb js FROM generate_series(1, 10000000) i; -- {"a": {"a": ... {"a": {"a": i } } ... } } CREATE TABLE obj_10 AS SELECT (repeat('{"a":', 10) || i || repeat('}', 10))::jsonb js FROM generate_series(1, 10000000) i; Queries were like "SELECT FROM table WHERE expression IS [NOT] NULL". Compared previous v27 version (4 patches) with v28 version (5 patches). New patch #5 contains one small but important optimization -- elimination of unnecessary palloc() in getIthJsonbValueFromContainer() and jsonb_get_element(). It should be posted separately, but for simplicity I included it the patch now. For the correctness of comparison, it was evaluated separately on top of v27 (v27opt). Table | Expression | Query time, ms | | v27 | v27opt| v28 --------+--------------------------------------+-------+-------+-------arr_1 | js->0 | 1811 | 1809 | 1813 arr_1 | js[0] | 2273 | 2294 | 2028 arr_1 | js['0'] | 2276 | 2286 | 2339arr_1 | js->1 | 808 | 844 | 809arr_1 | js[1] | 1180 | 1187 | 1008 obj_1 | js->'a' | 1941 | 1935 | 1939obj_1 | js['a'] | 2079 | 2083 | 2102obj_1 | js->'b' | 917 | 915 | 902obj_1 | js['b'] | 960 | 961 | 1059 |arr_10 | js->0->0 ... ->0->0 | 4530 | 4068 | 4052arr_10 | js[0][0] ... [0][0] | 6197 | 5513 | 3766arr_10 | js['0']['0'] ... ['0']['0'] | 6202 | 5519 | 5983arr_10 | js #> '{0,0,0,0,0,0,0,0,0,0}' | 6412 | 5850 | 5835arr_10 | js #>> '{0,0,0,0,0,0,0,0,0,0}' | 5904 | 5181 | 5192 obj_10 | js->'a'->'a' ... ->'a'->'a' | 4970 | 4717 | 4704obj_10 | js['a']['a'] ... ['a']['a'] | 4331 | 3698 | 4032obj_10 | js #> '{a,a,a,a,a,a,a,a,a,a}' | 4570 | 3941 | 3949 obj_10 | js #>> '{a,a,a,a,a,a,a,a,a,a}' | 4055 | 3395 | 3392 As it can be seen, array access time reduced from 10% in single subscripts to 40% in 10-subscript chains, and subscripting event started to overtake chained "->" operators. But there is 10% slowdown of object key access that needs further investigation. The elimination of unnecessary palloc()s also gives good results. I had to write new assignment logic reusing only some parts of setPath(), because the loop in setPath() should be broken on every level. During this process, I decided to implement assignment behavior similar to PostgreSQL's array behavior and added two new features:- creation of jsonb arrays/objects container from NULL values- appending/prepending array elements on the specified position, gaps filled with nulls (JavaScript has similar behavior) These features are not so easy to extract into a separate patch on top of the first 4 patches, but I can try if necessary. Here is examples of new features: CREATE TABLE t AS SELECT NULL::jsonb js, NULL::int[] a; -- create array from NULL UPDATE t SET js[0] = 1, a[1] = 1; SELECT * FROM t;js | a -----+-----[1] | {1} (1 row) -- append 4th element UPDATE t SET js[3] = 4, a[4] = 4; SELECT * FROM t; js | a --------------------+-----------------[1, null, null, 4] | {1,NULL,NULL,4} (1 row) -- prepend element when index is negative (position = size + index) UPDATE t SET js[-6] = -2; SELECT js FROM t; js ------------------------------[-2, null, 1, null, null, 4] (1 row)
--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment
pgsql-hackers by date: