Home > mailing lists

Re: [HACKERS] [PATCH] Generic type subscripting - Mailing list pgsql-hackers

From	Nikita Glukhov
Subject	Re: [HACKERS] [PATCH] Generic type subscripting
Date	October 31, 2019 02:35:28
Msg-id	099309f0-4b41-2c78-aac8-0835fe9a9ee7@postgrespro.ru Whole thread Raw
In response to	Re: [HACKERS] [PATCH] Generic type subscripting (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses	Re: [HACKERS] [PATCH] Generic type subscripting
List	pgsql-hackers

Tree view

On 30.09.2019 14:57, Dmitry Dolgov wrote:

On Wed, Sep 25, 2019 at 10:22 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

This broke recently. Can you please rebase again?

Thanks for noticing! Sure, here it is. We're quite close to the records.

Hi. I added new 5th patch to this patch set.


Jsonb subscripting uses text for representing subscript values.  This is Ok for
object keys, but integer arrays indexes should be parsed at runtime.  Another
problem is that floats can't be used as array indexes because integers simply
can't be parsed from a string containing a floating point.

But we can use float indexes in ordinary Postgres arrays:

SELECT ('{1,2,3}'::int[])[2.3];int4 
------   2
(1 row)

Also SQL standard allows to use float indexes in JSON path with implementation-
defined rounding or truncation:

SELECT jsonb_path_query('[1, 2, 3]', '$[1.3]');jsonb_path_query 
------------------2
(1 row)


So, I decided to fix these two issues introducing polymorphic subscripting,
in which each subscript expression variant interpreted depending on the result
of previous subscripting step. There are two variants of jsonb subscript
expressions -- the first is casted to text and the second is casted to int4.
Executor at each subscripting step selects which variant to execute by calling
callback jsonb_subscript_selectexpr().  To manage the subscripting state,
another callback jsonb_subscript_step() was introduced along with the new field
SubscriptingRefState.privatedata.

Such expression selecting has noticeable overhead, which we can eliminate by
generating only one expression variant when subscript is of int2/int4 or
text type.

After float subscripts start to works as expected:

SELECT ('[1, 2, 3]'::jsonb)[1.2];jsonb 
-------2
(1 row)

SELECT ('{"1": "a", "1.0": "b", "1.2": "c"}'::jsonb)[i]
FROM unnest('{1,1.0,1.2}'::numeric[]) i;
jsonb 
-------"a""b""c"
(3 rows)




Performance was compared on 4 tables with 10M rows:

-- [ i ]
CREATE TABLE arr_1 AS
SELECT jsonb_build_array(i)::jsonb js
FROM generate_series(1, 10000000) i;

-- { "a": i }
CREATE TABLE obj_1 AS
SELECT jsonb_build_object('a', i)  js
FROM generate_series(1, 10000000) i;

-- [[[[[[[[[[ i ]]]]]]]]]]
CREATE TABLE arr_10 AS
SELECT (repeat('[', 10) || i || repeat(']', 10))::jsonb js
FROM generate_series(1, 10000000) i;

-- {"a": {"a": ... {"a": {"a": i } } ... } }
CREATE TABLE obj_10 AS
SELECT (repeat('{"a":', 10) || i || repeat('}', 10))::jsonb js
FROM generate_series(1, 10000000) i;

Queries were like "SELECT FROM table WHERE expression IS [NOT] NULL".

Compared previous v27 version (4 patches) with v28 version (5 patches).
New patch #5 contains one small but important optimization -- elimination of
unnecessary palloc() in getIthJsonbValueFromContainer() and jsonb_get_element().
It should be posted separately, but for simplicity I included it the patch now.
For the correctness of comparison, it was evaluated separately on top of v27
(v27opt).

Table  |               Expression             |      Query time, ms       |                                      |  v27  | v27opt|  v28 
--------+--------------------------------------+-------+-------+-------arr_1  | js->0                                |  1811 |  1809 |  1813  arr_1  | js[0]                                |  2273 |  2294 |  2028 arr_1  | js['0']                              |  2276 |  2286 |  2339arr_1  | js->1                                |   808 |   844 |   809arr_1  | js[1]                                |  1180 |  1187 |  1008 obj_1  | js->'a'                              |  1941 |  1935 |  1939obj_1  | js['a']                              |  2079 |  2083 |  2102obj_1  | js->'b'                              |   917 |   915 |   902obj_1  | js['b']                              |   960 |   961 |  1059       |arr_10 | js->0->0 ... ->0->0                  |  4530 |  4068 |  4052arr_10 | js[0][0] ... [0][0]                  |  6197 |  5513 |  3766arr_10 | js['0']['0'] ... ['0']['0']          |  6202 |  5519 |  5983arr_10 | js #>  '{0,0,0,0,0,0,0,0,0,0}'       |  6412 |  5850 |  5835arr_10 | js #>> '{0,0,0,0,0,0,0,0,0,0}'       |  5904 |  5181 |  5192
obj_10 | js->'a'->'a' ... ->'a'->'a'          |  4970 |  4717 |  4704obj_10 | js['a']['a'] ... ['a']['a']          |  4331 |  3698 |  4032obj_10 | js #>  '{a,a,a,a,a,a,a,a,a,a}'       |  4570 |  3941 |  3949         obj_10 | js #>> '{a,a,a,a,a,a,a,a,a,a}'       |  4055 |  3395 |  3392       

As it can be seen, array access time reduced from 10% in single subscripts
to 40% in 10-subscript chains, and subscripting event started to overtake
chained "->" operators.  But there is 10% slowdown of object key access that
needs further investigation.  The elimination of unnecessary palloc()s also
gives good results.



I had to write new assignment logic reusing only some parts of setPath(),
because the loop in setPath() should be broken on every level.  During this
process, I decided to implement assignment behavior similar to PostgreSQL's
array behavior and added two new features:- creation of jsonb arrays/objects container from NULL values- appending/prepending array elements on the specified position, gaps filled  with nulls (JavaScript has similar behavior)
These features are not so easy to extract into a separate patch on top of the 
first 4 patches, but I can try if necessary.

Here is examples of new features:

CREATE TABLE t AS SELECT NULL::jsonb js, NULL::int[] a;

-- create array from NULL
UPDATE t SET js[0] = 1, a[1] = 1;
SELECT * FROM t;js  |  a  
-----+-----[1] | {1}
(1 row)

-- append 4th element
UPDATE t SET js[3] = 4, a[4] = 4;
SELECT * FROM t;        js         |        a        
--------------------+-----------------[1, null, null, 4] | {1,NULL,NULL,4}
(1 row)

-- prepend element when index is negative (position = size + index)
UPDATE t SET js[-6] = -2;
SELECT js FROM t;             js              
------------------------------[-2, null, 1, null, null, 4]
(1 row)

--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

pgsql-hackers by date:

From: Fujii Masao
Date: 31 October 2019, 02:27:12
Subject: Allow CREATE OR REPLACE VIEW to rename the columns

From: Thomas Munro
Date: 31 October 2019, 02:54:17
Subject: Re: Allow cluster_name in log_line_prefix

Re: [HACKERS] [PATCH] Generic type subscripting - Mailing list pgsql-hackers

Attachment

Previous

Next