Thread: [PATCH] Generic type subscription

[PATCH] Generic type subscription

From
Dmitry Dolgov
Date:
Hi,

Regarding to the previous conversations [1], [2], here is a patch (with some
improvements from Nikita Glukhov) for the generic type subscription. It allows
to define type-specific subscription logic for any data type and has
implementations for the array and jsonb types. There are following changes in this
patch:

* A new column for `pg_type` (`regproc typsubscription`) which points out a
  type-specific subscription procedure for particular data type. It can be none
  (and it's a default value), which means that this data type doesn't support
  subscription.

* Type-specific code (e.g. any kind of verification, type coercion, actual
  data extraction and update) in the array subscription implementation was
  separated from generic code into `array_subscription` procedure. Generic
  implementation assumes that subscription expression can be in form `[a]` or
  `[a:b]`, and there isn't any restrictions for type of `a` and `b`.

* Using the same api a new subscription logic was implemented for `jsonb` type
  in `jsonb_subscription` procedure. Several changes were introduced into jsonb
  functions just to be able to share common code.

I believe that this patch is more or less in good shape, so I would like to know
what do you think about it? Feedback is welcome.

Attachment

Re: [PATCH] Generic type subscription

From
Jim Nasby
Date:
On 9/9/16 6:29 AM, Dmitry Dolgov wrote:
> Regarding to the previous conversations [1], [2], here is a patch (with some
> improvements from Nikita Glukhov) for the generic type subscription.

Awesome! Please make sure to add it to the Commit Fest app.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461



Re: [PATCH] Generic type subscription

From
Victor Wagner
Date:
On Fri, 9 Sep 2016 18:29:23 +0700
Dmitry Dolgov <9erthalion6@gmail.com> wrote:

> Hi,
> 
> Regarding to the previous conversations [1], [2], here is a patch
> (with some improvements from Nikita Glukhov) for the generic type
> subscription. It allows
> to define type-specific subscription logic for any data type and has
> implementations for the array and jsonb types. There are following
> changes in this
> patch:

I've tried to compile this patch with current state of master (commit
51c3e9fade76c12)  and found out that, when configured with
--enable-cassert, it doesn't pass make check.

LOG:  server process (PID 4643) was terminated by signal 6: Aborted
DETAIL:  Failed process was running: 
update test_jsonb_subscript set test_json['a'] = '{"b":
1}'::jsonb;



--Victor 



Re: [PATCH] Generic type subscription

From
Dmitry Dolgov
Date:
> I've tried to compile this patch with current state of master (commit
> 51c3e9fade76c12)  and found out that, when configured with --enable-cassert,
> it doesn't pass make check.

Thanks for the feedback. Yes, unexpectedly for me, `ExecEvalExpr` can return
expanded `jbvArray` and `jbvObject` instead `jbvBinary` in both cases. It's
interesting, that this doesn't break anything, but obviously violates
the `pushJsonbValueScalar` semantics. I don't think `ExecEvalExpr` should be
changed for jsonb, we can handle this situation in `pushJsonbValue` instead. I've
attached a new version of patch with this modification.

On 27 September 2016 at 19:08, Victor Wagner <vitus@wagner.pp.ru> wrote:
On Fri, 9 Sep 2016 18:29:23 +0700
Dmitry Dolgov <9erthalion6@gmail.com> wrote:

> Hi,
>
> Regarding to the previous conversations [1], [2], here is a patch
> (with some improvements from Nikita Glukhov) for the generic type
> subscription. It allows
> to define type-specific subscription logic for any data type and has
> implementations for the array and jsonb types. There are following
> changes in this
> patch:

I've tried to compile this patch with current state of master (commit
51c3e9fade76c12)  and found out that, when configured with
--enable-cassert, it doesn't pass make check.

LOG:  server process (PID 4643) was terminated by signal 6: Aborted
DETAIL:  Failed process was running:
update test_jsonb_subscript set test_json['a'] = '{"b":
1}'::jsonb;



--
 Victor


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [PATCH] Generic type subscription

From
Victor Wagner
Date:
On Sat, 1 Oct 2016 16:52:34 +0700
Dmitry Dolgov <9erthalion6@gmail.com> wrote:

> > I've tried to compile this patch with current state of master
> > (commit 51c3e9fade76c12)  and found out that, when configured with  
> --enable-cassert,
> > it doesn't pass make check.  
> 
> Thanks for the feedback. Yes, unexpectedly for me, `ExecEvalExpr` can
> return expanded `jbvArray` and `jbvObject` instead `jbvBinary` in
> both cases. It's interesting, that this doesn't break anything, but
> obviously violates the `pushJsonbValueScalar` semantics. I don't
> think `ExecEvalExpr` should be changed for jsonb, we can handle this
> situation in `pushJsonbValue` instead. I've
> attached a new version of patch with this modification.

Thanks for you quick responce. I've not yet thoroughly investigated new
patch, but already noticed some minor promlems:

Git complains about whitespace in this version of patch:

$ git apply ../generic_type_subscription_v2.patch
../generic_type_subscription_v2.patch:218: tab in indent.int    first;
../generic_type_subscription_v2.patch:219: tab in indent.int    second;
../generic_type_subscription_v2.patch:225: tab in indent.SubscriptionExecData        *sbsdata = (SubscriptionExecData
*)PG_GETARG_POINTER(1);
 
../generic_type_subscription_v2.patch:226: tab in indent.Custom                        *result = (Custom *)
sbsdata->containerSource;
../generic_type_subscription_v2.patch:234: tab in indent.SubscriptionRef       *sbsref = (SubscriptionRef *)
PG_GETARG_POINTER(0);
warning: squelched 29 whitespace errors
warning: 34 lines add whitespace errors.

It doesn't prevent me from further testing of patch, but worth noticing.

-- 

Victor




Re: [PATCH] Generic type subscription

From
Oleg Bartunov
Date:


On Sat, Oct 1, 2016 at 12:52 PM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> I've tried to compile this patch with current state of master (commit
> 51c3e9fade76c12)  and found out that, when configured with --enable-cassert,
> it doesn't pass make check.

Thanks for the feedback. Yes, unexpectedly for me, `ExecEvalExpr` can return
expanded `jbvArray` and `jbvObject` instead `jbvBinary` in both cases. It's
interesting, that this doesn't break anything, but obviously violates
the `pushJsonbValueScalar` semantics. I don't think `ExecEvalExpr` should be
changed for jsonb, we can handle this situation in `pushJsonbValue` instead. I've
attached a new version of patch with this modification.


have you ever run 'make check' ?

=========================
 53 of 168 tests failed.
=========================

 
On 27 September 2016 at 19:08, Victor Wagner <vitus@wagner.pp.ru> wrote:
On Fri, 9 Sep 2016 18:29:23 +0700
Dmitry Dolgov <9erthalion6@gmail.com> wrote:

> Hi,
>
> Regarding to the previous conversations [1], [2], here is a patch
> (with some improvements from Nikita Glukhov) for the generic type
> subscription. It allows
> to define type-specific subscription logic for any data type and has
> implementations for the array and jsonb types. There are following
> changes in this
> patch:

I've tried to compile this patch with current state of master (commit
51c3e9fade76c12)  and found out that, when configured with
--enable-cassert, it doesn't pass make check.

LOG:  server process (PID 4643) was terminated by signal 6: Aborted
DETAIL:  Failed process was running:
update test_jsonb_subscript set test_json['a'] = '{"b":
1}'::jsonb;



--
 Victor


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [PATCH] Generic type subscription

From
Dmitry Dolgov
Date:
On 5 October 2016 at 03:00, Oleg Bartunov <obartunov@gmail.com> wrote:

have you ever run 'make check' ?

=========================
 53 of 168 tests failed.
=========================


Sure, how else could I write tests for this feature? But right now on my machine
everything is ok (the same for `make installcheck`):

$ make check
....
=======================
 All 168 tests passed. 
======================= 

Re: [PATCH] Generic type subscription

From
Oleg Bartunov
Date:


On Wed, Oct 5, 2016 at 6:48 AM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
On 5 October 2016 at 03:00, Oleg Bartunov <obartunov@gmail.com> wrote:

have you ever run 'make check' ?

=========================
 53 of 168 tests failed.
=========================


Sure, how else could I write tests for this feature? But right now on my machine
everything is ok (the same for `make installcheck`):

$ make check
....
=======================
 All 168 tests passed. 
======================= 

Oops, something was wrong in my setup :)

 

Re: [PATCH] Generic type subscription

From
Artur Zakirov
Date:
Hello,

On 04.10.2016 11:28, Victor Wagner wrote:
>
> Git complains about whitespace in this version of patch:
>
> $ git apply ../generic_type_subscription_v2.patch
> ../generic_type_subscription_v2.patch:218: tab in indent.
>     int    first;
> ../generic_type_subscription_v2.patch:219: tab in indent.
>     int    second;
> ../generic_type_subscription_v2.patch:225: tab in indent.
>     SubscriptionExecData        *sbsdata = (SubscriptionExecData *) PG_GETARG_POINTER(1);
> ../generic_type_subscription_v2.patch:226: tab in indent.
>     Custom                        *result = (Custom *) sbsdata->containerSource;
> ../generic_type_subscription_v2.patch:234: tab in indent.
>     SubscriptionRef       *sbsref = (SubscriptionRef *) PG_GETARG_POINTER(0);
> warning: squelched 29 whitespace errors
> warning: 34 lines add whitespace errors.
>
> It doesn't prevent me from further testing of patch, but worth noticing.
>

I agree with Victor. In sgml files whitespaces instead of tabs are 
usually used.

I've looked at your patch to make some review.

"subscription"
--------------

The term "subscription" is confusing me. I'm not native english speaker. 
But I suppose that this term is not related with the "subscript". So 
maybe you should use the "subscripting" or "container" (because you 
already use the "container" term in the code). For example:

T_SubscriptingRef
SubscriptingRef
deparseSubscriptingRef()
xsubscripting.sgml
CREATE TYPE custom (   internallength = 4,   input = custom_in,   output = custom_out,   subscripting =
custom_subscripting
);
etc.

Subscripting interface
----------------------

In tests I see the query:

> +select ('[1, "2", null]'::jsonb)['1'];
> + jsonb
> +-------
> + "2"
> +(1 row)

Here '1' is used as a second item index. But from the last discussion 
https://www.postgresql.org/message-id/55D24517.8080609%40agliodbs.com it 
should be a key:

> There is one ambiguous case you need to address:
>
> testjson = '{ "a" : { } }'
>
> SET testjson['a']['a1']['1'] = 42
>
> ... so in this case, is '1' a key, or the first item of an array?  how
> do we determine that? How does the user assign something to an array?

And should return null. Is this right? Or this behaviour was not 
accepted in discussion? I didn't find it.

> +Datum
> +custom_subscription(PG_FUNCTION_ARGS)
> +{
> +    int                        op_type = PG_GETARG_INT32(0);
> +    FunctionCallInfoData    target_fcinfo = get_slice_arguments(fcinfo, 1,
> +                                                                fcinfo->nargs);
> +
> +    if (op_type & SBS_VALIDATION)
> +        return custom_subscription_prepare(&target_fcinfo);
> +
> +    if (op_type & SBS_EXEC)
> +        return custom_subscription_evaluate(&target_fcinfo);
> +
> +    elog(ERROR, "incorrect op_type for subscription function: %d", op_type);
> +}

I'm not sure but maybe we should use here two different functions? For 
example:

Datum
custom_subscripting_validate(PG_FUNCTION_ARGS)
{
}

Datum
custom_subscripting_evaluate(PG_FUNCTION_ARGS)
{
}

CREATE TYPE custom (   internallength = 8,   input = custom_in,   output = custom_out,   subscripting_validate =
custom_subscripting_validate,  subscripting_evalute = custom_subscripting_evaluate,
 
);

What do you think?

Documentation
-------------

> +<!-- doc/src/sgml/xsubscription.sgml -->
> +
> + <sect1 id="xsubscription">
> +  <title>User-defined subscription procedure</title>

There is the extra whitespace before <sect1>.

> +  </indexterm>
> +  When you define a new base type, you can also specify a custom procedure
> +  to handle subscription expressions. It should contains logic for verification
> +  and for extraction or update your data. For instance:

I suppose a <para> tag is missed after the </indexterm>.

> +</programlisting>
> +
> + </para>
> +
> +  <para>

So </para> is redundant here.

Code
----

> +static Oid
> +findTypeSubscriptionFunction(List *procname, Oid typeOid)
> +{
> +    Oid            argList[1];

Here typeOid argument is not used. Is it should be here?

> +ExecEvalSubscriptionRef(SubscriptionRefExprState *sbstate,

I think in this function local arguments have a lot of tabs. It is 
normal if not all variables is aligned on the same line. A lot of tabs 
are occur in other functions too. Because of this some lines exceed 80 
characters.

> +    if (sbstate->refupperindexpr != NULL)
> +        upper = (Datum *) palloc(sbstate->refupperindexpr->length * sizeof(Datum *));
> +
> +    if (sbstate->reflowerindexpr != NULL)
> +        lower = (Datum *) palloc(sbstate->reflowerindexpr->length * sizeof(Datum *));

Could we use palloc() before the "foreach(l, sbstate->refupperindexpr)" 
? I think it could be a little optimization.

> -transformArrayType(Oid *arrayType, int32 *arrayTypmod)
> +transformArrayType(Oid *containerType, int32 *containerTypmod)

I think name of the function is confusing. Maybe use 
transformContainerType()?

> +JsonbValue *
> +JsonbToJsonbValue(Jsonb *jsonb, JsonbValue *val)
> +{

In this function we could palloc JsonbValue every time and don't use val 
argument. And maybe better to copy all JsonbContainer from jsonb->root 
using memcpy(). Instead of assigning reference to jsonb->root. As is the 
case in JsonbValueToJsonb().

It is necessary to remove the notice about JsonbToJsonbValue() in 
JsonbValueToJsonb() comments.

> -static JsonbValue *
> +JsonbValue *
>  setPath(JsonbIterator **it, Datum *path_elems,

Why did you remove static keyword? setPath() is still static.

> -    JsonbValue    v;
> +    JsonbValue    v, *new = (JsonbValue *) newval;

Is declaration of "new" variable necessary here? I think it is extra 
declaration. Also its name may bring some problems. For example, there 
is a thread where guys try to port PostgreSQL to C++.

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [PATCH] Generic type subscription

From
Dmitry Dolgov
Date:
>On 5 October 2016 at 22:59, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:
>I've looked at your patch to make some review.

Thanks for the feedback.

> On 04.10.2016 11:28, Victor Wagner wrote: Git complains about whitespace in
> this version of patch:

Fixed.

> The term "subscription" is confusing me

Yes, you're right. "container" is too general I think, so I renamed everything
to "subscripting".

> Here '1' is used as a second item index. But from the last discussion
> should be a key

Well, I missed that, since I used already implemented function "setPath", and
this function implies that "all path elements before the last must already
exist", so in this case:

select jsonb_set('{"a": {}}', '{a, a1, 1}', '42');

nothing will be changed. I agree, we can implement this as discussed, but we need
to do it inside "setPath", so it will be like "globally".

> I'm not sure but maybe we should use here two different functions?

I thought about that, but finally decided to keep changes into "pg_type" as
small as possible (anyway, these two functions will serve one logical purpose).

> Here typeOid argument is not used. Is it should be here?

No, it shouldn't, fixed. It's interesting, that the same is correct for
"findTypeAnalyzeFunction" (I used this function as an example).

> I think name of the function is confusing. Maybe use
> transformContainerType()?

The point is that "transformArrayType" still contains some array-specific code,
although it doesn't affect to any other type. So I think it's not completely
correct to use "transformContainerType" name.

> Why did you remove static keyword? setPath() is still static
> Is declaration of "new" variable necessary here?

I changed it back.

Here is a new version of patch.
Attachment

Re: [PATCH] Generic type subscription

From
Artur Zakirov
Date:
Hello,

Do you have an updated version of the patch?

2016-10-18 20:41 GMT+03:00 Dmitry Dolgov <9erthalion6@gmail.com>:

> The term "subscription" is confusing me

Yes, you're right. "container" is too general I think, so I renamed everything
to "subscripting".

There is another occurrences of "subscription" including file names. Please fix them.

Also I've sent a personal email, that we have performance degradation of SELECT queries:

create table test (
    a int2[],
    b int4[][][]);

With patch:

=> insert into test (a[1:5], b[1:1][1:2][1:2])
      select '{1,2,3,4,5}', '{{{0,0},{1,2}}}'
      from generate_series(1,100000);
Time: 936,285 ms

=> UPDATE test SET a[0] = '2';
Time: 1605,406 ms (00:01,605)

=> UPDATE test SET b[1:1][1:1][1:2] = '{113, 117}';
Time: 1603,076 ms (00:01,603)

=> explain analyze select a[1], b[1][1][1] from test;
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..3858.00 rows=100000 width=6) (actual time=0.035..241.280 rows=100000 loops=1)
 Planning time: 0.087 ms
 Execution time: 246.916 ms
(3 rows)

And without patch:

=> insert into test (a[1:5], b[1:1][1:2][1:2])
      select '{1,2,3,4,5}', '{{{0,0},{1,2}}}'
      from generate_series(1,100000);
Time: 971,677 ms

=> UPDATE test SET a[0] = '2';
Time: 1508,262 ms (00:01,508)

=> UPDATE test SET b[1:1][1:1][1:2] = '{113, 117}';
Time: 1473,459 ms (00:01,473)

=> explain analyze select a[1], b[1][1][1] from test;
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..5286.00 rows=100000 width=6) (actual time=0.024..98.475 rows=100000 loops=1)
 Planning time: 0.081 ms
 Execution time: 105.055 ms
(3 rows)

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: [PATCH] Generic type subscription

From
Dmitry Dolgov
Date:
Hi

Sorry for late reply and thank you for the feedback (especially about the
problem with performance).

> There is another occurrences of "subscription" including file names

Fixed.

> we have performance degradation of SELECT queries

Yes, I figured it out. The main problem was about type info lookups, some of
them were made for each data extraction operation, which is unnecessary.
Here are results on my machine:

with the fixed version of patch

=# explain analyze select a[1], b[1][1][1] from test;
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..5286.00 rows=100000 width=6) (actual time=0.950..48.370 rows=100000 loops=1)
 Planning time: 0.054 ms
 Execution time: 51.859 ms
(3 rows)

and without the patch

=# explain analyze select a[1], b[1][1][1] from test;                                                                                 
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Seq Scan on test  (cost=0.00..5286.00 rows=100000 width=6) (actual time=3.443..43.452 rows=100000 loops=1)
 Planning time: 0.117 ms
 Execution time: 46.875 ms
(3 rows)

It's still slightly slower because of all new logic aroung subscripting
operation, but not that much.
Attachment

Re: [PATCH] Generic type subscription

From
Aleksander Alekseev
Date:
Hello.

I took a look on the latest -v4 patch. I would like to note that this
patch breaks a backward compatibility. For instance sr_plan extension[1]
stop to compile with errors like:

```
serialize.c:38:2: error: unknown type name ‘ArrayRef’ JsonbValue *ArrayRef_ser(const ArrayRef *node, JsonbParseState
*state,
bool sub_object); ^
serialize.c:913:2: error: unknown type name ‘ArrayRef’ JsonbValue *ArrayRef_ser(const ArrayRef *node, JsonbParseState
*state,
bool sub_object) ^
In file included from sr_plan.h:4:0,                from serialize.c:1:

...

```

Other extensions could be affected as well. I'm not saying that it's a
fatal drawback, but it's definitely something you should be aware of.

I personally strongly believe that we shouldn't break extensions between
major releases or at least make it trivial to properly rewrite them.
Unfortunately it's not a case currently.

[1] https://github.com/postgrespro/sr_plan

--
Best regards,
Aleksander Alekseev

Re: [PATCH] Generic type subscription

From
Dmitry Dolgov
Date:
> On 15 November 2016 at 15:03, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote:
> Hello.
> I took a look on the latest -v4 patch. I would like to note that this
> patch breaks a backward compatibility. For instance sr_plan extension[1]
> stop to compile with errors

Thank you for the feedback.

Well, if we're speaking about this particular extension, if I understood
correctly, it fetches all parse tree nodes from Postgres and generates code
using this information. So to avoid compilation problems I believe you need to
run `make USE_PGXS=1 genparser` again (it worked for me, I don't see any
mentions of `ArrayRef`).

But speaking generally, I don't see how we can provide backward compatibility
for those extensions, who are strongly coupled with implementation details of
parsing tree. I mean, in terms of interface it's mostly about to replace
`ArrayRef` to `SubscriptingRef`, but I think it's better to do it in the
extension code.

Re: [PATCH] Generic type subscription

From
Haribabu Kommi
Date:


On Thu, Nov 17, 2016 at 10:56 PM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> On 15 November 2016 at 15:03, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote:
> Hello.
> I took a look on the latest -v4 patch. I would like to note that this
> patch breaks a backward compatibility. For instance sr_plan extension[1]
> stop to compile with errors

Thank you for the feedback.

Well, if we're speaking about this particular extension, if I understood
correctly, it fetches all parse tree nodes from Postgres and generates code
using this information. So to avoid compilation problems I believe you need to
run `make USE_PGXS=1 genparser` again (it worked for me, I don't see any
mentions of `ArrayRef`).

But speaking generally, I don't see how we can provide backward compatibility
for those extensions, who are strongly coupled with implementation details of
parsing tree. I mean, in terms of interface it's mostly about to replace
`ArrayRef` to `SubscriptingRef`, but I think it's better to do it in the
extension code.


Moved to next CF with "needs review" status.


Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] [PATCH] Generic type subscription

From
Dmitry Dolgov
Date:
> On 5 December 2016 at 12:03, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> Moved to next CF with "needs review" status.

Looks like we stuck here little bit. Does anyone else have any suggestions/improvements, or this patch is in good enough shape?

Re: [HACKERS] [PATCH] Generic type subscription

From
Artur Zakirov
Date:
2016-12-26 18:49 GMT+03:00 Dmitry Dolgov <9erthalion6@gmail.com>:
>> On 5 December 2016 at 12:03, Haribabu Kommi <kommi.haribabu@gmail.com>
>> wrote:
>
>> Moved to next CF with "needs review" status.
>
> Looks like we stuck here little bit. Does anyone else have any
> suggestions/improvements, or this patch is in good enough shape?

Would you rebase the patch, please? It seems it is necessary. It can't
be applied now.

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] [PATCH] Generic type subscription

From
Dmitry Dolgov
Date:
> On 27 December 2016 at 04:05, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:
> Would you rebase the patch, please? It seems it is necessary. It can't
> be applied now.

Sure, here is a new rebased version.
Attachment

Re: [HACKERS] [PATCH] Generic type subscription

From
Aleksander Alekseev
Date:
As I mentioned above [1] in my humble opinion this patch is not at all in a
"good shape" until it breaks existing extensions.

[1] http://postgr.es/m/20161115080324.GA5351%40e733.localdomain

On Mon, Dec 26, 2016 at 10:49:30PM +0700, Dmitry Dolgov wrote:
> > On 5 December 2016 at 12:03, Haribabu Kommi <kommi.haribabu@gmail.com>
>  wrote:
>
> > > Moved to next CF with "needs review" status.
>
> Looks like we stuck here little bit. Does anyone else have any
> suggestions/improvements, or this patch is in good enough shape?

--
Best regards,
Aleksander Alekseev

Re: [HACKERS] [PATCH] Generic type subscription

From
Dmitry Dolgov
Date:
> On 27 December 2016 at 16:09, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote:
> until it breaks existing extensions.

Hm...I already answered, that I managed to avoid compilation problems for this particular extension
using the `genparser` command again:

> On Thu, Nov 17, 2016 at 10:56 PM, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
> wrote:
>
> > On 15 November 2016 at 15:03, Aleksander Alekseev <
> a(dot)alekseev(at)postgrespro(dot)ru> wrote:
> > Hello.
> >
> > I took a look on the latest -v4 patch. I would like to note that this
> > patch breaks a backward compatibility. For instance sr_plan extension[1]
> > stop to compile with errors
>
> Thank you for the feedback.
>
> Well, if we're speaking about this particular extension, if I understood
> correctly, it fetches all parse tree nodes from Postgres and generates code
> using this information. So to avoid compilation problems I believe you need to
> run `make USE_PGXS=1 genparser` again (it worked for me, I don't see any
> mentions of `ArrayRef`).
>
> But speaking generally, I don't see how we can provide backward
> compatibility for those extensions, who are strongly coupled with implementation details
> of parsing tree. I mean, in terms of interface it's mostly about to replace
> `ArrayRef` to `SubscriptingRef`, but I think it's better to do it in the extension code.

Or is there something else that I missed?

Re: [HACKERS] [PATCH] Generic type subscription

From
Artur Zakirov
Date:
2016-12-27 14:42 GMT+05:00 Dmitry Dolgov <9erthalion6@gmail.com>:
>> On 27 December 2016 at 16:09, Aleksander Alekseev
>> <a.alekseev@postgrespro.ru> wrote:
>> until it breaks existing extensions.
>
> Hm...I already answered, that I managed to avoid compilation problems for
> this particular extension
> using the `genparser` command again:

I suppose that a separate node type could solve it. But I'm not
convinced about how to distinguish ArrayRef node with new
SubscriptingRef node. Maybe it could be done in the
transformIndirection() function. If I understand all correctly.

Also Tom pointed that he had bad experience with using ArrayRef node:
https://www.postgresql.org/message-id/518.1439846343%40sss.pgh.pa.us
> No.  Make a new expression node type.
>
> (Salesforce did something similar for an internal feature, and it was a
> disaster both for code modularity and performance.  We had to change it to
> a separate node type, which I just got finished doing.  Don't go down that
> path.  While you're at it, I'd advise that fetch and assignment be two
> different node types rather than copying ArrayRef's bad precedent of using
> only one.)

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] [PATCH] Generic type subscription

From
Dmitry Dolgov
Date:
> On 4 January 2017 at 18:06, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:
> But I'm not convinced about how to distinguish ArrayRef node with new
> SubscriptingRef node.

I'm not sure I understood you correctly. You're talking about having two nodes
`ArrayRef` and `SubscriptingRef` at the same time for the sake of backward
compatibility, am I right? But they're basically the same, since
`SubscriptingRef` name is used just to indicate more general purpose of this
node.

> Also Tom pointed that he had bad experience with using ArrayRef node:

Yes, but it was related to the idea of having `ArrayRef` and `JsonbRef` nodes
for specific types. Since now there is generic `SubscriptingRef` node, I think
it should be ok.

>> Hm...I already answered, that I managed to avoid compilation problems for
>> this particular extension using the `genparser` command again:

> I suppose that a separate node type could solve it.

Just to be clear - as far as I understood, these compilation problems were
caused not because the extension knew something about ArrayRef node in
particular, but because the extension tried to extract all nodes to generate
code from them. It means any change will require "refetching", so I think it's
natural for this extension.

Re: [HACKERS] [PATCH] Generic type subscription

From
Dmitry Dolgov
Date:
Here is a small improvement to patch documentation - I forgot to add link
to the page "user-defined subscripting procedure" into the "Extending ..." section.
Attachment

Re: [HACKERS] [PATCH] Generic type subscription

From
Artur Zakirov
Date:
> Yes, but it was related to the idea of having `ArrayRef` and `JsonbRef` nodes
> for specific types. Since now there is generic `SubscriptingRef` node, I think
> it should be ok.

Sorry I misunderstood it.

> Just to be clear - as far as I understood, these compilation problems were
> caused not because the extension knew something about ArrayRef node in
> particular, but because the extension tried to extract all nodes to generate
> code from them. It means any change will require "refetching", so I think it's
> natural for this extension.

Agree. It will be hard to maintain both nodes. And it is not so smart
to have two nodes ArrayRef (deprecated) and SubscriptingRef. It is not
hard to replace ArrayRef node with SubscriptingRef  in other
extensions.

There is a little note:

>  #include "utils/lsyscache.h"
> +#include "utils/syscache.h"
>  #include "utils/memutils.h"

I think "utils/syscache.h" isn't necessary here. PostgreSQL could be
compiled without this include.

I suppose that this patch can be marked as "Ready for commiter". Any opinions?

-- 
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] [PATCH] Generic type subscription

From
Tom Lane
Date:
Dmitry Dolgov <9erthalion6@gmail.com> writes:
> [ generic_type_subscription_v6.patch ]

Not too surprisingly, this doesn't apply anymore in the wake of commit
ea15e1867.  Could you rebase?  Changes for that should be pretty trivial
I'd expect.

I took an extremely quick look over the patch --- mostly looking
at the header file changes not the code --- and have a couple of
comments:

1. As I mentioned previously, it's a seriously bad idea that ArrayRef
is used for both array subscripting and array assignment.  Let's fix
that while we're at it, rather than setting that mistake in even more
stone by embedding it in a datatype extension API.

2. I'm not very pleased that the subscripting functions have signature
"subscripting(internal) returns internal"; we have more than enough of
those already, and each one is a hazard for plugging the wrong function
into the wrong place.  Worse yet, you're flat out lying about the number
of arguments that these functions actually expect to receive, which is
something that could get broken by any number of plausible future changes.
Can we arrange to do that differently?  I'd prefer something in which the
argument and result types are visibly connected to the actual datatypes
at hand, for instance  array_subscript(anyarray, internal) returns anyelement  array_assign(anyarray, internal,
anyelement)returns anyarray
 
where the "internal" argument is some representation of only the subscript
expressions.  This would allow CREATE TYPE to perform some amount of
checking that the right function(s) had been specified.  (If that means
we use two functions not one per datatype, that's fine with me.)  If that
seems impractical, let's at least pick a signature that doesn't conflict
with any other INTERNAL-using APIs, and preferably has some connection
to what the arguments really are.

3. The contents of ArrayRef are designed on the assumption that the same
typmod and collation values apply to both an array and its elements.
That's okay for standard arrays, but I do not think it holds for every
other container situation.  For example, hstore doesn't have collation
last I checked, but it would likely want to treat its element type as
being text, which does.  So this needs to be generalized.

4. It looks like your work on the node processing infrastructure has been
limited to s/ArrayRef/SubscriptingRef/g, but that's not nearly enough.
SubscriptingRef needs to be regarded as an opportunity to invoke a
user-defined function, which means that it now acts quite a bit like
FuncExpr.  For example, the function-to-be-invoked needs to be checked for
volatility, leakproofness, parallel safety, etc in operations that want to
know about such things.  So check_functions_in_node(), for one, needs to
consider SubscriptingRef, and really you'll have to look at everyplace
that deals with FuncExpr and see if it needs a case for SubscriptingRef
now.  I'd advise adding the OID of the subscripting function to
SubscriptingRef, so that those places don't need to do additional catalog
lookups to get it.

BTW, a different approach that might be worth considering is to say that
the nodetree representation of one of these things *is* a FuncExpr, and
the new magic thing is just that we invent a new CoercionForm value
which causes ruleutils.c to print the expression as a subscripting op.
I'm not quite convinced that that's a good idea --- a "coercion format"
that says "subscript" seems a bit weird --- but it would greatly reduce
the number of places you'd have to touch.
        regards, tom lane



Re: [HACKERS] [PATCH] Generic type subscription

From
Tom Lane
Date:
I wrote:
> ... (If that means
> we use two functions not one per datatype, that's fine with me.)

Actually, after thinking about that a bit more: you've really squeezed
*three* different APIs into one function.  Namely, subscript-reference
parse analysis, array subscripting execution, and array assignment
execution.  It would be cleaner, and would reduce runtime overhead a bit,
if those were embodied as three separate functions.

It might be possible to get away with having only one pg_type column,
pointing at the parse-analysis function.  That function would generate
a SubscriptingRef tree node containing the OID of the appropriate
execution function, which execQual.c could call without ever knowing
its name explicitly.

This clearly would work for built-in types, since the parse-analysis
function could rely on fmgroids.h for the OIDs of the execution functions.
But I'm less sure if it would work in extensions, which won't have
predetermined OIDs for their functions.  Is there a way for a function
in an extension to find the OID of one of its sibling functions?
        regards, tom lane



Re: [HACKERS] [PATCH] Generic type subscription

From
Jim Nasby
Date:
On 1/23/17 1:36 PM, Tom Lane wrote:
> Is there a way for a function
> in an extension to find the OID of one of its sibling functions?

Obviously there's regprocedure (or it's C equivalent), but then you're 
stuck re-computing at runtime. I've messed around with that a bit in an 
effort to have an extension depend on another extension that allows the 
user to specify it's schema. If you're already doing metaprogramming 
it's not an enormous problem... if you're not already doing that it 
sucks. Trying to make that work in C would be somewhere between 
impossible and a nightmare.

Since this kind of thing affects extensions that depend on extensions, 
it'd certainly be nice if there was some way to address it.

BTW, I actually do use SPI to call one of the reg casts in my variant 
type, but that's just a hack I used in the beginning and haven't gotten 
around to replacing. Since there's a static variable that gets set to 
the relevant OID it's not that bad performance-wise from what I can 
tell, but I suspect that's not something we want to be recommending to 
others...
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] [PATCH] Generic type subscription

From
Tom Lane
Date:
I wrote:
> BTW, a different approach that might be worth considering is to say that
> the nodetree representation of one of these things *is* a FuncExpr, and
> the new magic thing is just that we invent a new CoercionForm value
> which causes ruleutils.c to print the expression as a subscripting op.
> I'm not quite convinced that that's a good idea --- a "coercion format"
> that says "subscript" seems a bit weird --- but it would greatly reduce
> the number of places you'd have to touch.

After sleeping on it, that approach is starting to sound better to me.
Consider a design like this:

* Leave ArrayRef strictly alone, and introduce new infrastructure beside
it for non-array containers.  That sounds ugly at first, but it has two
significant advantages: you don't have to refactor or even touch any
array-related code, and you do not have to worry about somebody objecting
to the patch because it adds unacceptable overhead to existing array
operations.  Converting array ops to go through a function-call API surely
must add *some* overhead, and at this point we don't have enough info to
be certain it would be negligibly small.  (Testing the existing patch
cannot prove that, since as I noted yesterday, you're missing a lot of
plan-time manipulations that need to happen for a generic function call.)

* Let the node representation for non-array-container access be FuncExpr
with new value(s) of funcformat.  You'd need to design the exact
representation to be used for the subscript arguments, but that doesn't
seem horribly complicated.  In this way, you're not on the hook to
duplicate all the node-processing infrastructure for either ArrayRef
or FuncExpr --- ideally, you won't need to do much more in the core
system than provide the parse-time callout and write suitable deparsing
logic in ruleutils.c.

The ugliness of thinking that "subscripting" is a kind of coercion could
be dealt with by changing funcformat to some new enum type named something
like, say, FuncDisplayForm.  This would require touching all the places
that mess with funcformat, but that could be a good thing anyway, because
you'd be sure that you'd looked at everything that might need changes.
        regards, tom lane



Re: [HACKERS] [PATCH] Generic type subscription

From
Robert Haas
Date:
On Mon, Jan 23, 2017 at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Can we arrange to do that differently?  I'd prefer something in which the
> argument and result types are visibly connected to the actual datatypes
> at hand, for instance
>           array_subscript(anyarray, internal) returns anyelement
>           array_assign(anyarray, internal, anyelement) returns anyarray

What about having no internal arguments here at all?  Like if you want
to support foo[4] define a subscript function that takes (mytype, int)
and returns whatever.  You might have to allow for multiple
subscripting functions with different argument types for this to
really work, though.

/me ducks

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [PATCH] Generic type subscription

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Jan 23, 2017 at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Can we arrange to do that differently?  I'd prefer something in which the
>> argument and result types are visibly connected to the actual datatypes
>> at hand, for instance
>> array_subscript(anyarray, internal) returns anyelement
>> array_assign(anyarray, internal, anyelement) returns anyarray

> What about having no internal arguments here at all?  Like if you want
> to support foo[4] define a subscript function that takes (mytype, int)
> and returns whatever.  You might have to allow for multiple
> subscripting functions with different argument types for this to
> really work, though.

Yeah, the problem is if you're trying to support the full generality
of the array slice syntax, it gets kind of painful to shoehorn that
into simple SQL types.  Consider array[1][2:3][4:] or something like
that.  We could say that extensions only have access to the basic
non-slice notation, but I'm sure someone would be unhappy with that
--- and even then, you don't really want to define six functions to
deal with six possible numbers of subscripts.

Another issue is that, if someone is trying to use this facility to define
array semantics that are less screwball than what Berkeley bequeathed us,
they might not be happy with the idea that a single function
"array_subscript(anyarray, internal) returns anyelement" is what
determines the result type of a subscripting operation.  It might be for
example that you need to look at the subscripted object, as well as the
number of subscripts, before you know whether the result is an element or
a lower-dimension array.  So I'd really prefer that the functionality
involve a parser callout, and that would certainly need "internal"
argument(s).

Given that it's a parser callout, what the signatures of the runtime
function(s) are is really not our problem; the callout can construct
any darn expression tree it pleases.  There would only be some subset
of that space that ruleutils.c would know how to print as a subscripting
construct, but many people might be happy with the results reverse-listing
as a regular function or operator call.
        regards, tom lane



Re: [HACKERS] [PATCH] Generic type subscription

From
Robert Haas
Date:
On Thu, Jan 26, 2017 at 1:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So I'd really prefer that the functionality
> involve a parser callout, and that would certainly need "internal"
> argument(s).

Thanks, I see now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [PATCH] Generic type subscription

From
Dmitry Dolgov
Date:
> On 24 January 2017 at 02:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I took an extremely quick look over the patch

Thank you for the feedback. It took some time for me to think about all
suggestions and notes.

> 1. As I mentioned previously, it's a seriously bad idea that ArrayRef
> is used for both array subscripting and array assignment.  Let's fix
> that while we're at it, rather than setting that mistake in even more
> stone by embedding it in a datatype extension API.

Sure. But I assume that `SubscriptingRef` and `SubscriptingAssignmentRef` will
be almost identical since they carry the same information to get a value
and to assign a new value (so, probably it will be just an alias with a
different related function).

> 2. I'm not very pleased that the subscripting functions have signature
> "subscripting(internal) returns internal";

Basically, current implementation of subscripting operation contains node
related logic (e.g. like to verify that we're not using slice syntax for jsonb)
and data type related logic (e.g. to get/to assign a value in an array). And if
it's easy enough to use:
    `array_subscript(anyarray, internal) returns anyelement`
    `array_assign(anyarray, internal, anyelement) returns anyarray`
form for the second one, the first one should accept node as an argument and
return node - I'm not sure if it's possible to use something else than
`internal` here. Speaking about other signature issues, sure, I'll fix them.

> 3. The contents of ArrayRef are designed on the assumption that the same
> typmod and collation values apply to both an array and its elements.

Yes, I missed that. It looks straightforward for me, we can just split
`refcollid` and `reftypmod` to `refcontainercollid`, `refelementcollid` and
`refcontainertypmod`, `refelementtypmod`.

> 4. It looks like your work on the node processing infrastructure has been
> limited to s/ArrayRef/SubscriptingRef/g, but that's not nearly enough.
> SubscriptingRef needs to be regarded as an opportunity to invoke a
> user-defined function, which means that it now acts quite a bit like
> FuncExpr.  For example, the function-to-be-invoked needs to be checked for
> volatility, leakproofness, parallel safety, etc in operations that want to
> know about such things.
> ....
> I noted yesterday, you're missing a lot of plan-time manipulations that need
> to happen for a generic function call.

Yes, I totally missed these too. I'm going to improve this situation soon.

> Actually, after thinking about that a bit more: you've really squeezed
> *three* different APIs into one function.  Namely, subscript-reference
> parse analysis, array subscripting execution, and array assignment
> execution.  It would be cleaner, and would reduce runtime overhead a bit,
> if those were embodied as three separate functions.
>
> It might be possible to get away with having only one pg_type column,
> pointing at the parse-analysis function.  That function would generate
> a SubscriptingRef tree node containing the OID of the appropriate
> execution function, which execQual.c could call without ever knowing
> its name explicitly.
>
> This clearly would work for built-in types, since the parse-analysis
> function could rely on fmgroids.h for the OIDs of the execution functions.
> But I'm less sure if it would work in extensions, which won't have
> predetermined OIDs for their functions.  Is there a way for a function
> in an extension to find the OID of one of its sibling functions?
>
> On 24 January 2017 at 07:54, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>
> Obviously there's regprocedure (or it's C equivalent), but then you're stuck
> re-computing at runtime. I've messed around with that a bit in an effort to
> have an extension depend on another extension that allows the user to specify
> it's schema. If you're already doing metaprogramming it's not an enormous
> problem... if you're not already doing that it sucks. Trying to make that
> work in C would be somewhere between impossible and a nightmare.

The idea of having one function that will generate SubscriptingRef node sounds
good. But I'm afraid of consequences about using it for extensions (especially
since the request for general subscripting implementation came also from
their side). Is there a way to make it less troublesome?

To summarize, right now there are three options to handle a `SubscriptingRef`
node analysis, subscripting execution and assignment execution:

* one `pg_type` column with an OID of corresponding function for each purpose
  (which isn't cool)

* one "controller" function that will call directly another function with
  required logic (which is a "squeezing" and it's the current implementation)

* one function that will return `SubscriptingRef` with an OID of required
  function (which is potentially troublesome for extensions)

> BTW, a different approach that might be worth considering is to say that
> the nodetree representation of one of these things *is* a FuncExpr, and
> the new magic thing is just that we invent a new CoercionForm value
> which causes ruleutils.c to print the expression as a subscripting op.
>
> Leave ArrayRef strictly alone, and introduce new infrastructure beside it for
> non-array containers.

Yes, it will eliminate any objections about an overhead to existing array
operations. But I'm concerned that it will "virtually" reduce flexibility of
the solution on the whole, because it means we think only about one data type
as a target for implementation and it's easy to miss something for more complex
(in terms of subscripting logic) data structures.

> Let the node representation for non-array-container access be FuncExpr with
> new value(s) of funcformat.

Probably I misunderstood something, but if we want to keep all necessary
information about subscripting (like index value, slices etc.) together with
OID's of required functions, it will be almost the same as `SubscriptingRef`
isn't it? For me the current implementation looks more natural - there was a
component that was responsible for the subscripting for one data type, and it's
implementation evolved and became more general (but of course that's probably
just because I implemented this and it looks more natural only for me).

Re: [HACKERS] [PATCH] Generic type subscription

From
Michael Paquier
Date:
On Sat, Jan 28, 2017 at 2:31 AM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>> On 24 January 2017 at 02:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I took an extremely quick look over the patch
>
> Thank you for the feedback. It took some time for me to think about all
> suggestions and notes.

Okay, I am marking the patch as returned with feedback seeing those
reviews. Please don't hesitate to submit a new version.
-- 
Michael



Re: [HACKERS] [PATCH] Generic type subscription

From
Dmitry Dolgov
Date:
On 24 January 2017 at 02:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> It might be possible to get away with having only one pg_type column,
> pointing at the parse-analysis function.  That function would generate
> a SubscriptingRef tree node containing the OID of the appropriate
> execution function, which execQual.c could call without ever knowing
> its name explicitly.

Btw, is it acceptable if such generated SubscriptingRef will contain just a
function pointer to the appropriate execution function instead of an OID (e.g.
like `ExprStateEvalFunc`)? It will help to avoid problems in case of extensions.