Thread: [PATCH] Generic type subscription
Hi,
Regarding to the previous conversations [1], [2], here is a patch (with some
improvements from Nikita Glukhov) for the generic type subscription. It allows
to define type-specific subscription logic for any data type and has
implementations for the array and jsonb types. There are following changes in this
patch:
* A new column for `pg_type` (`regproc typsubscription`) which points out a
type-specific subscription procedure for particular data type. It can be none
(and it's a default value), which means that this data type doesn't support
subscription.
* Type-specific code (e.g. any kind of verification, type coercion, actual
data extraction and update) in the array subscription implementation was
separated from generic code into `array_subscription` procedure. Generic
implementation assumes that subscription expression can be in form `[a]` or
`[a:b]`, and there isn't any restrictions for type of `a` and `b`.
* Using the same api a new subscription logic was implemented for `jsonb` type
in `jsonb_subscription` procedure. Several changes were introduced into jsonb
functions just to be able to share common code.
I believe that this patch is more or less in good shape, so I would like to know
what do you think about it? Feedback is welcome.
Attachment
On 9/9/16 6:29 AM, Dmitry Dolgov wrote: > Regarding to the previous conversations [1], [2], here is a patch (with some > improvements from Nikita Glukhov) for the generic type subscription. Awesome! Please make sure to add it to the Commit Fest app. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
On Fri, 9 Sep 2016 18:29:23 +0700 Dmitry Dolgov <9erthalion6@gmail.com> wrote: > Hi, > > Regarding to the previous conversations [1], [2], here is a patch > (with some improvements from Nikita Glukhov) for the generic type > subscription. It allows > to define type-specific subscription logic for any data type and has > implementations for the array and jsonb types. There are following > changes in this > patch: I've tried to compile this patch with current state of master (commit 51c3e9fade76c12) and found out that, when configured with --enable-cassert, it doesn't pass make check. LOG: server process (PID 4643) was terminated by signal 6: Aborted DETAIL: Failed process was running: update test_jsonb_subscript set test_json['a'] = '{"b": 1}'::jsonb; --Victor
> I've tried to compile this patch with current state of master (commit
> 51c3e9fade76c12) and found out that, when configured with --enable-cassert,
> it doesn't pass make check.
Thanks for the feedback. Yes, unexpectedly for me, `ExecEvalExpr` can return
expanded `jbvArray` and `jbvObject` instead `jbvBinary` in both cases. It's
interesting, that this doesn't break anything, but obviously violates
the `pushJsonbValueScalar` semantics. I don't think `ExecEvalExpr` should be
changed for jsonb, we can handle this situation in `pushJsonbValue` instead. I've
attached a new version of patch with this modification.
On 27 September 2016 at 19:08, Victor Wagner <vitus@wagner.pp.ru> wrote:
On Fri, 9 Sep 2016 18:29:23 +0700
Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> Hi,
>
> Regarding to the previous conversations [1], [2], here is a patch
> (with some improvements from Nikita Glukhov) for the generic type
> subscription. It allows
> to define type-specific subscription logic for any data type and has
> implementations for the array and jsonb types. There are following
> changes in this
> patch:
I've tried to compile this patch with current state of master (commit
51c3e9fade76c12) and found out that, when configured with
--enable-cassert, it doesn't pass make check.
LOG: server process (PID 4643) was terminated by signal 6: Aborted
DETAIL: Failed process was running:
update test_jsonb_subscript set test_json['a'] = '{"b":
1}'::jsonb;
--
Victor
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Sat, 1 Oct 2016 16:52:34 +0700 Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > I've tried to compile this patch with current state of master > > (commit 51c3e9fade76c12) and found out that, when configured with > --enable-cassert, > > it doesn't pass make check. > > Thanks for the feedback. Yes, unexpectedly for me, `ExecEvalExpr` can > return expanded `jbvArray` and `jbvObject` instead `jbvBinary` in > both cases. It's interesting, that this doesn't break anything, but > obviously violates the `pushJsonbValueScalar` semantics. I don't > think `ExecEvalExpr` should be changed for jsonb, we can handle this > situation in `pushJsonbValue` instead. I've > attached a new version of patch with this modification. Thanks for you quick responce. I've not yet thoroughly investigated new patch, but already noticed some minor promlems: Git complains about whitespace in this version of patch: $ git apply ../generic_type_subscription_v2.patch ../generic_type_subscription_v2.patch:218: tab in indent.int first; ../generic_type_subscription_v2.patch:219: tab in indent.int second; ../generic_type_subscription_v2.patch:225: tab in indent.SubscriptionExecData *sbsdata = (SubscriptionExecData *)PG_GETARG_POINTER(1); ../generic_type_subscription_v2.patch:226: tab in indent.Custom *result = (Custom *) sbsdata->containerSource; ../generic_type_subscription_v2.patch:234: tab in indent.SubscriptionRef *sbsref = (SubscriptionRef *) PG_GETARG_POINTER(0); warning: squelched 29 whitespace errors warning: 34 lines add whitespace errors. It doesn't prevent me from further testing of patch, but worth noticing. -- Victor
On Sat, Oct 1, 2016 at 12:52 PM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> I've tried to compile this patch with current state of master (commit> 51c3e9fade76c12) and found out that, when configured with --enable-cassert,> it doesn't pass make check.Thanks for the feedback. Yes, unexpectedly for me, `ExecEvalExpr` can returnexpanded `jbvArray` and `jbvObject` instead `jbvBinary` in both cases. It'sinteresting, that this doesn't break anything, but obviously violatesthe `pushJsonbValueScalar` semantics. I don't think `ExecEvalExpr` should bechanged for jsonb, we can handle this situation in `pushJsonbValue` instead. I'veattached a new version of patch with this modification.
have you ever run 'make check' ?
=========================
53 of 168 tests failed.
=========================
=========================
53 of 168 tests failed.
=========================
On 27 September 2016 at 19:08, Victor Wagner <vitus@wagner.pp.ru> wrote:On Fri, 9 Sep 2016 18:29:23 +0700
Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> Hi,
>
> Regarding to the previous conversations [1], [2], here is a patch
> (with some improvements from Nikita Glukhov) for the generic type
> subscription. It allows
> to define type-specific subscription logic for any data type and has
> implementations for the array and jsonb types. There are following
> changes in this
> patch:
I've tried to compile this patch with current state of master (commit
51c3e9fade76c12) and found out that, when configured with
--enable-cassert, it doesn't pass make check.
LOG: server process (PID 4643) was terminated by signal 6: Aborted
DETAIL: Failed process was running:
update test_jsonb_subscript set test_json['a'] = '{"b":
1}'::jsonb;
--
Victor
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 5 October 2016 at 03:00, Oleg Bartunov <obartunov@gmail.com> wrote:
Sure, how else could I write tests for this feature? But right now on my machine
have you ever run 'make check' ?
=========================
53 of 168 tests failed.
=========================
Sure, how else could I write tests for this feature? But right now on my machine
everything is ok (the same for `make installcheck`):
$ make check
....
$ make check
....
=======================
All 168 tests passed.
======================= On Wed, Oct 5, 2016 at 6:48 AM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
On 5 October 2016 at 03:00, Oleg Bartunov <obartunov@gmail.com> wrote:have you ever run 'make check' ?
=========================
53 of 168 tests failed.
=========================
Sure, how else could I write tests for this feature? But right now on my machineeverything is ok (the same for `make installcheck`):
$ make check
....=======================All 168 tests passed.=======================
Oops, something was wrong in my setup :)
Hello, On 04.10.2016 11:28, Victor Wagner wrote: > > Git complains about whitespace in this version of patch: > > $ git apply ../generic_type_subscription_v2.patch > ../generic_type_subscription_v2.patch:218: tab in indent. > int first; > ../generic_type_subscription_v2.patch:219: tab in indent. > int second; > ../generic_type_subscription_v2.patch:225: tab in indent. > SubscriptionExecData *sbsdata = (SubscriptionExecData *) PG_GETARG_POINTER(1); > ../generic_type_subscription_v2.patch:226: tab in indent. > Custom *result = (Custom *) sbsdata->containerSource; > ../generic_type_subscription_v2.patch:234: tab in indent. > SubscriptionRef *sbsref = (SubscriptionRef *) PG_GETARG_POINTER(0); > warning: squelched 29 whitespace errors > warning: 34 lines add whitespace errors. > > It doesn't prevent me from further testing of patch, but worth noticing. > I agree with Victor. In sgml files whitespaces instead of tabs are usually used. I've looked at your patch to make some review. "subscription" -------------- The term "subscription" is confusing me. I'm not native english speaker. But I suppose that this term is not related with the "subscript". So maybe you should use the "subscripting" or "container" (because you already use the "container" term in the code). For example: T_SubscriptingRef SubscriptingRef deparseSubscriptingRef() xsubscripting.sgml CREATE TYPE custom ( internallength = 4, input = custom_in, output = custom_out, subscripting = custom_subscripting ); etc. Subscripting interface ---------------------- In tests I see the query: > +select ('[1, "2", null]'::jsonb)['1']; > + jsonb > +------- > + "2" > +(1 row) Here '1' is used as a second item index. But from the last discussion https://www.postgresql.org/message-id/55D24517.8080609%40agliodbs.com it should be a key: > There is one ambiguous case you need to address: > > testjson = '{ "a" : { } }' > > SET testjson['a']['a1']['1'] = 42 > > ... so in this case, is '1' a key, or the first item of an array? how > do we determine that? How does the user assign something to an array? And should return null. Is this right? Or this behaviour was not accepted in discussion? I didn't find it. > +Datum > +custom_subscription(PG_FUNCTION_ARGS) > +{ > + int op_type = PG_GETARG_INT32(0); > + FunctionCallInfoData target_fcinfo = get_slice_arguments(fcinfo, 1, > + fcinfo->nargs); > + > + if (op_type & SBS_VALIDATION) > + return custom_subscription_prepare(&target_fcinfo); > + > + if (op_type & SBS_EXEC) > + return custom_subscription_evaluate(&target_fcinfo); > + > + elog(ERROR, "incorrect op_type for subscription function: %d", op_type); > +} I'm not sure but maybe we should use here two different functions? For example: Datum custom_subscripting_validate(PG_FUNCTION_ARGS) { } Datum custom_subscripting_evaluate(PG_FUNCTION_ARGS) { } CREATE TYPE custom ( internallength = 8, input = custom_in, output = custom_out, subscripting_validate = custom_subscripting_validate, subscripting_evalute = custom_subscripting_evaluate, ); What do you think? Documentation ------------- > +<!-- doc/src/sgml/xsubscription.sgml --> > + > + <sect1 id="xsubscription"> > + <title>User-defined subscription procedure</title> There is the extra whitespace before <sect1>. > + </indexterm> > + When you define a new base type, you can also specify a custom procedure > + to handle subscription expressions. It should contains logic for verification > + and for extraction or update your data. For instance: I suppose a <para> tag is missed after the </indexterm>. > +</programlisting> > + > + </para> > + > + <para> So </para> is redundant here. Code ---- > +static Oid > +findTypeSubscriptionFunction(List *procname, Oid typeOid) > +{ > + Oid argList[1]; Here typeOid argument is not used. Is it should be here? > +ExecEvalSubscriptionRef(SubscriptionRefExprState *sbstate, I think in this function local arguments have a lot of tabs. It is normal if not all variables is aligned on the same line. A lot of tabs are occur in other functions too. Because of this some lines exceed 80 characters. > + if (sbstate->refupperindexpr != NULL) > + upper = (Datum *) palloc(sbstate->refupperindexpr->length * sizeof(Datum *)); > + > + if (sbstate->reflowerindexpr != NULL) > + lower = (Datum *) palloc(sbstate->reflowerindexpr->length * sizeof(Datum *)); Could we use palloc() before the "foreach(l, sbstate->refupperindexpr)" ? I think it could be a little optimization. > -transformArrayType(Oid *arrayType, int32 *arrayTypmod) > +transformArrayType(Oid *containerType, int32 *containerTypmod) I think name of the function is confusing. Maybe use transformContainerType()? > +JsonbValue * > +JsonbToJsonbValue(Jsonb *jsonb, JsonbValue *val) > +{ In this function we could palloc JsonbValue every time and don't use val argument. And maybe better to copy all JsonbContainer from jsonb->root using memcpy(). Instead of assigning reference to jsonb->root. As is the case in JsonbValueToJsonb(). It is necessary to remove the notice about JsonbToJsonbValue() in JsonbValueToJsonb() comments. > -static JsonbValue * > +JsonbValue * > setPath(JsonbIterator **it, Datum *path_elems, Why did you remove static keyword? setPath() is still static. > - JsonbValue v; > + JsonbValue v, *new = (JsonbValue *) newval; Is declaration of "new" variable necessary here? I think it is extra declaration. Also its name may bring some problems. For example, there is a thread where guys try to port PostgreSQL to C++. -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
>On 5 October 2016 at 22:59, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:
>I've looked at your patch to make some review.
Thanks for the feedback.
> On 04.10.2016 11:28, Victor Wagner wrote: Git complains about whitespace in
> this version of patch:
Fixed.
> The term "subscription" is confusing me
Yes, you're right. "container" is too general I think, so I renamed everything
to "subscripting".
> Here '1' is used as a second item index. But from the last discussion
> should be a key
Well, I missed that, since I used already implemented function "setPath", and
this function implies that "all path elements before the last must already
exist", so in this case:
select jsonb_set('{"a": {}}', '{a, a1, 1}', '42');
nothing will be changed. I agree, we can implement this as discussed, but we need
to do it inside "setPath", so it will be like "globally".
> I'm not sure but maybe we should use here two different functions?
I thought about that, but finally decided to keep changes into "pg_type" as
small as possible (anyway, these two functions will serve one logical purpose).
> Here typeOid argument is not used. Is it should be here?
No, it shouldn't, fixed. It's interesting, that the same is correct for
"findTypeAnalyzeFunction" (I used this function as an example).
> I think name of the function is confusing. Maybe use
> transformContainerType()?
The point is that "transformArrayType" still contains some array-specific code,
although it doesn't affect to any other type. So I think it's not completely
correct to use "transformContainerType" name.
> Why did you remove static keyword? setPath() is still static
> Is declaration of "new" variable necessary here?
> Is declaration of "new" variable necessary here?
I changed it back.
Here is a new version of patch.
Attachment
Hello,
Do you have an updated version of the patch?
2016-10-18 20:41 GMT+03:00 Dmitry Dolgov <9erthalion6@gmail.com>:
> The term "subscription" is confusing meYes, you're right. "container" is too general I think, so I renamed everythingto "subscripting".
There is another occurrences of "subscription" including file names. Please fix them.
Also I've sent a personal email, that we have performance degradation of SELECT queries:
create table test (
a int2[],
b int4[][][]);
With patch:
=> insert into test (a[1:5], b[1:1][1:2][1:2])
select '{1,2,3,4,5}', '{{{0,0},{1,2}}}'
from generate_series(1,100000);
Time: 936,285 ms
=> UPDATE test SET a[0] = '2';
Time: 1605,406 ms (00:01,605)
=> UPDATE test SET b[1:1][1:1][1:2] = '{113, 117}';
Time: 1603,076 ms (00:01,603)
=> explain analyze select a[1], b[1][1][1] from test;
QUERY PLAN
------------------------------ ------------------------------ ------------------------------ -------------------
Seq Scan on test (cost=0.00..3858.00 rows=100000 width=6) (actual time=0.035..241.280 rows=100000 loops=1)
Planning time: 0.087 ms
Execution time: 246.916 ms
(3 rows)
And without patch:
=> insert into test (a[1:5], b[1:1][1:2][1:2])
select '{1,2,3,4,5}', '{{{0,0},{1,2}}}'
from generate_series(1,100000);
Time: 971,677 ms
=> UPDATE test SET a[0] = '2';
Time: 1508,262 ms (00:01,508)
=> UPDATE test SET b[1:1][1:1][1:2] = '{113, 117}';
Time: 1473,459 ms (00:01,473)
=> explain analyze select a[1], b[1][1][1] from test;
QUERY PLAN
------------------------------ ------------------------------ ------------------------------ ------------------
Seq Scan on test (cost=0.00..5286.00 rows=100000 width=6) (actual time=0.024..98.475 rows=100000 loops=1)
Planning time: 0.081 ms
Execution time: 105.055 ms
(3 rows)
Hi
Sorry for late reply and thank you for the feedback (especially about the
problem with performance).
> There is another occurrences of "subscription" including file names
Fixed.
> we have performance degradation of SELECT queries
Yes, I figured it out. The main problem was about type info lookups, some of
them were made for each data extraction operation, which is unnecessary.
Here are results on my machine:
with the fixed version of patch
=# explain analyze select a[1], b[1][1][1] from test;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..5286.00 rows=100000 width=6) (actual time=0.950..48.370 rows=100000 loops=1)
Planning time: 0.054 ms
Execution time: 51.859 ms
(3 rows)
and without the patch
=# explain analyze select a[1], b[1][1][1] from test;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..5286.00 rows=100000 width=6) (actual time=3.443..43.452 rows=100000 loops=1)
Planning time: 0.117 ms
Execution time: 46.875 ms
(3 rows)
It's still slightly slower because of all new logic aroung subscripting
operation, but not that much.
Attachment
Hello. I took a look on the latest -v4 patch. I would like to note that this patch breaks a backward compatibility. For instance sr_plan extension[1] stop to compile with errors like: ``` serialize.c:38:2: error: unknown type name ‘ArrayRef’ JsonbValue *ArrayRef_ser(const ArrayRef *node, JsonbParseState *state, bool sub_object); ^ serialize.c:913:2: error: unknown type name ‘ArrayRef’ JsonbValue *ArrayRef_ser(const ArrayRef *node, JsonbParseState *state, bool sub_object) ^ In file included from sr_plan.h:4:0, from serialize.c:1: ... ``` Other extensions could be affected as well. I'm not saying that it's a fatal drawback, but it's definitely something you should be aware of. I personally strongly believe that we shouldn't break extensions between major releases or at least make it trivial to properly rewrite them. Unfortunately it's not a case currently. [1] https://github.com/postgrespro/sr_plan -- Best regards, Aleksander Alekseev
> On 15 November 2016 at 15:03, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote:
> Hello.
>
> I took a look on the latest -v4 patch. I would like to note that this
> patch breaks a backward compatibility. For instance sr_plan extension[1]
> stop to compile with errors
Thank you for the feedback.
Well, if we're speaking about this particular extension, if I understood
correctly, it fetches all parse tree nodes from Postgres and generates code
using this information. So to avoid compilation problems I believe you need to
run `make USE_PGXS=1 genparser` again (it worked for me, I don't see any
mentions of `ArrayRef`).
But speaking generally, I don't see how we can provide backward compatibility
for those extensions, who are strongly coupled with implementation details of
parsing tree. I mean, in terms of interface it's mostly about to replace
`ArrayRef` to `SubscriptingRef`, but I think it's better to do it in the
extension code.
On Thu, Nov 17, 2016 at 10:56 PM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> On 15 November 2016 at 15:03, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote:> Hello.>> I took a look on the latest -v4 patch. I would like to note that this> patch breaks a backward compatibility. For instance sr_plan extension[1]> stop to compile with errorsThank you for the feedback.Well, if we're speaking about this particular extension, if I understoodcorrectly, it fetches all parse tree nodes from Postgres and generates codeusing this information. So to avoid compilation problems I believe you need torun `make USE_PGXS=1 genparser` again (it worked for me, I don't see anymentions of `ArrayRef`).But speaking generally, I don't see how we can provide backward compatibilityfor those extensions, who are strongly coupled with implementation details ofparsing tree. I mean, in terms of interface it's mostly about to replace`ArrayRef` to `SubscriptingRef`, but I think it's better to do it in theextension code.
Moved to next CF with "needs review" status.
Regards,
Hari Babu
Fujitsu Australia
> On 5 December 2016 at 12:03, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> Moved to next CF with "needs review" status.
Looks like we stuck here little bit. Does anyone else have any suggestions/improvements, or this patch is in good enough shape?
> Moved to next CF with "needs review" status.
Looks like we stuck here little bit. Does anyone else have any suggestions/improvements, or this patch is in good enough shape?
2016-12-26 18:49 GMT+03:00 Dmitry Dolgov <9erthalion6@gmail.com>: >> On 5 December 2016 at 12:03, Haribabu Kommi <kommi.haribabu@gmail.com> >> wrote: > >> Moved to next CF with "needs review" status. > > Looks like we stuck here little bit. Does anyone else have any > suggestions/improvements, or this patch is in good enough shape? Would you rebase the patch, please? It seems it is necessary. It can't be applied now. -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
> On 27 December 2016 at 04:05, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:
> Would you rebase the patch, please? It seems it is necessary. It can't
> be applied now.
Sure, here is a new rebased version.
> Would you rebase the patch, please? It seems it is necessary. It can't
> be applied now.
Sure, here is a new rebased version.
Attachment
As I mentioned above [1] in my humble opinion this patch is not at all in a "good shape" until it breaks existing extensions. [1] http://postgr.es/m/20161115080324.GA5351%40e733.localdomain On Mon, Dec 26, 2016 at 10:49:30PM +0700, Dmitry Dolgov wrote: > > On 5 December 2016 at 12:03, Haribabu Kommi <kommi.haribabu@gmail.com> > wrote: > > > > Moved to next CF with "needs review" status. > > Looks like we stuck here little bit. Does anyone else have any > suggestions/improvements, or this patch is in good enough shape? -- Best regards, Aleksander Alekseev
> On 27 December 2016 at 16:09, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote:
> until it breaks existing extensions.
Hm...I already answered, that I managed to avoid compilation problems for this particular extension
using the `genparser` command again:
> until it breaks existing extensions.
Hm...I already answered, that I managed to avoid compilation problems for this particular extension
using the `genparser` command again:
> On Thu, Nov 17, 2016 at 10:56 PM, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
> wrote:
>
> > On 15 November 2016 at 15:03, Aleksander Alekseev <
> a(dot)alekseev(at)postgrespro(dot)ru> wrote:
> > Hello.
> >
> > I took a look on the latest -v4 patch. I would like to note that this
> > patch breaks a backward compatibility. For instance sr_plan extension[1]
> > stop to compile with errors
>
> Thank you for the feedback.
>
> Well, if we're speaking about this particular extension, if I understood
> correctly, it fetches all parse tree nodes from Postgres and generates code
> using this information. So to avoid compilation problems I believe you need to
> run `make USE_PGXS=1 genparser` again (it worked for me, I don't see any
> mentions of `ArrayRef`).
>
> But speaking generally, I don't see how we can provide backward
> compatibility for those extensions, who are strongly coupled with implementation details
> of parsing tree. I mean, in terms of interface it's mostly about to replace
> `ArrayRef` to `SubscriptingRef`, but I think it's better to do it in the extension code.
Or is there something else that I missed?
Or is there something else that I missed?
2016-12-27 14:42 GMT+05:00 Dmitry Dolgov <9erthalion6@gmail.com>: >> On 27 December 2016 at 16:09, Aleksander Alekseev >> <a.alekseev@postgrespro.ru> wrote: >> until it breaks existing extensions. > > Hm...I already answered, that I managed to avoid compilation problems for > this particular extension > using the `genparser` command again: I suppose that a separate node type could solve it. But I'm not convinced about how to distinguish ArrayRef node with new SubscriptingRef node. Maybe it could be done in the transformIndirection() function. If I understand all correctly. Also Tom pointed that he had bad experience with using ArrayRef node: https://www.postgresql.org/message-id/518.1439846343%40sss.pgh.pa.us > No. Make a new expression node type. > > (Salesforce did something similar for an internal feature, and it was a > disaster both for code modularity and performance. We had to change it to > a separate node type, which I just got finished doing. Don't go down that > path. While you're at it, I'd advise that fetch and assignment be two > different node types rather than copying ArrayRef's bad precedent of using > only one.) -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
> On 4 January 2017 at 18:06, Artur Zakirov <a.zakirov@postgrespro.ru> wrote:
> But I'm not convinced about how to distinguish ArrayRef node with new
> SubscriptingRef node.
I'm not sure I understood you correctly. You're talking about having two nodes
`ArrayRef` and `SubscriptingRef` at the same time for the sake of backward
compatibility, am I right? But they're basically the same, since
`SubscriptingRef` name is used just to indicate more general purpose of this
node.
> Also Tom pointed that he had bad experience with using ArrayRef node:
Yes, but it was related to the idea of having `ArrayRef` and `JsonbRef` nodes
for specific types. Since now there is generic `SubscriptingRef` node, I think
it should be ok.
>> Hm...I already answered, that I managed to avoid compilation problems for
>> this particular extension using the `genparser` command again:
> I suppose that a separate node type could solve it.
Just to be clear - as far as I understood, these compilation problems were
caused not because the extension knew something about ArrayRef node in
particular, but because the extension tried to extract all nodes to generate
code from them. It means any change will require "refetching", so I think it's
natural for this extension.
Here is a small improvement to patch documentation - I forgot to add link
to the page "user-defined subscripting procedure" into the "Extending ..." section.
Attachment
> Yes, but it was related to the idea of having `ArrayRef` and `JsonbRef` nodes > for specific types. Since now there is generic `SubscriptingRef` node, I think > it should be ok. Sorry I misunderstood it. > Just to be clear - as far as I understood, these compilation problems were > caused not because the extension knew something about ArrayRef node in > particular, but because the extension tried to extract all nodes to generate > code from them. It means any change will require "refetching", so I think it's > natural for this extension. Agree. It will be hard to maintain both nodes. And it is not so smart to have two nodes ArrayRef (deprecated) and SubscriptingRef. It is not hard to replace ArrayRef node with SubscriptingRef in other extensions. There is a little note: > #include "utils/lsyscache.h" > +#include "utils/syscache.h" > #include "utils/memutils.h" I think "utils/syscache.h" isn't necessary here. PostgreSQL could be compiled without this include. I suppose that this patch can be marked as "Ready for commiter". Any opinions? -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Dmitry Dolgov <9erthalion6@gmail.com> writes: > [ generic_type_subscription_v6.patch ] Not too surprisingly, this doesn't apply anymore in the wake of commit ea15e1867. Could you rebase? Changes for that should be pretty trivial I'd expect. I took an extremely quick look over the patch --- mostly looking at the header file changes not the code --- and have a couple of comments: 1. As I mentioned previously, it's a seriously bad idea that ArrayRef is used for both array subscripting and array assignment. Let's fix that while we're at it, rather than setting that mistake in even more stone by embedding it in a datatype extension API. 2. I'm not very pleased that the subscripting functions have signature "subscripting(internal) returns internal"; we have more than enough of those already, and each one is a hazard for plugging the wrong function into the wrong place. Worse yet, you're flat out lying about the number of arguments that these functions actually expect to receive, which is something that could get broken by any number of plausible future changes. Can we arrange to do that differently? I'd prefer something in which the argument and result types are visibly connected to the actual datatypes at hand, for instance array_subscript(anyarray, internal) returns anyelement array_assign(anyarray, internal, anyelement)returns anyarray where the "internal" argument is some representation of only the subscript expressions. This would allow CREATE TYPE to perform some amount of checking that the right function(s) had been specified. (If that means we use two functions not one per datatype, that's fine with me.) If that seems impractical, let's at least pick a signature that doesn't conflict with any other INTERNAL-using APIs, and preferably has some connection to what the arguments really are. 3. The contents of ArrayRef are designed on the assumption that the same typmod and collation values apply to both an array and its elements. That's okay for standard arrays, but I do not think it holds for every other container situation. For example, hstore doesn't have collation last I checked, but it would likely want to treat its element type as being text, which does. So this needs to be generalized. 4. It looks like your work on the node processing infrastructure has been limited to s/ArrayRef/SubscriptingRef/g, but that's not nearly enough. SubscriptingRef needs to be regarded as an opportunity to invoke a user-defined function, which means that it now acts quite a bit like FuncExpr. For example, the function-to-be-invoked needs to be checked for volatility, leakproofness, parallel safety, etc in operations that want to know about such things. So check_functions_in_node(), for one, needs to consider SubscriptingRef, and really you'll have to look at everyplace that deals with FuncExpr and see if it needs a case for SubscriptingRef now. I'd advise adding the OID of the subscripting function to SubscriptingRef, so that those places don't need to do additional catalog lookups to get it. BTW, a different approach that might be worth considering is to say that the nodetree representation of one of these things *is* a FuncExpr, and the new magic thing is just that we invent a new CoercionForm value which causes ruleutils.c to print the expression as a subscripting op. I'm not quite convinced that that's a good idea --- a "coercion format" that says "subscript" seems a bit weird --- but it would greatly reduce the number of places you'd have to touch. regards, tom lane
I wrote: > ... (If that means > we use two functions not one per datatype, that's fine with me.) Actually, after thinking about that a bit more: you've really squeezed *three* different APIs into one function. Namely, subscript-reference parse analysis, array subscripting execution, and array assignment execution. It would be cleaner, and would reduce runtime overhead a bit, if those were embodied as three separate functions. It might be possible to get away with having only one pg_type column, pointing at the parse-analysis function. That function would generate a SubscriptingRef tree node containing the OID of the appropriate execution function, which execQual.c could call without ever knowing its name explicitly. This clearly would work for built-in types, since the parse-analysis function could rely on fmgroids.h for the OIDs of the execution functions. But I'm less sure if it would work in extensions, which won't have predetermined OIDs for their functions. Is there a way for a function in an extension to find the OID of one of its sibling functions? regards, tom lane
On 1/23/17 1:36 PM, Tom Lane wrote: > Is there a way for a function > in an extension to find the OID of one of its sibling functions? Obviously there's regprocedure (or it's C equivalent), but then you're stuck re-computing at runtime. I've messed around with that a bit in an effort to have an extension depend on another extension that allows the user to specify it's schema. If you're already doing metaprogramming it's not an enormous problem... if you're not already doing that it sucks. Trying to make that work in C would be somewhere between impossible and a nightmare. Since this kind of thing affects extensions that depend on extensions, it'd certainly be nice if there was some way to address it. BTW, I actually do use SPI to call one of the reg casts in my variant type, but that's just a hack I used in the beginning and haven't gotten around to replacing. Since there's a static variable that gets set to the relevant OID it's not that bad performance-wise from what I can tell, but I suspect that's not something we want to be recommending to others... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
I wrote: > BTW, a different approach that might be worth considering is to say that > the nodetree representation of one of these things *is* a FuncExpr, and > the new magic thing is just that we invent a new CoercionForm value > which causes ruleutils.c to print the expression as a subscripting op. > I'm not quite convinced that that's a good idea --- a "coercion format" > that says "subscript" seems a bit weird --- but it would greatly reduce > the number of places you'd have to touch. After sleeping on it, that approach is starting to sound better to me. Consider a design like this: * Leave ArrayRef strictly alone, and introduce new infrastructure beside it for non-array containers. That sounds ugly at first, but it has two significant advantages: you don't have to refactor or even touch any array-related code, and you do not have to worry about somebody objecting to the patch because it adds unacceptable overhead to existing array operations. Converting array ops to go through a function-call API surely must add *some* overhead, and at this point we don't have enough info to be certain it would be negligibly small. (Testing the existing patch cannot prove that, since as I noted yesterday, you're missing a lot of plan-time manipulations that need to happen for a generic function call.) * Let the node representation for non-array-container access be FuncExpr with new value(s) of funcformat. You'd need to design the exact representation to be used for the subscript arguments, but that doesn't seem horribly complicated. In this way, you're not on the hook to duplicate all the node-processing infrastructure for either ArrayRef or FuncExpr --- ideally, you won't need to do much more in the core system than provide the parse-time callout and write suitable deparsing logic in ruleutils.c. The ugliness of thinking that "subscripting" is a kind of coercion could be dealt with by changing funcformat to some new enum type named something like, say, FuncDisplayForm. This would require touching all the places that mess with funcformat, but that could be a good thing anyway, because you'd be sure that you'd looked at everything that might need changes. regards, tom lane
On Mon, Jan 23, 2017 at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Can we arrange to do that differently? I'd prefer something in which the > argument and result types are visibly connected to the actual datatypes > at hand, for instance > array_subscript(anyarray, internal) returns anyelement > array_assign(anyarray, internal, anyelement) returns anyarray What about having no internal arguments here at all? Like if you want to support foo[4] define a subscript function that takes (mytype, int) and returns whatever. You might have to allow for multiple subscripting functions with different argument types for this to really work, though. /me ducks -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Jan 23, 2017 at 2:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Can we arrange to do that differently? I'd prefer something in which the >> argument and result types are visibly connected to the actual datatypes >> at hand, for instance >> array_subscript(anyarray, internal) returns anyelement >> array_assign(anyarray, internal, anyelement) returns anyarray > What about having no internal arguments here at all? Like if you want > to support foo[4] define a subscript function that takes (mytype, int) > and returns whatever. You might have to allow for multiple > subscripting functions with different argument types for this to > really work, though. Yeah, the problem is if you're trying to support the full generality of the array slice syntax, it gets kind of painful to shoehorn that into simple SQL types. Consider array[1][2:3][4:] or something like that. We could say that extensions only have access to the basic non-slice notation, but I'm sure someone would be unhappy with that --- and even then, you don't really want to define six functions to deal with six possible numbers of subscripts. Another issue is that, if someone is trying to use this facility to define array semantics that are less screwball than what Berkeley bequeathed us, they might not be happy with the idea that a single function "array_subscript(anyarray, internal) returns anyelement" is what determines the result type of a subscripting operation. It might be for example that you need to look at the subscripted object, as well as the number of subscripts, before you know whether the result is an element or a lower-dimension array. So I'd really prefer that the functionality involve a parser callout, and that would certainly need "internal" argument(s). Given that it's a parser callout, what the signatures of the runtime function(s) are is really not our problem; the callout can construct any darn expression tree it pleases. There would only be some subset of that space that ruleutils.c would know how to print as a subscripting construct, but many people might be happy with the results reverse-listing as a regular function or operator call. regards, tom lane
On Thu, Jan 26, 2017 at 1:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > So I'd really prefer that the functionality > involve a parser callout, and that would certainly need "internal" > argument(s). Thanks, I see now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On 24 January 2017 at 02:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I took an extremely quick look over the patch
Thank you for the feedback. It took some time for me to think about all
suggestions and notes.
> 1. As I mentioned previously, it's a seriously bad idea that ArrayRef
> is used for both array subscripting and array assignment. Let's fix
> that while we're at it, rather than setting that mistake in even more
> stone by embedding it in a datatype extension API.
Sure. But I assume that `SubscriptingRef` and `SubscriptingAssignmentRef` will
be almost identical since they carry the same information to get a value
and to assign a new value (so, probably it will be just an alias with a
different related function).
> 2. I'm not very pleased that the subscripting functions have signature
> "subscripting(internal) returns internal";
Basically, current implementation of subscripting operation contains node
related logic (e.g. like to verify that we're not using slice syntax for jsonb)
and data type related logic (e.g. to get/to assign a value in an array). And if
it's easy enough to use:
`array_subscript(anyarray, internal) returns anyelement`
`array_assign(anyarray, internal, anyelement) returns anyarray`
form for the second one, the first one should accept node as an argument and
`array_subscript(anyarray, internal) returns anyelement`
`array_assign(anyarray, internal, anyelement) returns anyarray`
form for the second one, the first one should accept node as an argument and
return node - I'm not sure if it's possible to use something else than
`internal` here. Speaking about other signature issues, sure, I'll fix them.
> 3. The contents of ArrayRef are designed on the assumption that the same
> typmod and collation values apply to both an array and its elements.
Yes, I missed that. It looks straightforward for me, we can just split
`refcollid` and `reftypmod` to `refcontainercollid`, `refelementcollid` and
`refcontainertypmod`, `refelementtypmod`.
> 4. It looks like your work on the node processing infrastructure has been
> limited to s/ArrayRef/SubscriptingRef/g, but that's not nearly enough.
> SubscriptingRef needs to be regarded as an opportunity to invoke a
> user-defined function, which means that it now acts quite a bit like
> FuncExpr. For example, the function-to-be-invoked needs to be checked for
> volatility, leakproofness, parallel safety, etc in operations that want to
> know about such things.
> ....
> I noted yesterday, you're missing a lot of plan-time manipulations that need
> to happen for a generic function call.
Yes, I totally missed these too. I'm going to improve this situation soon.
> Actually, after thinking about that a bit more: you've really squeezed
> *three* different APIs into one function. Namely, subscript-reference
> parse analysis, array subscripting execution, and array assignment
> execution. It would be cleaner, and would reduce runtime overhead a bit,
> if those were embodied as three separate functions.
>
> It might be possible to get away with having only one pg_type column,
> pointing at the parse-analysis function. That function would generate
> a SubscriptingRef tree node containing the OID of the appropriate
> execution function, which execQual.c could call without ever knowing
> its name explicitly.
>
> This clearly would work for built-in types, since the parse-analysis
> function could rely on fmgroids.h for the OIDs of the execution functions.
> But I'm less sure if it would work in extensions, which won't have
> predetermined OIDs for their functions. Is there a way for a function
> in an extension to find the OID of one of its sibling functions?
>
> On 24 January 2017 at 07:54, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>
> Obviously there's regprocedure (or it's C equivalent), but then you're stuck
> re-computing at runtime. I've messed around with that a bit in an effort to
> have an extension depend on another extension that allows the user to specify
> it's schema. If you're already doing metaprogramming it's not an enormous
> problem... if you're not already doing that it sucks. Trying to make that
> work in C would be somewhere between impossible and a nightmare.
The idea of having one function that will generate SubscriptingRef node sounds
good. But I'm afraid of consequences about using it for extensions (especially
since the request for general subscripting implementation came also from
their side). Is there a way to make it less troublesome?
To summarize, right now there are three options to handle a `SubscriptingRef`
node analysis, subscripting execution and assignment execution:
* one `pg_type` column with an OID of corresponding function for each purpose
(which isn't cool)
* one "controller" function that will call directly another function with
required logic (which is a "squeezing" and it's the current implementation)
* one function that will return `SubscriptingRef` with an OID of required
function (which is potentially troublesome for extensions)
> BTW, a different approach that might be worth considering is to say that
> the nodetree representation of one of these things *is* a FuncExpr, and
> the new magic thing is just that we invent a new CoercionForm value
> which causes ruleutils.c to print the expression as a subscripting op.
>
> Leave ArrayRef strictly alone, and introduce new infrastructure beside it for
> non-array containers.
Yes, it will eliminate any objections about an overhead to existing array
operations. But I'm concerned that it will "virtually" reduce flexibility of
the solution on the whole, because it means we think only about one data type
as a target for implementation and it's easy to miss something for more complex
(in terms of subscripting logic) data structures.
> Let the node representation for non-array-container access be FuncExpr with
> new value(s) of funcformat.
Probably I misunderstood something, but if we want to keep all necessary
information about subscripting (like index value, slices etc.) together with
OID's of required functions, it will be almost the same as `SubscriptingRef`
isn't it? For me the current implementation looks more natural - there was a
component that was responsible for the subscripting for one data type, and it's
implementation evolved and became more general (but of course that's probably
just because I implemented this and it looks more natural only for me).
On Sat, Jan 28, 2017 at 2:31 AM, Dmitry Dolgov <9erthalion6@gmail.com> wrote: >> On 24 January 2017 at 02:07, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I took an extremely quick look over the patch > > Thank you for the feedback. It took some time for me to think about all > suggestions and notes. Okay, I am marking the patch as returned with feedback seeing those reviews. Please don't hesitate to submit a new version. -- Michael
On 24 January 2017 at 02:36, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It might be possible to get away with having only one pg_type column,
> pointing at the parse-analysis function. That function would generate
> a SubscriptingRef tree node containing the OID of the appropriate
> execution function, which execQual.c could call without ever knowing
> its name explicitly.
> It might be possible to get away with having only one pg_type column,
> pointing at the parse-analysis function. That function would generate
> a SubscriptingRef tree node containing the OID of the appropriate
> execution function, which execQual.c could call without ever knowing
> its name explicitly.
Btw, is it acceptable if such generated SubscriptingRef will contain just a
function pointer to the appropriate execution function instead of an OID (e.g.
like `ExprStateEvalFunc`)? It will help to avoid problems in case of extensions.