Thread: [HACKERS] [PATCH] Generic type subscripting
Hi all
Regarding to the previous conversation [1], here is the patch for generic type
subscripting with several improvements. It contains the following changes:
* Subscripting node was separated into `SubscriptingRef` (only for data
extraction) and `SubscriptingAssignRef` (only for assignment), and common
data for both nodes is stored in `SubscriptingBase`. When it did make sense, I
also separated correlated pieces of code. Common code for both nodes works
with `SubscriptingRef`.
* Type related logic is separated into several functions, and the purpose of
procedures like `jsonb_subscripting`/`array_subscripting` now is just to
generate proper node with a correct function oid (I also tried to use a
function pointer instead of a function oid, and it worked for me in general
case, but I'm not sure if it will be ok for `_outSubscriptingRef` and
`_readSubscriptingRef`).
* Function signatures are fixed.
* Functions for type dependent logic are going to be properly verified (where I
found out such places, e.g. for `check_functions_in_node`)
The only thing left is separated typmod and collation. But since it's necessary
only for future data types, and there is already big enough list of changes
from previous version of this patch, I think it would be great to discuss it now and
implement this feature little bit later.
Generally speaking functionality, which has been implemented in this patch, is
the same. Code itself is little bit clumsy I guess (looks like I need to rename
`SubscriptingRef` to something shorter), but I hope I can refine it soon
enough.
As always, any feedback is welcome.
Attachment
On 2/28/17 13:02, Dmitry Dolgov wrote: > +<programlisting> > +-- Extract value by key > +SELECT ('{"a": 1}'::jsonb)['a']; > + > +-- Extract nested value by key path > +SELECT ('{"a": {"b": {"c": 1}}}'::jsonb)['a']['b']['c']; > + > +-- Extract element by index > +SELECT ('[1, "2", null]'::jsonb)['1']; > + > +-- Update value by key > +UPDATE table_name set jsonb_field['key'] = 1; > + > +-- Select records using where clause with subscripting > +SELECT * from table_name where jsonb_field['key'] = '"value"'; > +</programlisting> I see a possible problem here: This design only allows one subscripting function. But what you'd really want in this case is at least two: one taking an integer type for selecting by array index, and one taking text for selecting by field name. I suppose that since a given value can only be either an array or an object, there is no ambiguity, but I think this might also lose some error checking. It might also not work the same way for other types. It looks like your jsonb subscripting function just returns null if it can't find a field, which is also a bit dubious. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > I see a possible problem here: This design only allows one subscripting > function. But what you'd really want in this case is at least two: one > taking an integer type for selecting by array index, and one taking text > for selecting by field name. No, I think you're missing the point: there is just one function per datatype for *parse analysis* of a subscripting operation applied to that datatype. What it chooses to allow as subscript type, and what function it determines will be used at execution, is up to it. I agree that the given jsonb_subscript is failing to handle the subscript-an-array-with-an-integer case, but that's a datatype-specific shortcoming not a failure of the overall design. I would guess that what we really want for jsonb is the ability to intermix integer and text subscripts, so that jsonbcol['foo'][42]['bar'] would extract the "bar" field of an object in position 42 of an array in field "foo" of the given jsonb value. So you probably end up still having one jsonb execution function, not two, and it would have different code paths depending on whether it sees the type of the next subscript expression to be integer or text. > It looks like your jsonb subscripting function just returns null if it > can't find a field, which is also a bit dubious. Nah, that seems fine. Our precedent for standard array subscripting is to return NULL for out-of-range subscripts, and the jsonb -> operator also returns NULL if there's no such field. It would be rather surprising if jsonb subscripting threw an error instead; and I do not think it would be more useful. regards, tom lane
Dmitry Dolgov <9erthalion6@gmail.com> writes: > [ generic_type_subscription_v7.patch ] I looked through this a bit. I think that the basic design of having a type-specific parse analysis function that returns a constructed SubscriptingRef node is fine. I'm not totally excited about the naming you've chosen though, particularly the function names "array_subscripting()" and "jsonb_subscripting()" --- those are too generic, and a person coming to them for the first time would probably expect that they actually execute subscripting, when they do no such thing. Names like "array_subscript_parse()" might be better. Likewise the name of the new pg_type column doesn't really convey what it does, though I suppose "typsubscriptparse" is too much of a mouthful. "typsubparse" seems short enough but might be confusing too. I wonder also if we should try to provide some helper functions rather than expecting every data type to know all there is to know about parsing and execution of subscripting. Not sure what would be helpful, however. One thing that needs more thought for sure is the nested-assignment case (the logic around isAssignmentIndirectionExpr) --- the design you've got here implies that *every* container-like datatype would need to duplicate that logic, and I don't think we want to go there. The documentation needs a lot of work of course, and I do not think you're doing either yourself or anybody else any favors with the proposed additions to src/tutorial/. You'll never be sure that that stuff even compiles let alone accurately represents what people need to do. Actual running code is much better. It may be that jsonb_subscript is enough of an extension example, but perhaps adding a subscripting feature to some contrib module would be better. Aren't SBS_VALIDATION and SBS_EXEC just hangovers from the previous design? They're still in parse_node.h, and they're still mentioned in the docs, but I don't see them used in actual code anywhere. get_slice_arguments seems to be a hangover as well, which is good because it's mighty ugly and undocumented. It seems rather silly for ExecEvalSubscriptingRef to be palloc'ing some per-subscript arrays each time through when it's got other arrays that are still of fixed size MAXDIM. I can believe that it might be a good idea to increase or remove the MAXDIM limit, but this doesn't do it. In any case, you don't want to add the overhead of a couple of pallocs per execution. Using OidFunctionCall2 is even worse: that's adding a system catalog lookup per execution. You need to be caching the function address as is done for regular function and operator calls. (I take it you've not done performance testing yet.) I'm not really finding this to be an improvement: - errmsg("array subscript in assignment must not be null"))); + errmsg("container subscript in assignment must not be null"))); "Container" is going to seem like jargon to users. Maybe it'd be okay to drop the word altogether and just say "subscript in assignment must not be null". (Another question here is whether every datatype will be on board with the current rules about null subscripts, or whether we need to delegate null-handling to the datatype-specific execution function. I'm not sure; it would complicate the API significantly for what might be useless flexibility.) I'm tempted to propose that it'd be a good idea to separate the regular (varlena) array code paths from the fixed-length-array code paths altogether, which you could do in this implementation by having separate execution functions for them. That would buy back some fraction of whatever overhead we're adding with the additional level of function call. Maybe even separate execution functions for the slice and not-slice cases, though that might be overkill. I'm not on board with these APPLY_OPERATOR_TO_TYPE macros. If you think you have a cute idea for improving the notation in the node support files, great; submit a patch that changes all of the support functions at once. Patches that introduce one or two support functions that look radically different from all the rest are not acceptable. Likewise, what you did in places like JumbleExpr is too cute by half. Just make two separate cases for the two new node types. You're not saving enough code to justify the additional intellectual complexity and maintenance burden of doing it like that. I do not think that the extra SubscriptingBase data structure is paying for itself either; you're taking a hit in readability from the extra level of struct, and as far as I can find it's not buying you one single thing, because there's no code that operates on a SubscriptingBase argument. I'd just drop that idea and make two independent struct types, or else stick with the original ArrayRef design that made one struct serve both purposes. (IOW, my feeling that a separate assign struct would be a readability improvement isn't exactly getting borne out by what you've done here. But maybe there's a better way to do that.) I wouldn't suggest putting a lot of work into the execQual.c part of the patch right now, as execQual.c is going to look completely different if Andres' patch gets in. Better to concentrate on cleaning up the parsenode struct types and thinking about a less messy answer for nested assignment. regards, tom lane
Hi Dmitry, On 3/14/17 7:10 PM, Tom Lane wrote: > Dmitry Dolgov <9erthalion6@gmail.com> writes: >> [ generic_type_subscription_v7.patch ] > > I looked through this a bit. This thread has been idle for over a week. Please respond and/or post a new patch by 2017-03-24 00:00 AoE (UTC-12) or this submission will be marked "Returned with Feedback". Thanks, -- -David david@pgmasters.net
> On 21 March 2017 at 18:16, David Steele <david@pgmasters.net> wrote:
>
> This thread has been idle for over a week.
Yes, sorry for the late reply. I'm still trying to find a better solution for
some of the problems, that arose in this patch.
> On 15 March 2017 at 00:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Dmitry Dolgov <9erthalion6@gmail.com> writes:
>
> I looked through this a bit.
>
Thank you for the feedback.
> I'm not totally excited about the naming you've chosen though,
> particularly the function names "array_subscripting()" and
> "jsonb_subscripting()" --- those are too generic, and a person coming to
> them for the first time would probably expect that they actually execute
> subscripting, when they do no such thing. Names like
> "array_subscript_parse()" might be better. Likewise the name of the
> new pg_type column doesn't really convey what it does, though I suppose
> "typsubscriptparse" is too much of a mouthful. "typsubparse" seems short
> enough but might be confusing too.
It looks quite tempting for me to replace the word "subscript" by an
abbreviation all over the patch. Then it will become something like
"typsbsparse" which is not that mouthful, but I'm not sure if it will be easily
recognizable.
> I wonder also if we should try to provide some helper functions rather
> than expecting every data type to know all there is to know about parsing
> and execution of subscripting. Not sure what would be helpful, however.
I don't really see what details we can hide behind this helper, because almost
all code there is type specific (e.g. to check if subscript indexes are
correct), can you elaborate on that?
> The documentation needs a lot of work of course, and I do not think
> you're doing either yourself or anybody else any favors with the proposed
> additions to src/tutorial/.
Yes, unfortunately, I forget to update documentation from the previous version
of the patch. I'll fix it soon in the next version.
> Aren't SBS_VALIDATION and SBS_EXEC just hangovers from the previous
> design? They're still in parse_node.h, and they're still mentioned in
> the docs, but I don't see them used in actual code anywhere.
Yes, these are from the previous version too, I'll remove them.
> Another question here is whether every datatype will be on board
> with the current rules about null subscripts, or whether we need to
> delegate null-handling to the datatype-specific execution function.
> I'm not sure; it would complicate the API significantly for what might be
> useless flexibility.
It looks for me that it's a great idea to perform null-handling inside datatype
specific code and I'm not sure, what would be complicated? All necessary
information for that is already in `SubscriptingExecData` (I just have to use
`eisnull` in a different way).
> I do not think that the extra SubscriptingBase data structure is paying
> for itself either; you're taking a hit in readability from the extra level
> of struct, and as far as I can find it's not buying you one single thing,
> because there's no code that operates on a SubscriptingBase argument.
> I'd just drop that idea and make two independent struct types, or else
> stick with the original ArrayRef design that made one struct serve both
> purposes. (IOW, my feeling that a separate assign struct would be a
> readability improvement isn't exactly getting borne out by what you've
> done here. But maybe there's a better way to do that.)
I'm thinking to replace these structures by more meaningful ones, something like:
```
```
```
case T_SubscriptExtractRef:
{
// extract specific logic
sbsref->containerExpr = ExecInitExpr((Expr *) sbsref->containerExpr, parent);
}
case T_SubscriptAssignRef:
{
// assign specific logic
sbsref->containerExpr = ExecInitExpr((Expr *) sbsref->containerExpr, parent);
}
case T_SubscriptContainerRef:
{
// subscript container logic
}
```
>
> This thread has been idle for over a week.
Yes, sorry for the late reply. I'm still trying to find a better solution for
some of the problems, that arose in this patch.
> On 15 March 2017 at 00:10, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Dmitry Dolgov <9erthalion6@gmail.com> writes:
>
> I looked through this a bit.
>
Thank you for the feedback.
> > On 10 March 2017 at 06:20, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
> > I see a possible problem here: This design only allows one subscripting
> > function. But what you'd really want in this case is at least two: one
> > taking an integer type for selecting by array index, and one taking text
> > for selecting by field name.
>
> I would guess that what we really want for jsonb is the ability to
> intermix integer and text subscripts, so that
> jsonbcol['foo'][42]['bar']
> would extract the "bar" field of an object in position 42 of an
> array in field "foo" of the given jsonb value.
>
Maybe I misunderstood you, but isn't it already possible?
```
=# select ('{"a": [{"b": 1}, {"c": 2}]}'::jsonb)['a'][0]['b'];
jsonb
-------
1
(1 row)
=# select * from test;
data
-----------------------------
{"a": [{"b": 1}, {"c": 2}]}
(1 row)
=# update test set data['a'][0]['b'] = 42;
UPDATE 1
=# select * from test;
data
-----------------------------
{"a": [{"b": 42}, {"c": 2}]}
(1 row)
```
> I'm not totally excited about the naming you've chosen though,
> particularly the function names "array_subscripting()" and
> "jsonb_subscripting()" --- those are too generic, and a person coming to
> them for the first time would probably expect that they actually execute
> subscripting, when they do no such thing. Names like
> "array_subscript_parse()" might be better. Likewise the name of the
> new pg_type column doesn't really convey what it does, though I suppose
> "typsubscriptparse" is too much of a mouthful. "typsubparse" seems short
> enough but might be confusing too.
It looks quite tempting for me to replace the word "subscript" by an
abbreviation all over the patch. Then it will become something like
"typsbsparse" which is not that mouthful, but I'm not sure if it will be easily
recognizable.
> I wonder also if we should try to provide some helper functions rather
> than expecting every data type to know all there is to know about parsing
> and execution of subscripting. Not sure what would be helpful, however.
I don't really see what details we can hide behind this helper, because almost
all code there is type specific (e.g. to check if subscript indexes are
correct), can you elaborate on that?
> The documentation needs a lot of work of course, and I do not think
> you're doing either yourself or anybody else any favors with the proposed
> additions to src/tutorial/.
Yes, unfortunately, I forget to update documentation from the previous version
of the patch. I'll fix it soon in the next version.
> Aren't SBS_VALIDATION and SBS_EXEC just hangovers from the previous
> design? They're still in parse_node.h, and they're still mentioned in
> the docs, but I don't see them used in actual code anywhere.
Yes, these are from the previous version too, I'll remove them.
> Another question here is whether every datatype will be on board
> with the current rules about null subscripts, or whether we need to
> delegate null-handling to the datatype-specific execution function.
> I'm not sure; it would complicate the API significantly for what might be
> useless flexibility.
It looks for me that it's a great idea to perform null-handling inside datatype
specific code and I'm not sure, what would be complicated? All necessary
information for that is already in `SubscriptingExecData` (I just have to use
`eisnull` in a different way).
> I do not think that the extra SubscriptingBase data structure is paying
> for itself either; you're taking a hit in readability from the extra level
> of struct, and as far as I can find it's not buying you one single thing,
> because there's no code that operates on a SubscriptingBase argument.
> I'd just drop that idea and make two independent struct types, or else
> stick with the original ArrayRef design that made one struct serve both
> purposes. (IOW, my feeling that a separate assign struct would be a
> readability improvement isn't exactly getting borne out by what you've
> done here. But maybe there's a better way to do that.)
I'm thinking to replace these structures by more meaningful ones, something like:
```
typedef struct SubscriptContainerRef
{
Expr xpr;
Oid refcontainertype;
Oid refelemtype;
int32 reftypmod;
Oid refcollid;
} SubscriptBase;
typedef struct SubscriptExtractRef
{
Expr xpr;
SubscriptContainerRef *containerExpr;
List *refupperindexpr;
List *reflowerindexpr;
Oid refevalfunc;
} SubscriptExtractRef;
typedef struct SubscriptAssignRef
{
Expr xpr;
SubscriptContainerRef *containerExpr;
Expr *refassgnexpr;
List *refupperindexpr;
List *reflowerindexpr;
Oid refevalfunc;
} SubscriptAssignRef;
It's close to the previous version, but here will be a bit less duplicated code
in comparison with two independent structures, we still can use different nodes
for data assignment and data extraction, and node processing for these nodes
can be little bit more separated, e.g.:
```
case T_SubscriptExtractRef:
{
// extract specific logic
sbsref->containerExpr = ExecInitExpr((Expr *) sbsref->containerExpr, parent);
}
case T_SubscriptAssignRef:
{
// assign specific logic
sbsref->containerExpr = ExecInitExpr((Expr *) sbsref->containerExpr, parent);
}
case T_SubscriptContainerRef:
{
// subscript container logic
}
```
Hi Dmitry, On 3/21/17 4:42 PM, Dmitry Dolgov wrote: >> On 21 March 2017 at 18:16, David Steele <david@pgmasters.net > <mailto:david@pgmasters.net>> wrote: >> >> This thread has been idle for over a week. > > Yes, sorry for the late reply. I'm still trying to find a better > solution for > some of the problems, that arose in this patch. > >> On 15 March 2017 at 00:10, Tom Lane <tgl@sss.pgh.pa.us > <mailto:tgl@sss.pgh.pa.us>> wrote: >> Dmitry Dolgov <9erthalion6@gmail.com <mailto:9erthalion6@gmail.com>> > writes: >> >> I looked through this a bit. Do you have an idea when you will have a patch ready? We are now into the last week of the commitfest. I see one question for Tom, but it's not clear that this would prevent you from producing a new patch. Please post a new patch by 2017-03-28 00:00 AoE (UTC-12) or this submission will be marked "Returned with Feedback". Thanks, -- -David david@pgmasters.net
On 24 March 2017 at 15:39, David Steele <david@pgmasters.net> wrote:
>
> Do you have an idea when you will have a patch ready?
Yes, I'll prepare a new version with most important changes in two days.
>
> Do you have an idea when you will have a patch ready?
Yes, I'll prepare a new version with most important changes in two days.
David Steele <david@pgmasters.net> writes: > Do you have an idea when you will have a patch ready? We are now into > the last week of the commitfest. I see one question for Tom, but it's > not clear that this would prevent you from producing a new patch. FWIW, I'm up to my eyeballs in Andres' faster-expressions patch, and won't have time to think about this one for at least a couple more days. regards, tom lane
On 24.03.2017 18:29, Tom Lane wrote: > David Steele <david@pgmasters.net> writes: >> Do you have an idea when you will have a patch ready? We are now into >> the last week of the commitfest. I see one question for Tom, but it's >> not clear that this would prevent you from producing a new patch. > > FWIW, I'm up to my eyeballs in Andres' faster-expressions patch, and > won't have time to think about this one for at least a couple more > days. > > regards, tom lane > > I can try to review a new version of the patch, when Dmitry will send it. If no one objects. Besides, I've seen the previous versions of the patch from previous commitfest. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Here is a new version of this patch. What was changed:
* I rebased code against the latest version of master and adapted recent
changes about the expression execution
* Several names (functions and related pg_type column) were changed
* A new oid function field was introduced to handle nested assignment situation
* I updated the documentation for patch
* `MAXDIM` was replaced by `MAX_SUBSCRIPT_DEPTH`
* I returned one `SubscriptingRef` for both fetch & assign operations, since
there is no real readability improvements at this point (they're already
separated at the time of evaluation, and besides the evaluation code fetch &
assign are handled almost identically).
Attachment
Hello, On 27.03.2017 23:28, Dmitry Dolgov wrote: > Here is a new version of this patch. What was changed: > > * I rebased code against the latest version of master and adapted recent > changes about the expression execution > > * Several names (functions and related pg_type column) were changed > > * A new oid function field was introduced to handle nested assignment > situation > > * I updated the documentation for patch > > * `MAXDIM` was replaced by `MAX_SUBSCRIPT_DEPTH` > > * I returned one `SubscriptingRef` for both fetch & assign operations, since > there is no real readability improvements at this point (they're already > separated at the time of evaluation, and besides the evaluation code > fetch & > assign are handled almost identically). Your patch reverts commits from 25-26 march. And therefore contains 15000 lines. I think the patch needs rebasing. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On 28 March 2017 at 11:58, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
>
> Your patch reverts commits from 25-26 march. And therefore contains 15000 lines.
Wow, I didn't notice that, sorry - will fix it shortly.
>
> Your patch reverts commits from 25-26 march. And therefore contains 15000 lines.
Wow, I didn't notice that, sorry - will fix it shortly.
On 28 March 2017 at 12:08, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
> Wow, I didn't notice that, sorry - will fix it shortly.
So, here is the corrected version of the patch.
>
> Wow, I didn't notice that, sorry - will fix it shortly.
So, here is the corrected version of the patch.
Attachment
On 28.03.2017 19:31, Dmitry Dolgov wrote: > On 28 March 2017 at 12:08, Dmitry Dolgov <9erthalion6@gmail.com > <mailto:9erthalion6@gmail.com>> wrote: >> >> Wow, I didn't notice that, sorry - will fix it shortly. > > So, here is the corrected version of the patch. I have some picky comments. I'm not sure that "typsbsparse" is better than "typsubscripting" or "typsubparse". Maybe "typsubsparse"? > <row> > + <entry><structfield>typsubscripting</structfield></entry> > + <entry><type>regproc</type></entry> Here you didn't fix "typsubscripting" to new name. > + <title>JSON subscripting</title> > + <para> > + JSONB data type support array-style subscripting expressions to extract or update particular element. An example ofsubscripting syntax: Should be "JSONB data type supports". > + to handle subscripting expressions. It should contains logic for verification > + and decide which function must be used for evaluation of this expression. > + For instance: Should be "It should contain". > + <sect2 id="json-subscripting"> > + <title>JSON subscripting</title> > + <para> > + JSONB data type support array-style subscripting expressions to extract or update particular element. An example ofsubscripting syntax: You have implemented jsonb subscripting. The documentation should be fixed to: + <sect2 id="jsonb-subscripting"> + <title><type>jsonb</> Subscripting</title> + <para> + <type>jsonb</> data type support array-style subscripting expressions to extract or update particular element. An example of subscripting syntax: I think IsOneOf() macros should be removed. Since it is not used anywhere. > + Assert(subexpr != NULL); > + > + if (subexpr == NULL) > + ereport(ERROR, > + (errcode(ERRCODE_DATATYPE_MISMATCH), > + errmsg("jsonb subscript does not support slices"), > + parser_errposition(pstate, exprLocation( > + ((Node *) lfirst(sbsref->refupperindexpr->head)))))); Here one of the conditions is excess. Better to delete assert condition I think. I've tried tests from message [1]. They looks good. Performance looks similar for subscripting without patch and with patch. I wanted to implement subscripting for ltree or hstore extensions. Subscripting for ltree looks more interesting. Especially with slicing. But I haven't done it yet. I hope that I will implement it tomorrow. 1. https://www.postgresql.org/message-id/CAKNkYnz_WWkzzxyFx934N%3DEp47CAFju-Rk-sGeZo0ui8QdrGmw%40mail.gmail.com -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On 29 March 2017 at 19:14, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
>
> I'm not sure that "typsbsparse" is better than "typsubscripting" or "typsubparse". Maybe "typsubsparse"?
`typsubparse` is more confusing as for me, but I like `typsubsparse`.
> I've tried tests from message [1]. They looks good. Performance looks similar for subscripting without patch and with patch.
Great, thank you for confirmation.
I've attached a new version of the patch with related improvements.
>
> I'm not sure that "typsbsparse" is better than "typsubscripting" or "typsubparse". Maybe "typsubsparse"?
`typsubparse` is more confusing as for me, but I like `typsubsparse`.
> I've tried tests from message [1]. They looks good. Performance looks similar for subscripting without patch and with patch.
Great, thank you for confirmation.
I've attached a new version of the patch with related improvements.
Attachment
On 29.03.2017 20:14, Arthur Zakirov wrote: > > I wanted to implement subscripting for ltree or hstore extensions. > Subscripting for ltree looks more interesting. Especially with slicing. > But I haven't done it yet. I hope that I will implement it tomorrow. > ltree ----- I've implemented fetching ltree elements using subscripting. But haven't implemented assigning ltree elements yet. I'll send second patch after implementing assigning. Now you can execute the following query: SELECT ('Top.Science.Astronomy.Astrophysics'::ltree)[1:2]; ltree ------------- Top.Science Comments -------- But I've noticed about some points. In array_subscript_parse() passed uninitialized values of "typesource" and "typeneeded" variables for coerce_to_target_type(). > + typesource = exprType(assignExpr); > + typesource = is_slice ? sbsref->refcontainertype : sbsref->refelemtype; Here is the bug. Second variable should be "typeneeded". Moreover these assignments should be moved up to first coerce_to_target_type() execution. > + foreach(l, sbsref->reflowerindexpr) > + { > + List *expr_ai = (List *) lfirst(l); > + A_Indices *ai = (A_Indices *) lfirst(list_tail(expr_ai)); > + > + subexpr = (Node *) lfirst(list_head(expr_ai)); This code looks like a magic. This happens because of appending A_Indeces to lowerIndexpr: > - subexpr = NULL; > + lowerIndexpr = lappend(lowerIndexpr, list_make2(subexpr, ai)); > } And this A_Indeces used only when slicing is not used to make a constant 1. Maybe there are another way? Also it would be better if "refevalfunc" and "refnestedfunc" had pointers to functions not Oid type. Now you need to create "..._fetch" and "..._assign" functions in catalog and in "..._parse" function you need get their Oid using to_regproc() function. Can we use IndexAmRoutine structure method, when you use only pointers to necessary functions? You can see an example in blhandler() function in blutils.c. The last point is about the tutorial. As Tom pointed it is not useful when the tutorial doesn't execute. It happens because there is not "custom" type in subscripting.sql. Also it contradicts the README of tutorials. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On 30 March 2017 at 19:36, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
>
> The last point is about the tutorial. As Tom pointed it is not useful when the tutorial doesn't execute. It happens because there is not "custom" type in subscripting.sql.
I'm confused. Maybe I'm missing something, but there is "custom" type in this file:
```
-- subscripting.sql
```
So the only problem I see is notification about "type 'custom' is not yet defined", but it's the same for "complex" tutorial
```
Can you clarify this point?
>
> The last point is about the tutorial. As Tom pointed it is not useful when the tutorial doesn't execute. It happens because there is not "custom" type in subscripting.sql.
I'm confused. Maybe I'm missing something, but there is "custom" type in this file:
```
-- subscripting.sql
CREATE TYPE custom (
internallength = 8,
input = custom_in,
output = custom_out,
subscripting = custom_subscript_parse
);
``````
> \i subscripting.sql
psql:subscripting.sql:39: NOTICE: 42704: type "custom" is not yet defined
DETAIL: Creating a shell type definition.
LOCATION: compute_return_type, functioncmds.c:141
CREATE FUNCTION
Time: 4.257 ms
psql:subscripting.sql:47: NOTICE: 42809: argument type custom is only a shell
LOCATION: interpret_function_parameter_list, functioncmds.c:245
CREATE FUNCTION
Time: 37.038 ms
CREATE FUNCTION
Time: 13.891 ms
CREATE FUNCTION
Time: 0.946 ms
CREATE FUNCTION
Time: 1.161 ms
CREATE TYPE
Time: 1.336 ms
CREATE TABLE
Time: 2.129 ms
INSERT 0 1
Time: 2.501 ms
data
------
2
(1 row)
Time: 0.960 ms
UPDATE 1
Time: 0.887 ms
```So the only problem I see is notification about "type 'custom' is not yet defined", but it's the same for "complex" tutorial
```
> \i complex.sql
psql:complex.sql:39: NOTICE: 42704: type "complex" is not yet defined
DETAIL: Creating a shell type definition.
LOCATION: compute_return_type, functioncmds.c:141
CREATE FUNCTION
Time: 1.741 ms
psql:complex.sql:47: NOTICE: 42809: argument type complex is only a shell
LOCATION: interpret_function_parameter_list, functioncmds.c:245
CREATE FUNCTION
Time: 0.977 ms
psql:complex.sql:55: NOTICE: 42809: return type complex is only a shell
LOCATION: compute_return_type, functioncmds.c:105
CREATE FUNCTION
Time: 0.975 ms
psql:complex.sql:63: NOTICE: 42809: argument type complex is only a shell
LOCATION: interpret_function_parameter_list, functioncmds.c:245
CREATE FUNCTION
Time: 0.893 ms
CREATE TYPE
Time: 0.992 ms
...
```Can you clarify this point?
2017-03-31 5:32 GMT+03:00 Dmitry Dolgov <9erthalion6@gmail.com>: > On 30 March 2017 at 19:36, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote: >> >> The last point is about the tutorial. As Tom pointed it is not useful when >> the tutorial doesn't execute. It happens because there is not "custom" type >> in subscripting.sql. > > I'm confused. Maybe I'm missing something, but there is "custom" type in > this file: Sorry for confusing. I should have been more careful. I've mixed up NOTICE message with error message and I haven't noticed CREATE TYPE command. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
2017-03-28 19:31 GMT+03:00 Dmitry Dolgov <9erthalion6@gmail.com>: > On 28 March 2017 at 12:08, Dmitry Dolgov <9erthalion6@gmail.com> wrote: >> >> Wow, I didn't notice that, sorry - will fix it shortly. > > So, here is the corrected version of the patch. Thank you! The patch looks good to me. But it is not applied :) I think it is because of new FTS functions for jsonb. The patch need rebasing. But, from my point of view, it would be nice if the code mentioned earlier was improved: > + foreach(l, sbsref->reflowerindexpr) > + { > + List *expr_ai = (List *) lfirst(l); > + A_Indices *ai = (A_Indices *) lfirst(list_tail(expr_ai)); > + > + subexpr = (Node *) lfirst(list_head(expr_ai)); It is necessary for arrays of course because of logic mentioned in the documentation. > If any dimension is written as a slice, i.e., contains a colon, then all dimensions are treated as slices. Any dimensionthat has only a single number (no colon) is treated as being from 1 to the number specified. But it would be better if SubscriptingRef structure had a new field of List type. This field can store list of booleans, which means is there slices or not. I think it can improve readability of the code. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On 1 April 2017 at 21:26, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
>
> The patch looks good to me. But it is not applied :) I think it is
> because of new FTS functions for jsonb. The patch need rebasing.
Sorry for late reply. Here is a new version of the patch, I rebased it and
fixed those issues you've mentioned (pretty nasty problems, thank you for
noticing).
Attachment
On 04.04.2017 15:41, Dmitry Dolgov wrote: > Sorry for late reply. Here is a new version of the patch, I rebased it and > fixed those issues you've mentioned (pretty nasty problems, thank you for > noticing). Thank you! I've looked at the patch again. I'd like to focus on "refevalfunc" and "refnestedfunc" fields as I did earlier. I think using Oid type for them is a bad approach. "..._fetch" and "..._assign" functions in catalog is unnecessary movement to me. User of subscript of his type may think the same. But he won't see the code and won't know why he needs these functions. And so "..._fetch" and "..._assign" functions in catalog is a bad design to me. But, of course, it is just my opinion. This approach is the main think which we should resolve first, because after commiting the patch it will be hard to fix it. > static int ArrayCount(const char *str, int *dim, char typdelim); > +bool isAssignmentIndirectionExpr(ExprState *exprstate); > static void ReadArrayStr(char *arrayStr, const char *origStr, I think isAssignmentIndirectionExpr() here was forgoten to delete, because isAssignmentIndirectionExpr() is in execExpr.c now. > + if (subexpr == NULL) > + { > + lowerIndexpr = lappend(lowerIndexpr, subexpr); > + continue; > + } > + > + There is the extra line here after the brace. > if (array_type != sbsref->refcontainertype) > { > > node = coerce_to_target_type(pstate, > node, array_type, > sbsref->refcontainertype, sbsref->reftypmod, > COERCION_ASSIGNMENT, > COERCE_IMPLICIT_CAST, > -1); > > /* can fail if we had int2vector/oidvector, but not for true domains */ > if (node == NULL && node->type != 0) > ereport(ERROR, > (errcode(ERRCODE_CANNOT_COERCE), > errmsg("cannot cast type %s to %s", > format_type_be(array_type), > format_type_be(sbsref->refcontainertype)), > parser_errposition(pstate, 0))); > > PG_RETURN_POINTER(node); > } Also I was wondering do we need this code in array_subscript_parse()? I haven't understood the purpose of it. If it is necessary then would be good to add explain comment. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On 05.04.2017 16:06, Arthur Zakirov wrote: > On 04.04.2017 15:41, Dmitry Dolgov wrote: >> Sorry for late reply. Here is a new version of the patch, I rebased it >> and >> fixed those issues you've mentioned (pretty nasty problems, thank you for >> noticing). > > Thank you! > > I've looked at the patch again. > Sorry maybe it's too naive. Also I was wondering. > + element_type_id = transformArrayType(&array_type, &array_typ_mode); > + sbsref->refelemtype = element_type_id; I don't understand this part of the patch. Why is it necessary to execute transformArrayType() second time? It was executed in transformContainerSubscripts(). > + if (!OidIsValid(elementType)) > + elementType = containerType; This part looks strange to me too. It this parts are necessary it would be good to add comments. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On 05.04.2017 16:06, Arthur Zakirov wrote: > > I'd like to focus on "refevalfunc" and "refnestedfunc" fields as I did > earlier. I think using Oid type for them is a bad approach. "..._fetch" > and "..._assign" functions in catalog is unnecessary movement to me. > User of subscript of his type may think the same. But he won't see the > code and won't know why he needs these functions. > > And so "..._fetch" and "..._assign" functions in catalog is a bad design > to me. But, of course, it is just my opinion. This approach is the main > think which we should resolve first, because after commiting the patch > it will be hard to fix it. > I've read olders messages and thread. I see now that this approach was made with other hackers. I've just been confused when I've been implementing subscript for ltree. Sorry if I confused you. Any opinions about the patch? -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
> On 1 April 2017 at 21:26, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
>
> Also I was wondering do we need this code in array_subscript_parse()? I
> haven't understood the purpose of it. If it is necessary then would be
> good to add explain comment.
Well, it's necessary because `array_type` can be modified by
`transformArrayType` and we have to perform coercion again. I'm not sure if
more explanation for that is required, can you suggest something to add here?
> I don't understand this part of the patch. Why is it necessary to
> execute transformArrayType() second time? It was executed in
> transformContainerSubscripts().
Yes, that's my mistake, I removed one from `parse_node.c`.
Here is a new slightly improved version of the patch.
Attachment
On 28 February 2017 at 19:02, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
> Regarding to the previous conversation [1], here is the patch for generic type
> subscripting with several improvements. It contains the following changes:
>
> Regarding to the previous conversation [1], here is the patch for generic type
> subscripting with several improvements. It contains the following changes:
So, a few words about current state of the patch:
* after a lot of serious improvements general design of this feature is agreeable
* we introduced a lot of small changes to polish it
* I rebased the patch on the latest version of master, so you can take a look at it again
As always, any feedback is welcome.
Attachment
On Wednesday, 10 May 2017 23:43:10 MSK, Dmitry Dolgov wrote: > So, a few words about current state of the patch: > > * after a lot of serious improvements general design of this feature is > agreeable > > * we introduced a lot of small changes to polish it > > * I rebased the patch on the latest version of master, so you can take a > look at it again > > As always, any feedback is welcome. Hello, Can you rebase the patch please? It is not applyed now. I think it is because of pgindent. > + > + scratch->d.sbsref.eval_finfo = eval_finfo; > + scratch->d.sbsref.nested_finfo = nested_finfo; > + Also I have noticed that assigning eval_finfo and nested_finfo after every time eval step is pushed is unnecessary in ExecInitSubscriptingRef() function. We need them only for EEOP_SBSREF_OLD, EEOP_SBSREF_ASSIGN and EEOP_SBSREF_FETCH steps. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
> On 30 June 2017 at 11:34, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
>
> Can you rebase the patch please? It is not applyed now. I think it is because
> of pgindent.
Sure, I've attached the rebased version of the patch.
>
> > +
> > + scratch->d.sbsref.eval_finfo = eval_finfo;
> > + scratch->d.sbsref.nested_finfo = nested_finfo;
> > +
>
> Also I have noticed that assigning eval_finfo and nested_finfo after every time
> eval step is pushed is unnecessary in ExecInitSubscriptingRef() function. We
> need them only for EEOP_SBSREF_OLD, EEOP_SBSREF_ASSIGN and EEOP_SBSREF_FETCH
> steps.
I'm not sure, because an absence of any of those `eval_finfo`/`nested_finfo`
blocks in `ExecInitSubscriptingRef` breaks few tests.
Attachment
Here is a new rebased version of this patch (there were some conflicts in commentaries).
Attachment
> On 12 August 2017 at 13:37, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
> Here is a new rebased version of this patch (there were some conflicts in commentaries).
To make a review little bit easier I've divided the patch into a few smaller parts.
>
> Here is a new rebased version of this patch (there were some conflicts in commentaries).
To make a review little bit easier I've divided the patch into a few smaller parts.
Attachment
On 29 August 2017 at 22:42, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
> To make a review little bit easier I've divided the patch into a few smaller parts.
>
> To make a review little bit easier I've divided the patch into a few smaller parts.
Apparently I forgot about subscripting for the name data type, so here is a small update of the patch.
Attachment
On Thu, Sep 07, 2017 at 10:49:54PM +0200, Dmitry Dolgov wrote: > On 29 August 2017 at 22:42, Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > > To make a review little bit easier I've divided the patch into a few > smaller parts. > > Apparently I forgot about subscripting for the name data type, so here is a > small update of the patch. Thank you for rebasing the patch! PostgreSQL and documentation with the patch compiles without any errors. All regression tests passed. But honestly I still cannot say that I agree with *_extract() and *_assign() functions creation way. For example, there isno entry in pg_depend for them (related with pg_type entry). Because there is no such entry, there is the following bug: 1 - make and install src/tutorial 2 - run src/tutorial/subscripting.sql 3 - run: =# drop function custom_subscripting_extract(internal); 4 - and we get the error: =# select data[0] from test_subscripting; ERROR: function 0x55deb7911bfd returned NULL But of course it is only my opinion and I could be wrong. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
> On 9 September 2017 at 23:33, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
> PostgreSQL and documentation with the patch compiles without any errors. All
> regression tests passed.
Thank you!
> But honestly I still cannot say that I agree with *_extract() and *_assign()
> functions creation way. For example, there is no entry in pg_depend for them
> ...
> =# drop function custom_subscripting_extract(internal);
> =# select data[0] from test_subscripting;
> ERROR: function 0x55deb7911bfd returned NULL
Hm...I never thought about the feature in this way. When I was experimenting I
also tried another approach for this - save to `refevalfunc` a function
pointer to an appropriate function. For simple situations it was ok, but there
were questions about how it would work with node-related functions from
`outfuncs`/`copyfuncs` etc. Another my idea was to find out an actual
`refevalfunc` not at the time of a node creation but later on - this was also
questionable since in this case we need to carry a lot of information with a node
just for this sole purpose. Maybe you can suggest something else?
About dependencies between functions - as far as I understand one cannot create
a `pg_depend` entry or any other kind of dependencies between custom
user-defined functions. So yes, looks like with the current approach the only
solution would be to check in the `_parse` function that `_extract` and
`_assign` functions are existed (which is inconvenient).
> For example, there is no entry in pg_depend
Are there any other disadvantages of this approach?
Dmitry Dolgov <9erthalion6@gmail.com> writes: > About dependencies between functions - as far as I understand one cannot > create a `pg_depend` entry or any other kind of dependencies between > custom user-defined functions. Uh, what? Sure you can. Just because the existing code never has a reason to create such a dependency doesn't mean it wouldn't work. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
> On 11 September 2017 at 23:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Uh, what? Sure you can. Just because the existing code never has a
> reason to create such a dependency doesn't mean it wouldn't work.
Well, I thought that `pg_depend` was not intended to be used from user-defined
code and it's something "internal". But if I'm wrong then maybe the problem
Arhur raised is a valid reason for that.
Dmitry Dolgov <9erthalion6@gmail.com> writes: >> On 11 September 2017 at 23:19, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Uh, what? Sure you can. Just because the existing code never has a >> reason to create such a dependency doesn't mean it wouldn't work. > Well, I thought that `pg_depend` was not intended to be used from > user-defined code and it's something "internal". Well, no, we're not expecting that SQL code will manually insert rows there. This feature should have some sort of SQL command that will set up the relevant catalog entries, including the dependencies. If you don't want to do that, you're going to need the runtime tests. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
> On 11 September 2017 at 23:45, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Dmitry Dolgov <9erthalion6@gmail.com> writes:
> >> On 11 September 2017 at 23:19, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Uh, what? Sure you can. Just because the existing code never has a
> >> reason to create such a dependency doesn't mean it wouldn't work.
>
> > Well, I thought that `pg_depend` was not intended to be used from
> > user-defined code and it's something "internal".
>
> Well, no, we're not expecting that SQL code will manually insert rows
> there. This feature should have some sort of SQL command that will
> set up the relevant catalog entries, including the dependencies.
> If you don't want to do that, you're going to need the runtime tests.
Sure, an SQL command for that purpose is much better than a runtime check.
I'm going to add such command to the patch, thank you for the information!
> On 11 September 2017 at 23:55, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
> Sure, an SQL command for that purpose is much better than a runtime check.
> I'm going to add such command to the patch, thank you for the information!
So, I've implemented a patch for that in form of a `DEPENDS ON` syntax for creating a function.
Basically it looks like this (and initially I was looking for something like that in the documentation,
>
> Sure, an SQL command for that purpose is much better than a runtime check.
> I'm going to add such command to the patch, thank you for the information!
So, I've implemented a patch for that in form of a `DEPENDS ON` syntax for creating a function.
Basically it looks like this (and initially I was looking for something like that in the documentation,
you can find a complete example in the test `create_function_3.sql`):
```
CREATE FUNCTION custom_subscripting_extract(internal)
RETURNS internal;
CREATE FUNCTION custom_subscripting_assign(internal)
RETURNS internal;
CREATE FUNCTION custom_subscript_parse(internal)
RETURNS internal
DEPENDS ON custom_subscripting_extract, custom_subscripting_assign;
```
I hope it sounds reasonable and can help to address a problem with dependencies between functions.
Attachment
On Fri, Sep 15, 2017 at 10:02:00PM +0200, Dmitry Dolgov wrote: > > So, I've implemented a patch for that in form of a `DEPENDS ON` syntax for > creating a function. In my opinion, 'DEPENDS ON' syntax is not actually appropriate here. It also looks like a not very good hack to me. Moreover user can implement subscripting to its own type without using 'DEPENDS ON' syntax. And he will face the bug mentioned above too. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
> On 17 September 2017 at 00:04, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
>
> In my opinion, 'DEPENDS ON' syntax is not actually appropriate here. It
> also looks like a not very good hack to me.
Hm...why do you think about it as a hack?
> Moreover user can implement subscripting to its own type without using
> 'DEPENDS ON' syntax. And he will face the bug mentioned above too.
Yes, but since it will require from a user to create few independent custom
functions for subscripting (as we discussed before, there were few reasons of
having them as a proper separate function), I don't see how to avoid this step
of explicitly marking all of them as related to a subscripting logic for
particular data type. And therefore it's possible to forget to do that step in
spite of what form this step will be. Maybe it's possible to make something
like `CREATE FUNCTION ... FOR SUBSCRIPTING`, then verify that assign/extract
functions are presented and notify user if he missed them (but I would rather
not do this unless it's really necessary, since it looks like an overkill).
But I'm open to any suggestions, do you have something in mind?
On Sun, Sep 17, 2017 at 12:27:58AM +0200, Dmitry Dolgov wrote: > spite of what form this step will be. Maybe it's possible to make something > like `CREATE FUNCTION ... FOR SUBSCRIPTING`, then verify that assign/extract > functions are presented and notify user if he missed them (but I would > rather > not do this unless it's really necessary, since it looks like an overkill). > > But I'm open to any suggestions, do you have something in mind? I have put some thought into it. What about the following syntax? CREATE SUBSCRIPTING FOR type_name INITFUNC = subscripting_init_func FETCHFUNC = subscripting_fetch_func ASSIGNFUNC = subscripting_assign_func DROP SUBSCRIPTING FOR type_name But I am not if the community will like such syntax. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Arthur Zakirov <a.zakirov@postgrespro.ru> writes: > CREATE SUBSCRIPTING FOR type_name > INITFUNC = subscripting_init_func > FETCHFUNC = subscripting_fetch_func > ASSIGNFUNC = subscripting_assign_func > DROP SUBSCRIPTING FOR type_name Reasonable, but let's make the syntax more like other similar utility commands such as CREATE AGGREGATE --- basically just adding some parens, IIRC. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
> On 17 September 2017 at 23:34, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
>
> I have put some thought into it. What about the following syntax?
>
> CREATE SUBSCRIPTING FOR type_name
> INITFUNC = subscripting_init_func
> FETCHFUNC = subscripting_fetch_func
> ASSIGNFUNC = subscripting_assign_func
> DROP SUBSCRIPTING FOR type_name
Just to clarify, do you mean that `CREATE SUBSCRIPTING FOR` would only make a
dependency record? In this case `DROP SUBSCRIPTING FOR` actually means just
drop an init function.
On Mon, Sep 18, 2017 at 10:31:54AM +0200, Dmitry Dolgov wrote: > Just to clarify, do you mean that `CREATE SUBSCRIPTING FOR` would only make > a > dependency record? In this case `DROP SUBSCRIPTING FOR` actually means just > drop an init function. I think it would be good to add new catalog table. It may be named as pg_type_sbs or pg_subscripting (second is better Ithink). This table may have the fields: - oid - sbstype - sbsinit - sbsfetch - sbsassign And command 'CREATE SUBSCRIPTING' should add an entry to the pg_subscripting table. It also adds entries to the pg_dependtable: dependency between pg_type and pg_subscripting, dependency between pg_type and pg_proc. 'DROP SUBSCRIPTING' should drop this entries, it should not drop init function. According to the Tom's comment the syntax can be modified in the following way: CREATE SUBSCRIPTING FOR type_namei ( INITFUNC = subscripting_init_func FETCHFUNC = subscripting_fetch_func ASSIGNFUNC = subscripting_assign_func ) DROP SUBSCRIPTING FOR type_name -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
> On 18 September 2017 at 11:39, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
> I think it would be good to add new catalog table. It may be named as pg_type_sbs or pg_subscripting (second is better I think).
> This table may have the fields:
> - oid
> - sbstype
> - sbsinit
> - sbsfetch
> - sbsassign
What is `sbstype`?
> And command 'CREATE SUBSCRIPTING' should add an entry to the pg_subscripting table. It also adds entries to the pg_depend table: dependency between pg_type and pg_subscripting, dependency between pg_type and pg_proc.
> 'DROP SUBSCRIPTING' should drop this entries, it should not drop init function.
Why we should keep those subscripting functions? From my understanding they're
totally internal and useless without subscripting context.
Overall I have only one concern about this suggestion - basically it changes
nothing from the perspective of functionality or implementation quality. The
only purpose of it is to make a `subscripting` entity more explicit at the
expense of overhead of having one more catalog table and little bit more
complexity. I'm not really sure if it's necessary or not, and I would
appreciate any commentaries about that topic from the community (to make me
more convinced that this is a good decision or not).
On Mon, Sep 18, 2017 at 12:25:04PM +0200, Dmitry Dolgov wrote: > > I think it would be good to add new catalog table. It may be named as > pg_type_sbs or pg_subscripting (second is better I think). > > This table may have the fields: > > - oid > > - sbstype > > - sbsinit > > - sbsfetch > > - sbsassign > > What is `sbstype`? 'sbstype' is oid of type from pg_type for which subscripting is created. I.e. pg_type may not have the 'typsubsparse' field. > > And command 'CREATE SUBSCRIPTING' should add an entry to the > pg_subscripting table. It also adds entries to the pg_depend table: > dependency between pg_type and pg_subscripting, dependency between pg_type > and pg_proc. > > 'DROP SUBSCRIPTING' should drop this entries, it should not drop init > function. > > Why we should keep those subscripting functions? From my understanding > they're > totally internal and useless without subscripting context. > Other similar commands do not drop related functions. For example 'DROP CAST' do not drop a function used to perform a cast. > It also adds entries to the pg_depend table: dependency between pg_type and pg_subscripting, dependency between pg_typeand pg_proc. Here was a typo from me. Last entry is dependency between pg_subscripting and pg_proc. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
> On 19 September 2017 at 10:21, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
> On Mon, Sep 18, 2017 at 12:25:04PM +0200, Dmitry Dolgov wrote:
>> > I think it would be good to add new catalog table. It may be named as
>> pg_type_sbs or pg_subscripting (second is better I think).
>> > This table may have the fields:
>> > - oid
>> > - sbstype
>> > - sbsinit
>> > - sbsfetch
>> > - sbsassign
>>
>> What is `sbstype`?
>
>'sbstype' is oid of type from pg_type for which subscripting is created. I.e. pg_type may not have the 'typsubsparse' field.
I'm confused, why do we need it? I mean, isn't it enough to have a subscripting
oid in a pg_type record?
> On 18 September 2017 at 12:25, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
> Overall I have only one concern about this suggestion - basically it changes
> nothing from the perspective of functionality or implementation quality.
Few more thoughts about this point. Basically if we're going this way (i.e.
having `pg_subscripting`) there will be one possible change of functionality -
in this case since we store oids of all the required functions, we can pass
them to a `parse` function (so that a custom extension does not need to resolve
them every time).
At the same time there are consequences of storing `pg_subscripting`, e.g.:
* I assume the performance would be worse because we have to do more actions to
actually call a proper function.
* The implementation itself will be little bit more complex I think.
* Should we think about other functionality besides `CREATE` and `DROP`, for
example `ALTER` (as far as I see aggregations support that).
and maybe something else that I don't see now.
On 9/18/17 05:39, Arthur Zakirov wrote: > On Mon, Sep 18, 2017 at 10:31:54AM +0200, Dmitry Dolgov wrote: >> Just to clarify, do you mean that `CREATE SUBSCRIPTING FOR` would only make >> a >> dependency record? In this case `DROP SUBSCRIPTING FOR` actually means just >> drop an init function. > I think it would be good to add new catalog table. Would you mind posting a summary of how you go here? Superficially, it seems like this sort of feature should be handled by a couple of type attributes and pg_type columns. But you are talking about a new system catalog and new DDL, and it's not clear to me how you got here. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Tue, Sep 19, 2017 at 09:01:57PM -0400, Peter Eisentraut wrote: > Would you mind posting a summary of how you go here? There were several points here to me: - it is necessary to solve the dependency problem (it can be solved also by adding several oid fields to the pg_type) - users may want to add subscripting to their existing type, which already created in their database, or drop subscriptingfrom existing type (it cannot be done by CREATE TYPE) - other type related functionalities have their CREATE command and system catalog table. For example, CREATE CAST, CREATETRANSFORM (this is a weakest point I think, mostly because of several casts and transforms can be defined to one type,and only one subscripting can be defined to one type). > Superficially, it seems like this sort of feature should be handled by a > couple of type attributes and pg_type columns. But you are talking > about a new system catalog and new DDL, and it's not clear to me how you > got here. > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services You are right of course. It can be handled by oid columns of *_parse, *_extract and *_assign functions. Also it is clearto me now that the second point can be handled by ALTER TYPE command. The command should be modified to handle it ofcourse. And it can be modified by a separate patch later. I want to emphasize that I don't insist on CREATE SUBSCRIPTING command. The patch brings important functionality and I don'twant to be a person who blocked it from commiting. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 9/20/17 04:12, Arthur Zakirov wrote: > On Tue, Sep 19, 2017 at 09:01:57PM -0400, Peter Eisentraut wrote: >> Would you mind posting a summary of how you go here? > > There were several points here to me: > - it is necessary to solve the dependency problem (it can be solved also by adding several oid fields to the pg_type) A few oid or regproc fields in pg_type seem sensible. > - users may want to add subscripting to their existing type, which already created in their database, or drop subscriptingfrom existing type (it cannot be done by CREATE TYPE) That's what ALTER TYPE is for. > - other type related functionalities have their CREATE command and system catalog table. For example, CREATE CAST, CREATETRANSFORM (this is a weakest point I think, mostly because of several casts and transforms can be defined to one type,and only one subscripting can be defined to one type). The difference is that those create associations between two separate objects (cast: type1 <-> type2, transform: type <-> language). A subscripting is just a property of a type. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Wed, Sep 20, 2017 at 09:35:06AM -0400, Peter Eisentraut wrote: > > The difference is that those create associations between two separate > objects (cast: type1 <-> type2, transform: type <-> language). A > subscripting is just a property of a type. > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services Yes, indeed. I agree. As a conclusion: - additional field are needed to pg_type for *_fetch and *_assign functions to solve dependency problem - ALTER TYPE requires a modification to add or drop subscripting from existing type (I am not sure that it should be doneby this patch, maybe better to do it by a separate patch) -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
> On 20 September 2017 at 17:19, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
> As a conclusion:
> * additional field are needed to pg_type for *_fetch and *_assign functions to solve dependency problem
One last thing that I need to clarify. Initially there was an idea to minimize
changes in `pg_type` - that's why I added only one column there that contains an
OID of main subscripting function (and everything else you should find out
inside it). But I have no objections about adding more columns if everyone is
ok with that. Basically pros and cons (marked as + and -):
one new column in `pg_type`:
* less intrusive (+)
* it's neccessary to make a dependency record between subscripting functions
explicitly (-)
three new columns in `pg_type`:
* more intrusive (-)
* we can create a dependency record between subscripting functions
simultaneously with a custom type creation (+)
* custom subscripting code does not need to resolve `fetch` and `assign`
functions (+)
On 9/21/17 11:24, Dmitry Dolgov wrote: > One last thing that I need to clarify. Initially there was an idea to > minimize changes in `pg_type` I see, but there is no value in that if it makes everything else more complicated. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Fri, Sep 22, 2017 at 3:51 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 9/21/17 11:24, Dmitry Dolgov wrote: >> One last thing that I need to clarify. Initially there was an idea to >> minimize changes in `pg_type` > > I see, but there is no value in that if it makes everything else more > complicated. +1 > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
> On Fri, Sep 22, 2017 at 3:51 PM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
> On 9/21/17 11:24, Dmitry Dolgov wrote:
>> One last thing that I need to clarify. Initially there was an idea to
>> minimize changes in `pg_type`
>
> I see, but there is no value in that if it makes everything else more
> complicated.
I'm still working on that, but obviously I'll not manage to finish it within this CF,
> On 9/21/17 11:24, Dmitry Dolgov wrote:
>> One last thing that I need to clarify. Initially there was an idea to
>> minimize changes in `pg_type`
>
> I see, but there is no value in that if it makes everything else more
> complicated.
I'm still working on that, but obviously I'll not manage to finish it within this CF,
so I'm going to move it to the next one.
> On 30 September 2017 at 22:13, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> I'm still working on that, but obviously I'll not manage to finish it within
> this CF, so I'm going to move it to the next one.
So, here is the new version of patch that contains modifications we've
discussed, namely:
* store oids of `parse`, `fetch` and `assign` functions
* introduce dependencies from a data type
* as a side effect of previous two I also eliminated some unnecessary arguments
in `parse` function.
I'm going to make few more improvements, but in the meantime I hope we can
continue to review the patch.
Attachment
On Sun, Oct 29, 2017 at 10:56:19PM +0100, Dmitry Dolgov wrote: > > So, here is the new version of patch that contains modifications we've > discussed, namely: > > * store oids of `parse`, `fetch` and `assign` functions > > * introduce dependencies from a data type > > * as a side effect of previous two I also eliminated some unnecessary > arguments > in `parse` function. Thank you for new version of the patch. There are some notes. Documentation ------------- Documentation is compiled. But there are warnings about end-tags. Now it is necessary to have full named end-tags: =# make -C doc/src/sgml check /usr/sbin/onsgmls:json.sgml:574:20:W: empty end-tag ... Documentation is out of date: - catalogs.sgml needs to add information about additional pg_type fields - create_type.sgml needs information about subscripting_parse, subscripting_assign and subscripting_fetch options - xsubscripting.sgml is out of date Code ---- I think it is necessary to check Oids of subscripting_parse, subscripting_assign, subscripting_fetch. Maybe within TypeCreate(). Otherwise next cases possible: =# CREATE TYPE custom ( internallength = 8, input = custom_in, output = custom_out, subscripting_parse = custom_subscripting_parse); =# CREATE TYPE custom ( internallength = 8, input = custom_in, output = custom_out, subscripting_fetch = custom_subscripting_fetch); Are all subscripting_* fields mandatory? If so if user provided at least one of them then all fields should be provided. Should all types have support assigning via subscript? If not then subscripting_assign parameter is optional. > +Datum > +jsonb_subscript_parse(PG_FUNCTION_ARGS) > +{ > + bool isAssignment = PG_GETARG_BOOL(0); and > +Datum > +custom_subscripting_parse(PG_FUNCTION_ARGS) > +{ > + bool isAssignment = PG_GETARG_BOOL(0); Here isAssignment is unused variable, so it could be removed. > + > + scratch->d.sbsref.eval_finfo = eval_finfo; > + scratch->d.sbsref.nested_finfo = nested_finfo; > + As I mentioned earlier we need assigning eval_finfo and nested_finfo only for EEOP_SBSREF_OLD, EEOP_SBSREF_ASSIGN and EEOP_SBSREF_FETCHsteps. Also they should be assigned before calling ExprEvalPushStep(), not after. Otherwise some bugs may appear in future. > - ArrayRef *aref = makeNode(ArrayRef); > + NodeTag sbstag = nodeTag(src_expr); > + Size nodeSize = sizeof(SubscriptingRef); > + SubscriptingRef *sbsref = (SubscriptingRef *) newNode(nodeSize, sbstag); Is there necessity to use newNode() instead using makeNode(). The previous code was shorter. There is no changes in execnodes.h except removed line. So I think execnodes.h could be removed from the patch. > > I'm going to make few more improvements, but in the meantime I hope we can > continue to review the patch. I will wait. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
> On 31 October 2017 at 16:05, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
> Documentation is compiled. But there are warnings about end-tags. Now it is necessary to have full named end-tags
Fixed, thanks for noticing.
> I think it is necessary to check Oids of subscripting_parse, subscripting_assign, subscripting_fetch. Maybe within TypeCreate().
Yes, I agree. I implemented it in a way that all subscripting-related functions
must be provided if `subscripting_parse` is there - in this case if you want to
prevent assign or fetch, you can just do it inside a corresponding function and
it allows to provide a custom message about that.
> > +Datum
> > +custom_subscripting_parse(PG_FUNCTION_ARGS)
> > +{
> > + bool isAssignment = PG_GETARG_BOOL(0);
>
> Here isAssignment is unused variable, so it could be removed.
In this case I disagree - the purpose of these examples is to show everything
you can use. So I just need to come up with some example that involves
`isAssignment`.
> > + scratch->d.sbsref.eval_finfo = eval_finfo;
> > + scratch->d.sbsref.nested_finfo = nested_finfo;
> > +
> As I mentioned earlier we need assigning eval_finfo and nested_finfo only for EEOP_SBSREF_OLD, EEOP_SBSREF_ASSIGN and EEOP_SBSREF_FETCH steps.
> Also they should be assigned before calling ExprEvalPushStep(), not after. Otherwise some bugs may appear in future.
I'm really confused about this one. Can you tell me the exact line numbers?
Because if I remove any of these lines "blindly", tests are failing.
> > - ArrayRef *aref = makeNode(ArrayRef);
> > + NodeTag sbstag = nodeTag(src_expr);
> > + Size nodeSize = sizeof(SubscriptingRef);
> > + SubscriptingRef *sbsref = (SubscriptingRef *) newNode(nodeSize, sbstag);
>
> Is there necessity to use newNode() instead using makeNode(). The previous code was shorter.
Good catch! It was a leftover from the version when I had two different nodes
for subscripting.
> There is no changes in execnodes.h except removed line. So I think execnodes.h could be removed from the patch.
Fixed.
Attachment
Thank you for fixing. On Tue, Nov 07, 2017 at 09:00:43PM +0100, Dmitry Dolgov wrote: > > > +Datum > > > +custom_subscripting_parse(PG_FUNCTION_ARGS) > > > +{ > > > + bool isAssignment = PG_GETARG_BOOL(0); > > > > Here isAssignment is unused variable, so it could be removed. > > In this case I disagree - the purpose of these examples is to show > everything > you can use. So I just need to come up with some example that involves > `isAssignment`. Understood. > > > > + scratch->d.sbsref.eval_finfo = eval_finfo; > > > + scratch->d.sbsref.nested_finfo = nested_finfo; > > > + > > As I mentioned earlier we need assigning eval_finfo and nested_finfo only > for EEOP_SBSREF_OLD, EEOP_SBSREF_ASSIGN and EEOP_SBSREF_FETCH steps. > > Also they should be assigned before calling ExprEvalPushStep(), not > after. Otherwise some bugs may appear in future. > > I'm really confused about this one. Can you tell me the exact line numbers? > Because if I remove any of these lines "blindly", tests are failing. Field scratch->d is union. Its fields should be changed only before calling ExprEvalPushStep(), which copies 'scratch'. Tobe more specific I attached the patch 0005-Fix-ExprEvalStep.patch, which can be applyed over your patches. Some other notes are below. > <replaceable class="parameter">type_modifier_output_function</replaceable> and > - <replaceable class="parameter">analyze_function</replaceable> > + <replaceable class="parameter">analyze_function</replaceable>, > + <replaceable class="parameter">subscripting_parse_function</replaceable> > + <replaceable class="parameter">subscripting_assign_function</replaceable> > + <replaceable class="parameter">subscripting_fetch_function</replaceable> I think you forgot commas and conjunction 'and'. > + The optional > + <replaceable class="parameter">subscripting_parse_function</replaceable>, > + <replaceable class="parameter">subscripting_assign_function</replaceable> > + <replaceable class="parameter">subscripting_fetch_function</replaceable> > + contains type-specific logic for subscripting of the data type. Here you forgot comma or 'and'. Also 'contain' should be used instead 'contains'. It seems that in the following you switched descriptions: > + <term><replaceable class="parameter">subscripting_assign_function</replaceable></term> > + <listitem> > + <para> > + The name of a function that contains type-specific subscripting logic for > + fetching an element from the data type. > + </para> subscripting_assign_function is used for updating. > + <term><replaceable class="parameter">subscripting_fetch_function</replaceable></term> > + <listitem> > + <para> > + The name of a function that contains type-specific subscripting logic for > + updating an element in the data type. > + </para> subscripting_fetch_function is used for fetching. I have a little complain about how ExprEvalStep gets resvalue. resvalue is assigned in one place (within ExecEvalSubscriptingRefFetch(),ExecEvalSubscriptingRefAssign()), resnull is assigned in another place (within jsonb_subscript_fetch(),jsonb_subscript_assign()). I'm not sure that it is a good idea, but it is not critical, it is justcomplaint. After your fixing I think we should wait for opinion of senior community members and mark the patch as 'Ready for Commiter'.Maybe I will do more tests and try to implement subscripting to another type. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
> On 8 November 2017 at 17:25, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
Thanks for your review!
> > On Tue, Nov 07, 2017 at 09:00:43PM +0100, Dmitry Dolgov wrote:
> > > +Datum
> > > +custom_subscripting_parse(PG_FUNCTION_ARGS)
> > > +{
> > > + bool isAssignment = PG_GETARG_BOOL(0);
> >
> > Here isAssignment is unused variable, so it could be removed.
>
> In this case I disagree - the purpose of these examples is to show everything
> you can use. So I just need to come up with some example that involves
> `isAssignment`.
I've incorporated this variable into the tutorial.
> To be more specific I attached the patch 0005-Fix-ExprEvalStep.patch, which can be applyed over your patches.
Oh, now I see, thank you.
> I think you forgot commas and conjunction 'and'.
> Here you forgot comma or 'and'. Also 'contain' should be used instead 'contains'.
> It seems that in the following you switched descriptions:
Shame on me :) Fixed.
> I have a little complain about how ExprEvalStep gets resvalue. resvalue is
> assigned in one place (within ExecEvalSubscriptingRefFetch(),
> ExecEvalSubscriptingRefAssign()), resnull is assigned in another place
> (within jsonb_subscript_fetch(), jsonb_subscript_assign()).
Hm...I'm afraid I don't get this. `resnull` is never assigned inside
`jsonb_subscript_fetch` or `jsonb_subscript_assign`, instead it's coming
from `ExecInterpExp` as `isnull` if I remember correctly. Are we talking about
the same thing?
In this version of the patch I also improved NULL handling, you can see it in
the tests.
Attachment
On Sat, Nov 11, 2017 at 04:34:31PM +0100, Dmitry Dolgov wrote: > > > > > > Here isAssignment is unused variable, so it could be removed. > > > > In this case I disagree - the purpose of these examples is to show > everything > > you can use. So I just need to come up with some example that involves > > `isAssignment`. > > I've incorporated this variable into the tutorial. Great. I think users will know how to use isAssignment now. > > I have a little complain about how ExprEvalStep gets resvalue. resvalue is > > assigned in one place (within ExecEvalSubscriptingRefFetch(), > > ExecEvalSubscriptingRefAssign()), resnull is assigned in another place > > (within jsonb_subscript_fetch(), jsonb_subscript_assign()). > > Hm...I'm afraid I don't get this. `resnull` is never assigned inside > `jsonb_subscript_fetch` or `jsonb_subscript_assign`, instead it's coming > from `ExecInterpExp` as `isnull` if I remember correctly. Are we talking > about > the same thing? No, I meant ExprEvalStep struct. For example, within ExecEvalSubscriptingRefFetch() you assign *op->resvalue but *op->resnullis leaved as unchanged: > ExecEvalSubscriptingRefFetch(ExprState *state, ExprEvalStep *op) > ... > *op->resvalue = FunctionCall2(op->d.sbsref.eval_finfo, > PointerGetDatum(*op->resvalue), > PointerGetDatum(op)); For jsonb *op->resnull is changed within jsonb_subscript_fetch() for arrays within array_subscript_fetch() (which are calledby ExecEvalSubscriptingRefFetch()): > return jsonb_get_element(DatumGetJsonbP(containerSource), > sbstate->upper, > sbstate->numupper, > step->resnull, /* step->resnull is changed within jsonb_get_element() */ > false); It is not critical, but it may be good to change them in one place. > > In this version of the patch I also improved NULL handling, you can see it > in > the tests. New tests passed. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
> On 13 November 2017 at 14:11, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
> > I have a little complain about how ExprEvalStep gets resvalue. resvalue is
> > assigned in one place (within ExecEvalSubscriptingRefFetch(),
> > ExecEvalSubscriptingRefAssign()), resnull is assigned in another place
> > (within jsonb_subscript_fetch(), jsonb_subscript_assign()).
>
> Hm...I'm afraid I don't get this. `resnull` is never assigned inside
> `jsonb_subscript_fetch` or `jsonb_subscript_assign`, instead it's coming
> from `ExecInterpExp` as `isnull` if I remember correctly. Are we talking
> about
> the same thing?
>
> No, I meant ExprEvalStep struct. For example, within
> ExecEvalSubscriptingRefFetch() you assign op->resvalue but op->resnull is
> leaved as unchanged
Oh, I see now, thanks for the explanation. Actually, it's how it was
implemented before in array subscripting, and I tried to be consistent with
this implementation. But now I wonder if `resnull` is really needed in
`jsonb_get_element`, `array_get_element` and it seems to me that I can even get
rid of it so, that everything would be assigned in `jsonb_subscript_fetch`,
`array_subscript_fetch` - I'll send a new version of the patch soon.
> On 14 November 2017 at 22:25, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> But now I wonder if `resnull` is really needed in `jsonb_get_element`,
> `array_get_element` and it seems to me that I can even get rid of it so
On the second thought, no, looks like I'm wrong and it should be like this. The
reason is that any `fetch` function should be in form
(container, internal) -> extracted value
which means that we need to return an extracted value (for jsonb it's a `jsonb`,
for array it's an `anyelement`). But at the same time in general case we can
figure out if the result is null only inside a `fetch` function,
(`jsonb_get_element` for jsonb or whatever it may be for a custom data type)
because it returns Datum. So the only way to return this information is by
reference through the `internal` argument. To summarize, If as you said it's
not that critical, I would suggest to leave it as it is.
On Wed, Nov 15, 2017 at 11:02:59PM +0100, Dmitry Dolgov wrote: > On the second thought, no, looks like I'm wrong and it should be like this. > The > reason is that any `fetch` function should be in form > > (container, internal) -> extracted value > > which means that we need to return an extracted value (for jsonb it's a > `jsonb`, > for array it's an `anyelement`). But at the same time in general case we can > figure out if the result is null only inside a `fetch` function, > (`jsonb_get_element` for jsonb or whatever it may be for a custom data type) > because it returns Datum. So the only way to return this information is by > reference through the `internal` argument. To summarize, If as you said it's > not that critical, I would suggest to leave it as it is. Actually it is not only way to return isnull information. You can also return it using pointer to a boolean argument. *fetch() functions also doesn't need in ExprEvalStep struct, you can pass SubscriptingRefState struct instead. I mean the following code: ExecEvalSubscriptingRefFetch(ExprState *state, ExprEvalStep *op) { ... *op->resvalue = FunctionCall2(op->d.sbsref.eval_finfo, PointerGetDatum(*op->resvalue), PointerGetDatum(op->d.sbsref.state), PointerGetDatum(op->resnull)); } Datum jsonb_subscript_fetch(PG_FUNCTION_ARGS) { Datum containerSource = PG_GETARG_DATUM(0); SubscriptingRefState *state = (SubscriptingRefState *) PG_GETARG_POINTER(1); bool *isNull = (bool *) PG_GETARG_POINTER(2); return jsonb_get_element(DatumGetJsonbP(containerSource), state->upper, state->numupper, isNull, false); } > To summarize, If as you said it's > not that critical, I would suggest to leave it as it is. Yes, I just wanted to share an opinion how to improve the code. I thought that the current approach may confuse programmers,who will implement subscribting. Also you can see extractValue() function of GIN [1]. It returns if values is null in same way. 1 - https://www.postgresql.org/docs/current/static/gin-extensibility.html -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
> On 16 November 2017 at 12:40, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
>
> Actually it is not only way to return isnull information.
What i've meant is that it's the only way if we want to keep the function
signature. Actually I would prefer this
(container, internal) -> extracted value
over this (I assume that's exactly what you've suggested?)
(container, internal, internal) -> extracted value
because it makes the purpose of the function more clear (especially for custom
data types). Also it would be consistent with `assign` functions (since
`isnull` is not assigned there). But I see your point, a separate argument for
`isnull` will make it more straightforward in terms of null handling.
> fetch() functions also doesn't need in ExprEvalStep struct
I had hard time parsing this, but from your examples I assume you're talking
about passing little bit different arguments to `fetch` function (am I right?).
Just from functionality point of view I don't see a big difference in what
argument to use to return `isnull` by reference. So at the end I think we need
to choose between having one or two `internal` arguments for `fetch` functions.
Any other opinions?
On Thu, Nov 16, 2017 at 10:37:40PM +0100, Dmitry Dolgov wrote: > I had hard time parsing this, but from your examples I assume you're talking > about passing little bit different arguments to `fetch` function (am I > right?). Yes, I meant to pass the following arguments: Datum source, SubscriptingRefState *state, bool *isNull I think it is time to mark the patch as "Ready for Commiter". I've marked it. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
> On 19 November 2017 at 16:13, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
>
> I think it is time to mark the patch as "Ready for Commiter". I've
> marked it.
Good, thank you for the comprehensive review.
>
> I think it is time to mark the patch as "Ready for Commiter". I've
> marked it.
Good, thank you for the comprehensive review.
> On 1 December 2017 at 06:34, Michael Paquier <michael.paquier@gmail.com> wrote:
>
> Documentation in patch 4 has conflicts. Please rebase. I am moving
> this patch to next CF with "waiting on author".
Thanks for noticing. Here is the rebased version (the conflict itself was quite
trivial, but I also cleaned up functions signature a bit).
Attachment
> On 4 December 2017 at 01:26, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
> Thanks for noticing. Here is the rebased version (the conflict itself was quite
> trivial, but I also cleaned up functions signature a bit).
>
> Thanks for noticing. Here is the rebased version (the conflict itself was quite
> trivial, but I also cleaned up functions signature a bit).
Another rebased version of this patch.
Attachment
Dmitry Dolgov <9erthalion6@gmail.com> writes: > Another rebased version of this patch. Apologies for not having paid attention to this patch for so long. Coming back to it now, I wonder what happened to the plan to separate assignment and fetch into two different node types. I can see that that didn't happen so far as primnodes.h is concerned, but you've made some changes that seem to assume it did happen, eg this bit in clauses.c: @@ -1345,12 +1345,10 @@ contain_nonstrict_functions_walker(Node *node, void *context) /* a window function could return non-null with null input */ return true; } - if (IsA(node, ArrayRef)) + if (IsA(node, SubscriptingRef)) { /* array assignment is nonstrict, but subscripting is strict */ - if (((ArrayRef *) node)->refassgnexpr != NULL) - return true; - /* else fall through to check args */ + return true; } if (IsA(node, DistinctExpr)) { Treating the two cases alike here is just wrong. Also, the reason I was looking at clauses.c was I realized that my recent commit 3decd150a broke this patch, because it introduced understanding of ArrayRef into eval_const_expressions(). I think that you can probably just do s/ArrayRef/SubscriptingRef/ there, but it might deserve a closer look than I've given it. I'm not terribly happy with the cosmetics of this patch at the moment. There are too many places where it's achingly obvious that you did s/ArrayRef/SubscriptingRef/g and nothing else, leaving code that does not pass the test of "does it look like it was written like that to begin with". There are a lot of variables still named "aref" or "arefstate" or similar when that's no longer an apropos name; there are a lot of sentences reading "an SubscriptingRef" which is bad English; there are a lot of comments whose layout is not going to be too hot after pgindent because "SubscriptingRef" is so much longer than "ArrayRef". (I'm tempted to suggest that we call the node type just "Subscript", to buy back some of that.) I'm not necessarily putting it on you to fix that stuff --- it might be easier for a native English speaker --- but I am saying that if I commit this, there are going to be a lot of differences in detail from what's here now. regards, tom lane
> On 4 January 2018 at 03:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I wonder what happened to the plan to separate assignment and fetch into two > different node types. I can see that that didn't happen so far as primnodes.h > is concerned, but you've made some changes that seem to assume it did happen. There was one version of this patch that followed this plan. It turns out that it's quite unnatural approach (at least within the current implementation), because I had to duplicate or reuse a lot of code for those two node types. Eventually we decided to play it back. Unfortunately, I did it somehow sloppy and forgot about those simple cases, thank you for noticing! > I'm not terribly happy with the cosmetics of this patch at the moment. > There are too many places where it's achingly obvious that you did > s/ArrayRef/SubscriptingRef/g and nothing else, leaving code that does not > pass the test of "does it look like it was written like that to begin > with". Yes, I paid not enough attention to these small details. I cleaned this up to make it easier for a native speaker. Here is a new rebased version of the patch with incorporated improvements that you've mentioned.
Attachment
Dmitry Dolgov <9erthalion6@gmail.com> writes: >> On 4 January 2018 at 03:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I wonder what happened to the plan to separate assignment and fetch into two >> different node types. I can see that that didn't happen so far as primnodes.h >> is concerned, but you've made some changes that seem to assume it did happen. > There was one version of this patch that followed this plan. It turns out that > it's quite unnatural approach (at least within the current implementation), > because I had to duplicate or reuse a lot of code for those two node types. I'm not entirely convinced by that argument. Sure, there would be a lot of duplicative code in the node support functions, but that's inherent in our approach to those functions. I'm more concerned about readability, especially given that exposing this feature to extensions is going to set all these decisions in concrete forever. > Here is a new rebased version of the patch > with incorporated improvements that you've mentioned. I spent a couple of hours looking through this. I'm still pretty unhappy with the state of the parser APIs. In the first place, you haven't documented what those APIs are in any meaningful way. I do not think it's adequate to tell people to go read array_subscript_parse as the first and only way to understand what a subscript parse function must do. We do often expect extension authors to pick up small details that way, but there should be a textual API spec of some sort --- for example, look at the index AM API specs in indexam.sgml, which give pretty clear high-level definitions of what each AM API function has to do. Part of the reason why I'm insistent on that is that I think it will expose that the division of labor between the core parser and the datatype-specific parse function is still a mess. One particular sore spot is the question of who decides what the return data type of a subscripting function is. Right now you seem to be making that decision in the core parser, at least for the assignment case, which is pretty bad from an extensibility standpoint and also leads to all of this messiness: * You changed transformArrayType so that it doesn't throw an error if the given type isn't an array --- without, I note, updating either the function header comment or the internal comment that you falsified. * That in turn means that where transformAssignmentSubscripts thinks it's determining the result type, it may or may not be producing InvalidOid instead (again, with no comment to warn the reader). * And then you had to kludge transformAssignmentIndirection horribly (and I'm not sure you kludged it enough, either, because there are still a bunch of places where it uses targetTypeId without any concern for the possibility that that's zero). It doesn't seem to me to be acceptable to just ignore coercion failure, as that code does now. If it's no longer the responsibility of this function to guarantee that the result is of the right type, why is it attempting coercion at all? In any case you failed to update its header comment to explain what it's doing differently than before. In short the division of labor in this area still needs to be thought about. I don't think we really want transformAssignmentSubscripts determining the required input data type at all; that should be farmed out to the datatype-specific code. I'm also pretty unconvinced about refnestedfunc --- why do we need that? I also notice that you still haven't done anything about the problem of the subscripting operation possibly yielding a different typmod or collation than the container type has. It was okay to let that go for awhile, but we're not shipping it like this, because it's going to be awfully hard to change that struct type once extensions are building them. While I'm on the topic, I am not really happy with s/array/container/ as you've done in some of this code. To my mind, "container type" includes composite types. Particularly in the parse_target code, where we're currently dealing with either composites or arrays, making it say that we're dealing with either composites or containers is just a recipe for confusion. Unfortunately I can't think of a better word offhand, but some attention to this is needed. As far as the comments go, we might be able to use the term "subscriptable type", but not sure if that will work for function/variable names. On the executor side of things, I suspect Andres will be unhappy that you are making ExprEvalStep part of the API for datatypes --- he objected to my exposing it to plpgsql param eval in https://postgr.es/m/20171220174243.n4y3hgzf7xd3mm5e@alap3.anarazel.de and there was a lot more reason to do so there than there is here, IMO. It looks like what you actually need is just the SubscriptingRefState and an isnull flag, so it might be better to specify that the fetch and assign functions have signatures like Datum fetch(Datum val, SubscriptingRefState *state, bool *isnull) (representing both of the last two arguments as INTERNAL at SQL level). Now on the other hand, maybe the right way to go is to embrace a similar approach to what I did for plpgsql param eval, and let the datatype control what gets generated as the expression execution step. The main point here would be to let the datatype provide the address of a callback function that gets executed for a subscripting step, rather than having it specify the OID of a pg_proc entry to call. There would be two big wins from that: * The callback function would have a plain C call signature, so we would not have to go through FunctionCallN, saving a few cycles. This is attractive because it would pretty much eliminate any concern about this patch making array access slower at execution time. * There would no longer be a wired-in restriction that there be two and only two subscripting execution functions per datatype, since there would not be any need for those functions to be identified in pg_type. Basically, with this approach, a subscriptable data type would need to provide two cataloged support functions: parse, as we have now, and compile. Actual execution functions would be outside that. (Possibly we could merge the support functions into one function that takes an operation code, similar to one of your earlier designs. Not sure that that's better, but it'd be easier to extend in future if we decide we need three support operations...) The two disadvantages I can see of approaching things this way are: * There'd be at least some connection of subscriptable types to expression compilation, which is what Andres was objecting to in the message I cited above. Possibly we could alleviate that a bit by providing helper functions that mask exactly what goes into the expression step structs, but I'm not sure that that gets us far. * We'd not have OIDs of execution functions in the parse trees for subscripting operations, which would mean that we would not have a convenient way to identify subscripting operations that are mutable, parallel-unsafe, or leaky. Probably it'd be fine to assume that subscripting is always immutable and parallel-safe, although I'm slightly more concerned about whether people would want the option to label it leaky vs leakproof. As against that, the approach that's there right now adds planning overhead that wasn't there before for exactly those function property lookups, and again I'm a bit worried about the performance impact. (I did some crude performance tests today that indicated that the existing patch has small but measurable penalties, maybe on the order of 10 percent; and it'd be more by the time we're done because I'm pretty sure you've missed some places that ought to check these function properties if we're going to have them. So I'm afraid that we'll get pushback from people who don't care about extensible subscripts and do care about array performance.) So roughly speaking, I'm imagining that we'd go back to a design similar to one I recall you had at one point, where there's a single SQL-visible subscripting support function per datatype, with a signature like subscript_support(int opcode, internal other_info) returns internal but the opcodes would now be "parse" and "compile". Actual execution would use callback functions that don't have to be SQL-visible. This is closer to the approach we've been using of late for things like AM APIs: to the extent possible, there's just one SQL-registered handler function and all else is a callback. Actually, we could make it *exactly* like that, and have the registered handler give back a struct full of function pointers rather than doing anything much itself. Maybe that's an even better way to go. regards, tom lane
Hi, Tom pointed me towards this thread. I've not followed the topic, so I might miss a bit of context while commenting on expression eval related bits... On 2018-01-07 17:39:00 -0500, Tom Lane wrote: > On the executor side of things, I suspect Andres will be unhappy that > you are making ExprEvalStep part of the API for datatypes --- he > objected to my exposing it to plpgsql param eval in > https://postgr.es/m/20171220174243.n4y3hgzf7xd3mm5e@alap3.anarazel.de > and there was a lot more reason to do so there than there is here, > IMO. Indeed. > It looks like what you actually need is just the SubscriptingRefState and > an isnull flag, so it might be better to specify that the fetch and assign > functions have signatures like > Datum fetch(Datum val, SubscriptingRefState *state, bool *isnull) > (representing both of the last two arguments as INTERNAL at SQL level). That'd definitely be better. > Now on the other hand, maybe the right way to go is to embrace a similar > approach to what I did for plpgsql param eval, and let the datatype > control what gets generated as the expression execution step. The main > point here would be to let the datatype provide the address of a callback > function that gets executed for a subscripting step, rather than having it > specify the OID of a pg_proc entry to call. There would be two big wins > from that: > > * The callback function would have a plain C call signature, so we would > not have to go through FunctionCallN, saving a few cycles. This is > attractive because it would pretty much eliminate any concern about this > patch making array access slower at execution time. I'll note that I'm not convinced that the goal this paragraph states and having the datatype control the entire expression step are full dependent on each other. It seems quite possible to have ExecInitSubscriptingRef() call a datatype specific function that returns C callbacks. > The two disadvantages I can see of approaching things this way are: > > * There'd be at least some connection of subscriptable types to > expression compilation, which is what Andres was objecting to in the > message I cited above. Possibly we could alleviate that a bit by > providing helper functions that mask exactly what goes into the > expression step structs, but I'm not sure that that gets us far. Yea, I'm not the greatest fan of that. With plpgsql it's at least something in-core that's exposed, but I suspect the subscripotion interface will get used outside of core, and I really want to whack some of the expression stuff around some more. > Actually, we could make it *exactly* like that, and have the > registered handler give back a struct full of function pointers rather > than doing anything much itself. Maybe that's an even better way to > go. I'd definitely advocate for that. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2018-01-07 17:39:00 -0500, Tom Lane wrote: >> Now on the other hand, maybe the right way to go is to embrace a similar >> approach to what I did for plpgsql param eval, and let the datatype >> control what gets generated as the expression execution step. > I'll note that I'm not convinced that the goal this paragraph states and > having the datatype control the entire expression step are full > dependent on each other. It seems quite possible to have > ExecInitSubscriptingRef() call a datatype specific function that returns > C callbacks. Yeah, that's a good point. We could define the compile support function as simply being allowed to examine the expression tree and give back the address of the callback function to use, with the rest of the compiled expression structure being predetermined. The more general approach would only be useful if you imagine some sort of high-performance extension that wants to compile specialized expr steps for its subscripting activity. Probably the need for that is pretty far away yet. BTW, if you wanted to change the way plpgsql param callbacks work to be like this design (ie push the actual generation of the ExprStep back into core, with plpgsql just supplying a callback address), I wouldn't object. regards, tom lane
> On 7 January 2018 at 23:39, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Dmitry Dolgov <9erthalion6@gmail.com> writes: >>> On 4 January 2018 at 03:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I wonder what happened to the plan to separate assignment and fetch into two >>> different node types. I can see that that didn't happen so far as primnodes.h >>> is concerned, but you've made some changes that seem to assume it did happen. > >> There was one version of this patch that followed this plan. It turns out that >> it's quite unnatural approach (at least within the current implementation), >> because I had to duplicate or reuse a lot of code for those two node types. > > I'm not entirely convinced by that argument. Sure, there would be a lot of > duplicative code in the node support functions, but that's inherent in our > approach to those functions. I'm more concerned about readability, > especially given that exposing this feature to extensions is going to set > all these decisions in concrete forever. That version I'm talking about actually have got not that much readability from this separation (at least from my perspective). For extenions it maybe more justified, but at the same time `isAssignment` flag the only thing you need to check to figure out whether it's a fetch or assignment operation - it doesn't sound as something significantly worse for readability. Or do you mean something else? Anyway, it's quite possible that I just failed to implement a proper approach to implement this. I'm going to keep pondering this in case I'll come up with something better - it should be more or less straightforward to change this logic at any time. > * You changed transformArrayType so that it doesn't throw an error if > the given type isn't an array --- without, I note, updating either the > function header comment or the internal comment that you falsified. > > * That in turn means that where transformAssignmentSubscripts thinks > it's determining the result type, it may or may not be producing > InvalidOid instead (again, with no comment to warn the reader). > > * And then you had to kludge transformAssignmentIndirection horribly > (and I'm not sure you kludged it enough, either, because there are > still a bunch of places where it uses targetTypeId without any concern > for the possibility that that's zero). It doesn't seem to me to be > acceptable to just ignore coercion failure, as that code does now. > If it's no longer the responsibility of this function to guarantee > that the result is of the right type, why is it attempting coercion > at all? In any case you failed to update its header comment to > explain what it's doing differently than before. > > In short the division of labor in this area still needs to be thought > about. I don't think we really want transformAssignmentSubscripts > determining the required input data type at all; that should be > farmed out to the datatype-specific code. I'm also pretty unconvinced > about refnestedfunc --- why do we need that? Yes, I see, there is a room for improvements. I'm going to post a new version soon, where I'll try to address this. About `refnestedfunc` - I added it as part of my modification for functions caching, but looks like now this function is useless. > While I'm on the topic, I am not really happy with s/array/container/ > as you've done in some of this code. To my mind, "container type" > includes composite types. Particularly in the parse_target code, where > we're currently dealing with either composites or arrays, making it say > that we're dealing with either composites or containers is just a recipe > for confusion. Unfortunately I can't think of a better word offhand, > but some attention to this is needed. As far as the comments go, > we might be able to use the term "subscriptable type", but not sure if > that will work for function/variable names. Is there a plausible use case when "container type" can be something else than a "composite type"? Maybe it's ok just to say that "container type" is equal to "composite type"? > On the executor side of things, I suspect Andres will be unhappy that > you are making ExprEvalStep part of the API for datatypes --- he > objected to my exposing it to plpgsql param eval in > https://postgr.es/m/20171220174243.n4y3hgzf7xd3mm5e@alap3.anarazel.de > and there was a lot more reason to do so there than there is here, IMO. > It looks like what you actually need is just the SubscriptingRefState and > an isnull flag, so it might be better to specify that the fetch and assign > functions have signatures like > Datum fetch(Datum val, SubscriptingRefState *state, bool *isnull) > (representing both of the last two arguments as INTERNAL at SQL level). Yes, I agree. > Now on the other hand, maybe the right way to go is to embrace a similar > approach to what I did for plpgsql param eval, and let the datatype > control what gets generated as the expression execution step. The main > point here would be to let the datatype provide the address of a callback > function that gets executed for a subscripting step, rather than having it > specify the OID of a pg_proc entry to call. I had one implementation that resembles this approach. But as far as I see (please correct me if something in the following chain is wrong) it's necessary to store pointers to these callback functions in a `SubscriptingRef` node, which means that they would be missing in functions from `readfuncs`/`outfuncs`/`copyfuncs`. Is there any situation when this can lead to undesirable consequences? > * We'd not have OIDs of execution functions in the parse trees for > subscripting operations, which would mean that we would not have > a convenient way to identify subscripting operations that are > mutable, parallel-unsafe, or leaky. Probably it'd be fine to assume > that subscripting is always immutable and parallel-safe, although > I'm slightly more concerned about whether people would want the > option to label it leaky vs leakproof. As against that, the approach > that's there right now adds planning overhead that wasn't there before > for exactly those function property lookups, and again I'm a bit worried > about the performance impact. (I did some crude performance tests > today that indicated that the existing patch has small but measurable > penalties, maybe on the order of 10 percent; and it'd be more by the > time we're done because I'm pretty sure you've missed some places that > ought to check these function properties if we're going to have them. > So I'm afraid that we'll get pushback from people who don't care about > extensible subscripts and do care about array performance.) We also wouldn't have a convenient way to specify costs of subscripting operations. I had in mind a situation, when someone wants to have a quite heavy subscripting function, which should affect planner's decisions. It maybe a rare case though.
Hi, On 2018-01-04 16:47:50 +0100, Dmitry Dolgov wrote: > diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c > index 16f908037c..ee50fda4ef 100644 > --- a/src/backend/executor/execExpr.c > +++ b/src/backend/executor/execExpr.c > @@ -64,7 +64,8 @@ static void ExecInitExprSlots(ExprState *state, Node *node); > static bool get_last_attnums_walker(Node *node, LastAttnumInfo *info); > static void ExecInitWholeRowVar(ExprEvalStep *scratch, Var *variable, > ExprState *state); > -static void ExecInitArrayRef(ExprEvalStep *scratch, ArrayRef *aref, > +static void ExecInitSubscriptingRef(ExprEvalStep *scratch, > + SubscriptingRef *sbsref, > ExprState *state, > Datum *resv, bool *resnull); > static bool isAssignmentIndirectionExpr(Expr *expr); > @@ -857,11 +858,11 @@ ExecInitExprRec(Expr *node, ExprState *state, > break; > } > > - case T_ArrayRef: > + case T_SubscriptingRef: > { > - ArrayRef *aref = (ArrayRef *) node; > + SubscriptingRef *sbsref = (SubscriptingRef *) node; > > - ExecInitArrayRef(&scratch, aref, state, resv, resnull); > + ExecInitSubscriptingRef(&scratch, sbsref, state, resv, resnull); > break; > } > > @@ -1176,7 +1177,7 @@ ExecInitExprRec(Expr *node, ExprState *state, > /* > * Use the CaseTestExpr mechanism to pass down the old > * value of the field being replaced; this is needed in > - * case the newval is itself a FieldStore or ArrayRef that > + * case the newval is itself a FieldStore or SubscriptingRef that > * has to obtain and modify the old value. It's safe to > * reuse the CASE mechanism because there cannot be a CASE > * between here and where the value would be needed, and a > @@ -2401,34 +2402,40 @@ ExecInitWholeRowVar(ExprEvalStep *scratch, Var *variable, ExprState *state) > } > > /* > - * Prepare evaluation of an ArrayRef expression. > + * Prepare evaluation of a SubscriptingRef expression. > */ > static void > -ExecInitArrayRef(ExprEvalStep *scratch, ArrayRef *aref, > +ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, > ExprState *state, Datum *resv, bool *resnull) > { > - bool isAssignment = (aref->refassgnexpr != NULL); > - ArrayRefState *arefstate = palloc0(sizeof(ArrayRefState)); > - List *adjust_jumps = NIL; > - ListCell *lc; > - int i; > + bool isAssignment = (sbsref->refassgnexpr != NULL); > + SubscriptingRefState *sbsrefstate = palloc0(sizeof(SubscriptingRefState)); > + List *adjust_jumps = NIL; > + ListCell *lc; > + int i; > + FmgrInfo *eval_finfo, *nested_finfo; > + > + eval_finfo = palloc0(sizeof(FmgrInfo)); > + nested_finfo = palloc0(sizeof(FmgrInfo)); > + > + fmgr_info(sbsref->refevalfunc, eval_finfo); > + if (OidIsValid(sbsref->refnestedfunc)) > + { > + fmgr_info(sbsref->refnestedfunc, nested_finfo); > + } > > - /* Fill constant fields of ArrayRefState */ > - arefstate->isassignment = isAssignment; > - arefstate->refelemtype = aref->refelemtype; > - arefstate->refattrlength = get_typlen(aref->refarraytype); > - get_typlenbyvalalign(aref->refelemtype, > - &arefstate->refelemlength, > - &arefstate->refelembyval, > - &arefstate->refelemalign); > + /* Fill constant fields of SubscriptingRefState */ > + sbsrefstate->isassignment = isAssignment; > + sbsrefstate->refelemtype = sbsref->refelemtype; > + sbsrefstate->refattrlength = get_typlen(sbsref->refcontainertype); > > /* > * Evaluate array input. It's safe to do so into resv/resnull, because we > * won't use that as target for any of the other subexpressions, and it'll > - * be overwritten by the final EEOP_ARRAYREF_FETCH/ASSIGN step, which is > + * be overwritten by the final EEOP_SBSREF_FETCH/ASSIGN step, which is > * pushed last. > */ > - ExecInitExprRec(aref->refexpr, state, resv, resnull); > + ExecInitExprRec(sbsref->refexpr, state, resv, resnull); > > /* > * If refexpr yields NULL, and it's a fetch, then result is NULL. We can > @@ -2440,92 +2447,95 @@ ExecInitArrayRef(ExprEvalStep *scratch, ArrayRef *aref, > scratch->opcode = EEOP_JUMP_IF_NULL; > scratch->d.jump.jumpdone = -1; /* adjust later */ > ExprEvalPushStep(state, scratch); > + > adjust_jumps = lappend_int(adjust_jumps, > state->steps_len - 1); > } > > /* Verify subscript list lengths are within limit */ > - if (list_length(aref->refupperindexpr) > MAXDIM) > + if (list_length(sbsref->refupperindexpr) > MAX_SUBSCRIPT_DEPTH) > ereport(ERROR, > (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), > errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", > - list_length(aref->refupperindexpr), MAXDIM))); > + list_length(sbsref->refupperindexpr), MAX_SUBSCRIPT_DEPTH))); > > - if (list_length(aref->reflowerindexpr) > MAXDIM) > + if (list_length(sbsref->reflowerindexpr) > MAX_SUBSCRIPT_DEPTH) > ereport(ERROR, > (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), > errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", > - list_length(aref->reflowerindexpr), MAXDIM))); > + list_length(sbsref->reflowerindexpr), MAX_SUBSCRIPT_DEPTH))); > > /* Evaluate upper subscripts */ > i = 0; > - foreach(lc, aref->refupperindexpr) > + foreach(lc, sbsref->refupperindexpr) > { > Expr *e = (Expr *) lfirst(lc); > > /* When slicing, individual subscript bounds can be omitted */ > if (!e) > { > - arefstate->upperprovided[i] = false; > + sbsrefstate->upperprovided[i] = false; > i++; > continue; > } > > - arefstate->upperprovided[i] = true; > + sbsrefstate->upperprovided[i] = true; > > /* Each subscript is evaluated into subscriptvalue/subscriptnull */ > ExecInitExprRec(e, state, > - &arefstate->subscriptvalue, &arefstate->subscriptnull); > - > - /* ... and then ARRAYREF_SUBSCRIPT saves it into step's workspace */ > - scratch->opcode = EEOP_ARRAYREF_SUBSCRIPT; > - scratch->d.arrayref_subscript.state = arefstate; > - scratch->d.arrayref_subscript.off = i; > - scratch->d.arrayref_subscript.isupper = true; > - scratch->d.arrayref_subscript.jumpdone = -1; /* adjust later */ > + &sbsrefstate->subscriptvalue, &sbsrefstate->subscriptnull); > + > + /* ... and then SBSREF_SUBSCRIPT saves it into step's workspace */ > + scratch->opcode = EEOP_SBSREF_SUBSCRIPT; > + scratch->d.sbsref_subscript.state = sbsrefstate; > + scratch->d.sbsref_subscript.off = i; > + scratch->d.sbsref_subscript.isupper = true; > + scratch->d.sbsref_subscript.jumpdone = -1; /* adjust later */ > ExprEvalPushStep(state, scratch); > + > adjust_jumps = lappend_int(adjust_jumps, > state->steps_len - 1); > i++; > } > - arefstate->numupper = i; > + sbsrefstate->numupper = i; > > /* Evaluate lower subscripts similarly */ > i = 0; > - foreach(lc, aref->reflowerindexpr) > + foreach(lc, sbsref->reflowerindexpr) > { > Expr *e = (Expr *) lfirst(lc); > > /* When slicing, individual subscript bounds can be omitted */ > if (!e) > { > - arefstate->lowerprovided[i] = false; > + sbsrefstate->lowerprovided[i] = false; > i++; > continue; > } > > - arefstate->lowerprovided[i] = true; > + sbsrefstate->lowerprovided[i] = true; > > /* Each subscript is evaluated into subscriptvalue/subscriptnull */ > ExecInitExprRec(e, state, > - &arefstate->subscriptvalue, &arefstate->subscriptnull); > - > - /* ... and then ARRAYREF_SUBSCRIPT saves it into step's workspace */ > - scratch->opcode = EEOP_ARRAYREF_SUBSCRIPT; > - scratch->d.arrayref_subscript.state = arefstate; > - scratch->d.arrayref_subscript.off = i; > - scratch->d.arrayref_subscript.isupper = false; > - scratch->d.arrayref_subscript.jumpdone = -1; /* adjust later */ > + &sbsrefstate->subscriptvalue, &sbsrefstate->subscriptnull); > + > + /* ... and then SBSREF_SUBSCRIPT saves it into step's workspace */ > + scratch->opcode = EEOP_SBSREF_SUBSCRIPT; > + scratch->d.sbsref_subscript.state = sbsrefstate; > + scratch->d.sbsref_subscript.off = i; > + scratch->d.sbsref_subscript.isupper = false; > + scratch->d.sbsref_subscript.jumpdone = -1; /* adjust later */ > ExprEvalPushStep(state, scratch); > + > adjust_jumps = lappend_int(adjust_jumps, > state->steps_len - 1); > i++; > } > - arefstate->numlower = i; > + sbsrefstate->numlower = i; > > /* Should be impossible if parser is sane, but check anyway: */ > - if (arefstate->numlower != 0 && > - arefstate->numupper != arefstate->numlower) > + if (sbsrefstate->numlower != 0 && > + sbsrefstate->numupper != sbsrefstate->numlower) > elog(ERROR, "upper and lower index lists are not same length"); > > if (isAssignment) > @@ -2535,7 +2545,7 @@ ExecInitArrayRef(ExprEvalStep *scratch, ArrayRef *aref, > > /* > * We might have a nested-assignment situation, in which the > - * refassgnexpr is itself a FieldStore or ArrayRef that needs to > + * refassgnexpr is itself a FieldStore or SubscriptingRef that needs to > * obtain and modify the previous value of the array element or slice > * being replaced. If so, we have to extract that value from the > * array and pass it down via the CaseTestExpr mechanism. It's safe > @@ -2547,37 +2557,45 @@ ExecInitArrayRef(ExprEvalStep *scratch, ArrayRef *aref, > * Since fetching the old element might be a nontrivial expense, do it > * only if the argument actually needs it. > */ > - if (isAssignmentIndirectionExpr(aref->refassgnexpr)) > + if (isAssignmentIndirectionExpr(sbsref->refassgnexpr)) > { > - scratch->opcode = EEOP_ARRAYREF_OLD; > - scratch->d.arrayref.state = arefstate; > + scratch->opcode = EEOP_SBSREF_OLD; > + scratch->d.sbsref.state = sbsrefstate; > + scratch->d.sbsref.eval_finfo = eval_finfo; > + scratch->d.sbsref.nested_finfo = nested_finfo; > ExprEvalPushStep(state, scratch); > } > > - /* ARRAYREF_OLD puts extracted value into prevvalue/prevnull */ > + /* SBSREF_OLD puts extracted value into prevvalue/prevnull */ > save_innermost_caseval = state->innermost_caseval; > save_innermost_casenull = state->innermost_casenull; > - state->innermost_caseval = &arefstate->prevvalue; > - state->innermost_casenull = &arefstate->prevnull; > + state->innermost_caseval = &sbsrefstate->prevvalue; > + state->innermost_casenull = &sbsrefstate->prevnull; > > /* evaluate replacement value into replacevalue/replacenull */ > - ExecInitExprRec(aref->refassgnexpr, state, > - &arefstate->replacevalue, &arefstate->replacenull); > + ExecInitExprRec(sbsref->refassgnexpr, state, > + &sbsrefstate->replacevalue, &sbsrefstate->replacenull); > > state->innermost_caseval = save_innermost_caseval; > state->innermost_casenull = save_innermost_casenull; > > /* and perform the assignment */ > - scratch->opcode = EEOP_ARRAYREF_ASSIGN; > - scratch->d.arrayref.state = arefstate; > + scratch->opcode = EEOP_SBSREF_ASSIGN; > + scratch->d.sbsref.state = sbsrefstate; > + scratch->d.sbsref.eval_finfo = eval_finfo; > + scratch->d.sbsref.nested_finfo = nested_finfo; > ExprEvalPushStep(state, scratch); > + > } > else > { > /* array fetch is much simpler */ > - scratch->opcode = EEOP_ARRAYREF_FETCH; > - scratch->d.arrayref.state = arefstate; > + scratch->opcode = EEOP_SBSREF_FETCH; > + scratch->d.sbsref.state = sbsrefstate; > + scratch->d.sbsref.eval_finfo = eval_finfo; > + scratch->d.sbsref.nested_finfo = nested_finfo; > ExprEvalPushStep(state, scratch); > + > } > > /* adjust jump targets */ > @@ -2585,10 +2603,10 @@ ExecInitArrayRef(ExprEvalStep *scratch, ArrayRef *aref, > { > ExprEvalStep *as = &state->steps[lfirst_int(lc)]; > > - if (as->opcode == EEOP_ARRAYREF_SUBSCRIPT) > + if (as->opcode == EEOP_SBSREF_SUBSCRIPT) > { > - Assert(as->d.arrayref_subscript.jumpdone == -1); > - as->d.arrayref_subscript.jumpdone = state->steps_len; > + Assert(as->d.sbsref_subscript.jumpdone == -1); > + as->d.sbsref_subscript.jumpdone = state->steps_len; > } > else > { > @@ -2600,8 +2618,8 @@ ExecInitArrayRef(ExprEvalStep *scratch, ArrayRef *aref, > } > > /* > - * Helper for preparing ArrayRef expressions for evaluation: is expr a nested > - * FieldStore or ArrayRef that needs the old element value passed down? > + * Helper for preparing SubscriptingRef expressions for evaluation: is expr a nested > + * FieldStore or SubscriptingRef that needs the old element value passed down? > * > * (We could use this in FieldStore too, but in that case passing the old > * value is so cheap there's no need.) > @@ -2624,11 +2642,11 @@ isAssignmentIndirectionExpr(Expr *expr) > if (fstore->arg && IsA(fstore->arg, CaseTestExpr)) > return true; > } > - else if (IsA(expr, ArrayRef)) > + else if (IsA(expr, SubscriptingRef)) > { > - ArrayRef *arrayRef = (ArrayRef *) expr; > + SubscriptingRef *sbsRef = (SubscriptingRef *) expr; > > - if (arrayRef->refexpr && IsA(arrayRef->refexpr, CaseTestExpr)) > + if (sbsRef->refexpr && IsA(sbsRef->refexpr, CaseTestExpr)) > return true; > } > return false; > diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c > index 2e88417265..0c72e80b58 100644 > --- a/src/backend/executor/execExprInterp.c > +++ b/src/backend/executor/execExprInterp.c > @@ -364,10 +364,10 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) > &&CASE_EEOP_FIELDSELECT, > &&CASE_EEOP_FIELDSTORE_DEFORM, > &&CASE_EEOP_FIELDSTORE_FORM, > - &&CASE_EEOP_ARRAYREF_SUBSCRIPT, > - &&CASE_EEOP_ARRAYREF_OLD, > - &&CASE_EEOP_ARRAYREF_ASSIGN, > - &&CASE_EEOP_ARRAYREF_FETCH, > + &&CASE_EEOP_SBSREF_SUBSCRIPT, > + &&CASE_EEOP_SBSREF_OLD, > + &&CASE_EEOP_SBSREF_ASSIGN, > + &&CASE_EEOP_SBSREF_FETCH, > &&CASE_EEOP_DOMAIN_TESTVAL, > &&CASE_EEOP_DOMAIN_NOTNULL, > &&CASE_EEOP_DOMAIN_CHECK, > @@ -1367,43 +1367,43 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) > EEO_NEXT(); > } > > - EEO_CASE(EEOP_ARRAYREF_SUBSCRIPT) > + EEO_CASE(EEOP_SBSREF_SUBSCRIPT) > { > /* Process an array subscript */ > > /* too complex for an inline implementation */ > - if (ExecEvalArrayRefSubscript(state, op)) > + if (ExecEvalSubscriptingRef(state, op)) > { > EEO_NEXT(); > } > else > { > - /* Subscript is null, short-circuit ArrayRef to NULL */ > - EEO_JUMP(op->d.arrayref_subscript.jumpdone); > + /* Subscript is null, short-circuit SubscriptingRef to NULL */ > + EEO_JUMP(op->d.sbsref_subscript.jumpdone); > } > } > > - EEO_CASE(EEOP_ARRAYREF_OLD) > + EEO_CASE(EEOP_SBSREF_OLD) > { > /* > - * Fetch the old value in an arrayref assignment, in case it's > + * Fetch the old value in an sbsref assignment, in case it's > * referenced (via a CaseTestExpr) inside the assignment > * expression. > */ > > /* too complex for an inline implementation */ > - ExecEvalArrayRefOld(state, op); > + ExecEvalSubscriptingRefOld(state, op); > > EEO_NEXT(); > } > > /* > - * Perform ArrayRef assignment > + * Perform SubscriptingRef assignment > */ > - EEO_CASE(EEOP_ARRAYREF_ASSIGN) > + EEO_CASE(EEOP_SBSREF_ASSIGN) > { > /* too complex for an inline implementation */ > - ExecEvalArrayRefAssign(state, op); > + ExecEvalSubscriptingRefAssign(state, op); > > EEO_NEXT(); > } > @@ -1411,10 +1411,10 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) > /* > * Fetch subset of an array. > */ > - EEO_CASE(EEOP_ARRAYREF_FETCH) > + EEO_CASE(EEOP_SBSREF_FETCH) > { > /* too complex for an inline implementation */ > - ExecEvalArrayRefFetch(state, op); > + ExecEvalSubscriptingRefFetch(state, op); > > EEO_NEXT(); > } > @@ -2702,197 +2702,115 @@ ExecEvalFieldStoreForm(ExprState *state, ExprEvalStep *op, ExprContext *econtext > } > > /* > - * Process a subscript in an ArrayRef expression. > + * Process a subscript in a SubscriptingRef expression. > * > * If subscript is NULL, throw error in assignment case, or in fetch case > * set result to NULL and return false (instructing caller to skip the rest > - * of the ArrayRef sequence). > + * of the SubscriptingRef sequence). > * > * Subscript expression result is in subscriptvalue/subscriptnull. > * On success, integer subscript value has been saved in upperindex[] or > * lowerindex[] for use later. > */ > bool > -ExecEvalArrayRefSubscript(ExprState *state, ExprEvalStep *op) > +ExecEvalSubscriptingRef(ExprState *state, ExprEvalStep *op) > { > - ArrayRefState *arefstate = op->d.arrayref_subscript.state; > - int *indexes; > - int off; > + SubscriptingRefState *sbsrefstate = op->d.sbsref_subscript.state; > + Datum *indexes; > + int off; > > /* If any index expr yields NULL, result is NULL or error */ > - if (arefstate->subscriptnull) > + if (sbsrefstate->subscriptnull) > { > - if (arefstate->isassignment) > + if (sbsrefstate->isassignment) > ereport(ERROR, > (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), > - errmsg("array subscript in assignment must not be null"))); > + errmsg("subscript in assignment must not be null"))); > *op->resnull = true; > return false; > } > > /* Convert datum to int, save in appropriate place */ > - if (op->d.arrayref_subscript.isupper) > - indexes = arefstate->upperindex; > + if (op->d.sbsref_subscript.isupper) > + indexes = sbsrefstate->upper; > else > - indexes = arefstate->lowerindex; > - off = op->d.arrayref_subscript.off; > + indexes = sbsrefstate->lower; > + off = op->d.sbsref_subscript.off; > > - indexes[off] = DatumGetInt32(arefstate->subscriptvalue); > + indexes[off] = sbsrefstate->subscriptvalue; > > return true; > } > > /* > - * Evaluate ArrayRef fetch. > + * Evaluate SubscriptingRef fetch. > * > - * Source array is in step's result variable. > + * Source container is in step's result variable. > */ > void > -ExecEvalArrayRefFetch(ExprState *state, ExprEvalStep *op) > +ExecEvalSubscriptingRefFetch(ExprState *state, ExprEvalStep *op) > { > - ArrayRefState *arefstate = op->d.arrayref.state; > - > - /* Should not get here if source array (or any subscript) is null */ > + /* Should not get here if source container (or any subscript) is null */ > Assert(!(*op->resnull)); > > - if (arefstate->numlower == 0) > - { > - /* Scalar case */ > - *op->resvalue = array_get_element(*op->resvalue, > - arefstate->numupper, > - arefstate->upperindex, > - arefstate->refattrlength, > - arefstate->refelemlength, > - arefstate->refelembyval, > - arefstate->refelemalign, > - op->resnull); > - } > - else > - { > - /* Slice case */ > - *op->resvalue = array_get_slice(*op->resvalue, > - arefstate->numupper, > - arefstate->upperindex, > - arefstate->lowerindex, > - arefstate->upperprovided, > - arefstate->lowerprovided, > - arefstate->refattrlength, > - arefstate->refelemlength, > - arefstate->refelembyval, > - arefstate->refelemalign); > - } > + *op->resvalue = FunctionCall2(op->d.sbsref.eval_finfo, > + PointerGetDatum(*op->resvalue), > + PointerGetDatum(op)); > } > > /* > - * Compute old array element/slice value for an ArrayRef assignment > - * expression. Will only be generated if the new-value subexpression > - * contains ArrayRef or FieldStore. The value is stored into the > - * ArrayRefState's prevvalue/prevnull fields. > + * Compute old container element/slice value for a SubscriptingRef assignment > + * expression. Will only be generated if the new-value subexpression > + * contains SubscriptingRef or FieldStore. The value is stored into the > + * SubscriptingRefState's prevvalue/prevnull fields. > */ > void > -ExecEvalArrayRefOld(ExprState *state, ExprEvalStep *op) > +ExecEvalSubscriptingRefOld(ExprState *state, ExprEvalStep *op) > { > - ArrayRefState *arefstate = op->d.arrayref.state; > + SubscriptingRefState *sbsrefstate = op->d.sbsref.state; > > if (*op->resnull) > { > - /* whole array is null, so any element or slice is too */ > - arefstate->prevvalue = (Datum) 0; > - arefstate->prevnull = true; > - } > - else if (arefstate->numlower == 0) > - { > - /* Scalar case */ > - arefstate->prevvalue = array_get_element(*op->resvalue, > - arefstate->numupper, > - arefstate->upperindex, > - arefstate->refattrlength, > - arefstate->refelemlength, > - arefstate->refelembyval, > - arefstate->refelemalign, > - &arefstate->prevnull); > + /* whole container is null, so any element or slice is too */ > + sbsrefstate->prevvalue = (Datum) 0; > + sbsrefstate->prevnull = true; > } > else > { > - /* Slice case */ > - /* this is currently unreachable */ > - arefstate->prevvalue = array_get_slice(*op->resvalue, > - arefstate->numupper, > - arefstate->upperindex, > - arefstate->lowerindex, > - arefstate->upperprovided, > - arefstate->lowerprovided, > - arefstate->refattrlength, > - arefstate->refelemlength, > - arefstate->refelembyval, > - arefstate->refelemalign); > - arefstate->prevnull = false; > + sbsrefstate->prevvalue = FunctionCall2(op->d.sbsref.nested_finfo, > + PointerGetDatum(*op->resvalue), > + PointerGetDatum(op)); > + > + if (sbsrefstate->numlower != 0) > + sbsrefstate->prevnull = false; > + > } > } > > /* > - * Evaluate ArrayRef assignment. > + * Evaluate SubscriptingRef assignment. > * > - * Input array (possibly null) is in result area, replacement value is in > - * ArrayRefState's replacevalue/replacenull. > + * Input container (possibly null) is in result area, replacement value is in > + * SubscriptingRefState's replacevalue/replacenull. > */ > void > -ExecEvalArrayRefAssign(ExprState *state, ExprEvalStep *op) > +ExecEvalSubscriptingRefAssign(ExprState *state, ExprEvalStep *op) > { > - ArrayRefState *arefstate = op->d.arrayref.state; > - > + SubscriptingRefState *sbsrefstate = op->d.sbsref.state; > /* > - * For an assignment to a fixed-length array type, both the original array > - * and the value to be assigned into it must be non-NULL, else we punt and > - * return the original array. > + * For an assignment to a fixed-length container type, both the original > + * container and the value to be assigned into it must be non-NULL, else we > + * punt and return the original container. > */ > - if (arefstate->refattrlength > 0) /* fixed-length array? */ > + if (sbsrefstate->refattrlength > 0) > { > - if (*op->resnull || arefstate->replacenull) > + if (*op->resnull || sbsrefstate->replacenull) > return; > } > > - /* > - * For assignment to varlena arrays, we handle a NULL original array by > - * substituting an empty (zero-dimensional) array; insertion of the new > - * element will result in a singleton array value. It does not matter > - * whether the new element is NULL. > - */ > - if (*op->resnull) > - { > - *op->resvalue = PointerGetDatum(construct_empty_array(arefstate->refelemtype)); > - *op->resnull = false; > - } > - > - if (arefstate->numlower == 0) > - { > - /* Scalar case */ > - *op->resvalue = array_set_element(*op->resvalue, > - arefstate->numupper, > - arefstate->upperindex, > - arefstate->replacevalue, > - arefstate->replacenull, > - arefstate->refattrlength, > - arefstate->refelemlength, > - arefstate->refelembyval, > - arefstate->refelemalign); > - } > - else > - { > - /* Slice case */ > - *op->resvalue = array_set_slice(*op->resvalue, > - arefstate->numupper, > - arefstate->upperindex, > - arefstate->lowerindex, > - arefstate->upperprovided, > - arefstate->lowerprovided, > - arefstate->replacevalue, > - arefstate->replacenull, > - arefstate->refattrlength, > - arefstate->refelemlength, > - arefstate->refelembyval, > - arefstate->refelemalign); > - } > + *op->resvalue = FunctionCall2(op->d.sbsref.eval_finfo, > + PointerGetDatum(*op->resvalue), > + PointerGetDatum(op)); > } You might not love me for this suggestion, but I'd like to see the renaming here split from the rest of the patch. There's a lot of diff that's just more or less automatic changes, making it hard to see the actual meaningful changes. - Andres
Andres Freund <andres@anarazel.de> writes: > You might not love me for this suggestion, but I'd like to see the > renaming here split from the rest of the patch. There's a lot of diff > that's just more or less automatic changes, making it hard to see the > actual meaningful changes. Yeah, I'm beginning to wonder if we should do the renaming at all. It's useful for being sure we've found everyplace that needs to change ... but if lots of those places don't actually need more than the name changes, maybe it's just make-work and code thrashing. There's a set of other issues that are starting to bother me. Perhaps it's not in this patch's charter to resolve them, but I think we need to figure out whether that's true. It's a bit hard to explain clearly, but let me see how well I can state these: * The complaint I had about the "container" terminology isn't just terminology. Rather, there is a bunch of knowledge in the system that some data types can be found embedded in other types; for one example, see find_composite_type_dependencies. In the case of standard arrays, it's clear that the array type does contain its element type in this sense, and we'd better be aware of that in situations such as applying DDL that changes the element type. It's much less clear what it means if you say that type X has a subscripting function that yields type Y. I think the issue can be ignored as long as Y is not a type mutable by any provided DDL commands, but is that OK as a permanent restriction? If not, do we need to do anything about it in the v1 patch? If we don't, do we need to enforce some restriction on what Y can be for types other than true arrays? * There are other identifiable array-specific behaviors that people might expect a generic subscripting feature to let them into. For example, if we allow JSONB to be subscripted to obtain TEXT, does that mean a polymorphic function f(x anyarray) should now match JSONB? It's even more fun if you decide that subscripting JSONB should yield JSONB, which is probably a more useful definition than TEXT really. Then ANYARRAY and ANYELEMENT would need to be the same type, which is likely to blow holes in the polymorphic type code, though I've not looked closely for problems. In the same vein, if JSONB is subscriptable, should "x = ANY(y)" work for y of type JSONB? I'm not actually sure that we'd want these sorts of things to happen, even as follow-on extensions. For instance, a polymorphic function f(x anyarray) would very likely think it can apply array_dims(x) or iterate from array_lower(x,1) to array_upper(x,1). Providing it a subscripting function won't get you far if the subscripts aren't consecutive integers. * There's an awful lot of places in the backend that call get_element_type or get_base_element_type or type_is_array or type_is_array_domain, and aren't touched by the patch as it stands. Some of them might represent dependencies that we need to worry about that don't fall under either of the above categories. So just touching the places that mess with ArrayRef isn't getting us real far in terms of being sure we've considered everything that needs considering. regards, tom lane
On Wed, Jan 10, 2018 at 7:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > * The complaint I had about the "container" terminology isn't just > terminology. Rather, there is a bunch of knowledge in the system that > some data types can be found embedded in other types; for one example, > see find_composite_type_dependencies. In the case of standard arrays, > it's clear that the array type does contain its element type in this > sense, and we'd better be aware of that in situations such as applying > DDL that changes the element type. It's much less clear what it means > if you say that type X has a subscripting function that yields type Y. > I think the issue can be ignored as long as Y is not a type mutable by > any provided DDL commands, but is that OK as a permanent restriction? > If not, do we need to do anything about it in the v1 patch? If we don't, > do we need to enforce some restriction on what Y can be for types > other than true arrays? Well, if I set things up so that subscripting foo by integer returns bar, that doesn't seem to be much different, from the point of view of type dependencies, from CREATE FUNCTION some_function_name(foo, integer) RETURNS bar. I suppose that if this gets stored in the pg_type entry for foo, that type just develops a dependencies on the subscripting functions which, I suppose, makes them indirectly dependent on the return types of those functions. I'm not sure why that wouldn't be sufficient. It's true that you might be able to alter the type in some way that would break it, but that's equally true of the CREATE FUNCTION case: rhaas=# create table foo (a int, b text); CREATE TABLE rhaas=# create or replace function fooify(foo) returns int as $$select $1.a$$ language sql; CREATE FUNCTION rhaas=# select fooify('(1,tgl)'::foo); fooify -------- 1 (1 row) rhaas=# alter table foo rename column a to c; ALTER TABLE rhaas=# select fooify('(1,tgl)'::foo); ERROR: column "a" not found in data type foo LINE 1: select $1.a ^ QUERY: select $1.a CONTEXT: SQL function "fooify" during inlining > * There are other identifiable array-specific behaviors that people > might expect a generic subscripting feature to let them into. > For example, if we allow JSONB to be subscripted to obtain TEXT, > does that mean a polymorphic function f(x anyarray) should now match > JSONB? It's even more fun if you decide that subscripting JSONB > should yield JSONB, which is probably a more useful definition than > TEXT really. Then ANYARRAY and ANYELEMENT would need to be the same > type, which is likely to blow holes in the polymorphic type code, > though I've not looked closely for problems. In the same vein, if > JSONB is subscriptable, should "x = ANY(y)" work for y of type JSONB? > I'm not actually sure that we'd want these sorts of things to happen, > even as follow-on extensions. For instance, a polymorphic function > f(x anyarray) would very likely think it can apply array_dims(x) or > iterate from array_lower(x,1) to array_upper(x,1). Providing it > a subscripting function won't get you far if the subscripts aren't > consecutive integers. Our SQL dialect is statically typed; trying to support duck-typing seems likely to create a lot of problems. True, we have a single datatype for one-dimensional and multi-dimensional arrays, but I think most people would view that as an anti-feature. We also have some provision for dynamic record types, but it feels pretty kludgy. > * There's an awful lot of places in the backend that call get_element_type > or get_base_element_type or type_is_array or type_is_array_domain, and > aren't touched by the patch as it stands. Some of them might represent > dependencies that we need to worry about that don't fall under either of > the above categories. So just touching the places that mess with ArrayRef > isn't getting us real far in terms of being sure we've considered > everything that needs considering. No argument there. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Jan 10, 2018 at 7:09 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> * The complaint I had about the "container" terminology isn't just >> terminology. Rather, there is a bunch of knowledge in the system that >> some data types can be found embedded in other types; for one example, >> see find_composite_type_dependencies. > Well, if I set things up so that subscripting foo by integer returns > bar, that doesn't seem to be much different, from the point of view of > type dependencies, from CREATE FUNCTION some_function_name(foo, > integer) RETURNS bar. I think you missed the point. The question is whether the existence of a subscripting function means that we need to treat the subscriptable type as physically containing the subscript result type. For example, if the subscript result type is composite, do we need to do something about a column of the subscriptable type when somebody does an ALTER TYPE ... ALTER ATTRIBUTE TYPE on the result type? The dependency mechanism doesn't have enough information to answer that. It's fairly easy to imagine cases where it wouldn't be true --- for instance, if you had a subscripting conversion from JSONB to my_composite_type, changing my_composite_type would likely change the set of JSONB values for which the subscripting function would succeed, but it wouldn't create a need to physically rewrite any JSONB columns. But perhaps somebody might try to build a subscriptable type for which they did need that. After further thought, I think I'm prepared to say (for the moment) that only true arrays need be deemed to be containers in this sense. If you make a subscripting function for anything else, we'll treat it as just a function that happens to yield the result type but doesn't imply that that is what is physically stored. Perhaps at some point that will need to change, but I'm failing to think of near-term use cases where it would be important to have such a property. This is, however, a good reason why I don't like the use of "container" terminology in the patch. I think we want to reserve "container" for types where physical containment is assumed. >> * There are other identifiable array-specific behaviors that people >> might expect a generic subscripting feature to let them into. > Our SQL dialect is statically typed; trying to support duck-typing > seems likely to create a lot of problems. Agreed; that's pretty much what my point was too. I'm just trying to be clear about how far we expect this capability to reach. regards, tom lane
On Thu, Jan 11, 2018 at 1:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think you missed the point. The question is whether the existence of a > subscripting function means that we need to treat the subscriptable type > as physically containing the subscript result type. For example, if the > subscript result type is composite, do we need to do something about a > column of the subscriptable type when somebody does an ALTER TYPE > ... ALTER ATTRIBUTE TYPE on the result type? The dependency mechanism > doesn't have enough information to answer that. It's fairly easy to > imagine cases where it wouldn't be true --- for instance, if you had > a subscripting conversion from JSONB to my_composite_type, changing > my_composite_type would likely change the set of JSONB values for which > the subscripting function would succeed, but it wouldn't create a need > to physically rewrite any JSONB columns. I don't think I missed the point at all -- this is the exact same set of issues that arise with respect to functions. Indeed, I gave an example of a function that needs to be updated if a column of the input type is altered. In the case of functions, we've decided that it's not our problem. If the user updates the composite type and fails to update the function definitions as needed, things might break, so they should do that. If they don't, it's not our bug. > After further thought, I think I'm prepared to say (for the moment) that > only true arrays need be deemed to be containers in this sense. If you > make a subscripting function for anything else, we'll treat it as just a > function that happens to yield the result type but doesn't imply that that > is what is physically stored. Perhaps at some point that will need to > change, but I'm failing to think of near-term use cases where it would be > important to have such a property. In other words, we're vigorously agreeing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jan 11, 2018 at 1:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think you missed the point. The question is whether the existence of a >> subscripting function means that we need to treat the subscriptable type >> as physically containing the subscript result type. > I don't think I missed the point at all -- this is the exact same set > of issues that arise with respect to functions. Indeed, I gave an > example of a function that needs to be updated if a column of the > input type is altered. In the case of functions, we've decided that > it's not our problem. Right, but in the case of stored arrays, we've decided that it *is* our problem (as indeed it must be, because the user has no tools with which they could fix a representation change for stored data). The question is to what extent that need would propagate to pseudo array types. > In other words, we're vigorously agreeing. I think we're agreed on what should be in the v1 version of the patch. I'm not 100% convinced that the problem won't come up eventually. regards, tom lane
On Thu, Jan 11, 2018 at 1:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I don't think I missed the point at all -- this is the exact same set >> of issues that arise with respect to functions. Indeed, I gave an >> example of a function that needs to be updated if a column of the >> input type is altered. In the case of functions, we've decided that >> it's not our problem. > > Right, but in the case of stored arrays, we've decided that it *is* > our problem (as indeed it must be, because the user has no tools with > which they could fix a representation change for stored data). The > question is to what extent that need would propagate to pseudo array > types. I think I view the rationale a bit differently. Let's say that a user defines a composite type as (a int, b text) and uses that composite type as a column type. Then, somebody tries to change column a to have type text, and suppose we don't throw an error but simply permit the operation. If the user now tries to select from the offending column, the server will very likely crash. In contrast, in the case where the user has defined an SQL function that selects $1.a and returns it as an int, they will get a runtime error when they try to use the function. In my mind, that is the critical difference. An operation that can cause the server to crash or emit internal errors must be prohibited, whereas an operation that might cause stuff to fail with suitable error messages in the future can be allowed. In other words, the problem isn't that the user has no tools to fix the problem; it's that, with certain exceptions like superusers indulging in random catalog-hackery, unprivileged users shouldn't be allowed to break the world. You might point out that the chances of break-the-world behavior for type subscripting are pretty high, since we're slinging around arguments of type internal. But C functions are always an exception to the notion that we'll trap and report errors. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jan 11, 2018 at 1:37 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Right, but in the case of stored arrays, we've decided that it *is* >> our problem (as indeed it must be, because the user has no tools with >> which they could fix a representation change for stored data). The >> question is to what extent that need would propagate to pseudo array >> types. > I think I view the rationale a bit differently. Let's say that a user > defines a composite type as (a int, b text) and uses that composite > type as a column type. Then, somebody tries to change column a to > have type text, and suppose we don't throw an error but simply permit > the operation. If the user now tries to select from the offending > column, the server will very likely crash. In contrast, in the case > where the user has defined an SQL function that selects $1.a and > returns it as an int, they will get a runtime error when they try to > use the function. In my mind, that is the critical difference. There are two critical differences --- that's one, and the other is that there are SQL-level ways to fix the problem, ie change the function text with CREATE OR REPLACE FUNCTION. We don't have a SQL command that says "now go update the representation of table T column C". But I think we've probably beaten this topic to death ... regards, tom lane
On Thu, Jan 11, 2018 at 2:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > But I think we've probably beaten this topic to death ... Yep. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Sorry for late reply. I've attached a new version of the patch with following changes: > On 7 January 2018 at 23:39, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Part of the reason why I'm insistent on that is that I think it will > expose that the division of labor between the core parser and the > datatype-specific parse function is still a mess. I moved all logic that busy with determining required input data type into the datatype specific code (except everything domain related). Unfortunately, we need some of those types before the custom code can perform validation - to address this `parse` function now splitted into two phases, initial preparation (when the custom code can provide all required information), and actual validation (when everything, e.g. rhs is processed). In general I hope this approach makes separation of concerns more clear. > On the executor side of things, I suspect Andres will be unhappy that > you are making ExprEvalStep part of the API for datatypes --- he > objected to my exposing it to plpgsql param eval in > https://postgr.es/m/20171220174243.n4y3hgzf7xd3mm5e@alap3.anarazel.de > and there was a lot more reason to do so there than there is here, IMO. I've changes fetch/assign functions so that ExprEvalStep isn't exposed anymore. > Now on the other hand, maybe the right way to go is to embrace a similar > approach to what I did for plpgsql param eval, and let the datatype > control what gets generated as the expression execution step. The main > point here would be to let the datatype provide the address of a callback > function that gets executed for a subscripting step, rather than having it > specify the OID of a pg_proc entry to call. There would be two big wins > from that: > > * The callback function would have a plain C call signature, so we would > not have to go through FunctionCallN, saving a few cycles. This is > attractive because it would pretty much eliminate any concern about this > patch making array access slower at execution time. > > * There would no longer be a wired-in restriction that there be two and > only two subscripting execution functions per datatype, since there would > not be any need for those functions to be identified in pg_type. > > The two disadvantages I can see of approaching things this way are: > > * There'd be at least some connection of subscriptable types to > expression compilation, which is what Andres was objecting to in the > message I cited above. Possibly we could alleviate that a bit by > providing helper functions that mask exactly what goes into the > expression step structs, but I'm not sure that that gets us far. > > * We'd not have OIDs of execution functions in the parse trees for > subscripting operations, which would mean that we would not have > a convenient way to identify subscripting operations that are > mutable, parallel-unsafe, or leaky. Probably it'd be fine to assume > that subscripting is always immutable and parallel-safe, although > I'm slightly more concerned about whether people would want the > option to label it leaky vs leakproof. As against that, the approach > that's there right now adds planning overhead that wasn't there before > for exactly those function property lookups, and again I'm a bit worried > about the performance impact. (I did some crude performance tests > today that indicated that the existing patch has small but measurable > penalties, maybe on the order of 10 percent; and it'd be more by the > time we're done because I'm pretty sure you've missed some places that > ought to check these function properties if we're going to have them. > So I'm afraid that we'll get pushback from people who don't care about > extensible subscripts and do care about array performance.) I tired to apply this approach with callback functions for fetch/assign logic (these functions I mentioned above for prepare/validate are also implemented like that). This part is actually can be done more or less independently from changes above. Is it close to what you suggesting? As a side note, I'm not sure why this main function should have a signature with opcode: subscript_support(int opcode, internal other_info) returns internal since instead of calling it with different opcode we can just return a set of callback and execute what's necessary at this particular point. I haven't evaluated performance of this implementation yet, will do that soon. But in the meantime I want to align on what can be accepted as the best solution here. > Yeah, I'm beginning to wonder if we should do the renaming at all. > It's useful for being sure we've found everyplace that needs to change > ... but if lots of those places don't actually need more than the > name changes, maybe it's just make-work and code thrashing. I'm strongly in favor of renaming, just because I don't feel comfortable when a name of a concept is not exactly deliver the meaning. In the current version I kept the name "container" so far due lack of feasible alternatives. > While I'm on the topic, I am not really happy with s/array/container/ > as you've done in some of this code. To my mind, "container type" > includes composite types. Particularly in the parse_target code, where > we're currently dealing with either composites or arrays, making it say > that we're dealing with either composites or containers is just a recipe > for confusion. Unfortunately I can't think of a better word offhand, > but some attention to this is needed. As far as the comments go, > we might be able to use the term "subscriptable type", but not sure if > that will work for function/variable names. > > After further thought, I think I'm prepared to say (for the moment) that > only true arrays need be deemed to be containers in this sense. If you > make a subscripting function for anything else, we'll treat it as just a > function that happens to yield the result type but doesn't imply that that > is what is physically stored. Perhaps at some point that will need to > change, but I'm failing to think of near-term use cases where it would be > important to have such a property. > > This is, however, a good reason why I don't like the use of "container" > terminology in the patch. I think we want to reserve "container" for > types where physical containment is assumed. I see the point. But to my understanding all other datatypes are "containers" too, since a subscripting function most likely will return some data from it, maybe in some transformed form. Arrays are just more strict containers, so maybe reserve "typed/strict_container" for them? One more note. In the current version of the patch I haven't updated a tutorial part, since as I said I want to discuss and have an agreement on details stated above. Also, this idea about separating renaming stuff from everything else is actually paid off - I found out few more places where I forgot to revert some remnants of the implementation with two separated node types. I'm going to fix them within few days in the next version.
Attachment
> On 22 January 2018 at 23:38, Dmitry Dolgov <9erthalion6@gmail.com> wrote: > Sorry for late reply. I've attached a new version of the patch with following > changes: > > I found out few more places where I forgot to revert some remnants of the > implementation with two separated node types. I'm going to fix them within > few days in the next version. Here is a new version of the patch: * rebased to the latest master * fixed issues I mentioned above * updated an example from the tutorial part
Attachment
Hi, On Sun, Jan 28, 2018 at 06:26:56PM +0100, Dmitry Dolgov wrote: > > Here is a new version of the patch: > > * rebased to the latest master > > * fixed issues I mentioned above > > * updated an example from the tutorial part I have a few comments. 0002-Base-implementation-of-subscripting-mechanism-v6.patch: > - if (op->d.arrayref_subscript.isupper) > - indexes = arefstate->upperindex; > + if (op->d.sbsref_subscript.isupper) > + indexes = sbsrefstate->upper; I think upperindex is better here. There was no need to rename it. Same for lowerindex/lower. There are a couple changes which unrelated to the patch. For example: > - * subscripting. Adjacent A_Indices nodes have to be treated as a single > + * subscripting. Adjacent A_Indices nodes have to be treated as a single It is better to avoid it for the sake of decrease size of the patch. > - * typmod to be applied to the base type. Subscripting a domain is an > + * typmod to be applied to the base type. Subscripting a domain is an Same here. > +/* Non-inline data for container operations */ > +typedef struct SubscriptingRefState > +{ > + bool isassignment; /* is it assignment, or just fetch? */ > ... > +} SubscriptingRefState; It is not good to move up SubscriptingRefState, because it is hard to see changes between SubscriptingRefState and ArrayRefState. > + FmgrInfo *eval_finfo; /* function to evaluate subscript */ > + FmgrInfo *nested_finfo; /* function to handle nested assignment */ I think eval_finfo and nested_finfo are not needed anymore. > +typedef Datum (*SubscriptingFetch) (Datum source, struct SubscriptingRefState *sbsefstate); > + > +typedef Datum (*SubscriptingAssign) (Datum source, struct SubscriptingRefState *sbsefstate); Typo here? Did you mean sbsrefstate in the second argument? > +typedef struct SbsRoutines > +{ > + SubscriptingPrepare prepare; > + SubscriptingValidate validate; > + SubscriptingFetch fetch; > + SubscriptingAssign assign; > + > +} SbsRoutines; SbsRoutines is not good name for me. SubscriptRoutines or SubscriptingRoutines sound better and it is consistent with other structures. 0005-Subscripting-documentation-v6.patch: > + <replaceable class="parameter">type_modifier_output_function</replaceable>, > + <replaceable class="parameter">analyze_function</replaceable>, > + <replaceable class="parameter">subscripting_handler_function</replaceable>, > are optional. Generally these functions have to be coded in C Extra comma here. -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
> On 29 January 2018 at 14:41, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote: > > I have a few comments. Thanks for suggestions, I've incorporated all of them in a new version of the patch. > SbsRoutines is not good name for me. SubscriptRoutines or > SubscriptingRoutines sound better and it is consistent with other > structures. In general I agree, and I thought about that while implementing this structure. But my concern is that some parts of subscripting infrastructure are annoyingly verbose, maybe it makes sense to come up with a short abbreviation and use it everywhere across this code.
Attachment
> On 30 January 2018 at 16:47, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > On 29 January 2018 at 14:41, Arthur Zakirov <a.zakirov@postgrespro.ru> wrote:
> >
> > I have a few comments.
>
> Thanks for suggestions, I've incorporated all of them in a new version of the
> patch.
Few more updates. I've attached a new version with some minor changes, mostly
about moving a subscripting depth check to type related logic. Also I've made
some performance tests for arrays using pgbench:
pgbench -c 10 -j 2 -T 600 -f test.sql -r
with queries like:
select (ARRAY[1, 2, 3])[0];
select (ARRAY[1, 2, 3, ..., 98, 99, 100])[0];
select (ARRAY[[[[[[1]]]]]])[1][1][1][1][1][1];
select (ARRAY[[[[[[1, 2, 3]]]]]])[1][1][1][1][1:2];
and the difference in average latency was about 2%:
* with the patch
number of transactions actually processed: 349211
latency average = 1.718 ms
tps = 5820.048783 (including connections establishing)
tps = 5820.264433 (excluding connections establishing)
* without the patch
number of transactions actually processed: 356024
latency average = 1.685 ms
tps = 5933.538195 (including connections establishing)
tps = 5934.124599 (excluding connections establishing)
Attachment
> On 22 February 2018 at 18:30, Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > Few more updates. I've attached a new version with some minor changes, > mostly > about moving a subscripting depth check to type related logic. Also I've > made > some performance tests for arrays using pgbench: > > pgbench -c 10 -j 2 -T 600 -f test.sql -r > > with queries like: > > select (ARRAY[1, 2, 3])[0]; > select (ARRAY[1, 2, 3, ..., 98, 99, 100])[0]; > select (ARRAY[[[[[[1]]]]]])[1][1][1][1][1][1]; > select (ARRAY[[[[[[1, 2, 3]]]]]])[1][1][1][1][1:2]; > > and the difference in average latency was about 2%: > > * with the patch > > number of transactions actually processed: 349211 > latency average = 1.718 ms > tps = 5820.048783 (including connections establishing) > tps = 5820.264433 (excluding connections establishing) > > * without the patch > > number of transactions actually processed: 356024 > latency average = 1.685 ms > tps = 5933.538195 (including connections establishing) > tps = 5934.124599 (excluding connections establishing) > One more small update after fd1a421fe6 in attachments.
Attachment
On Tue, Mar 6, 2018 at 6:21 PM, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
One more small update after fd1a421fe6 in attachments.
Before looking at the code I have a few comments about documentation:
in json.sgml:
+-- Extract value by key
+SELECT ('{"a": 1}'::jsonb)['a'];
What is the result of running this query? What is the resulting data type?
+-- Extract nested value by key path
+SELECT ('{"a": {"b": {"c": 1}}}'::jsonb)['a']['b']['c'];
+
+-- Extract element by index
+SELECT ('[1, "2", null]'::jsonb)['1'];
What is the result here? Why subscript is a string and not a number? Are subscription indexes 0- or 1-based?
+-- Update value by key
+UPDATE table_name set jsonb_field['key'] = 1;
+
+-- Select records using where clause with subscripting
+SELECT * from table_name where jsonb_field['key'] = '"value"';
Please capitalize: SET, FROM, WHERE.
Use of double quotes around "value" requires some explanation, I think.
Should the user expect that a suitable index is used by the query planner for this query?
In other words, I would like to see this part of documentation to be extended beyond just showcasing the syntax.
Regards,
-- Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176 127-59-707
> On 20 March 2018 at 11:09, Oleksandr Shulgin <oleksandr.shulgin@zalando.de> wrote: >> On Tue, Mar 6, 2018 at 6:21 PM, Dmitry Dolgov <9erthalion6@gmail.com> wrote: >> >> >> One more small update after fd1a421fe6 in attachments. > > > Before looking at the code I have a few comments about documentation: > > ... > > In other words, I would like to see this part of documentation to be > extended beyond just showcasing the syntax. Good point, thanks for noticing. The thing is that the implementation of subscripting for jsonb data type in this patch relies on the `setPath` function and follows the same rules as e.g. `jsonb_set`, but I need to mention this explicitly in the documentation. Speaking about your questions: > +-- Extract value by key > +SELECT ('{"a": 1}'::jsonb)['a']; > > What is the result of running this query? What is the resulting data type? > Jsonb subscripting expression always returns another jsonb > +-- Extract element by index > +SELECT ('[1, "2", null]'::jsonb)['1']; > > What is the result here? Why subscript is a string and not a number? Are > subscription indexes 0- or 1-based? > For jsonb arrays an index is 0 based. It's also not necessary to have an index as a string in this situation (so `data['1']` and `data[1]` are actually equal) > +-- Select records using where clause with subscripting > +SELECT * from table_name where jsonb_field['key'] = '"value"'; > > Use of double quotes around "value" requires some explanation, I think. In case of comparison, since a subscripting expression returns something of jsonb data type, we're going to compare two objects of type jsonb. Which means we need to convert 'value' to a jsonb scalar, and for that purpose it should be in double quotes. > Should the user expect that a suitable index is used by the query planner > for this query? There is no specific indexing support for subscripting expressions, so if you need you can create a functional index using it. Here is the updated version of patch, rebased after recent conflicts and with suggested documentation improvements.
Attachment
> On 22 March 2018 at 23:25, Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > Here is the updated version of patch, rebased after recent conflicts and with > suggested documentation improvements. Another rebased version of the patch.
Attachment
> On Thu, 26 Apr 2018 at 16:44, Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > On 22 March 2018 at 23:25, Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > > Here is the updated version of patch, rebased after recent conflicts and with > > suggested documentation improvements. > > Another rebased version of the patch. I've noticed, that I never updated llvmjit code for the arrayref expressions, and it's important to do so, since the patch introduces another layer of flexibility. Hence here is the new version.
Attachment
> On Fri, 20 Jul 2018 at 23:32, Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > On Thu, 26 Apr 2018 at 16:44, Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > > > On 22 March 2018 at 23:25, Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > > > > Here is the updated version of patch, rebased after recent conflicts and with > > > suggested documentation improvements. > > > > Another rebased version of the patch. > > I've noticed, that I never updated llvmjit code for the arrayref expressions, > and it's important to do so, since the patch introduces another layer of > flexibility. Hence here is the new version. Here is another rebased version, and a bit of history: the first prototypes of this patch were sent more than 3 years ago. Of course the patch evolved significantly over this period, and I take it as a good sign that it wasn't rejected and keeps moving through the commitfests. At the same time the lack of attention makes things a bit frustrating. I have an impression that it's sort of regular situation and wonder if there are any ideas (besides the well known advice of putting some efforts into review patches from other people, since I'm already doing my best and enjoying this) how to make progress in such cases?
Attachment
ne 30. 9. 2018 v 0:21 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Fri, 20 Jul 2018 at 23:32, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
> > On Thu, 26 Apr 2018 at 16:44, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> >
> > > On 22 March 2018 at 23:25, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > >
> > > Here is the updated version of patch, rebased after recent conflicts and with
> > > suggested documentation improvements.
> >
> > Another rebased version of the patch.
>
> I've noticed, that I never updated llvmjit code for the arrayref expressions,
> and it's important to do so, since the patch introduces another layer of
> flexibility. Hence here is the new version.
Here is another rebased version, and a bit of history: the first prototypes of
this patch were sent more than 3 years ago. Of course the patch evolved
significantly over this period, and I take it as a good sign that it wasn't
rejected and keeps moving through the commitfests. At the same time the lack of
attention makes things a bit frustrating. I have an impression that it's sort
of regular situation and wonder if there are any ideas (besides the well known
advice of putting some efforts into review patches from other people, since I'm
already doing my best and enjoying this) how to make progress in such cases?
This feature looks nice, and it can be great when some values of some not atomic type should be updated.
Regards
Pavel
Hi
ne 30. 9. 2018 v 8:23 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
ne 30. 9. 2018 v 0:21 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:> On Fri, 20 Jul 2018 at 23:32, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
> > On Thu, 26 Apr 2018 at 16:44, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> >
> > > On 22 March 2018 at 23:25, Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > >
> > > Here is the updated version of patch, rebased after recent conflicts and with
> > > suggested documentation improvements.
> >
> > Another rebased version of the patch.
>
> I've noticed, that I never updated llvmjit code for the arrayref expressions,
> and it's important to do so, since the patch introduces another layer of
> flexibility. Hence here is the new version.
Here is another rebased version, and a bit of history: the first prototypes of
this patch were sent more than 3 years ago. Of course the patch evolved
significantly over this period, and I take it as a good sign that it wasn't
rejected and keeps moving through the commitfests. At the same time the lack of
attention makes things a bit frustrating. I have an impression that it's sort
of regular situation and wonder if there are any ideas (besides the well known
advice of putting some efforts into review patches from other people, since I'm
already doing my best and enjoying this) how to make progress in such cases?This feature looks nice, and it can be great when some values of some not atomic type should be updated.
I am playing with this feature little bit
I have one idea - can be possible to use integer subscript for record fields? It can helps with iteration over record.
example:
select ('{"a":{"a":[10,20]}}'::jsonb)[0];--> NULL, but can be more practical if it returns same like select ('{"a":{"a":[10,"20"]}}'::jsonb)['a'];
I don't like quite ignoring bad subsript in update
postgres=# insert into test(v) values( '[]');
INSERT 0 1
postgres=# update test set v[1000] = 'a';
UPDATE 1
postgres=# update test set v[1000] = 'a';
UPDATE 1
postgres=# update test set v[1000] = 'a';
UPDATE 1
postgres=# select * from test;
┌────┬─────────────────┐
│ id │ v │
╞════╪═════════════════╡
│ │ ["a", "a", "a"] │
└────┴─────────────────┘
(1 row)
INSERT 0 1
postgres=# update test set v[1000] = 'a';
UPDATE 1
postgres=# update test set v[1000] = 'a';
UPDATE 1
postgres=# update test set v[1000] = 'a';
UPDATE 1
postgres=# select * from test;
┌────┬─────────────────┐
│ id │ v │
╞════╪═════════════════╡
│ │ ["a", "a", "a"] │
└────┴─────────────────┘
(1 row)
It should to raise exception in this case. Current behave allows append simply, but can be source of errors. For this case we can introduce some special symbol - some like -0 :)
It is maybe strange, but I prefer less magic syntax like
update test set v['a']['a'] = v['a']['a'] || '1000';
more readable than
update test set v['a']['a'][1000000] = 1000;
My first impression is very good - update jsonb, xml documents can be very friendly.
Regards
Pavel
RegardsPavel
> On Wed, 10 Oct 2018 at 14:26, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > I am playing with this feature little bit Thanks a lot! > I have one idea - can be possible to use integer subscript for record fields? It can helps with iteration over record. > > example: > > select ('{"a":{"a":[10,20]}}'::jsonb)[0];--> NULL, but can be more practical if it returns same like select ('{"a":{"a":[10,"20"]}}'::jsonb)['a']; Sounds interesting, but I'm not sure how consistent it would be with the rest of jsonb functionality, and someone may want to get an error in this case. At the same time I believe that this can be achieved quite nicely with json_query or json_table from SQL/JSON patch (see examples here [1]). What do you think about this approach? > I don't like quite ignoring bad subsript in update Can you show an example of such ignoring of a bad subsript in an update? > postgres=# insert into test(v) values( '[]'); > INSERT 0 1 > postgres=# update test set v[1000] = 'a'; > UPDATE 1 > postgres=# update test set v[1000] = 'a'; > UPDATE 1 > postgres=# update test set v[1000] = 'a'; > UPDATE 1 > postgres=# select * from test; > ┌────┬─────────────────┐ > │ id │ v │ > ╞════╪═════════════════╡ > │ │ ["a", "a", "a"] │ > └────┴─────────────────┘ > (1 row) > > It should to raise exception in this case. Current behave allows append simply, but can be source of errors. For this casewe can introduce some special symbol - some like -0 :) Yeah, it may look strange, but there is a reason behind it. I tried to keep the behaviour of this feature consistent with jsonb_set function (and in fact they're sharing the same functionality). And for jsonb_set it's documented: If the item (of a path, in our case an index) is out of the range -array_length .. array_length -1, and create_missing is true, the new value is added at the beginning of the array if the item is negative, and at the end of the array if it is positive. So, the index 1000 is way above the end of the array v, and every new item has being appended at the end. Of course no one said that they should behave similarly, but I believe it's quite nice to have consistency here. Any other opinions? > It is maybe strange, but I prefer less magic syntax like > > update test set v['a']['a'] = v['a']['a'] || '1000'; > > more readable than > > update test set v['a']['a'][1000000] = 1000; Yep, with this patch it's possible to use both ways: =# table test; v ------------------------- {"a": {"a": [1, 2, 3]}} (1 row) =# update test set v['a']['a'] = v['a']['a'] || '1000'; UPDATE 1 =# table test; v ------------------------------- {"a": {"a": [1, 2, 3, 1000]}} (1 row) > My first impression is very good - update jsonb, xml documents can be very friendly. Thanks! 1: https://www.postgresql.org/message-id/flat/732208d3-56c3-25a4-8f08-3be1d54ad51b@postgrespro.ru
čt 11. 10. 2018 v 22:48 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Wed, 10 Oct 2018 at 14:26, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> I am playing with this feature little bit
Thanks a lot!
> I have one idea - can be possible to use integer subscript for record fields? It can helps with iteration over record.
>
> example:
>
> select ('{"a":{"a":[10,20]}}'::jsonb)[0];--> NULL, but can be more practical if it returns same like select ('{"a":{"a":[10,"20"]}}'::jsonb)['a'];
Sounds interesting, but I'm not sure how consistent it would be with the rest
of jsonb functionality, and someone may want to get an error in this case. At
the same time I believe that this can be achieved quite nicely with json_query
or json_table from SQL/JSON patch (see examples here [1]). What do you think
about this approach?
In this case, I don't see any problem - the array or multidimensional array can be indexed by numbers or by special keys. But numbers are natural every time.
For me, SQL/JSON, JSONPath support is different topic. More - the generic support can be used for other types than Jsonb. I can imagine integrated dictionary type - and the SQL/JSON support doesn't help here.
This is not too strong theme for me - just I don't see a reason for strong restrictivity there.
> I don't like quite ignoring bad subsript in update
Can you show an example of such ignoring of a bad subsript in an update?
> postgres=# insert into test(v) values( '[]');
> INSERT 0 1
> postgres=# update test set v[1000] = 'a';
> UPDATE 1
> postgres=# update test set v[1000] = 'a';
> UPDATE 1
> postgres=# update test set v[1000] = 'a';
> UPDATE 1
> postgres=# select * from test;
> ┌────┬─────────────────┐
> │ id │ v │
> ╞════╪═════════════════╡
> │ │ ["a", "a", "a"] │
> └────┴─────────────────┘
> (1 row)
>
> It should to raise exception in this case. Current behave allows append simply, but can be source of errors. For this case we can introduce some special symbol - some like -0 :)
Yeah, it may look strange, but there is a reason behind it. I tried to keep the
behaviour of this feature consistent with jsonb_set function (and in fact
they're sharing the same functionality). And for jsonb_set it's documented:
If the item (of a path, in our case an index) is out of the range
-array_length .. array_length -1, and create_missing is true, the new value
is added at the beginning of the array if the item is negative, and at the
end of the array if it is positive.
So, the index 1000 is way above the end of the array v, and every new item has
being appended at the end.
Of course no one said that they should behave similarly, but I believe it's
quite nice to have consistency here. Any other opinions?
Aha - although I understand to your motivation, I am think so it is bad design - and jsonb_set behave is not happy.
I am think so it is wrong idea, because you lost some information - field position - I push value on index 10, but it will be stored on second position.
Regards
Pavel
> It is maybe strange, but I prefer less magic syntax like
>
> update test set v['a']['a'] = v['a']['a'] || '1000';
>
> more readable than
>
> update test set v['a']['a'][1000000] = 1000;
Yep, with this patch it's possible to use both ways:
=# table test;
v
-------------------------
{"a": {"a": [1, 2, 3]}}
(1 row)
=# update test set v['a']['a'] = v['a']['a'] || '1000';
UPDATE 1
=# table test;
v
-------------------------------
{"a": {"a": [1, 2, 3, 1000]}}
(1 row)
> My first impression is very good - update jsonb, xml documents can be very friendly.
Thanks!
1: https://www.postgresql.org/message-id/flat/732208d3-56c3-25a4-8f08-3be1d54ad51b@postgrespro.ru
> On Fri, 12 Oct 2018 at 07:52, Pavel Stehule <pavel.stehule@gmail.com> wrote: > >> > postgres=# insert into test(v) values( '[]'); >> > INSERT 0 1 >> > postgres=# update test set v[1000] = 'a'; >> > UPDATE 1 >> > postgres=# update test set v[1000] = 'a'; >> > UPDATE 1 >> > postgres=# update test set v[1000] = 'a'; >> > UPDATE 1 >> > postgres=# select * from test; >> > ┌────┬─────────────────┐ >> > │ id │ v │ >> > ╞════╪═════════════════╡ >> > │ │ ["a", "a", "a"] │ >> > └────┴─────────────────┘ >> > (1 row) >> > >> > It should to raise exception in this case. Current behave allows append simply, but can be source of errors. For thiscase we can introduce some special symbol - some like -0 :) >> >> Yeah, it may look strange, but there is a reason behind it. I tried to keep the >> behaviour of this feature consistent with jsonb_set function (and in fact >> they're sharing the same functionality). And for jsonb_set it's documented: >> >> If the item (of a path, in our case an index) is out of the range >> -array_length .. array_length -1, and create_missing is true, the new value >> is added at the beginning of the array if the item is negative, and at the >> end of the array if it is positive. >> >> So, the index 1000 is way above the end of the array v, and every new item has >> being appended at the end. >> >> Of course no one said that they should behave similarly, but I believe it's >> quite nice to have consistency here. Any other opinions? > > > Aha - although I understand to your motivation, I am think so it is bad design - and jsonb_set behave is not happy. > > I am think so it is wrong idea, because you lost some information - field position - I push value on index 10, but it willbe stored on second position. The thing is that we don't store the field position in this sense anyway in jsonb. For arrays there are dimentions, boundaries and null bitmaps stored, but for jsonb it's just an array of elements. If we want to store this data, we either have to change the format, or fill in a jsonb with null values up to the required position (the first option is out of the scope of this patch, the second doesn't sound that good).
st 7. 11. 2018 v 16:25 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Fri, 12 Oct 2018 at 07:52, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
>> > postgres=# insert into test(v) values( '[]');
>> > INSERT 0 1
>> > postgres=# update test set v[1000] = 'a';
>> > UPDATE 1
>> > postgres=# update test set v[1000] = 'a';
>> > UPDATE 1
>> > postgres=# update test set v[1000] = 'a';
>> > UPDATE 1
>> > postgres=# select * from test;
>> > ┌────┬─────────────────┐
>> > │ id │ v │
>> > ╞════╪═════════════════╡
>> > │ │ ["a", "a", "a"] │
>> > └────┴─────────────────┘
>> > (1 row)
>> >
>> > It should to raise exception in this case. Current behave allows append simply, but can be source of errors. For this case we can introduce some special symbol - some like -0 :)
>>
>> Yeah, it may look strange, but there is a reason behind it. I tried to keep the
>> behaviour of this feature consistent with jsonb_set function (and in fact
>> they're sharing the same functionality). And for jsonb_set it's documented:
>>
>> If the item (of a path, in our case an index) is out of the range
>> -array_length .. array_length -1, and create_missing is true, the new value
>> is added at the beginning of the array if the item is negative, and at the
>> end of the array if it is positive.
>>
>> So, the index 1000 is way above the end of the array v, and every new item has
>> being appended at the end.
>>
>> Of course no one said that they should behave similarly, but I believe it's
>> quite nice to have consistency here. Any other opinions?
>
>
> Aha - although I understand to your motivation, I am think so it is bad design - and jsonb_set behave is not happy.
>
> I am think so it is wrong idea, because you lost some information - field position - I push value on index 10, but it will be stored on second position.
The thing is that we don't store the field position in this sense anyway in
jsonb. For arrays there are dimentions, boundaries and null bitmaps stored, but
for jsonb it's just an array of elements. If we want to store this data, we
either have to change the format, or fill in a jsonb with null values up to the
required position (the first option is out of the scope of this patch, the
second doesn't sound that good).
I don't agree. If we use a same syntax for some objects types, we should to enforce some consistency.
I don't think so you should to introduce nulls for JSONs. In this case, the most correct solution is raising a exception.
Regards
Pavel
> On Wed, 7 Nov 2018 at 17:09, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > I don't agree. If we use a same syntax for some objects types, we should to enforce some consistency. Just to make it clear, consistency between what? > I don't think so you should to introduce nulls for JSONs. In this case, the most correct solution is raising a exception. Now it's my turn to disagree. As an argument I have this thread [1], where similar discussion happened about flexibility of jsonb and throwing an errors (in this particular case whether or not to throw an error when a non existing path was given to jsonb_set). I can imagine significant number of use cases when adding a value to jsonb like that is desirable outcome, and I'm not sure if I can come up with an example when strictness is the best result. Maybe if you have something in mind, you can describe what would be the case for that? Also as I've mentioned before, consistency between jsonb_set and jsonb subscripting operator will help us to avoid tons of question about why I can do this and this using one option, but not another. [1]: https://www.postgresql.org/message-id/CAM3SWZT3uZ7aFktx-nNEWGbapN1oy2t2gt10pnOzygZys_Ak1Q%40mail.gmail.com
st 7. 11. 2018 v 19:35 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Wed, 7 Nov 2018 at 17:09, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> I don't agree. If we use a same syntax for some objects types, we should to enforce some consistency.
Just to make it clear, consistency between what?
> I don't think so you should to introduce nulls for JSONs. In this case, the most correct solution is raising a exception.
Now it's my turn to disagree. As an argument I have this thread [1], where
similar discussion happened about flexibility of jsonb and throwing an errors
(in this particular case whether or not to throw an error when a non existing
path was given to jsonb_set).
It doesn't mean so it is designed well.
I can imagine significant number of use cases when adding a value to jsonb like
that is desirable outcome, and I'm not sure if I can come up with an example
when strictness is the best result. Maybe if you have something in mind, you
can describe what would be the case for that? Also as I've mentioned before,
consistency between jsonb_set and jsonb subscripting operator will help us to
avoid tons of question about why I can do this and this using one option, but
not another.
I have only one argument - with this behave nobody knows if value was appended or updated.
[1]: https://www.postgresql.org/message-id/CAM3SWZT3uZ7aFktx-nNEWGbapN1oy2t2gt10pnOzygZys_Ak1Q%40mail.gmail.com
> On Thu, 8 Nov 2018 at 06:14, Pavel Stehule <pavel.stehule@gmail.com> wrote: > >> Now it's my turn to disagree. As an argument I have this thread [1], where >> similar discussion happened about flexibility of jsonb and throwing an errors >> (in this particular case whether or not to throw an error when a non existing >> path was given to jsonb_set). > > It doesn't mean so it is designed well. >> >> I can imagine significant number of use cases when adding a value to jsonb like >> that is desirable outcome, and I'm not sure if I can come up with an example >> when strictness is the best result. Maybe if you have something in mind, you >> can describe what would be the case for that? Also as I've mentioned before, >> consistency between jsonb_set and jsonb subscripting operator will help us to >> avoid tons of question about why I can do this and this using one option, but >> not another. > > I have only one argument - with this behave nobody knows if value was appended or updated. Well, maybe you're right, and I would love to discuss our approach to modify jsonb values, but the point is that the purpose of this patch is to provide a new "friendly" syntax to do so, not to change how it works or provide an alternative version of update functionality. Even if you'll convince me that subscripting for jsonb now should behave differently from jsonb_set, I would suggest to do this within a separate patch set, since the current one is already too big (probably that's why the review process is so slow).
> On Thu, 8 Nov 2018 at 16:19, Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > On Thu, 8 Nov 2018 at 06:14, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > > >> Now it's my turn to disagree. As an argument I have this thread [1], where > >> similar discussion happened about flexibility of jsonb and throwing an errors > >> (in this particular case whether or not to throw an error when a non existing > >> path was given to jsonb_set). > > > > It doesn't mean so it is designed well. > >> > >> I can imagine significant number of use cases when adding a value to jsonb like > >> that is desirable outcome, and I'm not sure if I can come up with an example > >> when strictness is the best result. Maybe if you have something in mind, you > >> can describe what would be the case for that? Also as I've mentioned before, > >> consistency between jsonb_set and jsonb subscripting operator will help us to > >> avoid tons of question about why I can do this and this using one option, but > >> not another. > > > > I have only one argument - with this behave nobody knows if value was appended or updated. > > Well, maybe you're right, and I would love to discuss our approach to modify > jsonb values, but the point is that the purpose of this patch is to > provide a new > "friendly" syntax to do so, not to change how it works or provide an > alternative version of update functionality. > > Even if you'll convince me that subscripting for jsonb now should behave > differently from jsonb_set, I would suggest to do this within a separate patch > set, since the current one is already too big (probably that's why the review > process is so slow). I've noticed, that patch has some conflicts, so here is the rebased version. Also, since people are concern about performance impact for arrays, I've done some tests similar to [1], but agains the current master - results are similar so far, I've got quite insignificant difference between the master and the patched version. [1]: https://www.postgresql.org/message-id/CA%2Bq6zcV8YCKcMHkUKiiUM3eOsq-ubb%3DT1D%2Bki4YbE%3DBYbt1PxQ%40mail.gmail.com
Attachment
> On Fri, Nov 9, 2018 at 1:55 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > I've noticed, that patch has some conflicts, so here is the rebased version. > Also, since people are concern about performance impact for arrays, I've done > some tests similar to [1], but agains the current master - results are similar > so far, I've got quite insignificant difference between the master and the > patched version. > > [1]: https://www.postgresql.org/message-id/CA%2Bq6zcV8YCKcMHkUKiiUM3eOsq-ubb%3DT1D%2Bki4YbE%3DBYbt1PxQ%40mail.gmail.com One more rebased version. This time I also decided to use this opportunity, to write more descriptive commit messages.
Attachment
On Mon, Nov 26, 2018 at 6:07 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > On Fri, Nov 9, 2018 at 1:55 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > > I've noticed, that patch has some conflicts, so here is the rebased version. > > Also, since people are concern about performance impact for arrays, I've done > > some tests similar to [1], but agains the current master - results are similar > > so far, I've got quite insignificant difference between the master and the > > patched version. > > > > [1]: https://www.postgresql.org/message-id/CA%2Bq6zcV8YCKcMHkUKiiUM3eOsq-ubb%3DT1D%2Bki4YbE%3DBYbt1PxQ%40mail.gmail.com > > One more rebased version. This time I also decided to use this opportunity, to > write more descriptive commit messages. Hi Dmitry, Noticed on cfbot.cputube.org: pg_type.c:167:10: error: passing argument 3 of ‘GenerateTypeDependencies’ makes integer from pointer without a cast [-Werror] false); ^ In file included from pg_type.c:28:0: ../../../src/include/catalog/pg_type.h:335:13: note: expected ‘Oid’ but argument is of type ‘void *’ extern void GenerateTypeDependencies(Oid typeObjectId, ^ -- Thomas Munro http://www.enterprisedb.com
> On Sun, Nov 25, 2018 at 9:31 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > > Noticed on cfbot.cputube.org: > > pg_type.c:167:10: error: passing argument 3 of > ‘GenerateTypeDependencies’ makes integer from pointer without a cast > [-Werror] > false); Thanks for noticing! I was in rush when did rebase, and made few mistakes (the moral of the story - do not rush). Here is the fix, which is more aligned with the commit ab69ea9fee.
Attachment
> On Mon, Nov 26, 2018 at 1:37 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > Thanks for noticing! I was in rush when did rebase, and made few mistakes (the > moral of the story - do not rush). Here is the fix, which is more aligned with > the commit ab69ea9fee. And one more rebase with pretty much the same functionality so far.
Attachment
Would anybody object to me pushing part 0001 soon? It seems pointless to force Dmitry keep rebasing a huge renaming patch all this time. I think the general feeling is that this is a desirable change, so let's keep things moving. That having been said ... while the current 0001 patch does apply semi-cleanly (`git apply -3` does it), it does not compile, probably because of header refactoring. Please rebase and make sure that each individual patch compiles cleanly. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
čt 31. 1. 2019 v 16:39 odesílatel Alvaro Herrera <alvherre@2ndquadrant.com> napsal:
Would anybody object to me pushing part 0001 soon? It seems pointless
to force Dmitry keep rebasing a huge renaming patch all this time. I
think the general feeling is that this is a desirable change, so let's
keep things moving.
That having been said ... while the current 0001 patch does apply
semi-cleanly (`git apply -3` does it), it does not compile, probably
because of header refactoring. Please rebase and make sure that each
individual patch compiles cleanly.
+1
Pavel
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On Thu, Jan 31, 2019 at 4:39 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > That having been said ... while the current 0001 patch does apply > semi-cleanly (`git apply -3` does it), it does not compile, probably > because of header refactoring. Please rebase and make sure that each > individual patch compiles cleanly. Oh, sorry for that, I'll fix it in a moment.
> On Thu, Jan 31, 2019 at 4:43 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: > > čt 31. 1. 2019 v 16:39 odesílatel Alvaro Herrera <alvherre@2ndquadrant.com> napsal: >> >> Would anybody object to me pushing part 0001 soon? It seems pointless >> to force Dmitry keep rebasing a huge renaming patch all this time. I >> think the general feeling is that this is a desirable change, so let's >> keep things moving. >> >> That having been said ... while the current 0001 patch does apply >> semi-cleanly (`git apply -3` does it), it does not compile, probably >> because of header refactoring. Please rebase and make sure that each >> individual patch compiles cleanly. > > +1 > On Thu, Jan 31, 2019 at 4:45 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > Oh, sorry for that, I'll fix it in a moment. The moment was longer than I expected, but here is the rebased version, where all the individual patches can be applied and compiled cleanly (although there is still functional dependency between 0002 and 0003, since the former introduces a new subscripting without any implementation, and the latter introduces an implementation for array data type).
Attachment
I think it's worth pointing out that "git format-patch -v" exists :-) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Feb-01, Alvaro Herrera wrote: > I think it's worth pointing out that "git format-patch -v" exists :-) ... and you're going to need "git format-patch -v19", because contrib doesn't build with 18. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Feb-01, Alvaro Herrera wrote: > On 2019-Feb-01, Alvaro Herrera wrote: > > > I think it's worth pointing out that "git format-patch -v" exists :-) > > ... and you're going to need "git format-patch -v19", because contrib > doesn't build with 18. And that suggests that maybe we should keep the old names working, to avoid breaking every extension out there that deals with ArrayRef, though I'm not sure if after patches 0002 ff it'll be possible to keep them working without changes. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On Fri, Feb 1, 2019 at 12:54 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2019-Feb-01, Alvaro Herrera wrote: > > > On 2019-Feb-01, Alvaro Herrera wrote: > > > > ... and you're going to need "git format-patch -v19", because contrib > > doesn't build with 18. > > And that suggests that maybe we should keep the old names working, to > avoid breaking every extension out there that deals with ArrayRef, > though I'm not sure if after patches 0002 ff it'll be possible to keep > them working without changes. Can you please point out for me what exactly doesn't build? I just tried to build contrib and ran all the tests, everything finished succesfully, which is also confirmed by bot [1]. > I think it's worth pointing out that "git format-patch -v" exists :-) Fortunately, I know. But yeah, no idea why I started to add a version number at the end of patch name :) [1]: https://travis-ci.org/postgresql-cfbot/postgresql/builds/487401077
On 2019-Feb-01, Dmitry Dolgov wrote: > Can you please point out for me what exactly doesn't build? I just tried to > build contrib and ran all the tests, everything finished succesfully, which is > also confirmed by bot [1]. Well, this is strange: I removed the changes then re-applied the diff, and it worked fine this time. Strange. I can only offer my terminal log, where it's obvious that the contrib changes were not applied by "git apply" (even though it did finish successfully): alvin: src 141$ git apply /tmp/0001-Renaming-for-new-subscripting-mechanism-v18.patch alvin: src 0$ runpg head install building heads/master ... /pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c: In function 'JumbleExpr': /pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c:2582:8: error: 'T_ArrayRef' undeclared (first use inthis function) case T_ArrayRef: ^~~~~~~~~~ /pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c:2582:8: note: each undeclared identifier is reportedonly once for each function it appears in /pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c:2584:5: error: unknown type name 'ArrayRef' ArrayRef *aref = (ArrayRef *) node; ^~~~~~~~ /pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c:2584:25: error: 'ArrayRef' undeclared (first use inthis function) ArrayRef *aref = (ArrayRef *) node; ^~~~~~~~ /pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c:2584:35: error: expected expression before ')' token ArrayRef *aref = (ArrayRef *) node; ^ /pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c:2586:37: error: request for member 'refupperindexpr'in something not a structure or union JumbleExpr(jstate, (Node *) aref->refupperindexpr); ^~ /pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c:2587:37: error: request for member 'reflowerindexpr'in something not a structure or union JumbleExpr(jstate, (Node *) aref->reflowerindexpr); ^~ /pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c:2588:37: error: request for member 'refexpr' in somethingnot a structure or union JumbleExpr(jstate, (Node *) aref->refexpr); ^~ /pgsql/source/master/contrib/pg_stat_statements/pg_stat_statements.c:2589:37: error: request for member 'refassgnexpr' insomething not a structure or union JumbleExpr(jstate, (Node *) aref->refassgnexpr); ^~ make[1]: *** [pg_stat_statements.o] Error 1 make[1]: Target 'install' not remade because of errors. make: *** [install-pg_stat_statements-recurse] Error 2 /pgsql/source/master/contrib/postgres_fdw/deparse.c:152:29: error: unknown type name 'ArrayRef' static void deparseArrayRef(ArrayRef *node, deparse_expr_cxt *context); ^~~~~~~~ /pgsql/source/master/contrib/postgres_fdw/deparse.c: In function 'foreign_expr_walker': /pgsql/source/master/contrib/postgres_fdw/deparse.c:404:8: error: 'T_ArrayRef' undeclared (fir case T_ArrayRef: ^~~~~~~~~~ /pgsql/source/master/contrib/postgres_fdw/deparse.c:404:8: note: each undeclared identifier is reported only once for eachfunction it appears in /pgsql/source/master/contrib/postgres_fdw/deparse.c:406:5: error: unknown type name 'ArrayRef' ArrayRef *ar = (ArrayRef *) node; ^~~~~~~~ /pgsql/source/master/contrib/postgres_fdw/deparse.c:406:23: error: 'ArrayRef' undeclared (first use in this function) ArrayRef *ar = (ArrayRef *) node; ^~~~~~~~ /pgsql/source/master/contrib/postgres_fdw/deparse.c:406:33: error: expected expression before ')' token ArrayRef *ar = (ArrayRef *) node; ^ /pgsql/source/master/contrib/postgres_fdw/deparse.c:409:11: error: request for member 'refassgnexpr' in something not a structureor union if (ar->refassgnexpr != NULL) ^~ /pgsql/source/master/contrib/postgres_fdw/deparse.c:417:41: error: request for member 'refupperindexpr' in something nota structure or union if (!foreign_expr_walker((Node *) ar->refupperindexpr, ^~ /pgsql/source/master/contrib/postgres_fdw/deparse.c:420:41: error: request for member 'reflowerindexpr' in something nota structure or union if (!foreign_expr_walker((Node *) ar->reflowerindexpr, ^~ /pgsql/source/master/contrib/postgres_fdw/deparse.c:423:41: error: request for member 'refexpr' in something not a structureor union if (!foreign_expr_walker((Node *) ar->refexpr, ^~ /pgsql/source/master/contrib/postgres_fdw/deparse.c:431:19: error: request for member 'refcollid' in something not a structureor union collation = ar->refcollid; ^~ /pgsql/source/master/contrib/postgres_fdw/deparse.c: In function 'deparseExpr': /pgsql/source/master/contrib/postgres_fdw/deparse.c:2273:8: error: 'T_ArrayRef' undeclared (first use in this function) case T_ArrayRef: ^~~~~~~~~~ /pgsql/source/master/contrib/postgres_fdw/deparse.c:2274:4: warning: implicit declaration of function 'deparseArrayRef' [-Wimplicit-function-declaration] deparseArrayRef((ArrayRef *) node, context); ^~~~~~~~~~~~~~~ /pgsql/source/master/contrib/postgres_fdw/deparse.c:2274:21: error: 'ArrayRef' undeclared (first use in this function) deparseArrayRef((ArrayRef *) node, context); ^~~~~~~~ /pgsql/source/master/contrib/postgres_fdw/deparse.c:2274:31: error: expected expression before ')' token deparseArrayRef((ArrayRef *) node, context); ^ /pgsql/source/master/contrib/postgres_fdw/deparse.c: At top level: /pgsql/source/master/contrib/postgres_fdw/deparse.c:2524:17: error: unknown type name 'ArrayRef' deparseArrayRef(ArrayRef *node, deparse_expr_cxt *context) ^~~~~~~~ make[1]: *** [deparse.o] Error 1 make[1]: Target 'install' not remade because of errors. make: *** [install-postgres_fdw-recurse] Error 2 make: Target 'install' not remade because of errors. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2019-Feb-01, Dmitry Dolgov wrote: > The moment was longer than I expected, but here is the rebased version, where > all the individual patches can be applied and compiled cleanly (although there > is still functional dependency between 0002 and 0003, since the former > introduces a new subscripting without any implementation, and the latter > introduces an implementation for array data type). Cool, pushed 0001. I'm afraid I included some pgindenting, so you'll have to rebase again. Maybe you already know how to do it without manually rebasing, but if not, a quick trick to avoid rebasing manually over all those whitespace changes might be to un-apply with "git show | patch -p1 -R", then apply your original 0001, commit, apply 0002, then pgindent; if you now do a git diff to the original commit, you should get an almost clean diff. Or you could just try to apply with -w. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On Fri, Feb 1, 2019 at 4:55 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > On 2019-Feb-01, Dmitry Dolgov wrote: > > > The moment was longer than I expected, but here is the rebased version, where > > all the individual patches can be applied and compiled cleanly (although there > > is still functional dependency between 0002 and 0003, since the former > > introduces a new subscripting without any implementation, and the latter > > introduces an implementation for array data type). > > Cool, pushed 0001. I'm afraid I included some pgindenting, so you'll > have to rebase again. Maybe you already know how to do it without > manually rebasing, but if not, a quick trick to avoid rebasing manually > over all those whitespace changes might be to un-apply with "git show | > patch -p1 -R", then apply your original 0001, commit, apply 0002, then > pgindent; if you now do a git diff to the original commit, you should > get an almost clean diff. Or you could just try to apply with -w. Great, thank you!
> On Fri, Feb 1, 2019 at 5:02 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > On Fri, Feb 1, 2019 at 4:55 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > > > On 2019-Feb-01, Dmitry Dolgov wrote: > > > > > The moment was longer than I expected, but here is the rebased version, where > > > all the individual patches can be applied and compiled cleanly (although there > > > is still functional dependency between 0002 and 0003, since the former > > > introduces a new subscripting without any implementation, and the latter > > > introduces an implementation for array data type). > > > > Cool, pushed 0001. I'm afraid I included some pgindenting, so you'll > > have to rebase again. Maybe you already know how to do it without > > manually rebasing, but if not, a quick trick to avoid rebasing manually > > over all those whitespace changes might be to un-apply with "git show | > > patch -p1 -R", then apply your original 0001, commit, apply 0002, then > > pgindent; if you now do a git diff to the original commit, you should > > get an almost clean diff. Or you could just try to apply with -w. > > Great, thank you! And here is the rebased version.
Attachment
> On Tue, Feb 19, 2019 at 5:22 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > And here is the rebased version. One more, after jsonpath changes.
Attachment
> On Tue, Mar 19, 2019 at 2:30 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > On Tue, Feb 19, 2019 at 5:22 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > > And here is the rebased version. > > One more, after jsonpath changes. Oh, I forgot to fix duplicating oids, here is it.
Attachment
Rebase after pg_indent. Besides, off the list there was a suggestion that this could be useful to accept more than one data type as a key for subscripting. E.g. for jsonb it probably makes sense to understand both a simple key name and jsonpath: jsonb['a'] and jsonb['$.a'] While to implement it can be technically relatively straightforward I guess, I wonder if there is any opinion about how valuable it could be and what it should looks like from the syntax point of view (since I believe a user needs to specify which type needs to be used).
Attachment
st 29. 5. 2019 v 17:49 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
Rebase after pg_indent. Besides, off the list there was a suggestion that this
could be useful to accept more than one data type as a key for subscripting.
E.g. for jsonb it probably makes sense to understand both a simple key name and
jsonpath:
jsonb['a'] and jsonb['$.a']
While to implement it can be technically relatively straightforward I guess, I
wonder if there is any opinion about how valuable it could be and what it
should looks like from the syntax point of view (since I believe a user needs
to specify which type needs to be used).
It is difficult decision - possibility to use jsonpath looks great, but necessity to cast every time is not friendly.
Probably there can be preferred type if subscripting is of unknown type. There can be similar rules to function's parameters.
so jsonb['a'] -- key
jsonb['$.a'] -- key
jsonb['$.a'::jsonpath'] -- json path
but it can be source of bad issues - so I think we don't need this feature in this moment. This feature can be implemented later, I think.
Regards
Pavel
> On Wed, May 29, 2019 at 6:17 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: > > st 29. 5. 2019 v 17:49 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal: >> >> Rebase after pg_indent. Besides, off the list there was a suggestion that this >> could be useful to accept more than one data type as a key for subscripting. >> E.g. for jsonb it probably makes sense to understand both a simple key name and >> jsonpath: >> >> jsonb['a'] and jsonb['$.a'] >> >> While to implement it can be technically relatively straightforward I guess, I >> wonder if there is any opinion about how valuable it could be and what it >> should looks like from the syntax point of view (since I believe a user needs >> to specify which type needs to be used). > > > It is difficult decision - possibility to use jsonpath looks great, but > necessity to cast every time is not friendly. Thanks. Yes, I also wonder if it's possible to avoid type casting every time, but other ideas seems syntactically equally not friendly. > Probably there can be preferred type if subscripting is of unknown type. > There can be similar rules to function's parameters. > > so jsonb['a'] -- key > jsonb['$.a'] -- key > jsonb['$.a'::jsonpath'] -- json path > > but it can be source of bad issues - so I think we don't need this feature in > this moment. This feature can be implemented later, I think. Yeah, I agree it's something that looks like a good potential improvement, not now but in the future.
> On Thu, May 30, 2019 at 4:17 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > On Wed, May 29, 2019 at 6:17 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: > > > > st 29. 5. 2019 v 17:49 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal: > >> > >> Rebase after pg_indent. Besides, off the list there was a suggestion that this > >> could be useful to accept more than one data type as a key for subscripting. > >> E.g. for jsonb it probably makes sense to understand both a simple key name and > >> jsonpath: And one more rebase.
Attachment
> On Thu, Jun 6, 2019 at 3:17 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > On Thu, May 30, 2019 at 4:17 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > > > On Wed, May 29, 2019 at 6:17 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: > > > > > > st 29. 5. 2019 v 17:49 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal: > > >> > > >> Rebase after pg_indent. Besides, off the list there was a suggestion that this > > >> could be useful to accept more than one data type as a key for subscripting. > > >> E.g. for jsonb it probably makes sense to understand both a simple key name and > > >> jsonpath: > > And one more rebase. Oh, looks like I was just confused and it wasn't necessary - for some reason starting from v22 cfbot tries to apply v6 instead of the latest one.
On Fri, Jun 7, 2019 at 6:22 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > >> Rebase after pg_indent. Besides, off the list there was a suggestion that this > > > >> could be useful to accept more than one data type as a key for subscripting. > > > >> E.g. for jsonb it probably makes sense to understand both a simple key name and > > > >> jsonpath: > > > > And one more rebase. > > Oh, looks like I was just confused and it wasn't necessary - for some reason > starting from v22 cfbot tries to apply v6 instead of the latest one. Hi Dmitry, Sorry about that. It looks like I broke the cfbot code that picks which thread to pull patches from when there are several registered in the CF app, the last time the HTML format changed. Now it's back to picking whichever thread has the most recent message on it. Such are the joys of web scraping (obviously we need better integration and that will happen, I just haven't had time yet). Anyway, I fixed that. But now you really do need to rebase :-) -- Thomas Munro https://enterprisedb.com
> On Mon, Jul 8, 2019 at 6:46 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Fri, Jun 7, 2019 at 6:22 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > > >> Rebase after pg_indent. Besides, off the list there was a suggestion that this > > > > >> could be useful to accept more than one data type as a key for subscripting. > > > > >> E.g. for jsonb it probably makes sense to understand both a simple key name and > > > > >> jsonpath: > > > > > > And one more rebase. > > > > Oh, looks like I was just confused and it wasn't necessary - for some reason > > starting from v22 cfbot tries to apply v6 instead of the latest one. > > Hi Dmitry, > > Sorry about that. It looks like I broke the cfbot code that picks > which thread to pull patches from when there are several registered in > the CF app, the last time the HTML format changed. Now it's back to > picking whichever thread has the most recent message on it. Such are > the joys of web scraping (obviously we need better integration and > that will happen, I just haven't had time yet). > > Anyway, I fixed that. But now you really do need to rebase :-) Thanks for fixing and for the reminder! Here is the new rebased version. It contradicts a bit with 44982e7d09, because I'm actually using indexprSlice, but I guess we can figure this out. And I must admit, it's a pure fun to maintain such a large patch set in sync for already several years :)
Attachment
On Tue, Jul 09, 2019 at 02:23:57PM +0200, Dmitry Dolgov wrote: > > On Mon, Jul 8, 2019 at 6:46 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > > > On Fri, Jun 7, 2019 at 6:22 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > > > >> Rebase after pg_indent. Besides, off the list there was a suggestion that this > > > > > >> could be useful to accept more than one data type as a key for subscripting. > > > > > >> E.g. for jsonb it probably makes sense to understand both a simple key name and > > > > > >> jsonpath: > > > > > > > > And one more rebase. > > > > > > Oh, looks like I was just confused and it wasn't necessary - for some reason > > > starting from v22 cfbot tries to apply v6 instead of the latest one. > > > > Hi Dmitry, > > > > Sorry about that. It looks like I broke the cfbot code that picks > > which thread to pull patches from when there are several registered in > > the CF app, the last time the HTML format changed. Now it's back to > > picking whichever thread has the most recent message on it. Such are > > the joys of web scraping (obviously we need better integration and > > that will happen, I just haven't had time yet). > > > > Anyway, I fixed that. But now you really do need to rebase :-) > > Thanks for fixing and for the reminder! Here is the new rebased version. It > contradicts a bit with 44982e7d09, because I'm actually using indexprSlice, but > I guess we can figure this out. > > And I must admit, it's a pure fun to maintain such a large patch set in sync > for already several years :) Looks great! The tutorial piece has bit-rotted slightly. Please find attached a patch atop yours that fixes it. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Attachment
> On Thu, Jul 11, 2019 at 9:47 AM David Fetter <david@fetter.org> wrote: > > Looks great! > > The tutorial piece has bit-rotted slightly. Please find attached a > patch atop yours that fixes it. Indeed, I've missed this change, thank you! Although there supposed to be an upperindex, not numupper (since the latter is just a number of upper indexes).
Attachment
On Thu, Jul 11, 2019 at 04:30:46PM +0200, Dmitry Dolgov wrote: > > On Thu, Jul 11, 2019 at 9:47 AM David Fetter <david@fetter.org> wrote: > > > > Looks great! > > > > The tutorial piece has bit-rotted slightly. Please find attached a > > patch atop yours that fixes it. > > Indeed, I've missed this change, thank you! Although there supposed to be an > upperindex, not numupper (since the latter is just a number of upper indexes). Oops! Fooled by a suggestion from the compiler. My bad for not checking more carefully. Thanks for making this happen. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On 2019-Jul-11, Dmitry Dolgov wrote: > > On Thu, Jul 11, 2019 at 9:47 AM David Fetter <david@fetter.org> wrote: > > > > Looks great! > > > > The tutorial piece has bit-rotted slightly. Please find attached a > > patch atop yours that fixes it. > > Indeed, I've missed this change, thank you! Although there supposed to be an > upperindex, not numupper (since the latter is just a number of upper indexes). Can you please send an updated version? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On Thu, Sep 12, 2019 at 3:58 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Can you please send an updated version? Sure, I'll send it in a few days.
> On Fri, Sep 13, 2019 at 10:29 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > On Thu, Sep 12, 2019 at 3:58 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > Can you please send an updated version? > > Sure, I'll send it in a few days. Here it is.
Attachment
This broke recently. Can you please rebase again? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On Wed, Sep 25, 2019 at 10:22 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > This broke recently. Can you please rebase again? Thanks for noticing! Sure, here it is. We're quite close to the records.
Attachment
On 30.09.2019 14:57, Dmitry Dolgov wrote:
On Wed, Sep 25, 2019 at 10:22 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: This broke recently. Can you please rebase again?Thanks for noticing! Sure, here it is. We're quite close to the records.
Hi. I added new 5th patch to this patch set. Jsonb subscripting uses text for representing subscript values. This is Ok for object keys, but integer arrays indexes should be parsed at runtime. Another problem is that floats can't be used as array indexes because integers simply can't be parsed from a string containing a floating point. But we can use float indexes in ordinary Postgres arrays: SELECT ('{1,2,3}'::int[])[2.3];int4 ------ 2 (1 row) Also SQL standard allows to use float indexes in JSON path with implementation- defined rounding or truncation: SELECT jsonb_path_query('[1, 2, 3]', '$[1.3]');jsonb_path_query ------------------2 (1 row) So, I decided to fix these two issues introducing polymorphic subscripting, in which each subscript expression variant interpreted depending on the result of previous subscripting step. There are two variants of jsonb subscript expressions -- the first is casted to text and the second is casted to int4. Executor at each subscripting step selects which variant to execute by calling callback jsonb_subscript_selectexpr(). To manage the subscripting state, another callback jsonb_subscript_step() was introduced along with the new field SubscriptingRefState.privatedata. Such expression selecting has noticeable overhead, which we can eliminate by generating only one expression variant when subscript is of int2/int4 or text type. After float subscripts start to works as expected: SELECT ('[1, 2, 3]'::jsonb)[1.2];jsonb -------2 (1 row) SELECT ('{"1": "a", "1.0": "b", "1.2": "c"}'::jsonb)[i] FROM unnest('{1,1.0,1.2}'::numeric[]) i; jsonb -------"a""b""c" (3 rows) Performance was compared on 4 tables with 10M rows: -- [ i ] CREATE TABLE arr_1 AS SELECT jsonb_build_array(i)::jsonb js FROM generate_series(1, 10000000) i; -- { "a": i } CREATE TABLE obj_1 AS SELECT jsonb_build_object('a', i) js FROM generate_series(1, 10000000) i; -- [[[[[[[[[[ i ]]]]]]]]]] CREATE TABLE arr_10 AS SELECT (repeat('[', 10) || i || repeat(']', 10))::jsonb js FROM generate_series(1, 10000000) i; -- {"a": {"a": ... {"a": {"a": i } } ... } } CREATE TABLE obj_10 AS SELECT (repeat('{"a":', 10) || i || repeat('}', 10))::jsonb js FROM generate_series(1, 10000000) i; Queries were like "SELECT FROM table WHERE expression IS [NOT] NULL". Compared previous v27 version (4 patches) with v28 version (5 patches). New patch #5 contains one small but important optimization -- elimination of unnecessary palloc() in getIthJsonbValueFromContainer() and jsonb_get_element(). It should be posted separately, but for simplicity I included it the patch now. For the correctness of comparison, it was evaluated separately on top of v27 (v27opt). Table | Expression | Query time, ms | | v27 | v27opt| v28 --------+--------------------------------------+-------+-------+-------arr_1 | js->0 | 1811 | 1809 | 1813 arr_1 | js[0] | 2273 | 2294 | 2028 arr_1 | js['0'] | 2276 | 2286 | 2339arr_1 | js->1 | 808 | 844 | 809arr_1 | js[1] | 1180 | 1187 | 1008 obj_1 | js->'a' | 1941 | 1935 | 1939obj_1 | js['a'] | 2079 | 2083 | 2102obj_1 | js->'b' | 917 | 915 | 902obj_1 | js['b'] | 960 | 961 | 1059 |arr_10 | js->0->0 ... ->0->0 | 4530 | 4068 | 4052arr_10 | js[0][0] ... [0][0] | 6197 | 5513 | 3766arr_10 | js['0']['0'] ... ['0']['0'] | 6202 | 5519 | 5983arr_10 | js #> '{0,0,0,0,0,0,0,0,0,0}' | 6412 | 5850 | 5835arr_10 | js #>> '{0,0,0,0,0,0,0,0,0,0}' | 5904 | 5181 | 5192 obj_10 | js->'a'->'a' ... ->'a'->'a' | 4970 | 4717 | 4704obj_10 | js['a']['a'] ... ['a']['a'] | 4331 | 3698 | 4032obj_10 | js #> '{a,a,a,a,a,a,a,a,a,a}' | 4570 | 3941 | 3949 obj_10 | js #>> '{a,a,a,a,a,a,a,a,a,a}' | 4055 | 3395 | 3392 As it can be seen, array access time reduced from 10% in single subscripts to 40% in 10-subscript chains, and subscripting event started to overtake chained "->" operators. But there is 10% slowdown of object key access that needs further investigation. The elimination of unnecessary palloc()s also gives good results. I had to write new assignment logic reusing only some parts of setPath(), because the loop in setPath() should be broken on every level. During this process, I decided to implement assignment behavior similar to PostgreSQL's array behavior and added two new features:- creation of jsonb arrays/objects container from NULL values- appending/prepending array elements on the specified position, gaps filled with nulls (JavaScript has similar behavior) These features are not so easy to extract into a separate patch on top of the first 4 patches, but I can try if necessary. Here is examples of new features: CREATE TABLE t AS SELECT NULL::jsonb js, NULL::int[] a; -- create array from NULL UPDATE t SET js[0] = 1, a[1] = 1; SELECT * FROM t;js | a -----+-----[1] | {1} (1 row) -- append 4th element UPDATE t SET js[3] = 4, a[4] = 4; SELECT * FROM t; js | a --------------------+-----------------[1, null, null, 4] | {1,NULL,NULL,4} (1 row) -- prepend element when index is negative (position = size + index) UPDATE t SET js[-6] = -2; SELECT js FROM t; js ------------------------------[-2, null, 1, null, null, 4] (1 row)
--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment
> On Thu, Oct 31, 2019 at 05:35:28AM +0300, Nikita Glukhov wrote: > Hi. I added new 5th patch to this patch set. Thank you! > Performance was compared on 4 tables with 10M rows: > > -- [ i ] > CREATE TABLE arr_1 AS > SELECT jsonb_build_array(i)::jsonb js > FROM generate_series(1, 10000000) i; > > -- { "a": i } > CREATE TABLE obj_1 AS > SELECT jsonb_build_object('a', i) js > FROM generate_series(1, 10000000) i; > > -- [[[[[[[[[[ i ]]]]]]]]]] > CREATE TABLE arr_10 AS > SELECT (repeat('[', 10) || i || repeat(']', 10))::jsonb js > FROM generate_series(1, 10000000) i; > > -- {"a": {"a": ... {"a": {"a": i } } ... } } > CREATE TABLE obj_10 AS > SELECT (repeat('{"a":', 10) || i || repeat('}', 10))::jsonb js > FROM generate_series(1, 10000000) i; > > Queries were like "SELECT FROM table WHERE expression IS [NOT] NULL". > > Compared previous v27 version (4 patches) with v28 version (5 patches). > New patch #5 contains one small but important optimization -- elimination of > unnecessary palloc() in getIthJsonbValueFromContainer() and jsonb_get_element(). > It should be posted separately, but for simplicity I included it the patch now. > For the correctness of comparison, it was evaluated separately on top of v27 > (v27opt). > > > Table | Expression | Query time, ms > | | v27 | v27opt| v28 > --------+--------------------------------------+-------+-------+------- > arr_1 | js->0 | 1811 | 1809 | 1813 > arr_1 | js[0] | 2273 | 2294 | 2028 > arr_1 | js['0'] | 2276 | 2286 | 2339 > arr_1 | js->1 | 808 | 844 | 809 > arr_1 | js[1] | 1180 | 1187 | 1008 > obj_1 | js->'a' | 1941 | 1935 | 1939 > obj_1 | js['a'] | 2079 | 2083 | 2102 > obj_1 | js->'b' | 917 | 915 | 902 > obj_1 | js['b'] | 960 | 961 | 1059 > | > arr_10 | js->0->0 ... ->0->0 | 4530 | 4068 | 4052 > arr_10 | js[0][0] ... [0][0] | 6197 | 5513 | 3766 > arr_10 | js['0']['0'] ... ['0']['0'] | 6202 | 5519 | 5983 > arr_10 | js #> '{0,0,0,0,0,0,0,0,0,0}' | 6412 | 5850 | 5835 > arr_10 | js #>> '{0,0,0,0,0,0,0,0,0,0}' | 5904 | 5181 | 5192 > > obj_10 | js->'a'->'a' ... ->'a'->'a' | 4970 | 4717 | 4704 > obj_10 | js['a']['a'] ... ['a']['a'] | 4331 | 3698 | 4032 > obj_10 | js #> '{a,a,a,a,a,a,a,a,a,a}' | 4570 | 3941 | 3949 > obj_10 | js #>> '{a,a,a,a,a,a,a,a,a,a}' | 4055 | 3395 | 3392 > > As it can be seen, array access time reduced from 10% in single subscripts > to 40% in 10-subscript chains, and subscripting event started to overtake > chained "->" operators. But there is 10% slowdown of object key access that > needs further investigation. The elimination of unnecessary palloc()s also > gives good results. I've tested 5th patch a bit and can confirm numbers in the last column for v28 (I've got similar proportions). Let's see what is the reason for 10% of slowdown for object key access. > I had to write new assignment logic reusing only some parts of setPath(), > because the loop in setPath() should be broken on every level. During this > process, I decided to implement assignment behavior similar to PostgreSQL's > array behavior and added two new features: > - creation of jsonb arrays/objects container from NULL values > - appending/prepending array elements on the specified position, gaps filled > with nulls (JavaScript has similar behavior) What is the reason for the last one?
> On Sun, Nov 10, 2019 at 01:32:08PM +0100, Dmitry Dolgov wrote: > > > I had to write new assignment logic reusing only some parts of setPath(), > > because the loop in setPath() should be broken on every level. During this > > process, I decided to implement assignment behavior similar to PostgreSQL's > > array behavior and added two new features: > > - creation of jsonb arrays/objects container from NULL values > > - appending/prepending array elements on the specified position, gaps filled > > with nulls (JavaScript has similar behavior) > > What is the reason for the last one? I've splitted the last patch into polymorphic itself and jsonb array behaviour changes, since I'm afraid it could be a questionable part.
Attachment
Hi
čt 19. 12. 2019 v 15:20 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Sun, Nov 10, 2019 at 01:32:08PM +0100, Dmitry Dolgov wrote:
>
> > I had to write new assignment logic reusing only some parts of setPath(),
> > because the loop in setPath() should be broken on every level. During this
> > process, I decided to implement assignment behavior similar to PostgreSQL's
> > array behavior and added two new features:
> > - creation of jsonb arrays/objects container from NULL values
> > - appending/prepending array elements on the specified position, gaps filled
> > with nulls (JavaScript has similar behavior)
>
> What is the reason for the last one?
I've splitted the last patch into polymorphic itself and jsonb array
behaviour changes, since I'm afraid it could be a questionable part.
I tested last set of patches.
I like patch 0006 - filling gaps by NULLs - it fixed my objections if I remember correctly. Patch 0005 - polymorphic subscribing - I had not a idea, what is a use case? Maybe can be good to postpone this patch. I have not strong opinion about it, but generally is good to reduce size of initial patch. I have nothing against a compatibility with SQL, but this case doesn't looks too realistic for me, and can be postponed without future compatibility issues.
I did some notes:
It needs rebase, I had to fix some issues.
I miss deeper comments for
+static Oid
+findTypeSubscriptingFunction(List *procname, Oid typeOid, bool parseFunc)
+findTypeSubscriptingFunction(List *procname, Oid typeOid, bool parseFunc)
+/* Callback function signatures --- see xsubscripting.sgml for more info. */
+typedef SubscriptingRef * (*SubscriptingPrepare) (bool isAssignment, SubscriptingRef *sbsef);
+
+typedef SubscriptingRef * (*SubscriptingValidate) (bool isAssignment, SubscriptingRef *sbsef,
+<-><--><--><--><--><--><--><--><--><--><--><--> struct ParseState *pstate);
+
+typedef Datum (*SubscriptingFetch) (Datum source, struct SubscriptingRefState *sbsrefstate);
+
+typedef Datum (*SubscriptingAssign) (Datum source, struct SubscriptingRefState *sbrsefstate);
+
+typedef struct SubscriptRoutines
+{
+<->SubscriptingPrepare><-->prepare; #### .
+<->SubscriptingValidate<-->validate;
+<->SubscriptingFetch<-><-->fetch;
+<->SubscriptingAssign<><-->assign;
+
+} SubscriptRoutines;
+
+typedef SubscriptingRef * (*SubscriptingPrepare) (bool isAssignment, SubscriptingRef *sbsef);
+
+typedef SubscriptingRef * (*SubscriptingValidate) (bool isAssignment, SubscriptingRef *sbsef,
+<-><--><--><--><--><--><--><--><--><--><--><--> struct ParseState *pstate);
+
+typedef Datum (*SubscriptingFetch) (Datum source, struct SubscriptingRefState *sbsrefstate);
+
+typedef Datum (*SubscriptingAssign) (Datum source, struct SubscriptingRefState *sbrsefstate);
+
+typedef struct SubscriptRoutines
+{
+<->SubscriptingPrepare><-->prepare; #### .
+<->SubscriptingValidate<-->validate;
+<->SubscriptingFetch<-><-->fetch;
+<->SubscriptingAssign<><-->assign;
+
+} SubscriptRoutines;
+
regress tests fails
+Datum
+array_subscript_fetch(Datum containerSource, SubscriptingRefState *sbstate)
+array_subscript_fetch(Datum containerSource, SubscriptingRefState *sbstate)
there is a variable "is_slice". Original code had not this variable. Personally I think so original code was better readable without this variable.
so instead
+<->if (is_slice)
+<->{
+<-><-->for(i = 0; i < sbstate->numlower; i++)
+<-><--><-->l_index.indx[i] = DatumGetInt32(sbstate->lowerindex[i]);
+<->}
+<->{
+<-><-->for(i = 0; i < sbstate->numlower; i++)
+<-><--><-->l_index.indx[i] = DatumGetInt32(sbstate->lowerindex[i]);
+<->}
is more readable
if (sbstate->numlower > 0)
{
/* read lower part of indexes */
for (i = 0; i < sbstate->numlower; ...
I miss comments (what is checked here - some like - subscript have to be int4 and number of subscripts should be less than MAXDIM)
+
+SubscriptingRef *
+array_subscript_prepare(bool isAssignment, SubscriptingRef *sbsref)
+SubscriptingRef *
+array_subscript_prepare(bool isAssignment, SubscriptingRef *sbsref)
+SubscriptingRef *
+array_subscript_validate(bool isAssignment, SubscriptingRef *sbsref,
+<-><--><--><--><--> ParseState *pstate)
+array_subscript_validate(bool isAssignment, SubscriptingRef *sbsref,
+<-><--><--><--><--> ParseState *pstate)
Regression tests fails - see a attachment
I really miss a PLpgSQL support
postgres=# do $$
declare j jsonb = '{"a":10, "b":20}';
begin
raise notice '%', j;
raise notice '%', j['a'];
j['a'] = '20';
raise notice '%', j;
end;
$$;
NOTICE: {"a": 10, "b": 20}
NOTICE: 10
ERROR: subscripted object is not an array
CONTEXT: PL/pgSQL function inline_code_block line 6 at assignment
declare j jsonb = '{"a":10, "b":20}';
begin
raise notice '%', j;
raise notice '%', j['a'];
j['a'] = '20';
raise notice '%', j;
end;
$$;
NOTICE: {"a": 10, "b": 20}
NOTICE: 10
ERROR: subscripted object is not an array
CONTEXT: PL/pgSQL function inline_code_block line 6 at assignment
With PLpgSQL support it will be great patch, and really important functionality. It can perfectly cover some gaps of plpgsql.
Regards
Pavel
Attachment
> On Thu, Feb 13, 2020 at 10:15:14AM +0100, Pavel Stehule wrote: > > I tested last set of patches. Thanks a lot for testing! > I like patch 0006 - filling gaps by NULLs - it fixed my objections if I > remember correctly. Patch 0005 - polymorphic subscribing - I had not a > idea, what is a use case? Maybe can be good to postpone this patch. I have > not strong opinion about it, but generally is good to reduce size of > initial patch. I have nothing against a compatibility with SQL, but this > case doesn't looks too realistic for me, and can be postponed without > future compatibility issues. The idea about 0005 is mostly performance related, since this change (aside from being more pedantic with the standard) also allows to squeeze out some visible processing time improvement. But I agree that the patch series itself is too big to add something more, that's why I concider 0005/0006 mosly as interesting ideas for the future. > I miss deeper comments for > > +static Oid > +findTypeSubscriptingFunction(List *procname, Oid typeOid, bool parseFunc) > > +/* Callback function signatures --- see xsubscripting.sgml for more info. > */ > +typedef SubscriptingRef * (*SubscriptingPrepare) (bool isAssignment, > SubscriptingRef *sbsef); > + > +typedef SubscriptingRef * (*SubscriptingValidate) (bool isAssignment, > SubscriptingRef *sbsef, > +<-><--><--><--><--><--><--><--><--><--><--><--> struct ParseState > *pstate); > + > +typedef Datum (*SubscriptingFetch) (Datum source, struct > SubscriptingRefState *sbsrefstate); > + > +typedef Datum (*SubscriptingAssign) (Datum source, struct > SubscriptingRefState *sbrsefstate); > + > +typedef struct SubscriptRoutines > +{ > +<->SubscriptingPrepare><-->prepare; #### . > +<->SubscriptingValidate<-->validate; > +<->SubscriptingFetch<-><-->fetch; > +<->SubscriptingAssign<><-->assign; > + > +} SubscriptRoutines; > + > > .... > > I miss comments (what is checked here - some like - subscript have to be > int4 and number of subscripts should be less than MAXDIM) > > + > +SubscriptingRef * > +array_subscript_prepare(bool isAssignment, SubscriptingRef *sbsref) > > +SubscriptingRef * > +array_subscript_validate(bool isAssignment, SubscriptingRef *sbsref, > +<-><--><--><--><--> ParseState *pstate) > Sure, I can probably add more commentaries there. > +Datum > +array_subscript_fetch(Datum containerSource, SubscriptingRefState *sbstate) > > there is a variable "is_slice". Original code had not this variable. > Personally I think so original code was better readable without this > variable. > > so instead > > +<->if (is_slice) > +<->{ > +<-><-->for(i = 0; i < sbstate->numlower; i++) > +<-><--><-->l_index.indx[i] = DatumGetInt32(sbstate->lowerindex[i]); > +<->} > > is more readable Hm, IIRC this is actually necessary, but I'll check one more time. > I really miss a PLpgSQL support > > postgres=# do $$ > declare j jsonb = '{"a":10, "b":20}'; > begin > raise notice '%', j; > raise notice '%', j['a']; > j['a'] = '20'; > raise notice '%', j; > end; > $$; > NOTICE: {"a": 10, "b": 20} > NOTICE: 10 > ERROR: subscripted object is not an array > CONTEXT: PL/pgSQL function inline_code_block line 6 at assignment > > With PLpgSQL support it will be great patch, and really important > functionality. It can perfectly cover some gaps of plpgsql. Oh, interesting, I never though about this part. Thanks for mentioning, I'll take a look about how can we support the same for PLpgSQL. > It needs rebase, I had to fix some issues. > > ... > > regress tests fails Yep, I wasn't paying much attention recently to this patch, will post rebased and fixed version soon. At the same time I must admit, even if at the moment I can pursue two goals - either to make this feature accepted somehow, or make a longest living CF item ever - neither of those goals seems reachable.
čt 13. 2. 2020 v 14:11 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Thu, Feb 13, 2020 at 10:15:14AM +0100, Pavel Stehule wrote:
>
> I tested last set of patches.
Thanks a lot for testing!
> I like patch 0006 - filling gaps by NULLs - it fixed my objections if I
> remember correctly. Patch 0005 - polymorphic subscribing - I had not a
> idea, what is a use case? Maybe can be good to postpone this patch. I have
> not strong opinion about it, but generally is good to reduce size of
> initial patch. I have nothing against a compatibility with SQL, but this
> case doesn't looks too realistic for me, and can be postponed without
> future compatibility issues.
The idea about 0005 is mostly performance related, since this change
(aside from being more pedantic with the standard) also allows to
squeeze out some visible processing time improvement. But I agree that
the patch series itself is too big to add something more, that's why I
concider 0005/0006 mosly as interesting ideas for the future.
patch 0006 is necessary from my perspective. Without it, behave of update is not practical. I didn't review of this patch mainly due issues that was fixed by 0006 patch
> I miss deeper comments for
>
> +static Oid
> +findTypeSubscriptingFunction(List *procname, Oid typeOid, bool parseFunc)
>
> +/* Callback function signatures --- see xsubscripting.sgml for more info.
> */
> +typedef SubscriptingRef * (*SubscriptingPrepare) (bool isAssignment,
> SubscriptingRef *sbsef);
> +
> +typedef SubscriptingRef * (*SubscriptingValidate) (bool isAssignment,
> SubscriptingRef *sbsef,
> +<-><--><--><--><--><--><--><--><--><--><--><--> struct ParseState
> *pstate);
> +
> +typedef Datum (*SubscriptingFetch) (Datum source, struct
> SubscriptingRefState *sbsrefstate);
> +
> +typedef Datum (*SubscriptingAssign) (Datum source, struct
> SubscriptingRefState *sbrsefstate);
> +
> +typedef struct SubscriptRoutines
> +{
> +<->SubscriptingPrepare><-->prepare; #### .
> +<->SubscriptingValidate<-->validate;
> +<->SubscriptingFetch<-><-->fetch;
> +<->SubscriptingAssign<><-->assign;
> +
> +} SubscriptRoutines;
> +
>
> ....
>
> I miss comments (what is checked here - some like - subscript have to be
> int4 and number of subscripts should be less than MAXDIM)
>
> +
> +SubscriptingRef *
> +array_subscript_prepare(bool isAssignment, SubscriptingRef *sbsref)
>
> +SubscriptingRef *
> +array_subscript_validate(bool isAssignment, SubscriptingRef *sbsref,
> +<-><--><--><--><--> ParseState *pstate)
>
Sure, I can probably add more commentaries there.
> +Datum
> +array_subscript_fetch(Datum containerSource, SubscriptingRefState *sbstate)
>
> there is a variable "is_slice". Original code had not this variable.
> Personally I think so original code was better readable without this
> variable.
>
> so instead
>
> +<->if (is_slice)
> +<->{
> +<-><-->for(i = 0; i < sbstate->numlower; i++)
> +<-><--><-->l_index.indx[i] = DatumGetInt32(sbstate->lowerindex[i]);
> +<->}
>
> is more readable
Hm, IIRC this is actually necessary, but I'll check one more time.
> I really miss a PLpgSQL support
>
> postgres=# do $$
> declare j jsonb = '{"a":10, "b":20}';
> begin
> raise notice '%', j;
> raise notice '%', j['a'];
> j['a'] = '20';
> raise notice '%', j;
> end;
> $$;
> NOTICE: {"a": 10, "b": 20}
> NOTICE: 10
> ERROR: subscripted object is not an array
> CONTEXT: PL/pgSQL function inline_code_block line 6 at assignment
>
> With PLpgSQL support it will be great patch, and really important
> functionality. It can perfectly cover some gaps of plpgsql.
Oh, interesting, I never though about this part. Thanks for mentioning,
I'll take a look about how can we support the same for PLpgSQL.
> It needs rebase, I had to fix some issues.
>
> ...
>
> regress tests fails
Yep, I wasn't paying much attention recently to this patch, will post
rebased and fixed version soon. At the same time I must admit, even if
at the moment I can pursue two goals - either to make this feature
accepted somehow, or make a longest living CF item ever - neither of
those goals seems reachable.
I think so this feature is not important for existing applications. But it allows to work with JSON data (or any other) more comfortable (creative) in plpgsql.
Pavel
> On Thu, Feb 13, 2020 at 02:25:46PM +0100, Pavel Stehule wrote: > > > > I like patch 0006 - filling gaps by NULLs - it fixed my objections if I > > > remember correctly. Patch 0005 - polymorphic subscribing - I had not a > > > idea, what is a use case? Maybe can be good to postpone this patch. I > > have > > > not strong opinion about it, but generally is good to reduce size of > > > initial patch. I have nothing against a compatibility with SQL, but this > > > case doesn't looks too realistic for me, and can be postponed without > > > future compatibility issues. > > > > The idea about 0005 is mostly performance related, since this change > > (aside from being more pedantic with the standard) also allows to > > squeeze out some visible processing time improvement. But I agree that > > the patch series itself is too big to add something more, that's why I > > concider 0005/0006 mosly as interesting ideas for the future. > > > > patch 0006 is necessary from my perspective. Without it, behave of update > is not practical. I didn't review of this patch mainly due issues that was > fixed by 0006 patch Oh, I see. The thing is that in how it is implemented right now 0006 depends on 0005. Originally I was against of doing anything different than a regular jsonb functionality would do, but after the discussion about jsonb_set and null arguments I figured that indeed it probably makes sense to deviate in some certain cases. Eventually it depends on the community feedback, so I can try to make 0006 an independent change and we will see. > > Yep, I wasn't paying much attention recently to this patch, will post > > rebased and fixed version soon. At the same time I must admit, even if > > at the moment I can pursue two goals - either to make this feature > > accepted somehow, or make a longest living CF item ever - neither of > > those goals seems reachable. > > > > I think so this feature is not important for existing applications. But it > allows to work with JSON data (or any other) more comfortable (creative) in > plpgsql. Yes, hopefully.
Hi Dmitry, On 2/13/20 8:12 AM, Dmitry Dolgov wrote: > > Yep, I wasn't paying much attention recently to this patch, will post > rebased and fixed version soon. The last CF for PG13 has begun. Do you know when you'll have a rebased and updated patch available? The current patch no longer applies. > At the same time I must admit, even if > at the moment I can pursue two goals - either to make this feature > accepted somehow, or make a longest living CF item ever - neither of > those goals seems reachable. Well, you are only the third oldest patch in this CF! Regards, -- -David david@pgmasters.net
> On Tue, Mar 03, 2020 at 12:55:38PM -0500, David Steele wrote: > > > Yep, I wasn't paying much attention recently to this patch, will post > > rebased and fixed version soon. > > The last CF for PG13 has begun. Do you know when you'll have a rebased and > updated patch available? The current patch no longer applies. Thanks for the reminder! Here is the rebased version, with one extra detail. Looks like tests for non-strict version of jsonb_set are doing: \pset null NULL and backwards, but via: \pset null which is not changing it back. It's showed up in tests for subscripting down the file, so I've changed it to what I guess it was suppose to be: \pset null ''
Attachment
st 4. 3. 2020 v 18:04 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Tue, Mar 03, 2020 at 12:55:38PM -0500, David Steele wrote:
>
> > Yep, I wasn't paying much attention recently to this patch, will post
> > rebased and fixed version soon.
>
> The last CF for PG13 has begun. Do you know when you'll have a rebased and
> updated patch available? The current patch no longer applies.
Thanks for the reminder! Here is the rebased version, with one extra
detail. Looks like tests for non-strict version of jsonb_set are doing:
\pset null NULL
and backwards, but via:
\pset null
which is not changing it back. It's showed up in tests for subscripting
down the file, so I've changed it to what I guess it was suppose to be:
\pset null ''
Hi
please rebase this patch
Regards
Pavel
Hi
Hiplease rebase this patch
here is a attached fixed first patch
v30-0001-Base-implementation-of-subscripting-mechanism.patch
My objectives are fixed. I checked this patch and
There are not problems with build (code, documentation)
All tests passed
The code is well documented
I like the functionality introduced by this patch. It opens a door for easy work with json, jsonb, xml, ... and lot of other types with array access syntax.
This is first step, but necessary steps. A write operations are not supported by PL/pgSQL. But plpgsql developers still has some benefits. It is working for read operations (in plpgsql).
I'll mark this patch as ready for commiters
Thank you for your work.
Regards
Pavel
RegardsPavel
Attachment
> On Tue, Mar 17, 2020 at 11:03:22AM +0100, Pavel Stehule wrote: > > here is a attached fixed first patch > > v30-0001-Base-implementation-of-subscripting-mechanism.patch > > My objectives are fixed. I checked this patch and > > There are not problems with build (code, documentation) > All tests passed > The code is well documented > > I like the functionality introduced by this patch. It opens a door for easy > work with json, jsonb, xml, ... and lot of other types with array access > syntax. > > This is first step, but necessary steps. A write operations are not > supported by PL/pgSQL. But plpgsql developers still has some benefits. It > is working for read operations (in plpgsql). > > I'll mark this patch as ready for commiters > > Thank you for your work. Thanks a lot, Pavel!
Pavel Stehule <pavel.stehule@gmail.com> writes: >> please rebase this patch > here is a attached fixed first patch cfbot reports this as failing because of missing include files. Somebody please post a complete patch set? regards, tom lane
ne 22. 3. 2020 v 18:47 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
>> please rebase this patch
> here is a attached fixed first patch
cfbot reports this as failing because of missing include files.
Somebody please post a complete patch set?
here it is
Regards
Pavel
regards, tom lane
Attachment
Pavel Stehule <pavel.stehule@gmail.com> writes: > ne 22. 3. 2020 v 18:47 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal: >> cfbot reports this as failing because of missing include files. >> Somebody please post a complete patch set? > here it is That set doesn't even apply. http://cfbot.cputube.org/patch_27_1062.log regards, tom lane
ne 22. 3. 2020 v 20:41 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> ne 22. 3. 2020 v 18:47 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>> cfbot reports this as failing because of missing include files.
>> Somebody please post a complete patch set?
> here it is
That set doesn't even apply.
http://cfbot.cputube.org/patch_27_1062.log
It was my mistake
regards, tom lane
Attachment
> On Mon, Mar 23, 2020 at 07:30:25AM +0100, Pavel Stehule wrote: > ne 22. 3. 2020 v 20:41 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal: > > > Pavel Stehule <pavel.stehule@gmail.com> writes: > > > ne 22. 3. 2020 v 18:47 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal: > > >> cfbot reports this as failing because of missing include files. > > >> Somebody please post a complete patch set? > > > > > here it is One more rebase to prepare for 2020-07.
Attachment
I started to look through this again, and really found myself wondering why we're going to all this work to invent what are fundamentally pretty bogus "features". The thing that particularly sticks in my craw is the 0005 patch, which tries to interpret a subscript of a JSON value as either integer or text depending on, seemingly, the phase of the moon. I don't think that will work. For example, with existing arrays you can do something like arraycol['42'] and the unknown-type literal is properly cast to an integer. The corresponding situation with a JSON subscript would have no principled resolution. It doesn't help any that both coercion alternatives are attempted at COERCION_ASSIGNMENT level, which makes it noticeably more likely that they'll both succeed. But using ASSIGNMENT level is quite inappropriate in any context where it's not 100% certain what the intended type is. The proposed commit message for 0005 claims that this is somehow improving our standards compliance, but I see nothing in the SQL spec suggesting that you can subscript a JSON value at all within the SQL language, so I think that claim is just false. Maybe this could be salvaged by flushing 0005 in its current form and having the jsonb subscript executor do something like "if the current value-to-be-subscripted is a JSON array, then try to convert the textual subscript value to an integer". Not sure about what the error handling rules ought to be like, though. regards, tom lane
so 1. 8. 2020 v 16:30 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Fri, Jul 31, 2020 at 03:35:22PM -0400, Tom Lane wrote:
>
> I started to look through this again, and really found myself wondering
> why we're going to all this work to invent what are fundamentally pretty
> bogus "features". The thing that particularly sticks in my craw is the
> 0005 patch, which tries to interpret a subscript of a JSON value as either
> integer or text depending on, seemingly, the phase of the moon. I don't
> think that will work. For example, with existing arrays you can do
> something like arraycol['42'] and the unknown-type literal is properly
> cast to an integer. The corresponding situation with a JSON subscript
> would have no principled resolution.
>
> It doesn't help any that both coercion alternatives are attempted at
> COERCION_ASSIGNMENT level, which makes it noticeably more likely that
> they'll both succeed. But using ASSIGNMENT level is quite inappropriate
> in any context where it's not 100% certain what the intended type is.
>
> The proposed commit message for 0005 claims that this is somehow improving
> our standards compliance, but I see nothing in the SQL spec suggesting
> that you can subscript a JSON value at all within the SQL language, so
> I think that claim is just false.
It's due to my lack of writing skills. As far as I can remember the
discussion was about JSON path part of the standard, where one allowed
to use float indexes with implementation-defined rounding or truncation.
In this patch series I'm trying to think of JSON subscript as an
equivalent for JSON path, hence this misleading description. Having said
that, I guess the main motivation behind 0005 is performance
improvements. Hopefully Nikita can provide more insights. Overall back
when 0005 patch was suggested its implementation looked reasonable for
me, but I'll review it again.
> Maybe this could be salvaged by flushing 0005 in its current form and
> having the jsonb subscript executor do something like "if the current
> value-to-be-subscripted is a JSON array, then try to convert the textual
> subscript value to an integer". Not sure about what the error handling
> rules ought to be like, though.
I'm fine with the idea of separating 0005 patch and potentially prusuing
it as an independent item. Just need to rebase 0006, since Pavel
mentioned that it's a reasonable change he would like to see in the
final result.
+1
Pavel
> On Sun, Aug 02, 2020 at 12:50:12PM +0200, Pavel Stehule wrote: > > > > > Maybe this could be salvaged by flushing 0005 in its current form and > > > having the jsonb subscript executor do something like "if the current > > > value-to-be-subscripted is a JSON array, then try to convert the textual > > > subscript value to an integer". Not sure about what the error handling > > > rules ought to be like, though. > > > > I'm fine with the idea of separating 0005 patch and potentially prusuing > > it as an independent item. Just need to rebase 0006, since Pavel > > mentioned that it's a reasonable change he would like to see in the > > final result. > > > > +1 Here is what I had in mind. Worth noting that, as well as the original patch, the attached implementation keeps the same behaviour for negative indices. Also, I've removed a strange inconsistency one could notice with the original implementation, when one extra gap was introduced when we append something at the beginning of an array.
Attachment
On Wed, Aug 05, 2020 at 04:04:22PM +0200, Dmitry Dolgov wrote: > > On Sun, Aug 02, 2020 at 12:50:12PM +0200, Pavel Stehule wrote: > > > > > > > Maybe this could be salvaged by flushing 0005 in its current form and > > > > having the jsonb subscript executor do something like "if the current > > > > value-to-be-subscripted is a JSON array, then try to convert the textual > > > > subscript value to an integer". Not sure about what the error handling > > > > rules ought to be like, though. > > > > > > I'm fine with the idea of separating 0005 patch and potentially prusuing > > > it as an independent item. Just need to rebase 0006, since Pavel > > > mentioned that it's a reasonable change he would like to see in the > > > final result. > > > > +1 > > Here is what I had in mind. Worth noting that, as well as the original This seems to already hit a merge conflict (8febfd185). Would you re-rebase ? -- Justin
st 9. 9. 2020 v 23:04 odesílatel Justin Pryzby <pryzby@telsasoft.com> napsal:
On Wed, Aug 05, 2020 at 04:04:22PM +0200, Dmitry Dolgov wrote:
> > On Sun, Aug 02, 2020 at 12:50:12PM +0200, Pavel Stehule wrote:
> > >
> > > > Maybe this could be salvaged by flushing 0005 in its current form and
> > > > having the jsonb subscript executor do something like "if the current
> > > > value-to-be-subscripted is a JSON array, then try to convert the textual
> > > > subscript value to an integer". Not sure about what the error handling
> > > > rules ought to be like, though.
> > >
> > > I'm fine with the idea of separating 0005 patch and potentially prusuing
> > > it as an independent item. Just need to rebase 0006, since Pavel
> > > mentioned that it's a reasonable change he would like to see in the
> > > final result.
> >
> > +1
>
> Here is what I had in mind. Worth noting that, as well as the original
This seems to already hit a merge conflict (8febfd185).
Would you re-rebase ?
This can be easy fixed. Maybe I found a another issue.
create table foo(a jsonb);
postgres=# select * from foo;
┌───────────────────────────────────────────────────────────────────┐
│ a │
╞═══════════════════════════════════════════════════════════════════╡
│ [0, null, null, null, null, null, null, null, null, null, "ahoj"] │
└───────────────────────────────────────────────────────────────────┘
(1 row)
┌───────────────────────────────────────────────────────────────────┐
│ a │
╞═══════════════════════════════════════════════════════════════════╡
│ [0, null, null, null, null, null, null, null, null, null, "ahoj"] │
└───────────────────────────────────────────────────────────────────┘
(1 row)
It is working like I expect
but
postgres=# truncate foo;
TRUNCATE TABLE
postgres=# insert into foo values('[]');
INSERT 0 1
postgres=# update foo set a[10] = 'ahoj';
UPDATE 1
postgres=# select * from foo;
┌──────────┐
│ a │
╞══════════╡
│ ["ahoj"] │
└──────────┘
(1 row)
TRUNCATE TABLE
postgres=# insert into foo values('[]');
INSERT 0 1
postgres=# update foo set a[10] = 'ahoj';
UPDATE 1
postgres=# select * from foo;
┌──────────┐
│ a │
╞══════════╡
│ ["ahoj"] │
└──────────┘
(1 row)
Other parts look well. The plpgsql support is not part of this patch, but it can be the next step. Implemented feature is interesting enough - it is a simple user friendly interface for work with jsonb and in future with other types.
Regards
Pavel
--
Justin
> st 9. 9. 2020 v 23:04 Justin Pryzby <pryzby@telsasoft.com> wrote: > > This seems to already hit a merge conflict (8febfd185). > Would you re-rebase ? Thanks. Sure, will post a rebased version soon. > On Tue, Sep 15, 2020 at 08:42:40PM +0200, Pavel Stehule wrote: > > Maybe I found a another issue. > > create table foo(a jsonb); > > postgres=# select * from foo; > ┌───────────────────────────────────────────────────────────────────┐ > │ a │ > ╞═══════════════════════════════════════════════════════════════════╡ > │ [0, null, null, null, null, null, null, null, null, null, "ahoj"] │ > └───────────────────────────────────────────────────────────────────┘ > (1 row) > > It is working like I expect > > but > > postgres=# truncate foo; > TRUNCATE TABLE > postgres=# insert into foo values('[]'); > INSERT 0 1 > postgres=# update foo set a[10] = 'ahoj'; > UPDATE 1 > postgres=# select * from foo; > ┌──────────┐ > │ a │ > ╞══════════╡ > │ ["ahoj"] │ > └──────────┘ > (1 row) Thanks for looking at the last patch, I appreciate! The situation you've mention is an interesting edge case. If I understand correctly, the first example is the result of some operations leading to filling gaps between 0 and "ahoj". In the second case there is no such gap that's why nothing was "filled in", although one could expect presence of a "start position" and fill with nulls everything from it to the new element, is that what you mean?
st 16. 9. 2020 v 11:36 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> st 9. 9. 2020 v 23:04 Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> This seems to already hit a merge conflict (8febfd185).
> Would you re-rebase ?
Thanks. Sure, will post a rebased version soon.
> On Tue, Sep 15, 2020 at 08:42:40PM +0200, Pavel Stehule wrote:
>
> Maybe I found a another issue.
>
> create table foo(a jsonb);
>
> postgres=# select * from foo;
> ┌───────────────────────────────────────────────────────────────────┐
> │ a │
> ╞═══════════════════════════════════════════════════════════════════╡
> │ [0, null, null, null, null, null, null, null, null, null, "ahoj"] │
> └───────────────────────────────────────────────────────────────────┘
> (1 row)
>
> It is working like I expect
>
> but
>
> postgres=# truncate foo;
> TRUNCATE TABLE
> postgres=# insert into foo values('[]');
> INSERT 0 1
> postgres=# update foo set a[10] = 'ahoj';
> UPDATE 1
> postgres=# select * from foo;
> ┌──────────┐
> │ a │
> ╞══════════╡
> │ ["ahoj"] │
> └──────────┘
> (1 row)
Thanks for looking at the last patch, I appreciate! The situation you've
mention is an interesting edge case. If I understand correctly, the
first example is the result of some operations leading to filling gaps
between 0 and "ahoj". In the second case there is no such gap that's why
nothing was "filled in", although one could expect presence of a "start
position" and fill with nulls everything from it to the new element, is
that what you mean?
I expect any time
a[10] := 10;
? a[10] --> 10
===
postgres=# truncate foo;
TRUNCATE TABLE
postgres=# insert into foo values('[]');
INSERT 0 1
postgres=# update foo set a[10] = 'AHOJ';
UPDATE 1
postgres=# select (a)[10] from foo;
┌───┐
│ a │
╞═══╡
│ ∅ │
└───┘
(1 row)
TRUNCATE TABLE
postgres=# insert into foo values('[]');
INSERT 0 1
postgres=# update foo set a[10] = 'AHOJ';
UPDATE 1
postgres=# select (a)[10] from foo;
┌───┐
│ a │
╞═══╡
│ ∅ │
└───┘
(1 row)
There should be consistency
postgres=# create table foo2(a text[]);
CREATE TABLE
postgres=# insert into foo2 values('{}');
INSERT 0 1
postgres=# update foo set a[10] = 'AHOJ';
UPDATE 1
postgres=# select (a)[10] from foo;
┌────────┐
│ a │
╞════════╡
│ "AHOJ" │
└────────┘
(1 row)
CREATE TABLE
postgres=# insert into foo2 values('{}');
INSERT 0 1
postgres=# update foo set a[10] = 'AHOJ';
UPDATE 1
postgres=# select (a)[10] from foo;
┌────────┐
│ a │
╞════════╡
│ "AHOJ" │
└────────┘
(1 row)
and some natural behaviour - any special case with different behaviour is a bad thing generally.
Regards
Pavel
On Wed, Sep 16, 2020 at 01:52:27PM +0200, Pavel Stehule wrote: > and some natural behaviour - any special case with different behaviour is a > bad thing generally. Please note that v33 of the patch fails to apply, speaking of at least 0001. Could you provide a rebase? -- Michael
Attachment
> On Wed, Sep 16, 2020 at 01:52:27PM +0200, Pavel Stehule wrote: > > > On Tue, Sep 15, 2020 at 08:42:40PM +0200, Pavel Stehule wrote: > > > > > > Maybe I found a another issue. > > > > > > create table foo(a jsonb); > > > > > > postgres=# select * from foo; > > > ┌───────────────────────────────────────────────────────────────────┐ > > > │ a │ > > > ╞═══════════════════════════════════════════════════════════════════╡ > > > │ [0, null, null, null, null, null, null, null, null, null, "ahoj"] │ > > > └───────────────────────────────────────────────────────────────────┘ > > > (1 row) > > > > > > It is working like I expect > > > > > > but > > > > > > postgres=# truncate foo; > > > TRUNCATE TABLE > > > postgres=# insert into foo values('[]'); > > > INSERT 0 1 > > > postgres=# update foo set a[10] = 'ahoj'; > > > UPDATE 1 > > > postgres=# select * from foo; > > > ┌──────────┐ > > > │ a │ > > > ╞══════════╡ > > > │ ["ahoj"] │ > > > └──────────┘ > > > (1 row) > > > > Thanks for looking at the last patch, I appreciate! The situation you've > > mention is an interesting edge case. If I understand correctly, the > > first example is the result of some operations leading to filling gaps > > between 0 and "ahoj". In the second case there is no such gap that's why > > nothing was "filled in", although one could expect presence of a "start > > position" and fill with nulls everything from it to the new element, is > > that what you mean? > > > > I expect any time > > a[10] := 10; > > ? a[10] --> 10 > > === > > > postgres=# truncate foo; > TRUNCATE TABLE > postgres=# insert into foo values('[]'); > INSERT 0 1 > postgres=# update foo set a[10] = 'AHOJ'; > UPDATE 1 > postgres=# select (a)[10] from foo; > ┌───┐ > │ a │ > ╞═══╡ > │ ∅ │ > └───┘ > (1 row) > > There should be consistency > > postgres=# create table foo2(a text[]); > CREATE TABLE > postgres=# insert into foo2 values('{}'); > INSERT 0 1 > postgres=# update foo set a[10] = 'AHOJ'; > UPDATE 1 > postgres=# select (a)[10] from foo; > ┌────────┐ > │ a │ > ╞════════╡ > │ "AHOJ" │ > └────────┘ > (1 row) > > and some natural behaviour - any special case with different behaviour is a > bad thing generally. Yeah, I see your point. IIRC there is no notion of an arbitrary index in jsonb array, so it needs to be done within an assignment operation similar to how the last patch fills the gaps between elements. Taking into account, that if there are more than one elements in the array, all the gaps should be filled and the behaviour is already the same as you described, what needs to be changed is more nulls need to be added around before the first element depending on the assignment index. I have my concerns about the performance side of this implementation as well as how surprising this would be for users, but at the same time the patch already does something similar and the code change should not be that big, so why not - I can include this change into the next rebased version. But it still can cause some confusion as it's not going to work for negative indices, so update foo set a[-10] = 1; and select a[-10] from foo; can return different value from what was assigned. Otherwise, if we will try to fix a[-10] assignment in the same way, it will prepend the array and a[10] will not return the same value.
> On Thu, Sep 17, 2020 at 01:44:48PM +0900, Michael Paquier wrote: > On Wed, Sep 16, 2020 at 01:52:27PM +0200, Pavel Stehule wrote: > > and some natural behaviour - any special case with different behaviour is a > > bad thing generally. > > Please note that v33 of the patch fails to apply, speaking of at least > 0001. Could you provide a rebase? Sure, I just want to make sure what, if anything, needs to be changed or included into the rebased version besides the rebase itself. Will post.
> There should be consistency
>
> postgres=# create table foo2(a text[]);
> CREATE TABLE
> postgres=# insert into foo2 values('{}');
> INSERT 0 1
> postgres=# update foo set a[10] = 'AHOJ';
> UPDATE 1
> postgres=# select (a)[10] from foo;
> ┌────────┐
> │ a │
> ╞════════╡
> │ "AHOJ" │
> └────────┘
> (1 row)
>
> and some natural behaviour - any special case with different behaviour is a
> bad thing generally.
Yeah, I see your point. IIRC there is no notion of an arbitrary index in
jsonb array, so it needs to be done within an assignment operation
similar to how the last patch fills the gaps between elements. Taking
into account, that if there are more than one elements in the array, all
the gaps should be filled and the behaviour is already the same as you
described, what needs to be changed is more nulls need to be added
around before the first element depending on the assignment index.
I have my concerns about the performance side of this implementation as
well as how surprising this would be for users, but at the same time the
patch already does something similar and the code change should not be
that big, so why not - I can include this change into the next rebased
version. But it still can cause some confusion as it's not going to work
for negative indices, so
update foo set a[-10] = 1;
and
select a[-10] from foo;
can return different value from what was assigned. Otherwise, if we will
try to fix a[-10] assignment in the same way, it will prepend the array
and a[10] will not return the same value.
What is semantic of negative index? It has clean semantic in C, but in PLpgSQL?
> On Thu, Sep 17, 2020 at 02:47:54PM +0200, Pavel Stehule wrote: > > I have my concerns about the performance side of this implementation as > > well as how surprising this would be for users, but at the same time the > > patch already does something similar and the code change should not be > > that big, so why not - I can include this change into the next rebased > > version. But it still can cause some confusion as it's not going to work > > for negative indices, so > > > > update foo set a[-10] = 1; > > > > and > > > > select a[-10] from foo; > > > > can return different value from what was assigned. Otherwise, if we will > > try to fix a[-10] assignment in the same way, it will prepend the array > > and a[10] will not return the same value. > > What is semantic of negative index? It has clean semantic in C, but in > PLpgSQL? It's just a common pattern for jsonb when a negative index count from the end of an array. I believe it was like that from the very earlier implementations, although can't comment on that from the semantic point of view.
čt 17. 9. 2020 v 15:56 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Thu, Sep 17, 2020 at 02:47:54PM +0200, Pavel Stehule wrote:
> > I have my concerns about the performance side of this implementation as
> > well as how surprising this would be for users, but at the same time the
> > patch already does something similar and the code change should not be
> > that big, so why not - I can include this change into the next rebased
> > version. But it still can cause some confusion as it's not going to work
> > for negative indices, so
> >
> > update foo set a[-10] = 1;
> >
> > and
> >
> > select a[-10] from foo;
> >
> > can return different value from what was assigned. Otherwise, if we will
> > try to fix a[-10] assignment in the same way, it will prepend the array
> > and a[10] will not return the same value.
>
> What is semantic of negative index? It has clean semantic in C, but in
> PLpgSQL?
It's just a common pattern for jsonb when a negative index count from
the end of an array. I believe it was like that from the very earlier
implementations, although can't comment on that from the semantic point
of view.
ok, then I think we can design some workable behaviour
My first rule - there should not be any implicit action that shifts positions in the array. It can be explicit, but not implicit. It is true for positive indexes, and it should be true for negative indexes too.
then I think so some like this can work
if (idx < 0)
{
if (abs(idx) > length of array)
exception("index is of of range");
array[length of array - idx] := value;
}
else
{
/* known behave for positive index */
}
Regards
Pavel
> On Thu, Sep 17, 2020 at 05:19:19PM +0200, Pavel Stehule wrote: > > ok, then I think we can design some workable behaviour > > My first rule - there should not be any implicit action that shifts > positions in the array. It can be explicit, but not implicit. It is true > for positive indexes, and it should be true for negative indexes too. > > then I think so some like this can work > > if (idx < 0) > { > if (abs(idx) > length of array) > exception("index is of of range"); > array[length of array - idx] := value; > } > else > { > /* known behave for positive index */ > } In this way (returning an error on a negative indices bigger than the number of elements) functionality for assigning via subscripting will be already significantly differ from the original one via jsonb_set. Which in turn could cause a new wave of something similar to "why assigning an SQL NULL as a value returns NULL instead of jsonb?". Taking into account that this is not absolutely new interface, but rather a convenient shortcut for the existing one it probably makes sense to try to find a balance between both consistency with regular array and similarity with already existing jsonb modification functions. Having said that, my impression is that this balance should be not fully shifted towards consistensy with the regular array type, as jsonb array and regular array are fundamentally different in terms of implementation. If any differences are of concern, they should be addressed at different level. At the same time I've already sort of gave up on this patch in the form I wanted to see it anyway, so anything goes if it helps bring it to the finish point. In case if there would be no more arguments from other involved sides, I can post the next version with your suggestion included.
pá 18. 9. 2020 v 13:01 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Thu, Sep 17, 2020 at 05:19:19PM +0200, Pavel Stehule wrote:
>
> ok, then I think we can design some workable behaviour
>
> My first rule - there should not be any implicit action that shifts
> positions in the array. It can be explicit, but not implicit. It is true
> for positive indexes, and it should be true for negative indexes too.
>
> then I think so some like this can work
>
> if (idx < 0)
> {
> if (abs(idx) > length of array)
> exception("index is of of range");
> array[length of array - idx] := value;
> }
> else
> {
> /* known behave for positive index */
> }
In this way (returning an error on a negative indices bigger than the
number of elements) functionality for assigning via subscripting will be
already significantly differ from the original one via jsonb_set. Which
in turn could cause a new wave of something similar to "why assigning an
SQL NULL as a value returns NULL instead of jsonb?". Taking into account
that this is not absolutely new interface, but rather a convenient
shortcut for the existing one it probably makes sense to try to find a
balance between both consistency with regular array and similarity with
already existing jsonb modification functions.
Having said that, my impression is that this balance should be not fully
shifted towards consistensy with the regular array type, as jsonb array
and regular array are fundamentally different in terms of
implementation. If any differences are of concern, they should be
addressed at different level. At the same time I've already sort of gave
up on this patch in the form I wanted to see it anyway, so anything goes
if it helps bring it to the finish point. In case if there would be no
more arguments from other involved sides, I can post the next version
with your suggestion included.
This is a relatively new interface and at this moment we can decide if it will be consistent or not. I have not a problem if I have different functions with different behaviors, but I don't like one interface with slightly different behaviors for different types. I understand your argument about implementing a lighter interface to some existing API. But I think so more important should be consistency in maximall possible rate (where it has sense).
For me "jsonb" can be a very fundamental type in PLpgSQL development - it can bring a lot of dynamic to this environment (it can work perfectly like PL/SQL collection or like Perl dictionary), but for this purpose the behaviour should be well consistent without surprising elements.
Regards
Pavel
> On Fri, Sep 18, 2020 at 07:23:11PM +0200, Pavel Stehule wrote: > > > In this way (returning an error on a negative indices bigger than the > > number of elements) functionality for assigning via subscripting will be > > already significantly differ from the original one via jsonb_set. Which > > in turn could cause a new wave of something similar to "why assigning an > > SQL NULL as a value returns NULL instead of jsonb?". Taking into account > > that this is not absolutely new interface, but rather a convenient > > shortcut for the existing one it probably makes sense to try to find a > > balance between both consistency with regular array and similarity with > > already existing jsonb modification functions. > > > > Having said that, my impression is that this balance should be not fully > > shifted towards consistensy with the regular array type, as jsonb array > > and regular array are fundamentally different in terms of > > implementation. If any differences are of concern, they should be > > addressed at different level. At the same time I've already sort of gave > > up on this patch in the form I wanted to see it anyway, so anything goes > > if it helps bring it to the finish point. In case if there would be no > > more arguments from other involved sides, I can post the next version > > with your suggestion included. > > > > This is a relatively new interface and at this moment we can decide if it > will be consistent or not. I have not a problem if I have different > functions with different behaviors, but I don't like one interface with > slightly different behaviors for different types. I understand your > argument about implementing a lighter interface to some existing API. But I > think so more important should be consistency in maximall possible rate > (where it has sense). > > For me "jsonb" can be a very fundamental type in PLpgSQL development - it > can bring a lot of dynamic to this environment (it can work perfectly like > PL/SQL collection or like Perl dictionary), but for this purpose the > behaviour should be well consistent without surprising elements. And here we are, the rebased version with the following changes: insert into test_jsonb_subscript values (1, '[]'); update test_jsonb_subscript set test_json[5] = 1; select * from test_jsonb_subscript; id | test_json ----+----------------------------------- 1 | [null, null, null, null, null, 1] (1 row) update test_jsonb_subscript set test_json[-8] = 1; ERROR: path element at position 1 is out of range: -8 Thanks for the suggestions!
Attachment
ne 20. 9. 2020 v 17:46 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Fri, Sep 18, 2020 at 07:23:11PM +0200, Pavel Stehule wrote:
>
> > In this way (returning an error on a negative indices bigger than the
> > number of elements) functionality for assigning via subscripting will be
> > already significantly differ from the original one via jsonb_set. Which
> > in turn could cause a new wave of something similar to "why assigning an
> > SQL NULL as a value returns NULL instead of jsonb?". Taking into account
> > that this is not absolutely new interface, but rather a convenient
> > shortcut for the existing one it probably makes sense to try to find a
> > balance between both consistency with regular array and similarity with
> > already existing jsonb modification functions.
> >
> > Having said that, my impression is that this balance should be not fully
> > shifted towards consistensy with the regular array type, as jsonb array
> > and regular array are fundamentally different in terms of
> > implementation. If any differences are of concern, they should be
> > addressed at different level. At the same time I've already sort of gave
> > up on this patch in the form I wanted to see it anyway, so anything goes
> > if it helps bring it to the finish point. In case if there would be no
> > more arguments from other involved sides, I can post the next version
> > with your suggestion included.
> >
>
> This is a relatively new interface and at this moment we can decide if it
> will be consistent or not. I have not a problem if I have different
> functions with different behaviors, but I don't like one interface with
> slightly different behaviors for different types. I understand your
> argument about implementing a lighter interface to some existing API. But I
> think so more important should be consistency in maximall possible rate
> (where it has sense).
>
> For me "jsonb" can be a very fundamental type in PLpgSQL development - it
> can bring a lot of dynamic to this environment (it can work perfectly like
> PL/SQL collection or like Perl dictionary), but for this purpose the
> behaviour should be well consistent without surprising elements.
And here we are, the rebased version with the following changes:
insert into test_jsonb_subscript values (1, '[]');
update test_jsonb_subscript set test_json[5] = 1;
select * from test_jsonb_subscript;
id | test_json
----+-----------------------------------
1 | [null, null, null, null, null, 1]
(1 row)
update test_jsonb_subscript set test_json[-8] = 1;
ERROR: path element at position 1 is out of range: -8
Thanks for the suggestions!
Thank you for accepting my suggestions.
I checked this set of patches and it looks well.
I have only one minor comment. I understand the error message, but I am not sure if without deeper knowledge I can understand.
+update test_jsonb_subscript set test_json[-8] = 1;
+ERROR: path element at position 1 is out of range: -8
+ERROR: path element at position 1 is out of range: -8
Maybe 'value of subscript "-8" is out of range'. Current error message is fully correct - but people probably have to think "what is a path element at position 1?" It doesn't look intuitive.
Do you have some idea?
My comment is minor, and I mark this patch with pleasure as ready for committer.
patching and compiling - without problems
implemented functionality - I like it
Building doc - without problems
make check-world - passed
Regards
Pavel
> On Fri, Sep 25, 2020 at 06:43:38PM +0200, Pavel Stehule wrote: > > I checked this set of patches and it looks well. > > I have only one minor comment. I understand the error message, but I am not > sure if without deeper knowledge I can understand. > > +update test_jsonb_subscript set test_json[-8] = 1; > +ERROR: path element at position 1 is out of range: -8 > > Maybe 'value of subscript "-8" is out of range'. Current error message is > fully correct - but people probably have to think "what is a path element > at position 1?" It doesn't look intuitive. > > Do you have some idea? Interesting question. I've borrowed this error message format from other parts of setPath function where it appears couple of times and unfortunately can't suggest anything better. In case it this patch will get lucky enough to attract someone else, maybe we can leave it to a committer, what do you think? > My comment is minor, and I mark this patch with pleasure as ready for > committer. > > patching and compiling - without problems > implemented functionality - I like it > Building doc - without problems > make check-world - passed Thanks!
po 28. 9. 2020 v 11:36 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Fri, Sep 25, 2020 at 06:43:38PM +0200, Pavel Stehule wrote:
>
> I checked this set of patches and it looks well.
>
> I have only one minor comment. I understand the error message, but I am not
> sure if without deeper knowledge I can understand.
>
> +update test_jsonb_subscript set test_json[-8] = 1;
> +ERROR: path element at position 1 is out of range: -8
>
> Maybe 'value of subscript "-8" is out of range'. Current error message is
> fully correct - but people probably have to think "what is a path element
> at position 1?" It doesn't look intuitive.
>
> Do you have some idea?
Interesting question. I've borrowed this error message format from other
parts of setPath function where it appears couple of times and
unfortunately can't suggest anything better. In case it this patch will
get lucky enough to attract someone else, maybe we can leave it to a
committer, what do you think?
ok
> My comment is minor, and I mark this patch with pleasure as ready for
> committer.
>
> patching and compiling - without problems
> implemented functionality - I like it
> Building doc - without problems
> make check-world - passed
Thanks!
Hi! I've started to review this patch. My first question is whether we're able to handle different subscript types differently. For instance, one day we could handle jsonpath subscripts for jsonb. And for sure, jsonpath subscripts are expected to be handled differently from text subscripts. I see we can distinguish types during in prepare and validate functions. But it seems there is no type information in fetch and assign functions. Should we add something like this to the SubscriptingRefState for future usage? Datum uppertypeoid[MAX_SUBSCRIPT_DEPTH]; Datum lowertypeoid[MAX_SUBSCRIPT_DEPTH]; ------ Regards, Alexander Korotkov
> On Fri, Nov 27, 2020 at 12:13:48PM +0300, Alexander Korotkov wrote: > > Hi! > > I've started to review this patch. Thanks! > My first question is whether we're > able to handle different subscript types differently. For instance, > one day we could handle jsonpath subscripts for jsonb. And for sure, > jsonpath subscripts are expected to be handled differently from text > subscripts. I see we can distinguish types during in prepare and > validate functions. But it seems there is no type information in > fetch and assign functions. Should we add something like this to the > SubscriptingRefState for future usage? > > Datum uppertypeoid[MAX_SUBSCRIPT_DEPTH]; > Datum lowertypeoid[MAX_SUBSCRIPT_DEPTH]; Yes, makes sense. My original idea was that it could be done within the jsonpath support patch itself, but at the same time providing these fields into SubscriptingRefState will help other potential extensions. Having said that, maybe it would be even better to introduce a field with an opaque structure for both SubscriptingRefState and SubscriptingRef, where every implementation of custom subscripting can store any necessary information? In case of jsonpath it could keep type information acquired in prepare function, which would be then passed via SubscriptingRefState down to the fetch/assign.
On Mon, Nov 30, 2020 at 2:33 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > On Fri, Nov 27, 2020 at 12:13:48PM +0300, Alexander Korotkov wrote: > > I've started to review this patch. > > Thanks! > > > My first question is whether we're > > able to handle different subscript types differently. For instance, > > one day we could handle jsonpath subscripts for jsonb. And for sure, > > jsonpath subscripts are expected to be handled differently from text > > subscripts. I see we can distinguish types during in prepare and > > validate functions. But it seems there is no type information in > > fetch and assign functions. Should we add something like this to the > > SubscriptingRefState for future usage? > > > > Datum uppertypeoid[MAX_SUBSCRIPT_DEPTH]; > > Datum lowertypeoid[MAX_SUBSCRIPT_DEPTH]; > > Yes, makes sense. My original idea was that it could be done within the > jsonpath support patch itself, but at the same time providing these > fields into SubscriptingRefState will help other potential extensions. > > Having said that, maybe it would be even better to introduce a field > with an opaque structure for both SubscriptingRefState and > SubscriptingRef, where every implementation of custom subscripting can > store any necessary information? In case of jsonpath it could keep type > information acquired in prepare function, which would be then passed via > SubscriptingRefState down to the fetch/assign. The idea of an opaque field in SubscriptingRef structure is more attractive to me. Could you please implement it? ------ Regards, Alexander Korotkov
> On Mon, Nov 30, 2020 at 04:12:29PM +0300, Alexander Korotkov wrote: > > > > My first question is whether we're > > > able to handle different subscript types differently. For instance, > > > one day we could handle jsonpath subscripts for jsonb. And for sure, > > > jsonpath subscripts are expected to be handled differently from text > > > subscripts. I see we can distinguish types during in prepare and > > > validate functions. But it seems there is no type information in > > > fetch and assign functions. Should we add something like this to the > > > SubscriptingRefState for future usage? > > > > > > Datum uppertypeoid[MAX_SUBSCRIPT_DEPTH]; > > > Datum lowertypeoid[MAX_SUBSCRIPT_DEPTH]; > > > > Yes, makes sense. My original idea was that it could be done within the > > jsonpath support patch itself, but at the same time providing these > > fields into SubscriptingRefState will help other potential extensions. > > > > Having said that, maybe it would be even better to introduce a field > > with an opaque structure for both SubscriptingRefState and > > SubscriptingRef, where every implementation of custom subscripting can > > store any necessary information? In case of jsonpath it could keep type > > information acquired in prepare function, which would be then passed via > > SubscriptingRefState down to the fetch/assign. > > The idea of an opaque field in SubscriptingRef structure is more > attractive to me. Could you please implement it? Sure, doesn't seem to be that much work.
> On Mon, Nov 30, 2020 at 02:26:19PM +0100, Dmitry Dolgov wrote: > > On Mon, Nov 30, 2020 at 04:12:29PM +0300, Alexander Korotkov wrote: > > > > > > My first question is whether we're > > > > able to handle different subscript types differently. For instance, > > > > one day we could handle jsonpath subscripts for jsonb. And for sure, > > > > jsonpath subscripts are expected to be handled differently from text > > > > subscripts. I see we can distinguish types during in prepare and > > > > validate functions. But it seems there is no type information in > > > > fetch and assign functions. Should we add something like this to the > > > > SubscriptingRefState for future usage? > > > > > > > > Datum uppertypeoid[MAX_SUBSCRIPT_DEPTH]; > > > > Datum lowertypeoid[MAX_SUBSCRIPT_DEPTH]; > > > > > > Yes, makes sense. My original idea was that it could be done within the > > > jsonpath support patch itself, but at the same time providing these > > > fields into SubscriptingRefState will help other potential extensions. > > > > > > Having said that, maybe it would be even better to introduce a field > > > with an opaque structure for both SubscriptingRefState and > > > SubscriptingRef, where every implementation of custom subscripting can > > > store any necessary information? In case of jsonpath it could keep type > > > information acquired in prepare function, which would be then passed via > > > SubscriptingRefState down to the fetch/assign. > > > > The idea of an opaque field in SubscriptingRef structure is more > > attractive to me. Could you please implement it? > > Sure, doesn't seem to be that much work. The attached implementation should be enough I guess, as fetch/assign are executed in a child memory context of one where prepare does the stuff.
Attachment
Dmitry Dolgov <9erthalion6@gmail.com> writes: >> On Mon, Nov 30, 2020 at 02:26:19PM +0100, Dmitry Dolgov wrote: >>> On Mon, Nov 30, 2020 at 04:12:29PM +0300, Alexander Korotkov wrote: >>> The idea of an opaque field in SubscriptingRef structure is more >>> attractive to me. Could you please implement it? >> Sure, doesn't seem to be that much work. I just happened to notice this bit. This idea is a complete nonstarter. You cannot have an "opaque" field in a parsetree node, because then the backend/nodes code has no idea what to do with it for copy/compare/outfuncs/readfuncs. The patch seems to be of the opinion that "do nothing" is adequate, which it completely isn't. Perhaps this is a good juncture at which to remind people that parse tree nodes are read-only so far as the executor is concerned, so storing something there only at execution time won't work either. regards, tom lane
So ... one of the things that's been worrying me about this patch from day one is whether it would create a noticeable performance penalty for existing use-cases. I did a small amount of experimentation about that with the v35 patchset, and it didn't take long at all to find that this: --- cut --- create or replace function arraytest(n int) returns void as $$ declare a int[]; begin a := array[1, 1]; for i in 3..n loop a[i] := a[i-1] - a[i-2]; end loop; end; $$ language plpgsql stable; \timing on select arraytest(10000000); --- cut --- is about 15% slower with the patch than with HEAD. I'm not sure what an acceptable penalty might be, but 15% is certainly not it. I'm also not quite sure where the cost is going. It looks like 0001+0002 aren't doing much to the executor except introducing one level of subroutine call, which doesn't seem like it'd account for that. I don't think this can be considered RFC until the performance issue is addressed. regards, tom lane
> On Wed, Dec 02, 2020 at 12:58:51PM -0500, Tom Lane wrote: > So ... one of the things that's been worrying me about this patch > from day one is whether it would create a noticeable performance > penalty for existing use-cases. I did a small amount of experimentation > about that with the v35 patchset, and it didn't take long at all to > find that this: > > --- cut --- > create or replace function arraytest(n int) returns void as > $$ > declare > a int[]; > begin > a := array[1, 1]; > for i in 3..n loop > a[i] := a[i-1] - a[i-2]; > end loop; > end; > $$ > language plpgsql stable; > > \timing on > > select arraytest(10000000); > --- cut --- > > is about 15% slower with the patch than with HEAD. I'm not sure > what an acceptable penalty might be, but 15% is certainly not it. > > I'm also not quite sure where the cost is going. It looks like > 0001+0002 aren't doing much to the executor except introducing > one level of subroutine call, which doesn't seem like it'd account > for that. I've tried to reproduce that, but get ~2-4% slowdown (with a pinned backend, no turbo etc). Are there any special steps I've probably missed? At the same time, I remember had conducted this sort of tests before when you and others raised the performance degradation question and the main part of the patch was already more or less stable. From what I remember the numbers back then were also rather small.
> On Wed, Dec 02, 2020 at 11:52:54AM -0500, Tom Lane wrote: > Dmitry Dolgov <9erthalion6@gmail.com> writes: > >> On Mon, Nov 30, 2020 at 02:26:19PM +0100, Dmitry Dolgov wrote: > >>> On Mon, Nov 30, 2020 at 04:12:29PM +0300, Alexander Korotkov wrote: > >>> The idea of an opaque field in SubscriptingRef structure is more > >>> attractive to me. Could you please implement it? > > >> Sure, doesn't seem to be that much work. > > I just happened to notice this bit. This idea is a complete nonstarter. > You cannot have an "opaque" field in a parsetree node, because then the > backend/nodes code has no idea what to do with it for > copy/compare/outfuncs/readfuncs. The patch seems to be of the opinion > that "do nothing" is adequate, which it completely isn't. > > Perhaps this is a good juncture at which to remind people that parse > tree nodes are read-only so far as the executor is concerned, so > storing something there only at execution time won't work either. Oh, right, stupid of me. Then I'll just stick with the original Alexanders suggestion.
On Wed, Dec 02, 2020 at 08:18:08PM +0100, Dmitry Dolgov wrote: > > On Wed, Dec 02, 2020 at 12:58:51PM -0500, Tom Lane wrote: > > So ... one of the things that's been worrying me about this patch > > from day one is whether it would create a noticeable performance > > penalty for existing use-cases. I did a small amount of experimentation > > about that with the v35 patchset, and it didn't take long at all to > > find that this: > > --- cut --- > > > > is about 15% slower with the patch than with HEAD. I'm not sure > > what an acceptable penalty might be, but 15% is certainly not it. > > > > I'm also not quite sure where the cost is going. It looks like > > 0001+0002 aren't doing much to the executor except introducing > > one level of subroutine call, which doesn't seem like it'd account > > for that. > > I've tried to reproduce that, but get ~2-4% slowdown (with a pinned > backend, no turbo etc). Are there any special steps I've probably > missed? At the same time, I remember had conducted this sort of tests > before when you and others raised the performance degradation question > and the main part of the patch was already more or less stable. From > what I remember the numbers back then were also rather small. Are you comparing with casserts (and therefor MEMORY_CONTEXT_CHECKING) disabled? -- Justin
> On Wed, Dec 02, 2020 at 01:20:10PM -0600, Justin Pryzby wrote: > On Wed, Dec 02, 2020 at 08:18:08PM +0100, Dmitry Dolgov wrote: > > > On Wed, Dec 02, 2020 at 12:58:51PM -0500, Tom Lane wrote: > > > So ... one of the things that's been worrying me about this patch > > > from day one is whether it would create a noticeable performance > > > penalty for existing use-cases. I did a small amount of experimentation > > > about that with the v35 patchset, and it didn't take long at all to > > > find that this: > > > --- cut --- > > > > > > is about 15% slower with the patch than with HEAD. I'm not sure > > > what an acceptable penalty might be, but 15% is certainly not it. > > > > > > I'm also not quite sure where the cost is going. It looks like > > > 0001+0002 aren't doing much to the executor except introducing > > > one level of subroutine call, which doesn't seem like it'd account > > > for that. > > > > I've tried to reproduce that, but get ~2-4% slowdown (with a pinned > > backend, no turbo etc). Are there any special steps I've probably > > missed? At the same time, I remember had conducted this sort of tests > > before when you and others raised the performance degradation question > > and the main part of the patch was already more or less stable. From > > what I remember the numbers back then were also rather small. > > Are you comparing with casserts (and therefor MEMORY_CONTEXT_CHECKING) disabled? Yep, they're disabled.
Dmitry Dolgov <9erthalion6@gmail.com> writes: >> On Wed, Dec 02, 2020 at 12:58:51PM -0500, Tom Lane wrote: >> So ... one of the things that's been worrying me about this patch >> from day one is whether it would create a noticeable performance >> penalty for existing use-cases. I did a small amount of experimentation >> about that with the v35 patchset, and it didn't take long at all to >> find that this: >> ... >> is about 15% slower with the patch than with HEAD. I'm not sure >> what an acceptable penalty might be, but 15% is certainly not it. > I've tried to reproduce that, but get ~2-4% slowdown (with a pinned > backend, no turbo etc). Are there any special steps I've probably > missed? Hmm, no, I just built with --disable-cassert and otherwise my usual development options. I had experimented with some other variants of the test case, where the repeated statement is a[i] := i; -- about the same a[i] := a[i-1] + 1; -- 7% slower a[i] := a[i-1] - a[i-2]; -- 15% slower so it seems clear that the penalty is on the array fetch not array assign side. This isn't too surprising now that I think about it, because plpgsql's array assignment code is untouched by the patch (which is a large feature omission BTW: you still can't write jsonb['x'] := y; in plpgsql). regards, tom lane
st 2. 12. 2020 v 21:02 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Dmitry Dolgov <9erthalion6@gmail.com> writes:
>> On Wed, Dec 02, 2020 at 12:58:51PM -0500, Tom Lane wrote:
>> So ... one of the things that's been worrying me about this patch
>> from day one is whether it would create a noticeable performance
>> penalty for existing use-cases. I did a small amount of experimentation
>> about that with the v35 patchset, and it didn't take long at all to
>> find that this:
>> ...
>> is about 15% slower with the patch than with HEAD. I'm not sure
>> what an acceptable penalty might be, but 15% is certainly not it.
> I've tried to reproduce that, but get ~2-4% slowdown (with a pinned
> backend, no turbo etc). Are there any special steps I've probably
> missed?
Hmm, no, I just built with --disable-cassert and otherwise my usual
development options.
I had experimented with some other variants of the test case,
where the repeated statement is
a[i] := i; -- about the same
a[i] := a[i-1] + 1; -- 7% slower
a[i] := a[i-1] - a[i-2]; -- 15% slower
so it seems clear that the penalty is on the array fetch not array
assign side. This isn't too surprising now that I think about it,
because plpgsql's array assignment code is untouched by the patch
(which is a large feature omission BTW: you still can't write
jsonb['x'] := y;
The refactoring of the left part of the assignment statement in plpgsql probably can be harder work than this patch. But it should be the next step.
in plpgsql).
I tested the last patch on my FC33 Lenovo T520 (I7) and I don't see 15% slowdown too .. On my comp there is a slowdown of about 1.5-3%. I used your function arraytest.
Regards
Pavel
regards, tom lane
On Wed, Dec 2, 2020 at 10:18 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > On Wed, Dec 02, 2020 at 11:52:54AM -0500, Tom Lane wrote: > > Dmitry Dolgov <9erthalion6@gmail.com> writes: > > >> On Mon, Nov 30, 2020 at 02:26:19PM +0100, Dmitry Dolgov wrote: > > >>> On Mon, Nov 30, 2020 at 04:12:29PM +0300, Alexander Korotkov wrote: > > >>> The idea of an opaque field in SubscriptingRef structure is more > > >>> attractive to me. Could you please implement it? > > > > >> Sure, doesn't seem to be that much work. > > > > I just happened to notice this bit. This idea is a complete nonstarter. > > You cannot have an "opaque" field in a parsetree node, because then the > > backend/nodes code has no idea what to do with it for > > copy/compare/outfuncs/readfuncs. The patch seems to be of the opinion > > that "do nothing" is adequate, which it completely isn't. > > > > Perhaps this is a good juncture at which to remind people that parse > > tree nodes are read-only so far as the executor is concerned, so > > storing something there only at execution time won't work either. > > Oh, right, stupid of me. Then I'll just stick with the original > Alexanders suggestion. Stupid me too :) I didn't get we can't add opaque field to SubscriptingRefState without adding it to SubscriptingRef, which has to support copy/compare/outfuncs/readfuncs ------ Regards, Alexander Korotkov
Alexander Korotkov <aekorotkov@gmail.com> writes: > I didn't get we can't add opaque field to SubscriptingRefState without > adding it to SubscriptingRef, which has to support > copy/compare/outfuncs/readfuncs Umm ... all depends on what you envision putting in there. There certainly can be an opaque field in SubscriptingRefState as long as the subscript-mechanism-specific code is responsible for setting it up. You just can't pass such a thing through the earlier phases. regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes: > I tested the last patch on my FC33 Lenovo T520 (I7) and I don't see 15% > slowdown too .. On my comp there is a slowdown of about 1.5-3%. I used your > function arraytest. After repeating the experiment a few times, I think I was misled by ASLR variance (ie, hot code falling within or across cache lines depending on where the backend executable gets loaded). I'd tried a couple of postmaster restarts, but seemingly not enough to expose the full variance in runtime that's due to that. I do still see a 2% or so penalty when comparing best-case runtimes, which is consistent with other people's reports. However, 2% is still more than I want to pay for this feature, and after studying the patchset awhile I don't think it's tried hard at all on execution efficiency. We should eliminate the ExecEvalSubscriptingRef* interface layer altogether, and have ExecInterpExpr dispatch directly to container-type-specific code, so that we end up with approximately the same call depth as before. With the patches shown below, we are (as best I can tell) right about on par with the existing code's runtime. This patch also gets rid of a few more undesirable assumptions in the core executor --- for instance, AFAICS there is no need for *any* hard-wired limit on the number of subscripts within the core executor. (What a particular container type chooses to support is its business, of course.) We also need not back off on the specificity of error messages, since the complaints that were in ExecEvalSubscriptingRef* are now in container-type-specific code. There were other things not to like about the way v35 chose to refactor the executor support. In particular, I don't think it understood the point of having the EEOP_SBSREF_SUBSCRIPT steps, which is to only transform the subscript Datums to internal form once, even when we have to use them twice in OLD and ASSIGN steps. Admittedly DatumGetInt32 is pretty cheap, but this cannot be said of reading text datums as the 0003 patch wishes to do. (BTW, 0003 is seriously buggy in that regard, as it's failing to cope with toasted or even short-header inputs. We really don't want to detoast twice, so that has to be dealt with in the SUBSCRIPT step.) I also felt that processing the subscripts one-at-a-time wasn't necessarily a great solution, as one can imagine container semantics where they need to be handled more holistically. So I replaced EEOP_SBSREF_SUBSCRIPT with EEOP_SBSREF_SUBSCRIPTS, which is executed just once after all the subscript Datums have been collected. (This does mean that we lose the optimization of short-circuiting as soon as we've found a NULL subscript, but I'm not troubled by that. I note in particular that the core code shouldn't be forcing a particular view of what to do with null subscripts onto all container types.) The two patches attached cover the same territory as v35's 0001 and 0002, but I split it up differently because I didn't see much point in a division that has a nonfunctional code state in the middle. 0001 below is just concerned with revising things enough so that the core executor doesn't have any assumption about a maximum number of subscripts. Then 0002 incorporates what was in v35 0001+0002, revised with what seems to me a better set of execution APIs. There are a bunch of loose ends yet, the first three introduced by me and the rest being pre-existing problems: * I don't have a lot of confidence in the LLVM changes --- they seem to work, but I don't really understand that code, and in particular I don't understand the difference between TypeParamBool and TypeStorageBool. So there might be something subtly wrong with the code generation for EEOP_SBSREF_SUBSCRIPTS. * As things stand here, there's no difference among the expression step types EEOP_SBSREF_OLD, EEOP_SBSREF_ASSIGN, and EEOP_SBSREF_FETCH; they dispatch to different support routines but the core executor's behavior is identical. So we could fold them all into one step type, and lose nothing except perhaps debugging visibility. Should we do that, or keep them separate? * I've not rebased v35-0003 and later onto this design, and don't intend to do so myself. * The patchset adds a CREATE TYPE option, but fails to provide any pg_dump support for that option. (There's no test coverage either. Maybe further on, we should extend hstore or another contrib type to have subscripting support, if only to have testing of that?) * CREATE TYPE fails to create a dependency from a type to its subscripting function. (Related to which, the changes to the GenerateTypeDependencies call in TypeShellMake are surely wrong.) * findTypeSubscriptingFunction contains dead code (not to mention sadly incorrect comments). * What is refnestedfunc? That sure seems to be dead code. * I'm not on board with including refindexprslice in the transformed expression, either. AFAICS that is the untransformed subscript list, which has *no* business being included in the finished parsetree. Probably that needs to be passed to the type-specific transform/validate code separately. * I've not really reviewed the parse analysis changes, but what is the motivation for separating the prepare and validate callbacks? It looks like those could be merged. * exprType (and exprTypeMod, perhaps) seem to be assuming more than they should about subscripting semantics. I think it should be possible for the type-specific code to define what the result type of a subscripting transformation is, without hard-wired rules like these. * The new code added to arrayfuncs.c seems like it doesn't really belong there (the fact that it forces adding a ton of new #include's is a good sign that it doesn't fit with the existing code there). I'm inclined to propose that we should break that out into a new .c file, maybe "arraysubs.c". * The proposed documentation in 0004 is pretty poor. You might as well drop all of xsubscripting.sgml and just say "look at the existing code for examples". (Splitting the array interface code out into a new file would help with that, too, as there'd be a well-defined set of code to point to.) regards, tom lane diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c index 79b325c7cf..c8382e9381 100644 --- a/src/backend/executor/execExpr.c +++ b/src/backend/executor/execExpr.c @@ -2523,11 +2523,19 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, ExprState *state, Datum *resv, bool *resnull) { bool isAssignment = (sbsref->refassgnexpr != NULL); - SubscriptingRefState *sbsrefstate = palloc0(sizeof(SubscriptingRefState)); + int nupper = list_length(sbsref->refupperindexpr); + int nlower = list_length(sbsref->reflowerindexpr); + SubscriptingRefState *sbsrefstate; + char *ptr; List *adjust_jumps = NIL; ListCell *lc; int i; + /* Allocate enough space for per-subscript arrays too */ + sbsrefstate = palloc0(MAXALIGN(sizeof(SubscriptingRefState)) + + (nupper + nlower) * (sizeof(Datum) + + 2 * sizeof(bool))); + /* Fill constant fields of SubscriptingRefState */ sbsrefstate->isassignment = isAssignment; sbsrefstate->refelemtype = sbsref->refelemtype; @@ -2536,6 +2544,22 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, &sbsrefstate->refelemlength, &sbsrefstate->refelembyval, &sbsrefstate->refelemalign); + sbsrefstate->numupper = nupper; + sbsrefstate->numlower = nlower; + /* Set up per-subscript arrays */ + ptr = ((char *) sbsrefstate) + MAXALIGN(sizeof(SubscriptingRefState)); + sbsrefstate->upperindex = (Datum *) ptr; + ptr += nupper * sizeof(Datum); + sbsrefstate->lowerindex = (Datum *) ptr; + ptr += nlower * sizeof(Datum); + sbsrefstate->upperprovided = (bool *) ptr; + ptr += nupper * sizeof(bool); + sbsrefstate->lowerprovided = (bool *) ptr; + ptr += nlower * sizeof(bool); + sbsrefstate->upperindexnull = (bool *) ptr; + ptr += nupper * sizeof(bool); + sbsrefstate->lowerindexnull = (bool *) ptr; + /* ptr += nlower * sizeof(bool); */ /* * Evaluate array input. It's safe to do so into resv/resnull, because we @@ -2548,7 +2572,8 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, /* * If refexpr yields NULL, and it's a fetch, then result is NULL. We can * implement this with just JUMP_IF_NULL, since we evaluated the array - * into the desired target location. + * into the desired target location. (XXX is it okay to impose these + * semantics on all forms of subscripting?) */ if (!isAssignment) { @@ -2559,19 +2584,6 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, state->steps_len - 1); } - /* Verify subscript list lengths are within limit */ - if (list_length(sbsref->refupperindexpr) > MAXDIM) - ereport(ERROR, - (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), - errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", - list_length(sbsref->refupperindexpr), MAXDIM))); - - if (list_length(sbsref->reflowerindexpr) > MAXDIM) - ereport(ERROR, - (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), - errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", - list_length(sbsref->reflowerindexpr), MAXDIM))); - /* Evaluate upper subscripts */ i = 0; foreach(lc, sbsref->refupperindexpr) @@ -2582,28 +2594,18 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, if (!e) { sbsrefstate->upperprovided[i] = false; - i++; - continue; + sbsrefstate->upperindexnull[i] = true; + } + else + { + sbsrefstate->upperprovided[i] = true; + /* Each subscript is evaluated into appropriate array entry */ + ExecInitExprRec(e, state, + &sbsrefstate->upperindex[i], + &sbsrefstate->upperindexnull[i]); } - - sbsrefstate->upperprovided[i] = true; - - /* Each subscript is evaluated into subscriptvalue/subscriptnull */ - ExecInitExprRec(e, state, - &sbsrefstate->subscriptvalue, &sbsrefstate->subscriptnull); - - /* ... and then SBSREF_SUBSCRIPT saves it into step's workspace */ - scratch->opcode = EEOP_SBSREF_SUBSCRIPT; - scratch->d.sbsref_subscript.state = sbsrefstate; - scratch->d.sbsref_subscript.off = i; - scratch->d.sbsref_subscript.isupper = true; - scratch->d.sbsref_subscript.jumpdone = -1; /* adjust later */ - ExprEvalPushStep(state, scratch); - adjust_jumps = lappend_int(adjust_jumps, - state->steps_len - 1); i++; } - sbsrefstate->numupper = i; /* Evaluate lower subscripts similarly */ i = 0; @@ -2615,33 +2617,26 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, if (!e) { sbsrefstate->lowerprovided[i] = false; - i++; - continue; + sbsrefstate->lowerindexnull[i] = true; + } + else + { + sbsrefstate->lowerprovided[i] = true; + /* Each subscript is evaluated into appropriate array entry */ + ExecInitExprRec(e, state, + &sbsrefstate->lowerindex[i], + &sbsrefstate->lowerindexnull[i]); } - - sbsrefstate->lowerprovided[i] = true; - - /* Each subscript is evaluated into subscriptvalue/subscriptnull */ - ExecInitExprRec(e, state, - &sbsrefstate->subscriptvalue, &sbsrefstate->subscriptnull); - - /* ... and then SBSREF_SUBSCRIPT saves it into step's workspace */ - scratch->opcode = EEOP_SBSREF_SUBSCRIPT; - scratch->d.sbsref_subscript.state = sbsrefstate; - scratch->d.sbsref_subscript.off = i; - scratch->d.sbsref_subscript.isupper = false; - scratch->d.sbsref_subscript.jumpdone = -1; /* adjust later */ - ExprEvalPushStep(state, scratch); - adjust_jumps = lappend_int(adjust_jumps, - state->steps_len - 1); i++; } - sbsrefstate->numlower = i; - /* Should be impossible if parser is sane, but check anyway: */ - if (sbsrefstate->numlower != 0 && - sbsrefstate->numupper != sbsrefstate->numlower) - elog(ERROR, "upper and lower index lists are not same length"); + /* SBSREF_SUBSCRIPTS checks and converts all the subscripts at once */ + scratch->opcode = EEOP_SBSREF_SUBSCRIPTS; + scratch->d.sbsref_subscript.state = sbsrefstate; + scratch->d.sbsref_subscript.jumpdone = -1; /* adjust later */ + ExprEvalPushStep(state, scratch); + adjust_jumps = lappend_int(adjust_jumps, + state->steps_len - 1); if (isAssignment) { @@ -2686,7 +2681,6 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, scratch->opcode = EEOP_SBSREF_ASSIGN; scratch->d.sbsref.state = sbsrefstate; ExprEvalPushStep(state, scratch); - } else { @@ -2694,7 +2688,6 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, scratch->opcode = EEOP_SBSREF_FETCH; scratch->d.sbsref.state = sbsrefstate; ExprEvalPushStep(state, scratch); - } /* adjust jump targets */ @@ -2702,7 +2695,7 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, { ExprEvalStep *as = &state->steps[lfirst_int(lc)]; - if (as->opcode == EEOP_SBSREF_SUBSCRIPT) + if (as->opcode == EEOP_SBSREF_SUBSCRIPTS) { Assert(as->d.sbsref_subscript.jumpdone == -1); as->d.sbsref_subscript.jumpdone = state->steps_len; diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index c09371ad58..1853405026 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -417,7 +417,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) &&CASE_EEOP_FIELDSELECT, &&CASE_EEOP_FIELDSTORE_DEFORM, &&CASE_EEOP_FIELDSTORE_FORM, - &&CASE_EEOP_SBSREF_SUBSCRIPT, + &&CASE_EEOP_SBSREF_SUBSCRIPTS, &&CASE_EEOP_SBSREF_OLD, &&CASE_EEOP_SBSREF_ASSIGN, &&CASE_EEOP_SBSREF_FETCH, @@ -1396,9 +1396,9 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) EEO_NEXT(); } - EEO_CASE(EEOP_SBSREF_SUBSCRIPT) + EEO_CASE(EEOP_SBSREF_SUBSCRIPTS) { - /* Process an array subscript */ + /* Process array subscript(s) */ /* too complex for an inline implementation */ if (ExecEvalSubscriptingRef(state, op)) @@ -3123,42 +3123,87 @@ ExecEvalFieldStoreForm(ExprState *state, ExprEvalStep *op, ExprContext *econtext } /* - * Process a subscript in a SubscriptingRef expression. + * Process the subscripts in a SubscriptingRef expression. * - * If subscript is NULL, throw error in assignment case, or in fetch case + * If any subscript is NULL, throw error in assignment case, or in fetch case * set result to NULL and return false (instructing caller to skip the rest * of the SubscriptingRef sequence). * - * Subscript expression result is in subscriptvalue/subscriptnull. - * On success, integer subscript value has been saved in upperindex[] or - * lowerindex[] for use later. + * We convert all the subscripts to plain integers and save them in the + * sbsrefstate->workspace array. */ bool ExecEvalSubscriptingRef(ExprState *state, ExprEvalStep *op) { - SubscriptingRefState *sbsrefstate = op->d.sbsref_subscript.state; + SubscriptingRefState *sbsrefstate = op->d.sbsref.state; int *indexes; - int off; - /* If any index expr yields NULL, result is NULL or error */ - if (sbsrefstate->subscriptnull) + /* + * Allocate workspace if first time through. This is also a good place to + * enforce the implementation limit on number of array subscripts. + */ + if (sbsrefstate->workspace == NULL) { - if (sbsrefstate->isassignment) + if (sbsrefstate->numupper > MAXDIM) ereport(ERROR, - (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), - errmsg("array subscript in assignment must not be null"))); - *op->resnull = true; - return false; + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), + errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", + sbsrefstate->numupper, MAXDIM))); + + /* Should be impossible if parser is sane, but check anyway: */ + if (sbsrefstate->numlower != 0 && + sbsrefstate->numupper != sbsrefstate->numlower) + elog(ERROR, "upper and lower index lists are not same length"); + + /* + * Workspace always has room for MAXDIM subscripts even if we don't + * have that many. This is necessary because array_get/set_slice may + * scribble on the extra entries. + */ + sbsrefstate->workspace = + MemoryContextAlloc(GetMemoryChunkContext(sbsrefstate), + 2 * MAXDIM * sizeof(int)); } - /* Convert datum to int, save in appropriate place */ - if (op->d.sbsref_subscript.isupper) - indexes = sbsrefstate->upperindex; - else - indexes = sbsrefstate->lowerindex; - off = op->d.sbsref_subscript.off; + /* Process upper subscripts */ + indexes = (int *) sbsrefstate->workspace; + for (int i = 0; i < sbsrefstate->numupper; i++) + { + if (sbsrefstate->upperprovided[i]) + { + /* If any index expr yields NULL, result is NULL or error */ + if (sbsrefstate->upperindexnull[i]) + { + if (sbsrefstate->isassignment) + ereport(ERROR, + (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), + errmsg("array subscript in assignment must not be null"))); + *op->resnull = true; + return false; + } + indexes[i] = DatumGetInt32(sbsrefstate->upperindex[i]); + } + } - indexes[off] = DatumGetInt32(sbsrefstate->subscriptvalue); + /* Likewise for lower subscripts */ + indexes += MAXDIM; + for (int i = 0; i < sbsrefstate->numlower; i++) + { + if (sbsrefstate->lowerprovided[i]) + { + /* If any index expr yields NULL, result is NULL or error */ + if (sbsrefstate->lowerindexnull[i]) + { + if (sbsrefstate->isassignment) + ereport(ERROR, + (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), + errmsg("array subscript in assignment must not be null"))); + *op->resnull = true; + return false; + } + indexes[i] = DatumGetInt32(sbsrefstate->lowerindex[i]); + } + } return true; } @@ -3166,12 +3211,15 @@ ExecEvalSubscriptingRef(ExprState *state, ExprEvalStep *op) /* * Evaluate SubscriptingRef fetch. * - * Source container is in step's result variable. + * Source container is in step's result variable, + * and indexes have already been evaluated into workspace array. */ void ExecEvalSubscriptingRefFetch(ExprState *state, ExprEvalStep *op) { SubscriptingRefState *sbsrefstate = op->d.sbsref.state; + int *upperindex = (int *) sbsrefstate->workspace; + int *lowerindex = upperindex + MAXDIM; /* Should not get here if source container (or any subscript) is null */ Assert(!(*op->resnull)); @@ -3181,7 +3229,7 @@ ExecEvalSubscriptingRefFetch(ExprState *state, ExprEvalStep *op) /* Scalar case */ *op->resvalue = array_get_element(*op->resvalue, sbsrefstate->numupper, - sbsrefstate->upperindex, + upperindex, sbsrefstate->refattrlength, sbsrefstate->refelemlength, sbsrefstate->refelembyval, @@ -3193,8 +3241,8 @@ ExecEvalSubscriptingRefFetch(ExprState *state, ExprEvalStep *op) /* Slice case */ *op->resvalue = array_get_slice(*op->resvalue, sbsrefstate->numupper, - sbsrefstate->upperindex, - sbsrefstate->lowerindex, + upperindex, + lowerindex, sbsrefstate->upperprovided, sbsrefstate->lowerprovided, sbsrefstate->refattrlength, @@ -3214,6 +3262,8 @@ void ExecEvalSubscriptingRefOld(ExprState *state, ExprEvalStep *op) { SubscriptingRefState *sbsrefstate = op->d.sbsref.state; + int *upperindex = (int *) sbsrefstate->workspace; + int *lowerindex = upperindex + MAXDIM; if (*op->resnull) { @@ -3226,7 +3276,7 @@ ExecEvalSubscriptingRefOld(ExprState *state, ExprEvalStep *op) /* Scalar case */ sbsrefstate->prevvalue = array_get_element(*op->resvalue, sbsrefstate->numupper, - sbsrefstate->upperindex, + upperindex, sbsrefstate->refattrlength, sbsrefstate->refelemlength, sbsrefstate->refelembyval, @@ -3239,8 +3289,8 @@ ExecEvalSubscriptingRefOld(ExprState *state, ExprEvalStep *op) /* this is currently unreachable */ sbsrefstate->prevvalue = array_get_slice(*op->resvalue, sbsrefstate->numupper, - sbsrefstate->upperindex, - sbsrefstate->lowerindex, + upperindex, + lowerindex, sbsrefstate->upperprovided, sbsrefstate->lowerprovided, sbsrefstate->refattrlength, @@ -3260,7 +3310,9 @@ ExecEvalSubscriptingRefOld(ExprState *state, ExprEvalStep *op) void ExecEvalSubscriptingRefAssign(ExprState *state, ExprEvalStep *op) { - SubscriptingRefState *sbsrefstate = op->d.sbsref_subscript.state; + SubscriptingRefState *sbsrefstate = op->d.sbsref.state; + int *upperindex = (int *) sbsrefstate->workspace; + int *lowerindex = upperindex + MAXDIM; /* * For an assignment to a fixed-length container type, both the original @@ -3290,7 +3342,7 @@ ExecEvalSubscriptingRefAssign(ExprState *state, ExprEvalStep *op) /* Scalar case */ *op->resvalue = array_set_element(*op->resvalue, sbsrefstate->numupper, - sbsrefstate->upperindex, + upperindex, sbsrefstate->replacevalue, sbsrefstate->replacenull, sbsrefstate->refattrlength, @@ -3303,8 +3355,8 @@ ExecEvalSubscriptingRefAssign(ExprState *state, ExprEvalStep *op) /* Slice case */ *op->resvalue = array_set_slice(*op->resvalue, sbsrefstate->numupper, - sbsrefstate->upperindex, - sbsrefstate->lowerindex, + upperindex, + lowerindex, sbsrefstate->upperprovided, sbsrefstate->lowerprovided, sbsrefstate->replacevalue, diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c index f232397cab..bc9b6771e3 100644 --- a/src/backend/jit/llvm/llvmjit_expr.c +++ b/src/backend/jit/llvm/llvmjit_expr.c @@ -1746,7 +1746,7 @@ llvm_compile_expr(ExprState *state) LLVMBuildBr(b, opblocks[opno + 1]); break; - case EEOP_SBSREF_SUBSCRIPT: + case EEOP_SBSREF_SUBSCRIPTS: { int jumpdone = op->d.sbsref_subscript.jumpdone; LLVMValueRef v_ret; diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c index a7ea7656c7..4c8a739bc4 100644 --- a/src/backend/utils/adt/arrayfuncs.c +++ b/src/backend/utils/adt/arrayfuncs.c @@ -2044,7 +2044,8 @@ array_get_element_expanded(Datum arraydatum, * array bound. * * NOTE: we assume it is OK to scribble on the provided subscript arrays - * lowerIndx[] and upperIndx[]. These are generally just temporaries. + * lowerIndx[] and upperIndx[]; also, these arrays must be of size MAXDIM + * even when nSubscripts is less. These are generally just temporaries. */ Datum array_get_slice(Datum arraydatum, @@ -2772,7 +2773,8 @@ array_set_element_expanded(Datum arraydatum, * (XXX TODO: allow a corresponding behavior for multidimensional arrays) * * NOTE: we assume it is OK to scribble on the provided index arrays - * lowerIndx[] and upperIndx[]. These are generally just temporaries. + * lowerIndx[] and upperIndx[]; also, these arrays must be of size MAXDIM + * even when nSubscripts is less. These are generally just temporaries. * * NOTE: For assignments, we throw an error for silly subscripts etc, * rather than returning a NULL or empty array as the fetch operations do. diff --git a/src/include/c.h b/src/include/c.h index b21e4074dd..12ea056a35 100644 --- a/src/include/c.h +++ b/src/include/c.h @@ -592,13 +592,9 @@ typedef uint32 CommandId; #define InvalidCommandId (~(CommandId)0) /* - * Array indexing support + * Maximum number of array subscripts, for regular varlena arrays */ #define MAXDIM 6 -typedef struct -{ - int indx[MAXDIM]; -} IntArray; /* ---------------- * Variable-length datatypes all share the 'struct varlena' header. diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h index abb489e206..b768c30b74 100644 --- a/src/include/executor/execExpr.h +++ b/src/include/executor/execExpr.h @@ -185,8 +185,8 @@ typedef enum ExprEvalOp */ EEOP_FIELDSTORE_FORM, - /* Process a container subscript; short-circuit expression to NULL if NULL */ - EEOP_SBSREF_SUBSCRIPT, + /* Process container subscripts; possibly short-circuit result to NULL */ + EEOP_SBSREF_SUBSCRIPTS, /* * Compute old container element/slice when a SubscriptingRef assignment @@ -494,13 +494,11 @@ typedef struct ExprEvalStep int ncolumns; } fieldstore; - /* for EEOP_SBSREF_SUBSCRIPT */ + /* for EEOP_SBSREF_SUBSCRIPTS */ struct { /* too big to have inline */ struct SubscriptingRefState *state; - int off; /* 0-based index of this subscript */ - bool isupper; /* is it upper or lower subscript? */ int jumpdone; /* jump here on null */ } sbsref_subscript; @@ -646,20 +644,21 @@ typedef struct SubscriptingRefState bool refelembyval; /* is the element type pass-by-value? */ char refelemalign; /* typalign of the element type */ - /* numupper and upperprovided[] are filled at compile time */ - /* at runtime, extracted subscript datums get stored in upperindex[] */ + /* workspace for type-specific subscripting code */ + void *workspace; + + /* numupper and upperprovided[] are filled at expression compile time */ + /* at runtime, subscripts are computed in upperindex[]/upperindexnull[] */ int numupper; - bool upperprovided[MAXDIM]; - int upperindex[MAXDIM]; + bool *upperprovided; /* indicates if this position is supplied */ + Datum *upperindex; + bool *upperindexnull; /* similarly for lower indexes, if any */ int numlower; - bool lowerprovided[MAXDIM]; - int lowerindex[MAXDIM]; - - /* subscript expressions get evaluated into here */ - Datum subscriptvalue; - bool subscriptnull; + bool *lowerprovided; + Datum *lowerindex; + bool *lowerindexnull; /* for assignment, new value to assign is evaluated into here */ Datum replacevalue; diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c index 70cfdb2c9d..f5835e89dd 100644 --- a/contrib/pg_stat_statements/pg_stat_statements.c +++ b/contrib/pg_stat_statements/pg_stat_statements.c @@ -2858,6 +2858,7 @@ JumbleExpr(pgssJumbleState *jstate, Node *node) JumbleExpr(jstate, (Node *) sbsref->reflowerindexpr); JumbleExpr(jstate, (Node *) sbsref->refexpr); JumbleExpr(jstate, (Node *) sbsref->refassgnexpr); + APP_JUMB(sbsref->refnestedfunc); } break; case T_FuncExpr: diff --git a/src/backend/catalog/Catalog.pm b/src/backend/catalog/Catalog.pm index dd39a086ce..b4dfa26518 100644 --- a/src/backend/catalog/Catalog.pm +++ b/src/backend/catalog/Catalog.pm @@ -384,6 +384,7 @@ sub GenerateArrayTypes # Arrays require INT alignment, unless the element type requires # DOUBLE alignment. $array_type{typalign} = $elem_type->{typalign} eq 'd' ? 'd' : 'i'; + $array_type{typsubshandler} = 'array_subscript_handler'; # Fill in the rest of the array entry's fields. foreach my $column (@$pgtype_schema) diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 4cd7d76938..d157f6f5ac 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -1091,7 +1091,8 @@ AddNewRelationType(const char *typeName, -1, /* typmod */ 0, /* array dimensions for typBaseType */ false, /* Type NOT NULL */ - InvalidOid); /* rowtypes never have a collation */ + InvalidOid, /* rowtypes never have a collation */ + InvalidOid); /* typsubshandler - none */ } /* -------------------------------- @@ -1370,7 +1371,8 @@ heap_create_with_catalog(const char *relname, -1, /* typmod */ 0, /* array dimensions for typBaseType */ false, /* Type NOT NULL */ - InvalidOid); /* rowtypes never have a collation */ + InvalidOid, /* rowtypes never have a collation */ + F_ARRAY_SUBSCRIPT_HANDLER); /* array implementation */ pfree(relarrayname); } diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c index aeb4a54f63..bd67512e26 100644 --- a/src/backend/catalog/pg_type.c +++ b/src/backend/catalog/pg_type.c @@ -119,6 +119,7 @@ TypeShellMake(const char *typeName, Oid typeNamespace, Oid ownerId) values[Anum_pg_type_typtypmod - 1] = Int32GetDatum(-1); values[Anum_pg_type_typndims - 1] = Int32GetDatum(0); values[Anum_pg_type_typcollation - 1] = ObjectIdGetDatum(InvalidOid); + values[Anum_pg_type_typsubshandler - 1] = ObjectIdGetDatum(InvalidOid); nulls[Anum_pg_type_typdefaultbin - 1] = true; nulls[Anum_pg_type_typdefault - 1] = true; nulls[Anum_pg_type_typacl - 1] = true; @@ -159,10 +160,10 @@ TypeShellMake(const char *typeName, Oid typeNamespace, Oid ownerId) GenerateTypeDependencies(tup, pg_type_desc, NULL, - NULL, 0, false, false, + InvalidOid, false); /* Post creation hook for new shell type */ @@ -220,7 +221,8 @@ TypeCreate(Oid newTypeOid, int32 typeMod, int32 typNDims, /* Array dimensions for baseType */ bool typeNotNull, - Oid typeCollation) + Oid typeCollation, + Oid subscriptingHandlerProcedure) { Relation pg_type_desc; Oid typeObjectId; @@ -373,6 +375,7 @@ TypeCreate(Oid newTypeOid, values[Anum_pg_type_typtypmod - 1] = Int32GetDatum(typeMod); values[Anum_pg_type_typndims - 1] = Int32GetDatum(typNDims); values[Anum_pg_type_typcollation - 1] = ObjectIdGetDatum(typeCollation); + values[Anum_pg_type_typsubshandler - 1] = ObjectIdGetDatum(subscriptingHandlerProcedure); /* * initialize the default binary value for this type. Check for nulls of diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c index 483bb65ddc..596c6cf3ca 100644 --- a/src/backend/commands/typecmds.c +++ b/src/backend/commands/typecmds.c @@ -115,6 +115,7 @@ static Oid findTypeSendFunction(List *procname, Oid typeOid); static Oid findTypeTypmodinFunction(List *procname); static Oid findTypeTypmodoutFunction(List *procname); static Oid findTypeAnalyzeFunction(List *procname, Oid typeOid); +static Oid findTypeSubscriptingFunction(List *procname, Oid typeOid, bool parseFunc); static Oid findRangeSubOpclass(List *opcname, Oid subtype); static Oid findRangeCanonicalFunction(List *procname, Oid typeOid); static Oid findRangeSubtypeDiffFunction(List *procname, Oid subtype); @@ -149,6 +150,7 @@ DefineType(ParseState *pstate, List *names, List *parameters) List *typmodinName = NIL; List *typmodoutName = NIL; List *analyzeName = NIL; + List *subscriptingParseName = NIL; char category = TYPCATEGORY_USER; bool preferred = false; char delimiter = DEFAULT_TYPDELIM; @@ -167,6 +169,7 @@ DefineType(ParseState *pstate, List *names, List *parameters) DefElem *typmodinNameEl = NULL; DefElem *typmodoutNameEl = NULL; DefElem *analyzeNameEl = NULL; + DefElem *subscriptingParseNameEl = NULL; DefElem *categoryEl = NULL; DefElem *preferredEl = NULL; DefElem *delimiterEl = NULL; @@ -188,6 +191,7 @@ DefineType(ParseState *pstate, List *names, List *parameters) Oid typoid; ListCell *pl; ObjectAddress address; + Oid subscriptingParseOid = InvalidOid; /* * As of Postgres 8.4, we require superuser privilege to create a base @@ -288,6 +292,8 @@ DefineType(ParseState *pstate, List *names, List *parameters) else if (strcmp(defel->defname, "analyze") == 0 || strcmp(defel->defname, "analyse") == 0) defelp = &analyzeNameEl; + else if (strcmp(defel->defname, "subscripting_handler") == 0) + defelp = &subscriptingParseNameEl; else if (strcmp(defel->defname, "category") == 0) defelp = &categoryEl; else if (strcmp(defel->defname, "preferred") == 0) @@ -358,6 +364,8 @@ DefineType(ParseState *pstate, List *names, List *parameters) typmodoutName = defGetQualifiedName(typmodoutNameEl); if (analyzeNameEl) analyzeName = defGetQualifiedName(analyzeNameEl); + if (subscriptingParseNameEl) + subscriptingParseName = defGetQualifiedName(subscriptingParseNameEl); if (categoryEl) { char *p = defGetString(categoryEl); @@ -482,6 +490,10 @@ DefineType(ParseState *pstate, List *names, List *parameters) if (analyzeName) analyzeOid = findTypeAnalyzeFunction(analyzeName, typoid); + if (subscriptingParseName) + subscriptingParseOid = findTypeSubscriptingFunction(subscriptingParseName, + typoid, true); + /* * Check permissions on functions. We choose to require the creator/owner * of a type to also own the underlying functions. Since creating a type @@ -563,7 +575,8 @@ DefineType(ParseState *pstate, List *names, List *parameters) -1, /* typMod (Domains only) */ 0, /* Array Dimensions of typbasetype */ false, /* Type NOT NULL */ - collation); /* type's collation */ + collation, /* type's collation */ + subscriptingParseOid); /* subscripting procedure */ Assert(typoid == address.objectId); /* @@ -604,7 +617,8 @@ DefineType(ParseState *pstate, List *names, List *parameters) -1, /* typMod (Domains only) */ 0, /* Array dimensions of typbasetype */ false, /* Type NOT NULL */ - collation); /* type's collation */ + collation, /* type's collation */ + F_ARRAY_SUBSCRIPT_HANDLER); pfree(array_type); @@ -667,6 +681,7 @@ DefineDomain(CreateDomainStmt *stmt) Oid receiveProcedure; Oid sendProcedure; Oid analyzeProcedure; + Oid subscriptingHandlerProcedure; bool byValue; char category; char delimiter; @@ -800,6 +815,9 @@ DefineDomain(CreateDomainStmt *stmt) /* Analysis function */ analyzeProcedure = baseType->typanalyze; + /* Subscripting functions */ + subscriptingHandlerProcedure = baseType->typsubshandler; + /* Inherited default value */ datum = SysCacheGetAttr(TYPEOID, typeTup, Anum_pg_type_typdefault, &isnull); @@ -1005,7 +1023,8 @@ DefineDomain(CreateDomainStmt *stmt) basetypeMod, /* typeMod value */ typNDims, /* Array dimensions for base type */ typNotNull, /* Type NOT NULL */ - domaincoll); /* type's collation */ + domaincoll, /* type's collation */ + subscriptingHandlerProcedure); /* subscripting procedure */ /* * Create the array type that goes with it. @@ -1045,7 +1064,8 @@ DefineDomain(CreateDomainStmt *stmt) -1, /* typMod (Domains only) */ 0, /* Array dimensions of typbasetype */ false, /* Type NOT NULL */ - domaincoll); /* type's collation */ + domaincoll, /* type's collation */ + F_ARRAY_SUBSCRIPT_HANDLER); /* array subscripting implementation */ pfree(domainArrayName); @@ -1160,7 +1180,8 @@ DefineEnum(CreateEnumStmt *stmt) -1, /* typMod (Domains only) */ 0, /* Array dimensions of typbasetype */ false, /* Type NOT NULL */ - InvalidOid); /* type's collation */ + InvalidOid, /* type's collation */ + InvalidOid); /* typsubshandler - none */ /* Enter the enum's values into pg_enum */ EnumValuesCreate(enumTypeAddr.objectId, stmt->vals); @@ -1200,7 +1221,8 @@ DefineEnum(CreateEnumStmt *stmt) -1, /* typMod (Domains only) */ 0, /* Array dimensions of typbasetype */ false, /* Type NOT NULL */ - InvalidOid); /* type's collation */ + InvalidOid, /* type's collation */ + F_ARRAY_SUBSCRIPT_HANDLER); /* array subscripting implementation */ pfree(enumArrayName); @@ -1488,7 +1510,8 @@ DefineRange(CreateRangeStmt *stmt) -1, /* typMod (Domains only) */ 0, /* Array dimensions of typbasetype */ false, /* Type NOT NULL */ - InvalidOid); /* type's collation (ranges never have one) */ + InvalidOid, /* type's collation (ranges never have one) */ + InvalidOid); /* typsubshandler - none */ Assert(typoid == InvalidOid || typoid == address.objectId); typoid = address.objectId; @@ -1531,7 +1554,8 @@ DefineRange(CreateRangeStmt *stmt) -1, /* typMod (Domains only) */ 0, /* Array dimensions of typbasetype */ false, /* Type NOT NULL */ - InvalidOid); /* typcollation */ + InvalidOid, /* typcollation */ + F_ARRAY_SUBSCRIPT_HANDLER); /* array subscripting implementation */ pfree(rangeArrayName); @@ -1904,6 +1928,43 @@ findTypeAnalyzeFunction(List *procname, Oid typeOid) return procOid; } +static Oid +findTypeSubscriptingFunction(List *procname, Oid typeOid, bool parseFunc) +{ + Oid argList[2]; + Oid procOid; + int nargs; + + if (parseFunc) + { + /* + * Subscripting parse functions always take two INTERNAL arguments and + * return INTERNAL. + */ + argList[0] = INTERNALOID; + nargs = 1; + } + else + { + /* + * Subscripting fetch/assign functions always take one typeOid + * argument, one INTERNAL argument and return typeOid. + */ + argList[0] = typeOid; + argList[1] = INTERNALOID; + nargs = 2; + } + + procOid = LookupFuncName(procname, nargs, argList, true); + if (!OidIsValid(procOid)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_FUNCTION), + errmsg("function %s does not exist", + func_signature_string(procname, nargs, NIL, argList)))); + + return procOid; +} + /* * Find suitable support functions and opclasses for a range type. */ diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c index c8382e9381..473e595662 100644 --- a/src/backend/executor/execExpr.c +++ b/src/backend/executor/execExpr.c @@ -40,6 +40,7 @@ #include "miscadmin.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" +#include "nodes/subscripting.h" #include "optimizer/optimizer.h" #include "pgstat.h" #include "utils/acl.h" @@ -2522,6 +2523,7 @@ static void ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, ExprState *state, Datum *resv, bool *resnull) { + SubscriptRoutines *sbsroutines = getSubscriptingRoutines(sbsref->refcontainertype); bool isAssignment = (sbsref->refassgnexpr != NULL); int nupper = list_length(sbsref->refupperindexpr); int nlower = list_length(sbsref->reflowerindexpr); @@ -2538,12 +2540,6 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, /* Fill constant fields of SubscriptingRefState */ sbsrefstate->isassignment = isAssignment; - sbsrefstate->refelemtype = sbsref->refelemtype; - sbsrefstate->refattrlength = get_typlen(sbsref->refcontainertype); - get_typlenbyvalalign(sbsref->refelemtype, - &sbsrefstate->refelemlength, - &sbsrefstate->refelembyval, - &sbsrefstate->refelemalign); sbsrefstate->numupper = nupper; sbsrefstate->numlower = nlower; /* Set up per-subscript arrays */ @@ -2561,6 +2557,14 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, sbsrefstate->lowerindexnull = (bool *) ptr; /* ptr += nlower * sizeof(bool); */ + /* + * Let the container-type-specific code have a chance. It must fill in + * the sbs_subscripts, sbs_fetch, sbs_assign, and sbs_fetch_old function + * pointers for us to possibly use in execution steps below; and it can + * optionally set up some data pointed to by the workspace field. + */ + sbsroutines->exec_setup(sbsref, sbsrefstate); + /* * Evaluate array input. It's safe to do so into resv/resnull, because we * won't use that as target for any of the other subexpressions, and it'll @@ -2632,6 +2636,7 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, /* SBSREF_SUBSCRIPTS checks and converts all the subscripts at once */ scratch->opcode = EEOP_SBSREF_SUBSCRIPTS; + scratch->d.sbsref_subscript.subscriptfunc = sbsrefstate->sbs_subscripts; scratch->d.sbsref_subscript.state = sbsrefstate; scratch->d.sbsref_subscript.jumpdone = -1; /* adjust later */ ExprEvalPushStep(state, scratch); @@ -2660,6 +2665,7 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, if (isAssignmentIndirectionExpr(sbsref->refassgnexpr)) { scratch->opcode = EEOP_SBSREF_OLD; + scratch->d.sbsref.subscriptfunc = sbsrefstate->sbs_fetch_old; scratch->d.sbsref.state = sbsrefstate; ExprEvalPushStep(state, scratch); } @@ -2679,6 +2685,7 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, /* and perform the assignment */ scratch->opcode = EEOP_SBSREF_ASSIGN; + scratch->d.sbsref.subscriptfunc = sbsrefstate->sbs_assign; scratch->d.sbsref.state = sbsrefstate; ExprEvalPushStep(state, scratch); } @@ -2686,6 +2693,7 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, { /* array fetch is much simpler */ scratch->opcode = EEOP_SBSREF_FETCH; + scratch->d.sbsref.subscriptfunc = sbsrefstate->sbs_fetch; scratch->d.sbsref.state = sbsrefstate; ExprEvalPushStep(state, scratch); } diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index 1853405026..a4b71fb554 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -1398,10 +1398,8 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) EEO_CASE(EEOP_SBSREF_SUBSCRIPTS) { - /* Process array subscript(s) */ - - /* too complex for an inline implementation */ - if (ExecEvalSubscriptingRef(state, op)) + /* Process container subscript(s) */ + if (op->d.sbsref_subscript.subscriptfunc(state, op, econtext)) { EEO_NEXT(); } @@ -1419,9 +1417,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) * referenced (via a CaseTestExpr) inside the assignment * expression. */ - - /* too complex for an inline implementation */ - ExecEvalSubscriptingRefOld(state, op); + op->d.sbsref.subscriptfunc(state, op, econtext); EEO_NEXT(); } @@ -1431,19 +1427,17 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) */ EEO_CASE(EEOP_SBSREF_ASSIGN) { - /* too complex for an inline implementation */ - ExecEvalSubscriptingRefAssign(state, op); + op->d.sbsref.subscriptfunc(state, op, econtext); EEO_NEXT(); } /* - * Fetch subset of an array. + * Perform SubscriptingRef fetch */ EEO_CASE(EEOP_SBSREF_FETCH) { - /* too complex for an inline implementation */ - ExecEvalSubscriptingRefFetch(state, op); + op->d.sbsref.subscriptfunc(state, op, econtext); EEO_NEXT(); } @@ -3122,252 +3116,6 @@ ExecEvalFieldStoreForm(ExprState *state, ExprEvalStep *op, ExprContext *econtext *op->resnull = false; } -/* - * Process the subscripts in a SubscriptingRef expression. - * - * If any subscript is NULL, throw error in assignment case, or in fetch case - * set result to NULL and return false (instructing caller to skip the rest - * of the SubscriptingRef sequence). - * - * We convert all the subscripts to plain integers and save them in the - * sbsrefstate->workspace array. - */ -bool -ExecEvalSubscriptingRef(ExprState *state, ExprEvalStep *op) -{ - SubscriptingRefState *sbsrefstate = op->d.sbsref.state; - int *indexes; - - /* - * Allocate workspace if first time through. This is also a good place to - * enforce the implementation limit on number of array subscripts. - */ - if (sbsrefstate->workspace == NULL) - { - if (sbsrefstate->numupper > MAXDIM) - ereport(ERROR, - (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), - errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", - sbsrefstate->numupper, MAXDIM))); - - /* Should be impossible if parser is sane, but check anyway: */ - if (sbsrefstate->numlower != 0 && - sbsrefstate->numupper != sbsrefstate->numlower) - elog(ERROR, "upper and lower index lists are not same length"); - - /* - * Workspace always has room for MAXDIM subscripts even if we don't - * have that many. This is necessary because array_get/set_slice may - * scribble on the extra entries. - */ - sbsrefstate->workspace = - MemoryContextAlloc(GetMemoryChunkContext(sbsrefstate), - 2 * MAXDIM * sizeof(int)); - } - - /* Process upper subscripts */ - indexes = (int *) sbsrefstate->workspace; - for (int i = 0; i < sbsrefstate->numupper; i++) - { - if (sbsrefstate->upperprovided[i]) - { - /* If any index expr yields NULL, result is NULL or error */ - if (sbsrefstate->upperindexnull[i]) - { - if (sbsrefstate->isassignment) - ereport(ERROR, - (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), - errmsg("array subscript in assignment must not be null"))); - *op->resnull = true; - return false; - } - indexes[i] = DatumGetInt32(sbsrefstate->upperindex[i]); - } - } - - /* Likewise for lower subscripts */ - indexes += MAXDIM; - for (int i = 0; i < sbsrefstate->numlower; i++) - { - if (sbsrefstate->lowerprovided[i]) - { - /* If any index expr yields NULL, result is NULL or error */ - if (sbsrefstate->lowerindexnull[i]) - { - if (sbsrefstate->isassignment) - ereport(ERROR, - (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), - errmsg("array subscript in assignment must not be null"))); - *op->resnull = true; - return false; - } - indexes[i] = DatumGetInt32(sbsrefstate->lowerindex[i]); - } - } - - return true; -} - -/* - * Evaluate SubscriptingRef fetch. - * - * Source container is in step's result variable, - * and indexes have already been evaluated into workspace array. - */ -void -ExecEvalSubscriptingRefFetch(ExprState *state, ExprEvalStep *op) -{ - SubscriptingRefState *sbsrefstate = op->d.sbsref.state; - int *upperindex = (int *) sbsrefstate->workspace; - int *lowerindex = upperindex + MAXDIM; - - /* Should not get here if source container (or any subscript) is null */ - Assert(!(*op->resnull)); - - if (sbsrefstate->numlower == 0) - { - /* Scalar case */ - *op->resvalue = array_get_element(*op->resvalue, - sbsrefstate->numupper, - upperindex, - sbsrefstate->refattrlength, - sbsrefstate->refelemlength, - sbsrefstate->refelembyval, - sbsrefstate->refelemalign, - op->resnull); - } - else - { - /* Slice case */ - *op->resvalue = array_get_slice(*op->resvalue, - sbsrefstate->numupper, - upperindex, - lowerindex, - sbsrefstate->upperprovided, - sbsrefstate->lowerprovided, - sbsrefstate->refattrlength, - sbsrefstate->refelemlength, - sbsrefstate->refelembyval, - sbsrefstate->refelemalign); - } -} - -/* - * Compute old container element/slice value for a SubscriptingRef assignment - * expression. Will only be generated if the new-value subexpression - * contains SubscriptingRef or FieldStore. The value is stored into the - * SubscriptingRefState's prevvalue/prevnull fields. - */ -void -ExecEvalSubscriptingRefOld(ExprState *state, ExprEvalStep *op) -{ - SubscriptingRefState *sbsrefstate = op->d.sbsref.state; - int *upperindex = (int *) sbsrefstate->workspace; - int *lowerindex = upperindex + MAXDIM; - - if (*op->resnull) - { - /* whole array is null, so any element or slice is too */ - sbsrefstate->prevvalue = (Datum) 0; - sbsrefstate->prevnull = true; - } - else if (sbsrefstate->numlower == 0) - { - /* Scalar case */ - sbsrefstate->prevvalue = array_get_element(*op->resvalue, - sbsrefstate->numupper, - upperindex, - sbsrefstate->refattrlength, - sbsrefstate->refelemlength, - sbsrefstate->refelembyval, - sbsrefstate->refelemalign, - &sbsrefstate->prevnull); - } - else - { - /* Slice case */ - /* this is currently unreachable */ - sbsrefstate->prevvalue = array_get_slice(*op->resvalue, - sbsrefstate->numupper, - upperindex, - lowerindex, - sbsrefstate->upperprovided, - sbsrefstate->lowerprovided, - sbsrefstate->refattrlength, - sbsrefstate->refelemlength, - sbsrefstate->refelembyval, - sbsrefstate->refelemalign); - sbsrefstate->prevnull = false; - } -} - -/* - * Evaluate SubscriptingRef assignment. - * - * Input container (possibly null) is in result area, replacement value is in - * SubscriptingRefState's replacevalue/replacenull. - */ -void -ExecEvalSubscriptingRefAssign(ExprState *state, ExprEvalStep *op) -{ - SubscriptingRefState *sbsrefstate = op->d.sbsref.state; - int *upperindex = (int *) sbsrefstate->workspace; - int *lowerindex = upperindex + MAXDIM; - - /* - * For an assignment to a fixed-length container type, both the original - * container and the value to be assigned into it must be non-NULL, else - * we punt and return the original container. - */ - if (sbsrefstate->refattrlength > 0) - { - if (*op->resnull || sbsrefstate->replacenull) - return; - } - - /* - * For assignment to varlena arrays, we handle a NULL original array by - * substituting an empty (zero-dimensional) array; insertion of the new - * element will result in a singleton array value. It does not matter - * whether the new element is NULL. - */ - if (*op->resnull) - { - *op->resvalue = PointerGetDatum(construct_empty_array(sbsrefstate->refelemtype)); - *op->resnull = false; - } - - if (sbsrefstate->numlower == 0) - { - /* Scalar case */ - *op->resvalue = array_set_element(*op->resvalue, - sbsrefstate->numupper, - upperindex, - sbsrefstate->replacevalue, - sbsrefstate->replacenull, - sbsrefstate->refattrlength, - sbsrefstate->refelemlength, - sbsrefstate->refelembyval, - sbsrefstate->refelemalign); - } - else - { - /* Slice case */ - *op->resvalue = array_set_slice(*op->resvalue, - sbsrefstate->numupper, - upperindex, - lowerindex, - sbsrefstate->upperprovided, - sbsrefstate->lowerprovided, - sbsrefstate->replacevalue, - sbsrefstate->replacenull, - sbsrefstate->refattrlength, - sbsrefstate->refelemlength, - sbsrefstate->refelembyval, - sbsrefstate->refelemalign); - } -} - /* * Evaluate a rowtype coercion operation. * This may require rearranging field positions. diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c index bc9b6771e3..e7f0d84521 100644 --- a/src/backend/jit/llvm/llvmjit_expr.c +++ b/src/backend/jit/llvm/llvmjit_expr.c @@ -1116,22 +1116,35 @@ llvm_compile_expr(ExprState *state) } case EEOP_SBSREF_OLD: - build_EvalXFunc(b, mod, "ExecEvalSubscriptingRefOld", - v_state, op); - LLVMBuildBr(b, opblocks[opno + 1]); - break; - case EEOP_SBSREF_ASSIGN: - build_EvalXFunc(b, mod, "ExecEvalSubscriptingRefAssign", - v_state, op); - LLVMBuildBr(b, opblocks[opno + 1]); - break; - case EEOP_SBSREF_FETCH: - build_EvalXFunc(b, mod, "ExecEvalSubscriptingRefFetch", - v_state, op); - LLVMBuildBr(b, opblocks[opno + 1]); - break; + { + LLVMTypeRef param_types[3]; + LLVMValueRef v_params[3]; + LLVMTypeRef v_functype; + LLVMValueRef v_func; + + param_types[0] = l_ptr(StructExprState); + param_types[1] = l_ptr(TypeSizeT); + param_types[2] = l_ptr(StructExprContext); + + v_functype = LLVMFunctionType(LLVMVoidType(), + param_types, + lengthof(param_types), + false); + v_func = l_ptr_const(op->d.sbsref.subscriptfunc, + l_ptr(v_functype)); + + v_params[0] = v_state; + v_params[1] = l_ptr_const(op, l_ptr(TypeSizeT)); + v_params[2] = v_econtext; + LLVMBuildCall(b, + v_func, + v_params, lengthof(v_params), ""); + + LLVMBuildBr(b, opblocks[opno + 1]); + break; + } case EEOP_CASE_TESTVAL: { @@ -1749,10 +1762,29 @@ llvm_compile_expr(ExprState *state) case EEOP_SBSREF_SUBSCRIPTS: { int jumpdone = op->d.sbsref_subscript.jumpdone; + LLVMTypeRef param_types[3]; + LLVMValueRef v_params[3]; + LLVMTypeRef v_functype; + LLVMValueRef v_func; LLVMValueRef v_ret; - v_ret = build_EvalXFunc(b, mod, "ExecEvalSubscriptingRef", - v_state, op); + param_types[0] = l_ptr(StructExprState); + param_types[1] = l_ptr(TypeSizeT); + param_types[2] = l_ptr(StructExprContext); + + v_functype = LLVMFunctionType(TypeParamBool, + param_types, + lengthof(param_types), + false); + v_func = l_ptr_const(op->d.sbsref_subscript.subscriptfunc, + l_ptr(v_functype)); + + v_params[0] = v_state; + v_params[1] = l_ptr_const(op, l_ptr(TypeSizeT)); + v_params[2] = v_econtext; + v_ret = LLVMBuildCall(b, + v_func, + v_params, lengthof(v_params), ""); v_ret = LLVMBuildZExt(b, v_ret, TypeStorageBool, ""); LLVMBuildCondBr(b, diff --git a/src/backend/jit/llvm/llvmjit_types.c b/src/backend/jit/llvm/llvmjit_types.c index 1ed3cafa2f..ae3c88aad9 100644 --- a/src/backend/jit/llvm/llvmjit_types.c +++ b/src/backend/jit/llvm/llvmjit_types.c @@ -124,10 +124,6 @@ void *referenced_functions[] = ExecEvalSQLValueFunction, ExecEvalScalarArrayOp, ExecEvalSubPlan, - ExecEvalSubscriptingRef, - ExecEvalSubscriptingRefAssign, - ExecEvalSubscriptingRefFetch, - ExecEvalSubscriptingRefOld, ExecEvalSysVar, ExecEvalWholeRowVar, ExecEvalXmlExpr, diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 910906f639..90aebb270b 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -1548,8 +1548,10 @@ _copySubscriptingRef(const SubscriptingRef *from) COPY_SCALAR_FIELD(refcontainertype); COPY_SCALAR_FIELD(refelemtype); + COPY_SCALAR_FIELD(refassgntype); COPY_SCALAR_FIELD(reftypmod); COPY_SCALAR_FIELD(refcollid); + COPY_SCALAR_FIELD(refnestedfunc); COPY_NODE_FIELD(refupperindexpr); COPY_NODE_FIELD(reflowerindexpr); COPY_NODE_FIELD(refexpr); diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index 687609f59e..9d54830c3d 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -276,8 +276,10 @@ _equalSubscriptingRef(const SubscriptingRef *a, const SubscriptingRef *b) { COMPARE_SCALAR_FIELD(refcontainertype); COMPARE_SCALAR_FIELD(refelemtype); + COMPARE_SCALAR_FIELD(refassgntype); COMPARE_SCALAR_FIELD(reftypmod); COMPARE_SCALAR_FIELD(refcollid); + COMPARE_SCALAR_FIELD(refnestedfunc); COMPARE_NODE_FIELD(refupperindexpr); COMPARE_NODE_FIELD(reflowerindexpr); COMPARE_NODE_FIELD(refexpr); diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c index 1dc873ed25..a78d3b634f 100644 --- a/src/backend/nodes/nodeFuncs.c +++ b/src/backend/nodes/nodeFuncs.c @@ -70,7 +70,7 @@ exprType(const Node *expr) const SubscriptingRef *sbsref = (const SubscriptingRef *) expr; /* slice and/or store operations yield the container type */ - if (sbsref->reflowerindexpr || sbsref->refassgnexpr) + if (IsAssignment(sbsref) || sbsref->reflowerindexpr) type = sbsref->refcontainertype; else type = sbsref->refelemtype; diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 9c73c605a4..7edaa4c3d2 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -1194,8 +1194,10 @@ _outSubscriptingRef(StringInfo str, const SubscriptingRef *node) WRITE_OID_FIELD(refcontainertype); WRITE_OID_FIELD(refelemtype); + WRITE_OID_FIELD(refassgntype); WRITE_INT_FIELD(reftypmod); WRITE_OID_FIELD(refcollid); + WRITE_OID_FIELD(refnestedfunc); WRITE_NODE_FIELD(refupperindexpr); WRITE_NODE_FIELD(reflowerindexpr); WRITE_NODE_FIELD(refexpr); diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c index 169d5581b9..7ece697c75 100644 --- a/src/backend/nodes/readfuncs.c +++ b/src/backend/nodes/readfuncs.c @@ -671,8 +671,10 @@ _readSubscriptingRef(void) READ_OID_FIELD(refcontainertype); READ_OID_FIELD(refelemtype); + READ_OID_FIELD(refassgntype); READ_INT_FIELD(reftypmod); READ_OID_FIELD(refcollid); + READ_OID_FIELD(refnestedfunc); READ_NODE_FIELD(refupperindexpr); READ_NODE_FIELD(reflowerindexpr); READ_NODE_FIELD(refexpr); diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 36002f059d..12e11744ec 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -20,6 +20,7 @@ #include "miscadmin.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" +#include "nodes/subscripting.h" #include "optimizer/optimizer.h" #include "parser/analyze.h" #include "parser/parse_agg.h" @@ -431,6 +432,8 @@ transformIndirection(ParseState *pstate, A_Indirection *ind) { Node *last_srf = pstate->p_last_srf; Node *result = transformExprRecurse(pstate, ind->arg); + SubscriptRoutines *sbsroutines; + SubscriptingRef *sbsref; List *subscripts = NIL; int location = exprLocation(result); ListCell *i; @@ -461,13 +464,20 @@ transformIndirection(ParseState *pstate, A_Indirection *ind) /* process subscripts before this field selection */ if (subscripts) - result = (Node *) transformContainerSubscripts(pstate, - result, - exprType(result), - InvalidOid, - exprTypmod(result), - subscripts, - NULL); + { + sbsref = transformContainerSubscripts(pstate, + result, + exprType(result), + InvalidOid, + exprTypmod(result), + subscripts, + NULL); + + sbsroutines = getSubscriptingRoutines(sbsref->refcontainertype); + sbsref = sbsroutines->prepare(false, sbsref); + sbsroutines->validate(false, sbsref, pstate); + result = (Node *) sbsref; + } subscripts = NIL; newresult = ParseFuncOrColumn(pstate, @@ -484,13 +494,20 @@ transformIndirection(ParseState *pstate, A_Indirection *ind) } /* process trailing subscripts, if any */ if (subscripts) - result = (Node *) transformContainerSubscripts(pstate, - result, - exprType(result), - InvalidOid, - exprTypmod(result), - subscripts, - NULL); + { + sbsref = transformContainerSubscripts(pstate, + result, + exprType(result), + InvalidOid, + exprTypmod(result), + subscripts, + NULL); + + sbsroutines = getSubscriptingRoutines(sbsref->refcontainertype); + sbsref = sbsroutines->prepare(false, sbsref); + sbsroutines->validate(false, sbsref, pstate); + result = (Node *) sbsref; + } return result; } diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c index 6e98fe55fc..31e40acc64 100644 --- a/src/backend/parser/parse_node.c +++ b/src/backend/parser/parse_node.c @@ -184,21 +184,12 @@ pcb_error_callback(void *arg) * transformContainerType() * Identify the types involved in a subscripting operation for container * - * - * On entry, containerType/containerTypmod identify the type of the input value - * to be subscripted (which could be a domain type). These are modified if - * necessary to identify the actual container type and typmod, and the - * container's element type is returned. An error is thrown if the input isn't - * an array type. + * On entry, containerType/containerTypmod are modified if necessary to + * identify the actual container type and typmod. */ -Oid +void transformContainerType(Oid *containerType, int32 *containerTypmod) { - Oid origContainerType = *containerType; - Oid elementType; - HeapTuple type_tuple_container; - Form_pg_type type_struct_container; - /* * If the input is a domain, smash to base type, and extract the actual * typmod to be applied to the base type. Subscripting a domain is an @@ -219,25 +210,6 @@ transformContainerType(Oid *containerType, int32 *containerTypmod) *containerType = INT2ARRAYOID; else if (*containerType == OIDVECTOROID) *containerType = OIDARRAYOID; - - /* Get the type tuple for the container */ - type_tuple_container = SearchSysCache1(TYPEOID, ObjectIdGetDatum(*containerType)); - if (!HeapTupleIsValid(type_tuple_container)) - elog(ERROR, "cache lookup failed for type %u", *containerType); - type_struct_container = (Form_pg_type) GETSTRUCT(type_tuple_container); - - /* needn't check typisdefined since this will fail anyway */ - - elementType = type_struct_container->typelem; - if (elementType == InvalidOid) - ereport(ERROR, - (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("cannot subscript type %s because it is not an array", - format_type_be(origContainerType)))); - - ReleaseSysCache(type_tuple_container); - - return elementType; } /* @@ -254,10 +226,15 @@ transformContainerType(Oid *containerType, int32 *containerTypmod) * container. We produce an expression that represents the new container value * with the source data inserted into the right part of the container. * - * For both cases, if the source container is of a domain-over-array type, - * the result is of the base array type or its element type; essentially, - * we must fold a domain to its base type before applying subscripting. - * (Note that int2vector and oidvector are treated as domains here.) + * For both cases, this function contains only general subscripting logic while + * type-specific logic (e.g. type verifications and coercion) is placed in + * separate procedures indicated by typsubshandler. There is only one exception + * for now about domain-over-container: if the source container is of a + * domain-over-container type, the result is of the base container type or its + * element type; essentially, we must fold a domain to its base type before + * applying subscripting. (Note that int2vector and oidvector are treated as + * domains here.) An error will appear in the case the current container type + * doesn't have a subscripting procedure. * * pstate Parse state * containerBase Already-transformed expression for the container as a whole @@ -284,16 +261,12 @@ transformContainerSubscripts(ParseState *pstate, bool isSlice = false; List *upperIndexpr = NIL; List *lowerIndexpr = NIL; + List *indexprSlice = NIL; ListCell *idx; SubscriptingRef *sbsref; - /* - * Caller may or may not have bothered to determine elementType. Note - * that if the caller did do so, containerType/containerTypMod must be as - * modified by transformContainerType, ie, smash domain to base type. - */ - if (!OidIsValid(elementType)) - elementType = transformContainerType(&containerType, &containerTypMod); + /* Identify the actual container type and element type involved */ + transformContainerType(&containerType, &containerTypMod); /* * A list containing only simple subscripts refers to a single container @@ -327,29 +300,6 @@ transformContainerSubscripts(ParseState *pstate, if (ai->lidx) { subexpr = transformExpr(pstate, ai->lidx, pstate->p_expr_kind); - /* If it's not int4 already, try to coerce */ - subexpr = coerce_to_target_type(pstate, - subexpr, exprType(subexpr), - INT4OID, -1, - COERCION_ASSIGNMENT, - COERCE_IMPLICIT_CAST, - -1); - if (subexpr == NULL) - ereport(ERROR, - (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("array subscript must have type integer"), - parser_errposition(pstate, exprLocation(ai->lidx)))); - } - else if (!ai->is_slice) - { - /* Make a constant 1 */ - subexpr = (Node *) makeConst(INT4OID, - -1, - InvalidOid, - sizeof(int32), - Int32GetDatum(1), - false, - true); /* pass by value */ } else { @@ -357,63 +307,12 @@ transformContainerSubscripts(ParseState *pstate, subexpr = NULL; } lowerIndexpr = lappend(lowerIndexpr, subexpr); + indexprSlice = lappend(indexprSlice, ai); } - else - Assert(ai->lidx == NULL && !ai->is_slice); - - if (ai->uidx) - { - subexpr = transformExpr(pstate, ai->uidx, pstate->p_expr_kind); - /* If it's not int4 already, try to coerce */ - subexpr = coerce_to_target_type(pstate, - subexpr, exprType(subexpr), - INT4OID, -1, - COERCION_ASSIGNMENT, - COERCE_IMPLICIT_CAST, - -1); - if (subexpr == NULL) - ereport(ERROR, - (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("array subscript must have type integer"), - parser_errposition(pstate, exprLocation(ai->uidx)))); - } - else - { - /* Slice with omitted upper bound, put NULL into the list */ - Assert(isSlice && ai->is_slice); - subexpr = NULL; - } + subexpr = transformExpr(pstate, ai->uidx, pstate->p_expr_kind); upperIndexpr = lappend(upperIndexpr, subexpr); } - /* - * If doing an array store, coerce the source value to the right type. - * (This should agree with the coercion done by transformAssignedExpr.) - */ - if (assignFrom != NULL) - { - Oid typesource = exprType(assignFrom); - Oid typeneeded = isSlice ? containerType : elementType; - Node *newFrom; - - newFrom = coerce_to_target_type(pstate, - assignFrom, typesource, - typeneeded, containerTypMod, - COERCION_ASSIGNMENT, - COERCE_IMPLICIT_CAST, - -1); - if (newFrom == NULL) - ereport(ERROR, - (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("array assignment requires type %s" - " but expression is of type %s", - format_type_be(typeneeded), - format_type_be(typesource)), - errhint("You will need to rewrite or cast the expression."), - parser_errposition(pstate, exprLocation(assignFrom)))); - assignFrom = newFrom; - } - /* * Ready to build the SubscriptingRef node. */ @@ -422,13 +321,12 @@ transformContainerSubscripts(ParseState *pstate, sbsref->refassgnexpr = (Expr *) assignFrom; sbsref->refcontainertype = containerType; - sbsref->refelemtype = elementType; sbsref->reftypmod = containerTypMod; /* refcollid will be set by parse_collate.c */ sbsref->refupperindexpr = upperIndexpr; sbsref->reflowerindexpr = lowerIndexpr; + sbsref->refindexprslice = indexprSlice; sbsref->refexpr = (Expr *) containerBase; - sbsref->refassgnexpr = (Expr *) assignFrom; return sbsref; } diff --git a/src/backend/parser/parse_target.c b/src/backend/parser/parse_target.c index 9de0cff833..3230661eac 100644 --- a/src/backend/parser/parse_target.c +++ b/src/backend/parser/parse_target.c @@ -20,6 +20,7 @@ #include "miscadmin.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" +#include "nodes/subscripting.h" #include "parser/parse_coerce.h" #include "parser/parse_expr.h" #include "parser/parse_func.h" @@ -848,27 +849,21 @@ transformAssignmentIndirection(ParseState *pstate, location); } - /* base case: just coerce RHS to match target type ID */ - - result = coerce_to_target_type(pstate, - rhs, exprType(rhs), - targetTypeId, targetTypMod, - COERCION_ASSIGNMENT, - COERCE_IMPLICIT_CAST, - -1); - if (result == NULL) + /* + * Base case: just coerce RHS to match target type ID. It's necessary only + * for field selection, since for subscripting its custom code should + * define types. + */ + if (!targetIsSubscripting) { - if (targetIsSubscripting) - ereport(ERROR, - (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("array assignment to \"%s\" requires type %s" - " but expression is of type %s", - targetName, - format_type_be(targetTypeId), - format_type_be(exprType(rhs))), - errhint("You will need to rewrite or cast the expression."), - parser_errposition(pstate, location))); - else + result = coerce_to_target_type(pstate, + rhs, exprType(rhs), + targetTypeId, targetTypMod, + COERCION_ASSIGNMENT, + COERCE_IMPLICIT_CAST, + -1); + + if (result == NULL) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("subfield \"%s\" is of type %s" @@ -879,6 +874,8 @@ transformAssignmentIndirection(ParseState *pstate, errhint("You will need to rewrite or cast the expression."), parser_errposition(pstate, location))); } + else + result = rhs; return result; } @@ -903,26 +900,39 @@ transformAssignmentSubscripts(ParseState *pstate, Node *result; Oid containerType; int32 containerTypMod; - Oid elementTypeId; - Oid typeNeeded; Oid collationNeeded; + SubscriptingRef *sbsref; + SubscriptRoutines *sbsroutines; Assert(subscripts != NIL); /* Identify the actual array type and element type involved */ containerType = targetTypeId; containerTypMod = targetTypMod; - elementTypeId = transformContainerType(&containerType, &containerTypMod); - /* Identify type that RHS must provide */ - typeNeeded = isSlice ? containerType : elementTypeId; + /* process subscripts */ + sbsref = transformContainerSubscripts(pstate, + basenode, + containerType, + exprType(rhs), + containerTypMod, + subscripts, + rhs); + + sbsroutines = getSubscriptingRoutines(sbsref->refcontainertype); + + /* + * Let custom code provide necessary information about required types: + * refelemtype and refassgntype + */ + sbsref = sbsroutines->prepare(rhs != NULL, sbsref); /* * container normally has same collation as elements, but there's an * exception: we might be subscripting a domain over a container type. In * that case use collation of the base type. */ - if (containerType == targetTypeId) + if (sbsref->refcontainertype == containerType) collationNeeded = targetCollation; else collationNeeded = get_typcollation(containerType); @@ -932,25 +942,22 @@ transformAssignmentSubscripts(ParseState *pstate, NULL, targetName, true, - typeNeeded, - containerTypMod, + sbsref->refassgntype, + sbsref->reftypmod, collationNeeded, indirection, next_indirection, rhs, location); - /* process subscripts */ - result = (Node *) transformContainerSubscripts(pstate, - basenode, - containerType, - elementTypeId, - containerTypMod, - subscripts, - rhs); + /* Provide fully prepared subscripting information for custom validation */ + sbsref->refassgnexpr = (Expr *) rhs; + sbsroutines->validate(rhs != NULL, sbsref, pstate); + + result = (Node *) sbsref; /* If target was a domain over container, need to coerce up to the domain */ - if (containerType != targetTypeId) + if (sbsref->refcontainertype != targetTypeId) { Oid resulttype = exprType(result); diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c index 4c8a739bc4..b7415e8790 100644 --- a/src/backend/utils/adt/arrayfuncs.c +++ b/src/backend/utils/adt/arrayfuncs.c @@ -19,19 +19,25 @@ #include "access/htup_details.h" #include "catalog/pg_type.h" +#include "executor/execExpr.h" #include "funcapi.h" #include "libpq/pqformat.h" +#include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" +#include "nodes/subscripting.h" #include "nodes/supportnodes.h" #include "optimizer/optimizer.h" +#include "parser/parse_coerce.h" #include "port/pg_bitutils.h" #include "utils/array.h" #include "utils/arrayaccess.h" #include "utils/builtins.h" #include "utils/datum.h" +#include "utils/fmgroids.h" #include "utils/lsyscache.h" #include "utils/memutils.h" #include "utils/selfuncs.h" +#include "utils/syscache.h" #include "utils/typcache.h" @@ -88,6 +94,25 @@ typedef struct ArrayIteratorData int current_item; /* the item # we're at in the array */ } ArrayIteratorData; +/* SubscriptingRefState.workspace for array subscripting operations */ +typedef struct ArraySubWorkspace +{ + /* Values determined during expression compilation */ + Oid refelemtype; /* OID of the array element type */ + int16 refattrlength; /* typlen of array type */ + int16 refelemlength; /* typlen of the array element type */ + bool refelembyval; /* is the element type pass-by-value? */ + char refelemalign; /* typalign of the element type */ + + /* + * Subscript values converted to integers. Note that these arrays must be + * of length MAXDIM even when dealing with fewer subscripts, because + * array_get/set_slice may scribble on the extra entries. + */ + int upperindex[MAXDIM]; + int lowerindex[MAXDIM]; +} ArraySubWorkspace; + static bool array_isspace(char ch); static int ArrayCount(const char *str, int *dim, char typdelim); static void ReadArrayStr(char *arrayStr, const char *origStr, @@ -6628,3 +6653,518 @@ width_bucket_array_variable(Datum operand, return left; } + + +/* + * XXX undocumented is unacceptable + */ +static SubscriptingRef * +array_subscript_prepare(bool isAssignment, SubscriptingRef *sbsref) +{ + Oid array_type = sbsref->refcontainertype; + HeapTuple type_tuple_container; + Form_pg_type type_struct_container; + bool is_slice = sbsref->reflowerindexpr != NIL; + + /* Get the type tuple for the container */ + type_tuple_container = SearchSysCache1(TYPEOID, ObjectIdGetDatum(array_type)); + if (!HeapTupleIsValid(type_tuple_container)) + elog(ERROR, "cache lookup failed for type %u", array_type); + type_struct_container = (Form_pg_type) GETSTRUCT(type_tuple_container); + + /* needn't check typisdefined since this will fail anyway */ + sbsref->refelemtype = type_struct_container->typelem; + + /* Identify type that RHS must provide */ + if (isAssignment) + sbsref->refassgntype = is_slice ? sbsref->refcontainertype : sbsref->refelemtype; + + ReleaseSysCache(type_tuple_container); + + return sbsref; +} + +/* + * XXX undocumented is unacceptable + */ +static SubscriptingRef * +array_subscript_validate(bool isAssignment, SubscriptingRef *sbsref, + ParseState *pstate) +{ + bool is_slice = sbsref->reflowerindexpr != NIL; + Oid typeneeded = InvalidOid, + typesource = InvalidOid; + Node *new_from; + Node *subexpr; + List *upperIndexpr = NIL; + List *lowerIndexpr = NIL; + ListCell *u, + *l, + *s; + + foreach(u, sbsref->refupperindexpr) + { + subexpr = (Node *) lfirst(u); + + if (subexpr == NULL) + { + upperIndexpr = lappend(upperIndexpr, subexpr); + continue; + } + + subexpr = coerce_to_target_type(pstate, + subexpr, exprType(subexpr), + INT4OID, -1, + COERCION_ASSIGNMENT, + COERCE_IMPLICIT_CAST, + -1); + if (subexpr == NULL) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("array subscript must have type integer"), + parser_errposition(pstate, exprLocation(subexpr)))); + + upperIndexpr = lappend(upperIndexpr, subexpr); + } + + sbsref->refupperindexpr = upperIndexpr; + + forboth(l, sbsref->reflowerindexpr, s, sbsref->refindexprslice) + { + A_Indices *ai = (A_Indices *) lfirst(s); + + subexpr = (Node *) lfirst(l); + + if (subexpr == NULL && !ai->is_slice) + { + /* Make a constant 1 */ + subexpr = (Node *) makeConst(INT4OID, + -1, + InvalidOid, + sizeof(int32), + Int32GetDatum(1), + false, + true); /* pass by value */ + } + + if (subexpr == NULL) + { + lowerIndexpr = lappend(lowerIndexpr, subexpr); + continue; + } + + subexpr = coerce_to_target_type(pstate, + subexpr, exprType(subexpr), + INT4OID, -1, + COERCION_ASSIGNMENT, + COERCE_IMPLICIT_CAST, + -1); + if (subexpr == NULL) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("array subscript must have type integer"), + parser_errposition(pstate, exprLocation(subexpr)))); + + lowerIndexpr = lappend(lowerIndexpr, subexpr); + } + + sbsref->reflowerindexpr = lowerIndexpr; + + if (isAssignment) + { + SubscriptingRef *assignRef = (SubscriptingRef *) sbsref; + Node *assignExpr = (Node *) assignRef->refassgnexpr; + + typesource = exprType(assignExpr); + typeneeded = is_slice ? sbsref->refcontainertype : sbsref->refelemtype; + new_from = coerce_to_target_type(pstate, + assignExpr, typesource, + typeneeded, sbsref->reftypmod, + COERCION_ASSIGNMENT, + COERCE_IMPLICIT_CAST, + -1); + if (new_from == NULL) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("array assignment requires type %s" + " but expression is of type %s", + format_type_be(sbsref->refelemtype), + format_type_be(typesource)), + errhint("You will need to rewrite or cast the expression."), + parser_errposition(pstate, exprLocation(assignExpr)))); + assignRef->refassgnexpr = (Expr *) new_from; + } + + sbsref->refnestedfunc = F_ARRAY_SUBSCRIPT_HANDLER; + + /* Verify subscript list lengths are within limit */ + if (list_length(sbsref->refupperindexpr) > MAXDIM) + ereport(ERROR, + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), + errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", + list_length(sbsref->refupperindexpr), MAXDIM))); + + if (list_length(sbsref->reflowerindexpr) > MAXDIM) + ereport(ERROR, + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), + errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", + list_length(sbsref->reflowerindexpr), MAXDIM))); + + return sbsref; +} + +/* + * Process the subscripts in a SubscriptingRef expression. + * + * If any subscript is NULL, throw error in assignment case, or in fetch case + * set result to NULL and return false (instructing caller to skip the rest + * of the SubscriptingRef sequence). + * + * We convert all the subscripts to plain integers and save them in the + * sbsrefstate->workspace arrays. + */ +static bool +array_subscript_subscripts(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbsrefstate = op->d.sbsref_subscript.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbsrefstate->workspace; + + /* Process upper subscripts */ + for (int i = 0; i < sbsrefstate->numupper; i++) + { + if (sbsrefstate->upperprovided[i]) + { + /* If any index expr yields NULL, result is NULL or error */ + if (sbsrefstate->upperindexnull[i]) + { + if (sbsrefstate->isassignment) + ereport(ERROR, + (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), + errmsg("array subscript in assignment must not be null"))); + *op->resnull = true; + return false; + } + workspace->upperindex[i] = DatumGetInt32(sbsrefstate->upperindex[i]); + } + } + + /* Likewise for lower subscripts */ + for (int i = 0; i < sbsrefstate->numlower; i++) + { + if (sbsrefstate->lowerprovided[i]) + { + /* If any index expr yields NULL, result is NULL or error */ + if (sbsrefstate->lowerindexnull[i]) + { + if (sbsrefstate->isassignment) + ereport(ERROR, + (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), + errmsg("array subscript in assignment must not be null"))); + *op->resnull = true; + return false; + } + workspace->lowerindex[i] = DatumGetInt32(sbsrefstate->lowerindex[i]); + } + } + + return true; +} + +/* + * Evaluate SubscriptingRef fetch for an array element. + * + * Source container is in step's result variable, + * and indexes have already been evaluated into workspace array. + */ +static void +array_subscript_fetch(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbstate = op->d.sbsref.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbstate->workspace; + + /* Should not get here if source array (or any subscript) is null */ + Assert(!(*op->resnull)); + + *op->resvalue = array_get_element(*op->resvalue, + sbstate->numupper, + workspace->upperindex, + workspace->refattrlength, + workspace->refelemlength, + workspace->refelembyval, + workspace->refelemalign, + op->resnull); +} + +/* + * Evaluate SubscriptingRef fetch for an array slice. + * + * Source container is in step's result variable, + * and indexes have already been evaluated into workspace array. + */ +static void +array_subscript_fetch_slice(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbstate = op->d.sbsref.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbstate->workspace; + + /* Should not get here if source array (or any subscript) is null */ + Assert(!(*op->resnull)); + + *op->resvalue = array_get_slice(*op->resvalue, + sbstate->numupper, + workspace->upperindex, + workspace->lowerindex, + sbstate->upperprovided, + sbstate->lowerprovided, + workspace->refattrlength, + workspace->refelemlength, + workspace->refelembyval, + workspace->refelemalign); +} + +/* + * Evaluate SubscriptingRef assignment for an array element assignment. + * + * Input container (possibly null) is in result area, replacement value is in + * SubscriptingRefState's replacevalue/replacenull. + */ +static void +array_subscript_assign(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbstate = op->d.sbsref.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbstate->workspace; + Datum arraySource = *op->resvalue; + + /* + * For an assignment to a fixed-length array type, both the original array + * and the value to be assigned into it must be non-NULL, else we punt and + * return the original array. + */ + if (workspace->refattrlength > 0) + { + if (*op->resnull || sbstate->replacenull) + return; + } + + /* + * For assignment to varlena arrays, we handle a NULL original array by + * substituting an empty (zero-dimensional) array; insertion of the new + * element will result in a singleton array value. It does not matter + * whether the new element is NULL. + */ + if (*op->resnull) + { + arraySource = PointerGetDatum(construct_empty_array(workspace->refelemtype)); + *op->resnull = false; + } + + *op->resvalue = array_set_element(arraySource, + sbstate->numupper, + workspace->upperindex, + sbstate->replacevalue, + sbstate->replacenull, + workspace->refattrlength, + workspace->refelemlength, + workspace->refelembyval, + workspace->refelemalign); +} + +/* + * Evaluate SubscriptingRef assignment for an array slice assignment. + * + * Input container (possibly null) is in result area, replacement value is in + * SubscriptingRefState's replacevalue/replacenull. + */ +static void +array_subscript_assign_slice(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbstate = op->d.sbsref.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbstate->workspace; + Datum arraySource = *op->resvalue; + + /* + * For an assignment to a fixed-length array type, both the original array + * and the value to be assigned into it must be non-NULL, else we punt and + * return the original array. + */ + if (workspace->refattrlength > 0) + { + if (*op->resnull || sbstate->replacenull) + return; + } + + /* + * For assignment to varlena arrays, we handle a NULL original array by + * substituting an empty (zero-dimensional) array; insertion of the new + * element will result in a singleton array value. It does not matter + * whether the new element is NULL. + */ + if (*op->resnull) + { + arraySource = PointerGetDatum(construct_empty_array(workspace->refelemtype)); + *op->resnull = false; + } + + *op->resvalue = array_set_slice(arraySource, + sbstate->numupper, + workspace->upperindex, + workspace->lowerindex, + sbstate->upperprovided, + sbstate->lowerprovided, + sbstate->replacevalue, + sbstate->replacenull, + workspace->refattrlength, + workspace->refelemlength, + workspace->refelembyval, + workspace->refelemalign); +} + +/* + * Compute old array element value for a SubscriptingRef assignment + * expression. Will only be called if the new-value subexpression + * contains SubscriptingRef or FieldStore. The value is stored into the + * SubscriptingRefState's prevvalue/prevnull fields. + */ +static void +array_subscript_fetch_old(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbstate = op->d.sbsref.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbstate->workspace; + + if (*op->resnull) + { + /* whole array is null, so any element is too */ + sbstate->prevvalue = (Datum) 0; + sbstate->prevnull = true; + } + else + sbstate->prevvalue = array_get_element(*op->resvalue, + sbstate->numupper, + workspace->upperindex, + workspace->refattrlength, + workspace->refelemlength, + workspace->refelembyval, + workspace->refelemalign, + &sbstate->prevnull); +} + +/* + * Compute old array slice value for a SubscriptingRef assignment + * expression. Will only be called if the new-value subexpression + * contains SubscriptingRef or FieldStore. The value is stored into the + * SubscriptingRefState's prevvalue/prevnull fields. + */ +static void +array_subscript_fetch_old_slice(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbstate = op->d.sbsref.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbstate->workspace; + + if (*op->resnull) + { + /* whole array is null, so any slice is too */ + sbstate->prevvalue = (Datum) 0; + sbstate->prevnull = true; + } + else + { + sbstate->prevvalue = array_get_slice(*op->resvalue, + sbstate->numupper, + workspace->upperindex, + workspace->lowerindex, + sbstate->upperprovided, + sbstate->lowerprovided, + workspace->refattrlength, + workspace->refelemlength, + workspace->refelembyval, + workspace->refelemalign); + /* slices of non-null arrays are never null */ + sbstate->prevnull = false; + } +} + +/* + * Set up execution state for an array subscript operation. + */ +static void +array_exec_setup(SubscriptingRef *sbsref, + SubscriptingRefState *sbsrefstate) +{ + bool is_slice = (sbsrefstate->numlower != 0); + ArraySubWorkspace *workspace; + + /* + * Allocate type-specific workspace. This is also a good place to enforce + * the implementation limit on number of array subscripts. + */ + if (sbsrefstate->numupper > MAXDIM) + ereport(ERROR, + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), + errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", + sbsrefstate->numupper, MAXDIM))); + + /* Should be impossible if parser is sane, but check anyway: */ + if (sbsrefstate->numlower != 0 && + sbsrefstate->numupper != sbsrefstate->numlower) + elog(ERROR, "upper and lower index lists are not same length"); + + workspace = (ArraySubWorkspace *) + MemoryContextAlloc(GetMemoryChunkContext(sbsrefstate), + sizeof(ArraySubWorkspace)); + sbsrefstate->workspace = workspace; + + /* + * Collect datatype details we'll need at execution. + */ + workspace->refelemtype = sbsref->refelemtype; + workspace->refattrlength = get_typlen(sbsref->refcontainertype); + get_typlenbyvalalign(sbsref->refelemtype, + &workspace->refelemlength, + &workspace->refelembyval, + &workspace->refelemalign); + + /* Pass back pointers to step execution functions */ + sbsrefstate->sbs_subscripts = array_subscript_subscripts; + if (is_slice) + { + sbsrefstate->sbs_fetch = array_subscript_fetch_slice; + sbsrefstate->sbs_assign = array_subscript_assign_slice; + sbsrefstate->sbs_fetch_old = array_subscript_fetch_old_slice; + } + else + { + sbsrefstate->sbs_fetch = array_subscript_fetch; + sbsrefstate->sbs_assign = array_subscript_assign; + sbsrefstate->sbs_fetch_old = array_subscript_fetch_old; + } +} + +/* + * Handle array-type subscripting logic. + */ +Datum +array_subscript_handler(PG_FUNCTION_ARGS) +{ + SubscriptRoutines *sbsroutines = (SubscriptRoutines *) + palloc(sizeof(SubscriptRoutines)); + + sbsroutines->prepare = array_subscript_prepare; + sbsroutines->validate = array_subscript_validate; + sbsroutines->exec_setup = array_exec_setup; + + PG_RETURN_POINTER(sbsroutines); +} diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index c2c6df2a4f..f516a5011e 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -7999,7 +7999,7 @@ get_rule_expr(Node *node, deparse_context *context, * EXPLAIN tries to print the targetlist of a plan resulting * from such a statement. */ - if (sbsref->refassgnexpr) + if (IsAssignment(sbsref)) { Node *refassgnexpr; diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index ae23299162..1297695692 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -2966,6 +2966,51 @@ type_is_collatable(Oid typid) } +/* + * get_typsubshandler + * + * Given the type OID, return the type's subscripting procedure's OID, + * if it has one. + */ +RegProcedure +get_typsubshandler(Oid typid) +{ + HeapTuple tp; + + tp = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typid)); + if (HeapTupleIsValid(tp)) + { + RegProcedure handler = ((Form_pg_type) GETSTRUCT(tp))->typsubshandler; + + ReleaseSysCache(tp); + return handler; + } + else + return InvalidOid; +} + +/* + * getSubscriptingRoutines + * + * Given the type OID, fetch the type's subscripting functions. + * Fail if type is not subscriptable. + */ +struct SubscriptRoutines * +getSubscriptingRoutines(Oid typid) +{ + RegProcedure typsubshandler = get_typsubshandler(typid); + + if (!OidIsValid(typsubshandler)) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("cannot subscript type %s because it does not support subscripting", + format_type_be(typid)))); + + return (struct SubscriptRoutines *) + DatumGetPointer(OidFunctionCall0(typsubshandler)); +} + + /* ---------- STATISTICS CACHE ---------- */ /* diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index fc2202b843..9fa9990efe 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -10936,6 +10936,13 @@ proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float8_pass_by_value,data_page_checksum_version}', prosrc => 'pg_control_init' }, +{ oid => '6099', + descr => 'Array subscripting logic', + proname => 'array_subscript_handler', + prorettype => 'internal', + proargtypes => 'internal', + prosrc => 'array_subscript_handler' }, + # collation management functions { oid => '3445', descr => 'import collations from operating system', proname => 'pg_import_system_collations', procost => '100', diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat index 21a467a7a7..9f29461e39 100644 --- a/src/include/catalog/pg_type.dat +++ b/src/include/catalog/pg_type.dat @@ -50,7 +50,8 @@ typname => 'name', typlen => 'NAMEDATALEN', typbyval => 'f', typcategory => 'S', typelem => 'char', typinput => 'namein', typoutput => 'nameout', typreceive => 'namerecv', typsend => 'namesend', - typalign => 'c', typcollation => 'C' }, + typalign => 'c', typcollation => 'C', + typsubshandler => 'array_subscript_handler' }, { oid => '20', array_type_oid => '1016', descr => '~18 digit integer, 8-byte storage', typname => 'int8', typlen => '8', typbyval => 'FLOAT8PASSBYVAL', @@ -66,7 +67,7 @@ typname => 'int2vector', typlen => '-1', typbyval => 'f', typcategory => 'A', typelem => 'int2', typinput => 'int2vectorin', typoutput => 'int2vectorout', typreceive => 'int2vectorrecv', typsend => 'int2vectorsend', - typalign => 'i' }, + typalign => 'i', typsubshandler => 'array_subscript_handler' }, { oid => '23', array_type_oid => '1007', descr => '-2 billion to 2 billion integer, 4-byte storage', typname => 'int4', typlen => '4', typbyval => 't', typcategory => 'N', @@ -105,7 +106,8 @@ descr => 'array of oids, used in system tables', typname => 'oidvector', typlen => '-1', typbyval => 'f', typcategory => 'A', typelem => 'oid', typinput => 'oidvectorin', typoutput => 'oidvectorout', - typreceive => 'oidvectorrecv', typsend => 'oidvectorsend', typalign => 'i' }, + typreceive => 'oidvectorrecv', typsend => 'oidvectorsend', typalign => 'i', + typsubshandler => 'array_subscript_handler' }, # hand-built rowtype entries for bootstrapped catalogs # NB: OIDs assigned here must match the BKI_ROWTYPE_OID declarations @@ -179,32 +181,37 @@ descr => 'geometric point \'(x, y)\'', typname => 'point', typlen => '16', typbyval => 'f', typcategory => 'G', typelem => 'float8', typinput => 'point_in', typoutput => 'point_out', - typreceive => 'point_recv', typsend => 'point_send', typalign => 'd' }, + typreceive => 'point_recv', typsend => 'point_send', typalign => 'd', + typsubshandler => 'array_subscript_handler' }, { oid => '601', array_type_oid => '1018', descr => 'geometric line segment \'(pt1,pt2)\'', typname => 'lseg', typlen => '32', typbyval => 'f', typcategory => 'G', typelem => 'point', typinput => 'lseg_in', typoutput => 'lseg_out', - typreceive => 'lseg_recv', typsend => 'lseg_send', typalign => 'd' }, + typreceive => 'lseg_recv', typsend => 'lseg_send', typalign => 'd', + typsubshandler => 'array_subscript_handler' }, { oid => '602', array_type_oid => '1019', descr => 'geometric path \'(pt1,...)\'', typname => 'path', typlen => '-1', typbyval => 'f', typcategory => 'G', typinput => 'path_in', typoutput => 'path_out', typreceive => 'path_recv', - typsend => 'path_send', typalign => 'd', typstorage => 'x' }, + typsend => 'path_send', typalign => 'd', typstorage => 'x', + typsubshandler => 'array_subscript_handler' }, { oid => '603', array_type_oid => '1020', descr => 'geometric box \'(lower left,upper right)\'', typname => 'box', typlen => '32', typbyval => 'f', typcategory => 'G', typdelim => ';', typelem => 'point', typinput => 'box_in', typoutput => 'box_out', typreceive => 'box_recv', typsend => 'box_send', - typalign => 'd' }, + typalign => 'd', typsubshandler => 'array_subscript_handler' }, { oid => '604', array_type_oid => '1027', descr => 'geometric polygon \'(pt1,...)\'', typname => 'polygon', typlen => '-1', typbyval => 'f', typcategory => 'G', typinput => 'poly_in', typoutput => 'poly_out', typreceive => 'poly_recv', - typsend => 'poly_send', typalign => 'd', typstorage => 'x' }, + typsend => 'poly_send', typalign => 'd', typstorage => 'x', + typsubshandler => 'array_subscript_handler' }, { oid => '628', array_type_oid => '629', descr => 'geometric line', typname => 'line', typlen => '24', typbyval => 'f', typcategory => 'G', typelem => 'float8', typinput => 'line_in', typoutput => 'line_out', - typreceive => 'line_recv', typsend => 'line_send', typalign => 'd' }, + typreceive => 'line_recv', typsend => 'line_send', typalign => 'd', + typsubshandler => 'array_subscript_handler' }, # OIDS 700 - 799 @@ -263,7 +270,7 @@ { oid => '1033', array_type_oid => '1034', descr => 'access control list', typname => 'aclitem', typlen => '12', typbyval => 'f', typcategory => 'U', typinput => 'aclitemin', typoutput => 'aclitemout', typreceive => '-', - typsend => '-', typalign => 'i' }, + typsend => '-', typalign => 'i', typsubshandler => 'array_subscript_handler' }, { oid => '1042', array_type_oid => '1014', descr => 'char(length), blank-padded string, fixed storage length', typname => 'bpchar', typlen => '-1', typbyval => 'f', typcategory => 'S', @@ -302,7 +309,8 @@ typcategory => 'D', typispreferred => 't', typinput => 'timestamptz_in', typoutput => 'timestamptz_out', typreceive => 'timestamptz_recv', typsend => 'timestamptz_send', typmodin => 'timestamptztypmodin', - typmodout => 'timestamptztypmodout', typalign => 'd' }, + typmodout => 'timestamptztypmodout', typalign => 'd', + typsubshandler => 'array_subscript_handler' }, { oid => '1186', array_type_oid => '1187', descr => '@ <number> <units>, time interval', typname => 'interval', typlen => '16', typbyval => 'f', typcategory => 'T', diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h index 6099e5f57c..3e10179e1c 100644 --- a/src/include/catalog/pg_type.h +++ b/src/include/catalog/pg_type.h @@ -221,6 +221,12 @@ CATALOG(pg_type,1247,TypeRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(71,TypeRelati */ Oid typcollation BKI_DEFAULT(0) BKI_LOOKUP(pg_collation); + /* + * Type specific subscripting logic. If typsubshandler is NULL, it means + * that this type doesn't support subscripting. + */ + regproc typsubshandler BKI_DEFAULT(-) BKI_LOOKUP(pg_proc); + #ifdef CATALOG_VARLEN /* variable-length fields start here */ /* @@ -363,7 +369,8 @@ extern ObjectAddress TypeCreate(Oid newTypeOid, int32 typeMod, int32 typNDims, bool typeNotNull, - Oid typeCollation); + Oid typeCollation, + Oid subscriptingParseProcedure); extern void GenerateTypeDependencies(HeapTuple typeTuple, Relation typeCatalog, diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h index b768c30b74..baa2363bed 100644 --- a/src/include/executor/execExpr.h +++ b/src/include/executor/execExpr.h @@ -32,6 +32,11 @@ typedef void (*ExecEvalSubroutine) (ExprState *state, struct ExprEvalStep *op, ExprContext *econtext); +/* API for out-of-line evaluation subroutines returning bool */ +typedef bool (*ExecEvalBoolSubroutine) (ExprState *state, + struct ExprEvalStep *op, + ExprContext *econtext); + /* * Discriminator for ExprEvalSteps. * @@ -497,6 +502,7 @@ typedef struct ExprEvalStep /* for EEOP_SBSREF_SUBSCRIPTS */ struct { + ExecEvalBoolSubroutine subscriptfunc; /* evaluation subroutine */ /* too big to have inline */ struct SubscriptingRefState *state; int jumpdone; /* jump here on null */ @@ -505,6 +511,7 @@ typedef struct ExprEvalStep /* for EEOP_SBSREF_OLD / ASSIGN / FETCH */ struct { + ExecEvalSubroutine subscriptfunc; /* evaluation subroutine */ /* too big to have inline */ struct SubscriptingRefState *state; } sbsref; @@ -638,12 +645,6 @@ typedef struct SubscriptingRefState { bool isassignment; /* is it assignment, or just fetch? */ - Oid refelemtype; /* OID of the container element type */ - int16 refattrlength; /* typlen of container type */ - int16 refelemlength; /* typlen of the container element type */ - bool refelembyval; /* is the element type pass-by-value? */ - char refelemalign; /* typalign of the element type */ - /* workspace for type-specific subscripting code */ void *workspace; @@ -667,6 +668,17 @@ typedef struct SubscriptingRefState /* if we have a nested assignment, SBSREF_OLD puts old value here */ Datum prevvalue; bool prevnull; + + /* + * Step execution function pointers returned by exec_setup method. These + * are not needed at runtime, only during expression compilation; but it's + * not worth complicating exec_setup's API by making an additional struct + * to hold them. + */ + ExecEvalBoolSubroutine sbs_subscripts; /* process subscripts */ + ExecEvalSubroutine sbs_fetch; /* function to fetch an element */ + ExecEvalSubroutine sbs_assign; /* function to assign an element */ + ExecEvalSubroutine sbs_fetch_old; /* fetch old value for assignment */ } SubscriptingRefState; @@ -711,10 +723,6 @@ extern void ExecEvalFieldStoreDeForm(ExprState *state, ExprEvalStep *op, ExprContext *econtext); extern void ExecEvalFieldStoreForm(ExprState *state, ExprEvalStep *op, ExprContext *econtext); -extern bool ExecEvalSubscriptingRef(ExprState *state, ExprEvalStep *op); -extern void ExecEvalSubscriptingRefFetch(ExprState *state, ExprEvalStep *op); -extern void ExecEvalSubscriptingRefOld(ExprState *state, ExprEvalStep *op); -extern void ExecEvalSubscriptingRefAssign(ExprState *state, ExprEvalStep *op); extern void ExecEvalConvertRowtype(ExprState *state, ExprEvalStep *op, ExprContext *econtext); extern void ExecEvalScalarArrayOp(ExprState *state, ExprEvalStep *op); diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h index cdbe781c73..36ceac084e 100644 --- a/src/include/nodes/primnodes.h +++ b/src/include/nodes/primnodes.h @@ -425,13 +425,18 @@ typedef struct SubscriptingRef Expr xpr; Oid refcontainertype; /* type of the container proper */ Oid refelemtype; /* type of the container elements */ + Oid refassgntype; /* type of assignment expr that is required */ int32 reftypmod; /* typmod of the container (and elements too) */ Oid refcollid; /* OID of collation, or InvalidOid if none */ + Oid refnestedfunc; /* OID of type-specific function to handle + * nested assignment */ List *refupperindexpr; /* expressions that evaluate to upper * container indexes */ List *reflowerindexpr; /* expressions that evaluate to lower * container indexes, or NIL for single * container element */ + List *refindexprslice; /* whether or not related indexpr from + * reflowerindexpr is a slice */ Expr *refexpr; /* the expression that evaluates to a * container value */ @@ -439,6 +444,8 @@ typedef struct SubscriptingRef * fetch */ } SubscriptingRef; +#define IsAssignment(expr) ( ((SubscriptingRef*) expr)->refassgnexpr != NULL ) + /* * CoercionContext - distinguishes the allowed set of type casts * diff --git a/src/include/nodes/subscripting.h b/src/include/nodes/subscripting.h new file mode 100644 index 0000000000..1aba2f80bc --- /dev/null +++ b/src/include/nodes/subscripting.h @@ -0,0 +1,38 @@ +/*------------------------------------------------------------------------- + * + * subscripting.h + * API for generic type subscripting + * + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/nodes/subscripting.h + * + *------------------------------------------------------------------------- + */ +#ifndef SUBSCRIPTING_H +#define SUBSCRIPTING_H + +#include "parser/parse_node.h" +#include "nodes/primnodes.h" + +struct ParseState; +struct SubscriptingRefState; + +/* Callback function signatures --- see xsubscripting.sgml for more info. */ +typedef SubscriptingRef *(*SubscriptingPrepare) (bool isAssignment, SubscriptingRef *sbsef); + +typedef SubscriptingRef *(*SubscriptingValidate) (bool isAssignment, SubscriptingRef *sbsef, + struct ParseState *pstate); + +typedef void (*SubscriptingExecSetup) (SubscriptingRef *sbsref, + struct SubscriptingRefState *sbsrefstate); + +typedef struct SubscriptRoutines +{ + SubscriptingPrepare prepare; + SubscriptingValidate validate; + SubscriptingExecSetup exec_setup; +} SubscriptRoutines; + +#endif /* SUBSCRIPTING_H */ diff --git a/src/include/parser/parse_node.h b/src/include/parser/parse_node.h index d25819aa28..fcc6c426e7 100644 --- a/src/include/parser/parse_node.h +++ b/src/include/parser/parse_node.h @@ -313,7 +313,7 @@ extern void setup_parser_errposition_callback(ParseCallbackState *pcbstate, ParseState *pstate, int location); extern void cancel_parser_errposition_callback(ParseCallbackState *pcbstate); -extern Oid transformContainerType(Oid *containerType, int32 *containerTypmod); +extern void transformContainerType(Oid *containerType, int32 *containerTypmod); extern SubscriptingRef *transformContainerSubscripts(ParseState *pstate, Node *containerBase, @@ -322,6 +322,7 @@ extern SubscriptingRef *transformContainerSubscripts(ParseState *pstate, int32 containerTypMod, List *indirection, Node *assignFrom); + extern Const *make_const(ParseState *pstate, Value *value, int location); #endif /* PARSE_NODE_H */ diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index fecfe1f4f6..38cd2940df 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -17,6 +17,9 @@ #include "access/htup.h" #include "nodes/pg_list.h" +/* avoid including subscripting.h here */ +struct SubscriptRoutines; + /* Result list element for get_op_btree_interpretation */ typedef struct OpBtreeInterpretation { @@ -172,6 +175,8 @@ extern void getTypeBinaryOutputInfo(Oid type, Oid *typSend, bool *typIsVarlena); extern Oid get_typmodin(Oid typid); extern Oid get_typcollation(Oid typid); extern bool type_is_collatable(Oid typid); +extern RegProcedure get_typsubshandler(Oid typid); +extern struct SubscriptRoutines *getSubscriptingRoutines(Oid typid); extern Oid getBaseType(Oid typid); extern Oid getBaseTypeAndTypmod(Oid typid, int32 *typmod); extern int32 get_typavgwidth(Oid typid, int32 typmod);
BTW, I had a thought about this patch, which I wanted to write down before it disappears again (I'm not offering to code it right now). I think we should split array_subscript_handler into two functions, one for "regular" varlena arrays and one for the fixed-length pseudo-array types like "name" and "point". This needn't have a lot of impact on the execution code. In fact, for the first version both handlers could just return the same set of method pointers, and then if we feel like it we could tease apart the code paths later. The value of doing this is that then typsubshandler could be used as a bulletproof designator of the type's semantics. Instead of weird implementation-dependent tests on typlen and so on, the rule could be "it's a regular array if typsubshandler == F_ARRAY_SUBSCRIPT_HANDLER". Later on, we could even allow the "fixed length" array semantics to be applied to varlena types perhaps, so long as their contents are just N copies of some fixed-size type. The point here is that we now have a tool for recognizing the subscripting semantics reliably, instead of having to back into an understanding of what they are. But for the tool to be useful, we don't want the same handler implementing significantly different behaviors. regards, tom lane
I decided that the way to get this moved forward is to ignore the jsonb parts for the moment and focus on getting the core feature into committable shape. It's possible that the lack of a concrete use-case other than arrays will cause us to miss a detail or two, but if so we can fix it later, I think. (We should make sure to get the jsonb parts in for v14, though, before we ship this.) Accordingly, I went through all of the core and array code and dealt with a lot of details that hadn't gotten seen to, including pg_dump support and dependency considerations. I ended up rewriting the parser code pretty heavily, because I didn't like the original either from an invasiveness or usability standpoint. I also did the thing I suggested earlier of using separate handler functions for varlena and fixed-length arrays, though I made no effort to separate the code paths. I think the attached v37 is committable or nearly so, though there remain a few loose ends: 1. I'm still wondering if TypeParamBool is the right thing to pass to LLVMFunctionType() to describe a function-returning-bool. It does seem to work on x64_64 and aarch64, for what that's worth. 2. I haven't pulled the trigger on merging the three now-identical execution step types. That could be done separately, of course. 3. There are some semantic details that probably need debating. As I wrote in subscripting.h: * There are some general restrictions on what subscripting can do. The * planner expects subscripting fetches to be strict (i.e., return NULL for * any null input), immutable (same inputs always give same results), and * leakproof (data-value-dependent errors must not be thrown; in other * words, you must silently return NULL for any bad subscript value). * Subscripting assignment need not be, and usually isn't, strict; it need * not be leakproof either; but it must be immutable. I doubt that there's anything too wrong with assuming immutability of SubscriptingRef. And perhaps strictness is OK too, though I worry about somebody deciding that it'd be cute to define a NULL subscript value as doing something special. But the leakproofness assumption seems like a really dangerous one. All it takes is somebody deciding that they should throw an error for a bad subscript, and presto, we have a security hole. What I'm slightly inclined to do here, but did not do in the attached patch, is to have check_functions_in_node look at the leakproofness marking of the subscripting support function; that's a bit of a hack, but since the support function can't be called from SQL there is no other use for its proleakproof flag. Alternatively we could extend the SubscriptingRoutine return struct to include some flags saying whether fetch and/or assignment is leakproof. BTW, right now check_functions_in_node() effectively treats SubscriptingRef as unconditionally leakproof. That's okay for the array "fetch" code, but the array "store" code does throw errors for bad subscripts, meaning it's not leakproof. The only reason this isn't a live security bug is that "store" SubscriptingRefs can only occur in the targetlist of an INSERT or UPDATE, which is not a place that could access any variables we are trying to protect with the leakproofness mechanism. (If it could, you could just store the variable's value directly into another table.) Still, that's a shaky chain of reasoning. The patch's change to check_functions_in_node() would work in the back branches, so I'm inclined to back-patch it that way even if we end up doing something smarter in HEAD. Comments? regards, tom lane diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 2d44df19fe..ca2f9f3215 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -426,23 +426,28 @@ foreign_expr_walker(Node *node, return false; /* - * Recurse to remaining subexpressions. Since the container - * subscripts must yield (noncollatable) integers, they won't - * affect the inner_cxt state. + * Recurse into the remaining subexpressions. The container + * subscripts will not affect collation of the SubscriptingRef + * result, so do those first and reset inner_cxt afterwards. */ if (!foreign_expr_walker((Node *) sr->refupperindexpr, glob_cxt, &inner_cxt)) return false; + inner_cxt.collation = InvalidOid; + inner_cxt.state = FDW_COLLATE_NONE; if (!foreign_expr_walker((Node *) sr->reflowerindexpr, glob_cxt, &inner_cxt)) return false; + inner_cxt.collation = InvalidOid; + inner_cxt.state = FDW_COLLATE_NONE; if (!foreign_expr_walker((Node *) sr->refexpr, glob_cxt, &inner_cxt)) return false; /* - * Container subscripting should yield same collation as - * input, but for safety use same logic as for function nodes. + * Container subscripting typically yields same collation as + * refexpr's, but in case it doesn't, use same logic as for + * function nodes. */ collation = sr->refcollid; if (collation == InvalidOid) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 79069ddfab..583a5ce3b9 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -8740,6 +8740,21 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l </para></entry> </row> + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>typsubscript</structfield> <type>regproc</type> + (references <link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.<structfield>oid</structfield>) + </para> + <para> + Subscripting handler function's OID, or zero if this type doesn't + support subscripting. Types that are <quote>true</quote> array + types have <structfield>typsubscript</structfield> + = <function>array_subscript_handler</function>, but other types may + have other handler functions to implement specialized subscripting + behavior. + </para></entry> + </row> + <row> <entry role="catalog_table_entry"><para role="column_definition"> <structfield>typelem</structfield> <type>oid</type> @@ -8747,19 +8762,12 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l </para> <para> If <structfield>typelem</structfield> is not 0 then it - identifies another row in <structname>pg_type</structname>. - The current type can then be subscripted like an array yielding - values of type <structfield>typelem</structfield>. A - <quote>true</quote> array type is variable length - (<structfield>typlen</structfield> = -1), - but some fixed-length (<structfield>typlen</structfield> > 0) types - also have nonzero <structfield>typelem</structfield>, for example - <type>name</type> and <type>point</type>. - If a fixed-length type has a <structfield>typelem</structfield> then - its internal representation must be some number of values of the - <structfield>typelem</structfield> data type with no other data. - Variable-length array types have a header defined by the array - subroutines. + identifies another row in <structname>pg_type</structname>, + defining the type yielded by subscripting. This should be 0 + if <structfield>typsubscript</structfield> is 0. However, it can + be 0 when <structfield>typsubscript</structfield> isn't 0, if the + handler doesn't need <structfield>typelem</structfield> to + determine the subscripting result type. </para></entry> </row> diff --git a/doc/src/sgml/ref/create_type.sgml b/doc/src/sgml/ref/create_type.sgml index 970b517db9..6177290a4b 100644 --- a/doc/src/sgml/ref/create_type.sgml +++ b/doc/src/sgml/ref/create_type.sgml @@ -43,6 +43,7 @@ CREATE TYPE <replaceable class="parameter">name</replaceable> ( [ , TYPMOD_IN = <replaceable class="parameter">type_modifier_input_function</replaceable> ] [ , TYPMOD_OUT = <replaceable class="parameter">type_modifier_output_function</replaceable> ] [ , ANALYZE = <replaceable class="parameter">analyze_function</replaceable> ] + [ , SUBSCRIPT = <replaceable class="parameter">subscript_function</replaceable> ] [ , INTERNALLENGTH = { <replaceable class="parameter">internallength</replaceable> | VARIABLE } ] [ , PASSEDBYVALUE ] [ , ALIGNMENT = <replaceable class="parameter">alignment</replaceable> ] @@ -196,8 +197,9 @@ CREATE TYPE <replaceable class="parameter">name</replaceable> <replaceable class="parameter">receive_function</replaceable>, <replaceable class="parameter">send_function</replaceable>, <replaceable class="parameter">type_modifier_input_function</replaceable>, - <replaceable class="parameter">type_modifier_output_function</replaceable> and - <replaceable class="parameter">analyze_function</replaceable> + <replaceable class="parameter">type_modifier_output_function</replaceable>, + <replaceable class="parameter">analyze_function</replaceable>, and + <replaceable class="parameter">subscript_function</replaceable> are optional. Generally these functions have to be coded in C or another low-level language. </para> @@ -318,6 +320,26 @@ CREATE TYPE <replaceable class="parameter">name</replaceable> in <filename>src/include/commands/vacuum.h</filename>. </para> + <para> + The optional <replaceable class="parameter">subscript_function</replaceable> + allows the data type to be subscripted in SQL commands. Specifying this + function does not cause the type to be considered a <quote>true</quote> + array type; for example, it will not be a candidate for the result type + of <literal>ARRAY[]</literal> constructs. But if subscripting a value + of the type is a natural notation for extracting data from it, then + a <replaceable class="parameter">subscript_function</replaceable> can + be written to define what that means. The subscript function must be + declared to take a single argument of type <type>internal</type>, and + return an <type>internal</type> result, which is a pointer to a struct + of methods (functions) that implement subscripting. + The detailed API for subscript functions appears + in <filename>src/include/nodes/subscripting.h</filename>; + it may also be useful to read the array implementation in + in <filename>src/backend/utils/adt/arraysubs.c</filename>. + Additional information appears in + <xref linkend="sql-createtype-array"/> below. + </para> + <para> While the details of the new type's internal representation are only known to the I/O functions and other functions you create to work with @@ -428,11 +450,12 @@ CREATE TYPE <replaceable class="parameter">name</replaceable> </para> <para> - To indicate that a type is an array, specify the type of the array + To indicate that a type is a fixed-length subscriptable type, + specify the type of the array elements using the <literal>ELEMENT</literal> key word. For example, to define an array of 4-byte integers (<type>int4</type>), specify - <literal>ELEMENT = int4</literal>. More details about array types - appear below. + <literal>ELEMENT = int4</literal>. For more details, + see <xref linkend="sql-createtype-array"/> below. </para> <para> @@ -456,7 +479,7 @@ CREATE TYPE <replaceable class="parameter">name</replaceable> </para> </refsect2> - <refsect2> + <refsect2 id="sql-createtype-array" xreflabel="Array Types"> <title>Array Types</title> <para> @@ -469,7 +492,9 @@ CREATE TYPE <replaceable class="parameter">name</replaceable> repeated until a non-colliding name is found.) This implicitly-created array type is variable length and uses the built-in input and output functions <literal>array_in</literal> and - <literal>array_out</literal>. The array type tracks any changes in its + <literal>array_out</literal>. Furthermore, this type is what the system + uses for constructs such as <literal>ARRAY[]</literal> over the + user-defined type. The array type tracks any changes in its element type's owner or schema, and is dropped if the element type is. </para> @@ -485,13 +510,27 @@ CREATE TYPE <replaceable class="parameter">name</replaceable> using <literal>point[0]</literal> and <literal>point[1]</literal>. Note that this facility only works for fixed-length types whose internal form - is exactly a sequence of identical fixed-length fields. A subscriptable - variable-length type must have the generalized internal representation - used by <literal>array_in</literal> and <literal>array_out</literal>. + is exactly a sequence of identical fixed-length fields. For historical reasons (i.e., this is clearly wrong but it's far too late to change it), subscripting of fixed-length array types starts from zero, rather than from one as for variable-length arrays. </para> + + <para> + Specifying the <option>SUBSCRIPT</option> option allows a data type to + be subscripted, even though the system does not otherwise regard it as + an array type. The behavior just described for fixed-length arrays is + actually implemented by the <option>SUBSCRIPT</option> handler + function <function>raw_array_subscript_handler</function>, which is + used automatically if you specify <option>ELEMENT</option> for a + fixed-length type without also writing <option>SUBSCRIPT</option>. + When specifying a custom <option>SUBSCRIPT</option> function, it is + not necessary to specify <option>ELEMENT</option> unless + the <option>SUBSCRIPT</option> handler function needs to + consult <structfield>typelem</structfield> to find out what to return, + or if you want an explicit dependency from the new type to the + subscripting output type. + </para> </refsect2> </refsect1> @@ -654,6 +693,16 @@ CREATE TYPE <replaceable class="parameter">name</replaceable> </listitem> </varlistentry> + <varlistentry> + <term><replaceable class="parameter">subscript_function</replaceable></term> + <listitem> + <para> + The name of a function that defines what subscripting a value of the + data type does. + </para> + </listitem> + </varlistentry> + <varlistentry> <term><replaceable class="parameter">internallength</replaceable></term> <listitem> diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c index c626161408..c4594b0b09 100644 --- a/src/backend/catalog/aclchk.c +++ b/src/backend/catalog/aclchk.c @@ -3114,7 +3114,7 @@ ExecGrant_Type(InternalGrant *istmt) pg_type_tuple = (Form_pg_type) GETSTRUCT(tuple); - if (pg_type_tuple->typelem != 0 && pg_type_tuple->typlen == -1) + if (IsTrueArrayType(pg_type_tuple)) ereport(ERROR, (errcode(ERRCODE_INVALID_GRANT_OPERATION), errmsg("cannot set privileges of array types"), @@ -4392,7 +4392,7 @@ pg_type_aclmask(Oid type_oid, Oid roleid, AclMode mask, AclMaskHow how) * "True" array types don't manage permissions of their own; consult the * element type instead. */ - if (OidIsValid(typeForm->typelem) && typeForm->typlen == -1) + if (IsTrueArrayType(typeForm)) { Oid elttype_oid = typeForm->typelem; diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index 245c2f4fc8..119006159b 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -2074,6 +2074,22 @@ find_expr_references_walker(Node *node, context->addrs); /* fall through to examine arguments */ } + else if (IsA(node, SubscriptingRef)) + { + SubscriptingRef *sbsref = (SubscriptingRef *) node; + + /* + * The refexpr should provide adequate dependency on refcontainertype, + * and that type in turn depends on refelemtype. However, a custom + * subscripting handler might set refrestype to something different + * from either of those, in which case we'd better record it. + */ + if (sbsref->refrestype != sbsref->refcontainertype && + sbsref->refrestype != sbsref->refelemtype) + add_object_address(OCLASS_TYPE, sbsref->refrestype, 0, + context->addrs); + /* fall through to examine arguments */ + } else if (IsA(node, SubPlan)) { /* Extra work needed here if we ever need this case */ diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 4cd7d76938..51b5c4f7f6 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -1079,6 +1079,7 @@ AddNewRelationType(const char *typeName, InvalidOid, /* typmodin procedure - none */ InvalidOid, /* typmodout procedure - none */ InvalidOid, /* analyze procedure - default */ + InvalidOid, /* subscript procedure - none */ InvalidOid, /* array element type - irrelevant */ false, /* this is not an array type */ new_array_type, /* array type if any */ @@ -1358,6 +1359,7 @@ heap_create_with_catalog(const char *relname, InvalidOid, /* typmodin procedure - none */ InvalidOid, /* typmodout procedure - none */ F_ARRAY_TYPANALYZE, /* array analyze procedure */ + F_ARRAY_SUBSCRIPT_HANDLER, /* array subscript procedure */ new_type_oid, /* array element type - the rowtype */ true, /* yes, this is an array type */ InvalidOid, /* this has no array type */ diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c index aeb4a54f63..4252875ef5 100644 --- a/src/backend/catalog/pg_type.c +++ b/src/backend/catalog/pg_type.c @@ -103,6 +103,7 @@ TypeShellMake(const char *typeName, Oid typeNamespace, Oid ownerId) values[Anum_pg_type_typisdefined - 1] = BoolGetDatum(false); values[Anum_pg_type_typdelim - 1] = CharGetDatum(DEFAULT_TYPDELIM); values[Anum_pg_type_typrelid - 1] = ObjectIdGetDatum(InvalidOid); + values[Anum_pg_type_typsubscript - 1] = ObjectIdGetDatum(InvalidOid); values[Anum_pg_type_typelem - 1] = ObjectIdGetDatum(InvalidOid); values[Anum_pg_type_typarray - 1] = ObjectIdGetDatum(InvalidOid); values[Anum_pg_type_typinput - 1] = ObjectIdGetDatum(F_SHELL_IN); @@ -208,6 +209,7 @@ TypeCreate(Oid newTypeOid, Oid typmodinProcedure, Oid typmodoutProcedure, Oid analyzeProcedure, + Oid subscriptProcedure, Oid elementType, bool isImplicitArray, Oid arrayType, @@ -357,6 +359,7 @@ TypeCreate(Oid newTypeOid, values[Anum_pg_type_typisdefined - 1] = BoolGetDatum(true); values[Anum_pg_type_typdelim - 1] = CharGetDatum(typDelim); values[Anum_pg_type_typrelid - 1] = ObjectIdGetDatum(relationOid); + values[Anum_pg_type_typsubscript - 1] = ObjectIdGetDatum(subscriptProcedure); values[Anum_pg_type_typelem - 1] = ObjectIdGetDatum(elementType); values[Anum_pg_type_typarray - 1] = ObjectIdGetDatum(arrayType); values[Anum_pg_type_typinput - 1] = ObjectIdGetDatum(inputProcedure); @@ -667,7 +670,7 @@ GenerateTypeDependencies(HeapTuple typeTuple, recordDependencyOnCurrentExtension(&myself, rebuild); } - /* Normal dependencies on the I/O functions */ + /* Normal dependencies on the I/O and support functions */ if (OidIsValid(typeForm->typinput)) { ObjectAddressSet(referenced, ProcedureRelationId, typeForm->typinput); @@ -710,6 +713,12 @@ GenerateTypeDependencies(HeapTuple typeTuple, add_exact_object_address(&referenced, addrs_normal); } + if (OidIsValid(typeForm->typsubscript)) + { + ObjectAddressSet(referenced, ProcedureRelationId, typeForm->typsubscript); + add_exact_object_address(&referenced, addrs_normal); + } + /* Normal dependency from a domain to its base type. */ if (OidIsValid(typeForm->typbasetype)) { diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c index 483bb65ddc..29fe52d2ce 100644 --- a/src/backend/commands/typecmds.c +++ b/src/backend/commands/typecmds.c @@ -115,6 +115,7 @@ static Oid findTypeSendFunction(List *procname, Oid typeOid); static Oid findTypeTypmodinFunction(List *procname); static Oid findTypeTypmodoutFunction(List *procname); static Oid findTypeAnalyzeFunction(List *procname, Oid typeOid); +static Oid findTypeSubscriptingFunction(List *procname, Oid typeOid); static Oid findRangeSubOpclass(List *opcname, Oid subtype); static Oid findRangeCanonicalFunction(List *procname, Oid typeOid); static Oid findRangeSubtypeDiffFunction(List *procname, Oid subtype); @@ -149,6 +150,7 @@ DefineType(ParseState *pstate, List *names, List *parameters) List *typmodinName = NIL; List *typmodoutName = NIL; List *analyzeName = NIL; + List *subscriptName = NIL; char category = TYPCATEGORY_USER; bool preferred = false; char delimiter = DEFAULT_TYPDELIM; @@ -167,6 +169,7 @@ DefineType(ParseState *pstate, List *names, List *parameters) DefElem *typmodinNameEl = NULL; DefElem *typmodoutNameEl = NULL; DefElem *analyzeNameEl = NULL; + DefElem *subscriptNameEl = NULL; DefElem *categoryEl = NULL; DefElem *preferredEl = NULL; DefElem *delimiterEl = NULL; @@ -183,6 +186,7 @@ DefineType(ParseState *pstate, List *names, List *parameters) Oid typmodinOid = InvalidOid; Oid typmodoutOid = InvalidOid; Oid analyzeOid = InvalidOid; + Oid subscriptOid = InvalidOid; char *array_type; Oid array_oid; Oid typoid; @@ -288,6 +292,8 @@ DefineType(ParseState *pstate, List *names, List *parameters) else if (strcmp(defel->defname, "analyze") == 0 || strcmp(defel->defname, "analyse") == 0) defelp = &analyzeNameEl; + else if (strcmp(defel->defname, "subscript") == 0) + defelp = &subscriptNameEl; else if (strcmp(defel->defname, "category") == 0) defelp = &categoryEl; else if (strcmp(defel->defname, "preferred") == 0) @@ -358,6 +364,8 @@ DefineType(ParseState *pstate, List *names, List *parameters) typmodoutName = defGetQualifiedName(typmodoutNameEl); if (analyzeNameEl) analyzeName = defGetQualifiedName(analyzeNameEl); + if (subscriptNameEl) + subscriptName = defGetQualifiedName(subscriptNameEl); if (categoryEl) { char *p = defGetString(categoryEl); @@ -482,6 +490,24 @@ DefineType(ParseState *pstate, List *names, List *parameters) if (analyzeName) analyzeOid = findTypeAnalyzeFunction(analyzeName, typoid); + /* + * Likewise look up the subscripting procedure if any. If it is not + * specified, but a typelem is specified, allow that if + * raw_array_subscript_handler can be used. (This is for backwards + * compatibility; maybe someday we should throw an error instead.) + */ + if (subscriptName) + subscriptOid = findTypeSubscriptingFunction(subscriptName, typoid); + else if (OidIsValid(elemType)) + { + if (internalLength > 0 && !byValue && get_typlen(elemType) > 0) + subscriptOid = F_RAW_ARRAY_SUBSCRIPT_HANDLER; + else + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("element type cannot be specified without a valid subscripting procedure"))); + } + /* * Check permissions on functions. We choose to require the creator/owner * of a type to also own the underlying functions. Since creating a type @@ -516,6 +542,9 @@ DefineType(ParseState *pstate, List *names, List *parameters) if (analyzeOid && !pg_proc_ownercheck(analyzeOid, GetUserId())) aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_FUNCTION, NameListToString(analyzeName)); + if (subscriptOid && !pg_proc_ownercheck(subscriptOid, GetUserId())) + aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_FUNCTION, + NameListToString(subscriptName)); #endif /* @@ -551,8 +580,9 @@ DefineType(ParseState *pstate, List *names, List *parameters) typmodinOid, /* typmodin procedure */ typmodoutOid, /* typmodout procedure */ analyzeOid, /* analyze procedure */ + subscriptOid, /* subscript procedure */ elemType, /* element type ID */ - false, /* this is not an array type */ + false, /* this is not an implicit array type */ array_oid, /* array type we are about to create */ InvalidOid, /* base type ID (only for domains) */ defaultValue, /* default type value */ @@ -592,6 +622,7 @@ DefineType(ParseState *pstate, List *names, List *parameters) typmodinOid, /* typmodin procedure */ typmodoutOid, /* typmodout procedure */ F_ARRAY_TYPANALYZE, /* analyze procedure */ + F_ARRAY_SUBSCRIPT_HANDLER, /* array subscript procedure */ typoid, /* element type ID */ true, /* yes this is an array type */ InvalidOid, /* no further array type */ @@ -800,6 +831,12 @@ DefineDomain(CreateDomainStmt *stmt) /* Analysis function */ analyzeProcedure = baseType->typanalyze; + /* + * Domains don't need a subscript procedure, since they are not + * subscriptable on their own. If the base type is subscriptable, the + * parser will reduce the type to the base type before subscripting. + */ + /* Inherited default value */ datum = SysCacheGetAttr(TYPEOID, typeTup, Anum_pg_type_typdefault, &isnull); @@ -993,6 +1030,7 @@ DefineDomain(CreateDomainStmt *stmt) InvalidOid, /* typmodin procedure - none */ InvalidOid, /* typmodout procedure - none */ analyzeProcedure, /* analyze procedure */ + InvalidOid, /* subscript procedure - none */ InvalidOid, /* no array element type */ false, /* this isn't an array */ domainArrayOid, /* array type we are about to create */ @@ -1033,6 +1071,7 @@ DefineDomain(CreateDomainStmt *stmt) InvalidOid, /* typmodin procedure - none */ InvalidOid, /* typmodout procedure - none */ F_ARRAY_TYPANALYZE, /* analyze procedure */ + F_ARRAY_SUBSCRIPT_HANDLER, /* array subscript procedure */ address.objectId, /* element type ID */ true, /* yes this is an array type */ InvalidOid, /* no further array type */ @@ -1148,6 +1187,7 @@ DefineEnum(CreateEnumStmt *stmt) InvalidOid, /* typmodin procedure - none */ InvalidOid, /* typmodout procedure - none */ InvalidOid, /* analyze procedure - default */ + InvalidOid, /* subscript procedure - none */ InvalidOid, /* element type ID */ false, /* this is not an array type */ enumArrayOid, /* array type we are about to create */ @@ -1188,6 +1228,7 @@ DefineEnum(CreateEnumStmt *stmt) InvalidOid, /* typmodin procedure - none */ InvalidOid, /* typmodout procedure - none */ F_ARRAY_TYPANALYZE, /* analyze procedure */ + F_ARRAY_SUBSCRIPT_HANDLER, /* array subscript procedure */ enumTypeAddr.objectId, /* element type ID */ true, /* yes this is an array type */ InvalidOid, /* no further array type */ @@ -1476,6 +1517,7 @@ DefineRange(CreateRangeStmt *stmt) InvalidOid, /* typmodin procedure - none */ InvalidOid, /* typmodout procedure - none */ F_RANGE_TYPANALYZE, /* analyze procedure */ + InvalidOid, /* subscript procedure - none */ InvalidOid, /* element type ID - none */ false, /* this is not an array type */ rangeArrayOid, /* array type we are about to create */ @@ -1519,6 +1561,7 @@ DefineRange(CreateRangeStmt *stmt) InvalidOid, /* typmodin procedure - none */ InvalidOid, /* typmodout procedure - none */ F_ARRAY_TYPANALYZE, /* analyze procedure */ + F_ARRAY_SUBSCRIPT_HANDLER, /* array subscript procedure */ typoid, /* element type ID */ true, /* yes this is an array type */ InvalidOid, /* no further array type */ @@ -1616,7 +1659,7 @@ makeRangeConstructors(const char *name, Oid namespace, /* - * Find suitable I/O functions for a type. + * Find suitable I/O and other support functions for a type. * * typeOid is the type's OID (which will already exist, if only as a shell * type). @@ -1904,6 +1947,45 @@ findTypeAnalyzeFunction(List *procname, Oid typeOid) return procOid; } +static Oid +findTypeSubscriptingFunction(List *procname, Oid typeOid) +{ + Oid argList[1]; + Oid procOid; + + /* + * Subscripting support functions always take one INTERNAL argument and + * return INTERNAL. (The argument is not used, but we must have it to + * maintain type safety.) + */ + argList[0] = INTERNALOID; + + procOid = LookupFuncName(procname, 1, argList, true); + if (!OidIsValid(procOid)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_FUNCTION), + errmsg("function %s does not exist", + func_signature_string(procname, 1, NIL, argList)))); + + if (get_func_rettype(procOid) != INTERNALOID) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("type subscripting function %s must return type %s", + NameListToString(procname), "internal"))); + + /* + * We disallow array_subscript_handler() from being selected explicitly, + * since that must only be applied to autogenerated array types. + */ + if (procOid == F_ARRAY_SUBSCRIPT_HANDLER) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("user-defined types cannot use subscripting function %s", + NameListToString(procname)))); + + return procOid; +} + /* * Find suitable support functions and opclasses for a range type. */ @@ -3221,8 +3303,7 @@ RenameType(RenameStmt *stmt) errhint("Use ALTER TABLE instead."))); /* don't allow direct alteration of array types, either */ - if (OidIsValid(typTup->typelem) && - get_array_type(typTup->typelem) == typeOid) + if (IsTrueArrayType(typTup)) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("cannot alter array type %s", @@ -3303,8 +3384,7 @@ AlterTypeOwner(List *names, Oid newOwnerId, ObjectType objecttype) errhint("Use ALTER TABLE instead."))); /* don't allow direct alteration of array types, either */ - if (OidIsValid(typTup->typelem) && - get_array_type(typTup->typelem) == typeOid) + if (IsTrueArrayType(typTup)) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("cannot alter array type %s", @@ -3869,8 +3949,7 @@ AlterType(AlterTypeStmt *stmt) /* * For the same reasons, don't allow direct alteration of array types. */ - if (OidIsValid(typForm->typelem) && - get_array_type(typForm->typelem) == typeOid) + if (IsTrueArrayType(typForm)) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("%s is not a base type", diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c index 79b325c7cf..44a5d4d70c 100644 --- a/src/backend/executor/execExpr.c +++ b/src/backend/executor/execExpr.c @@ -40,6 +40,7 @@ #include "miscadmin.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" +#include "nodes/subscripting.h" #include "optimizer/optimizer.h" #include "pgstat.h" #include "utils/acl.h" @@ -2523,19 +2524,51 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, ExprState *state, Datum *resv, bool *resnull) { bool isAssignment = (sbsref->refassgnexpr != NULL); - SubscriptingRefState *sbsrefstate = palloc0(sizeof(SubscriptingRefState)); + int nupper = list_length(sbsref->refupperindexpr); + int nlower = list_length(sbsref->reflowerindexpr); + const SubscriptRoutines *sbsroutines; + SubscriptingRefState *sbsrefstate; + SubscriptExecSteps methods; + char *ptr; List *adjust_jumps = NIL; ListCell *lc; int i; + /* Look up the subscripting support methods */ + sbsroutines = getSubscriptingRoutines(sbsref->refcontainertype, NULL); + + /* Allocate sbsrefstate, with enough space for per-subscript arrays too */ + sbsrefstate = palloc0(MAXALIGN(sizeof(SubscriptingRefState)) + + (nupper + nlower) * (sizeof(Datum) + + 2 * sizeof(bool))); + /* Fill constant fields of SubscriptingRefState */ sbsrefstate->isassignment = isAssignment; - sbsrefstate->refelemtype = sbsref->refelemtype; - sbsrefstate->refattrlength = get_typlen(sbsref->refcontainertype); - get_typlenbyvalalign(sbsref->refelemtype, - &sbsrefstate->refelemlength, - &sbsrefstate->refelembyval, - &sbsrefstate->refelemalign); + sbsrefstate->numupper = nupper; + sbsrefstate->numlower = nlower; + /* Set up per-subscript arrays */ + ptr = ((char *) sbsrefstate) + MAXALIGN(sizeof(SubscriptingRefState)); + sbsrefstate->upperindex = (Datum *) ptr; + ptr += nupper * sizeof(Datum); + sbsrefstate->lowerindex = (Datum *) ptr; + ptr += nlower * sizeof(Datum); + sbsrefstate->upperprovided = (bool *) ptr; + ptr += nupper * sizeof(bool); + sbsrefstate->lowerprovided = (bool *) ptr; + ptr += nlower * sizeof(bool); + sbsrefstate->upperindexnull = (bool *) ptr; + ptr += nupper * sizeof(bool); + sbsrefstate->lowerindexnull = (bool *) ptr; + /* ptr += nlower * sizeof(bool); */ + + /* + * Let the container-type-specific code have a chance. It must fill the + * "methods" struct with function pointers for us to possibly use in + * execution steps below; and it can optionally set up some data pointed + * to by the workspace field. + */ + memset(&methods, 0, sizeof(methods)); + sbsroutines->exec_setup(sbsref, sbsrefstate, &methods); /* * Evaluate array input. It's safe to do so into resv/resnull, because we @@ -2548,7 +2581,9 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, /* * If refexpr yields NULL, and it's a fetch, then result is NULL. We can * implement this with just JUMP_IF_NULL, since we evaluated the array - * into the desired target location. + * into the desired target location. (Caution: if you think you'd like to + * relax this, note that contain_nonstrict_functions() believes that + * non-assignment SubscriptingRef is strict.) */ if (!isAssignment) { @@ -2559,19 +2594,6 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, state->steps_len - 1); } - /* Verify subscript list lengths are within limit */ - if (list_length(sbsref->refupperindexpr) > MAXDIM) - ereport(ERROR, - (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), - errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", - list_length(sbsref->refupperindexpr), MAXDIM))); - - if (list_length(sbsref->reflowerindexpr) > MAXDIM) - ereport(ERROR, - (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), - errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", - list_length(sbsref->reflowerindexpr), MAXDIM))); - /* Evaluate upper subscripts */ i = 0; foreach(lc, sbsref->refupperindexpr) @@ -2582,28 +2604,18 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, if (!e) { sbsrefstate->upperprovided[i] = false; - i++; - continue; + sbsrefstate->upperindexnull[i] = true; + } + else + { + sbsrefstate->upperprovided[i] = true; + /* Each subscript is evaluated into appropriate array entry */ + ExecInitExprRec(e, state, + &sbsrefstate->upperindex[i], + &sbsrefstate->upperindexnull[i]); } - - sbsrefstate->upperprovided[i] = true; - - /* Each subscript is evaluated into subscriptvalue/subscriptnull */ - ExecInitExprRec(e, state, - &sbsrefstate->subscriptvalue, &sbsrefstate->subscriptnull); - - /* ... and then SBSREF_SUBSCRIPT saves it into step's workspace */ - scratch->opcode = EEOP_SBSREF_SUBSCRIPT; - scratch->d.sbsref_subscript.state = sbsrefstate; - scratch->d.sbsref_subscript.off = i; - scratch->d.sbsref_subscript.isupper = true; - scratch->d.sbsref_subscript.jumpdone = -1; /* adjust later */ - ExprEvalPushStep(state, scratch); - adjust_jumps = lappend_int(adjust_jumps, - state->steps_len - 1); i++; } - sbsrefstate->numupper = i; /* Evaluate lower subscripts similarly */ i = 0; @@ -2615,39 +2627,43 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, if (!e) { sbsrefstate->lowerprovided[i] = false; - i++; - continue; + sbsrefstate->lowerindexnull[i] = true; } + else + { + sbsrefstate->lowerprovided[i] = true; + /* Each subscript is evaluated into appropriate array entry */ + ExecInitExprRec(e, state, + &sbsrefstate->lowerindex[i], + &sbsrefstate->lowerindexnull[i]); + } + i++; + } - sbsrefstate->lowerprovided[i] = true; - - /* Each subscript is evaluated into subscriptvalue/subscriptnull */ - ExecInitExprRec(e, state, - &sbsrefstate->subscriptvalue, &sbsrefstate->subscriptnull); - - /* ... and then SBSREF_SUBSCRIPT saves it into step's workspace */ - scratch->opcode = EEOP_SBSREF_SUBSCRIPT; + /* SBSREF_SUBSCRIPTS checks and converts all the subscripts at once */ + if (methods.sbs_check_subscripts) + { + scratch->opcode = EEOP_SBSREF_SUBSCRIPTS; + scratch->d.sbsref_subscript.subscriptfunc = methods.sbs_check_subscripts; scratch->d.sbsref_subscript.state = sbsrefstate; - scratch->d.sbsref_subscript.off = i; - scratch->d.sbsref_subscript.isupper = false; scratch->d.sbsref_subscript.jumpdone = -1; /* adjust later */ ExprEvalPushStep(state, scratch); adjust_jumps = lappend_int(adjust_jumps, state->steps_len - 1); - i++; } - sbsrefstate->numlower = i; - - /* Should be impossible if parser is sane, but check anyway: */ - if (sbsrefstate->numlower != 0 && - sbsrefstate->numupper != sbsrefstate->numlower) - elog(ERROR, "upper and lower index lists are not same length"); if (isAssignment) { Datum *save_innermost_caseval; bool *save_innermost_casenull; + /* Check for unimplemented methods */ + if (!methods.sbs_assign) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("type %s does not support subscripted assignment", + format_type_be(sbsref->refcontainertype)))); + /* * We might have a nested-assignment situation, in which the * refassgnexpr is itself a FieldStore or SubscriptingRef that needs @@ -2664,7 +2680,13 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, */ if (isAssignmentIndirectionExpr(sbsref->refassgnexpr)) { + if (!methods.sbs_fetch_old) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("type %s does not support subscripted assignment", + format_type_be(sbsref->refcontainertype)))); scratch->opcode = EEOP_SBSREF_OLD; + scratch->d.sbsref.subscriptfunc = methods.sbs_fetch_old; scratch->d.sbsref.state = sbsrefstate; ExprEvalPushStep(state, scratch); } @@ -2684,17 +2706,17 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, /* and perform the assignment */ scratch->opcode = EEOP_SBSREF_ASSIGN; + scratch->d.sbsref.subscriptfunc = methods.sbs_assign; scratch->d.sbsref.state = sbsrefstate; ExprEvalPushStep(state, scratch); - } else { /* array fetch is much simpler */ scratch->opcode = EEOP_SBSREF_FETCH; + scratch->d.sbsref.subscriptfunc = methods.sbs_fetch; scratch->d.sbsref.state = sbsrefstate; ExprEvalPushStep(state, scratch); - } /* adjust jump targets */ @@ -2702,7 +2724,7 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, { ExprEvalStep *as = &state->steps[lfirst_int(lc)]; - if (as->opcode == EEOP_SBSREF_SUBSCRIPT) + if (as->opcode == EEOP_SBSREF_SUBSCRIPTS) { Assert(as->d.sbsref_subscript.jumpdone == -1); as->d.sbsref_subscript.jumpdone = state->steps_len; diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index c09371ad58..a4b71fb554 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -417,7 +417,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) &&CASE_EEOP_FIELDSELECT, &&CASE_EEOP_FIELDSTORE_DEFORM, &&CASE_EEOP_FIELDSTORE_FORM, - &&CASE_EEOP_SBSREF_SUBSCRIPT, + &&CASE_EEOP_SBSREF_SUBSCRIPTS, &&CASE_EEOP_SBSREF_OLD, &&CASE_EEOP_SBSREF_ASSIGN, &&CASE_EEOP_SBSREF_FETCH, @@ -1396,12 +1396,10 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) EEO_NEXT(); } - EEO_CASE(EEOP_SBSREF_SUBSCRIPT) + EEO_CASE(EEOP_SBSREF_SUBSCRIPTS) { - /* Process an array subscript */ - - /* too complex for an inline implementation */ - if (ExecEvalSubscriptingRef(state, op)) + /* Process container subscript(s) */ + if (op->d.sbsref_subscript.subscriptfunc(state, op, econtext)) { EEO_NEXT(); } @@ -1419,9 +1417,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) * referenced (via a CaseTestExpr) inside the assignment * expression. */ - - /* too complex for an inline implementation */ - ExecEvalSubscriptingRefOld(state, op); + op->d.sbsref.subscriptfunc(state, op, econtext); EEO_NEXT(); } @@ -1431,19 +1427,17 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) */ EEO_CASE(EEOP_SBSREF_ASSIGN) { - /* too complex for an inline implementation */ - ExecEvalSubscriptingRefAssign(state, op); + op->d.sbsref.subscriptfunc(state, op, econtext); EEO_NEXT(); } /* - * Fetch subset of an array. + * Perform SubscriptingRef fetch */ EEO_CASE(EEOP_SBSREF_FETCH) { - /* too complex for an inline implementation */ - ExecEvalSubscriptingRefFetch(state, op); + op->d.sbsref.subscriptfunc(state, op, econtext); EEO_NEXT(); } @@ -3122,200 +3116,6 @@ ExecEvalFieldStoreForm(ExprState *state, ExprEvalStep *op, ExprContext *econtext *op->resnull = false; } -/* - * Process a subscript in a SubscriptingRef expression. - * - * If subscript is NULL, throw error in assignment case, or in fetch case - * set result to NULL and return false (instructing caller to skip the rest - * of the SubscriptingRef sequence). - * - * Subscript expression result is in subscriptvalue/subscriptnull. - * On success, integer subscript value has been saved in upperindex[] or - * lowerindex[] for use later. - */ -bool -ExecEvalSubscriptingRef(ExprState *state, ExprEvalStep *op) -{ - SubscriptingRefState *sbsrefstate = op->d.sbsref_subscript.state; - int *indexes; - int off; - - /* If any index expr yields NULL, result is NULL or error */ - if (sbsrefstate->subscriptnull) - { - if (sbsrefstate->isassignment) - ereport(ERROR, - (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), - errmsg("array subscript in assignment must not be null"))); - *op->resnull = true; - return false; - } - - /* Convert datum to int, save in appropriate place */ - if (op->d.sbsref_subscript.isupper) - indexes = sbsrefstate->upperindex; - else - indexes = sbsrefstate->lowerindex; - off = op->d.sbsref_subscript.off; - - indexes[off] = DatumGetInt32(sbsrefstate->subscriptvalue); - - return true; -} - -/* - * Evaluate SubscriptingRef fetch. - * - * Source container is in step's result variable. - */ -void -ExecEvalSubscriptingRefFetch(ExprState *state, ExprEvalStep *op) -{ - SubscriptingRefState *sbsrefstate = op->d.sbsref.state; - - /* Should not get here if source container (or any subscript) is null */ - Assert(!(*op->resnull)); - - if (sbsrefstate->numlower == 0) - { - /* Scalar case */ - *op->resvalue = array_get_element(*op->resvalue, - sbsrefstate->numupper, - sbsrefstate->upperindex, - sbsrefstate->refattrlength, - sbsrefstate->refelemlength, - sbsrefstate->refelembyval, - sbsrefstate->refelemalign, - op->resnull); - } - else - { - /* Slice case */ - *op->resvalue = array_get_slice(*op->resvalue, - sbsrefstate->numupper, - sbsrefstate->upperindex, - sbsrefstate->lowerindex, - sbsrefstate->upperprovided, - sbsrefstate->lowerprovided, - sbsrefstate->refattrlength, - sbsrefstate->refelemlength, - sbsrefstate->refelembyval, - sbsrefstate->refelemalign); - } -} - -/* - * Compute old container element/slice value for a SubscriptingRef assignment - * expression. Will only be generated if the new-value subexpression - * contains SubscriptingRef or FieldStore. The value is stored into the - * SubscriptingRefState's prevvalue/prevnull fields. - */ -void -ExecEvalSubscriptingRefOld(ExprState *state, ExprEvalStep *op) -{ - SubscriptingRefState *sbsrefstate = op->d.sbsref.state; - - if (*op->resnull) - { - /* whole array is null, so any element or slice is too */ - sbsrefstate->prevvalue = (Datum) 0; - sbsrefstate->prevnull = true; - } - else if (sbsrefstate->numlower == 0) - { - /* Scalar case */ - sbsrefstate->prevvalue = array_get_element(*op->resvalue, - sbsrefstate->numupper, - sbsrefstate->upperindex, - sbsrefstate->refattrlength, - sbsrefstate->refelemlength, - sbsrefstate->refelembyval, - sbsrefstate->refelemalign, - &sbsrefstate->prevnull); - } - else - { - /* Slice case */ - /* this is currently unreachable */ - sbsrefstate->prevvalue = array_get_slice(*op->resvalue, - sbsrefstate->numupper, - sbsrefstate->upperindex, - sbsrefstate->lowerindex, - sbsrefstate->upperprovided, - sbsrefstate->lowerprovided, - sbsrefstate->refattrlength, - sbsrefstate->refelemlength, - sbsrefstate->refelembyval, - sbsrefstate->refelemalign); - sbsrefstate->prevnull = false; - } -} - -/* - * Evaluate SubscriptingRef assignment. - * - * Input container (possibly null) is in result area, replacement value is in - * SubscriptingRefState's replacevalue/replacenull. - */ -void -ExecEvalSubscriptingRefAssign(ExprState *state, ExprEvalStep *op) -{ - SubscriptingRefState *sbsrefstate = op->d.sbsref_subscript.state; - - /* - * For an assignment to a fixed-length container type, both the original - * container and the value to be assigned into it must be non-NULL, else - * we punt and return the original container. - */ - if (sbsrefstate->refattrlength > 0) - { - if (*op->resnull || sbsrefstate->replacenull) - return; - } - - /* - * For assignment to varlena arrays, we handle a NULL original array by - * substituting an empty (zero-dimensional) array; insertion of the new - * element will result in a singleton array value. It does not matter - * whether the new element is NULL. - */ - if (*op->resnull) - { - *op->resvalue = PointerGetDatum(construct_empty_array(sbsrefstate->refelemtype)); - *op->resnull = false; - } - - if (sbsrefstate->numlower == 0) - { - /* Scalar case */ - *op->resvalue = array_set_element(*op->resvalue, - sbsrefstate->numupper, - sbsrefstate->upperindex, - sbsrefstate->replacevalue, - sbsrefstate->replacenull, - sbsrefstate->refattrlength, - sbsrefstate->refelemlength, - sbsrefstate->refelembyval, - sbsrefstate->refelemalign); - } - else - { - /* Slice case */ - *op->resvalue = array_set_slice(*op->resvalue, - sbsrefstate->numupper, - sbsrefstate->upperindex, - sbsrefstate->lowerindex, - sbsrefstate->upperprovided, - sbsrefstate->lowerprovided, - sbsrefstate->replacevalue, - sbsrefstate->replacenull, - sbsrefstate->refattrlength, - sbsrefstate->refelemlength, - sbsrefstate->refelembyval, - sbsrefstate->refelemalign); - } -} - /* * Evaluate a rowtype coercion operation. * This may require rearranging field positions. diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c index f232397cab..e7f0d84521 100644 --- a/src/backend/jit/llvm/llvmjit_expr.c +++ b/src/backend/jit/llvm/llvmjit_expr.c @@ -1116,22 +1116,35 @@ llvm_compile_expr(ExprState *state) } case EEOP_SBSREF_OLD: - build_EvalXFunc(b, mod, "ExecEvalSubscriptingRefOld", - v_state, op); - LLVMBuildBr(b, opblocks[opno + 1]); - break; - case EEOP_SBSREF_ASSIGN: - build_EvalXFunc(b, mod, "ExecEvalSubscriptingRefAssign", - v_state, op); - LLVMBuildBr(b, opblocks[opno + 1]); - break; - case EEOP_SBSREF_FETCH: - build_EvalXFunc(b, mod, "ExecEvalSubscriptingRefFetch", - v_state, op); - LLVMBuildBr(b, opblocks[opno + 1]); - break; + { + LLVMTypeRef param_types[3]; + LLVMValueRef v_params[3]; + LLVMTypeRef v_functype; + LLVMValueRef v_func; + + param_types[0] = l_ptr(StructExprState); + param_types[1] = l_ptr(TypeSizeT); + param_types[2] = l_ptr(StructExprContext); + + v_functype = LLVMFunctionType(LLVMVoidType(), + param_types, + lengthof(param_types), + false); + v_func = l_ptr_const(op->d.sbsref.subscriptfunc, + l_ptr(v_functype)); + + v_params[0] = v_state; + v_params[1] = l_ptr_const(op, l_ptr(TypeSizeT)); + v_params[2] = v_econtext; + LLVMBuildCall(b, + v_func, + v_params, lengthof(v_params), ""); + + LLVMBuildBr(b, opblocks[opno + 1]); + break; + } case EEOP_CASE_TESTVAL: { @@ -1746,13 +1759,32 @@ llvm_compile_expr(ExprState *state) LLVMBuildBr(b, opblocks[opno + 1]); break; - case EEOP_SBSREF_SUBSCRIPT: + case EEOP_SBSREF_SUBSCRIPTS: { int jumpdone = op->d.sbsref_subscript.jumpdone; + LLVMTypeRef param_types[3]; + LLVMValueRef v_params[3]; + LLVMTypeRef v_functype; + LLVMValueRef v_func; LLVMValueRef v_ret; - v_ret = build_EvalXFunc(b, mod, "ExecEvalSubscriptingRef", - v_state, op); + param_types[0] = l_ptr(StructExprState); + param_types[1] = l_ptr(TypeSizeT); + param_types[2] = l_ptr(StructExprContext); + + v_functype = LLVMFunctionType(TypeParamBool, + param_types, + lengthof(param_types), + false); + v_func = l_ptr_const(op->d.sbsref_subscript.subscriptfunc, + l_ptr(v_functype)); + + v_params[0] = v_state; + v_params[1] = l_ptr_const(op, l_ptr(TypeSizeT)); + v_params[2] = v_econtext; + v_ret = LLVMBuildCall(b, + v_func, + v_params, lengthof(v_params), ""); v_ret = LLVMBuildZExt(b, v_ret, TypeStorageBool, ""); LLVMBuildCondBr(b, diff --git a/src/backend/jit/llvm/llvmjit_types.c b/src/backend/jit/llvm/llvmjit_types.c index 1ed3cafa2f..ae3c88aad9 100644 --- a/src/backend/jit/llvm/llvmjit_types.c +++ b/src/backend/jit/llvm/llvmjit_types.c @@ -124,10 +124,6 @@ void *referenced_functions[] = ExecEvalSQLValueFunction, ExecEvalScalarArrayOp, ExecEvalSubPlan, - ExecEvalSubscriptingRef, - ExecEvalSubscriptingRefAssign, - ExecEvalSubscriptingRefFetch, - ExecEvalSubscriptingRefOld, ExecEvalSysVar, ExecEvalWholeRowVar, ExecEvalXmlExpr, diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 910906f639..70f8b718e0 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -1548,6 +1548,7 @@ _copySubscriptingRef(const SubscriptingRef *from) COPY_SCALAR_FIELD(refcontainertype); COPY_SCALAR_FIELD(refelemtype); + COPY_SCALAR_FIELD(refrestype); COPY_SCALAR_FIELD(reftypmod); COPY_SCALAR_FIELD(refcollid); COPY_NODE_FIELD(refupperindexpr); diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index 687609f59e..541e0e6b48 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -276,6 +276,7 @@ _equalSubscriptingRef(const SubscriptingRef *a, const SubscriptingRef *b) { COMPARE_SCALAR_FIELD(refcontainertype); COMPARE_SCALAR_FIELD(refelemtype); + COMPARE_SCALAR_FIELD(refrestype); COMPARE_SCALAR_FIELD(reftypmod); COMPARE_SCALAR_FIELD(refcollid); COMPARE_NODE_FIELD(refupperindexpr); diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c index 1dc873ed25..b2f1f91ab0 100644 --- a/src/backend/nodes/nodeFuncs.c +++ b/src/backend/nodes/nodeFuncs.c @@ -66,15 +66,7 @@ exprType(const Node *expr) type = ((const WindowFunc *) expr)->wintype; break; case T_SubscriptingRef: - { - const SubscriptingRef *sbsref = (const SubscriptingRef *) expr; - - /* slice and/or store operations yield the container type */ - if (sbsref->reflowerindexpr || sbsref->refassgnexpr) - type = sbsref->refcontainertype; - else - type = sbsref->refelemtype; - } + type = ((const SubscriptingRef *) expr)->refrestype; break; case T_FuncExpr: type = ((const FuncExpr *) expr)->funcresulttype; @@ -286,7 +278,6 @@ exprTypmod(const Node *expr) case T_Param: return ((const Param *) expr)->paramtypmod; case T_SubscriptingRef: - /* typmod is the same for container or element */ return ((const SubscriptingRef *) expr)->reftypmod; case T_FuncExpr: { @@ -1723,6 +1714,20 @@ check_functions_in_node(Node *node, check_function_callback checker, return true; } break; + case T_SubscriptingRef: + { + SubscriptingRef *expr = (SubscriptingRef *) node; + + /* + * We assume that subscripting assignment is leaky but + * subscripting fetch is leakproof. (This constrains + * type-specific subscripting implementations; maybe we should + * relax it someday.) + */ + if (expr->refassgnexpr != NULL) + return true; + } + break; case T_FuncExpr: { FuncExpr *expr = (FuncExpr *) node; diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 9c73c605a4..98c23470e6 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -1194,6 +1194,7 @@ _outSubscriptingRef(StringInfo str, const SubscriptingRef *node) WRITE_OID_FIELD(refcontainertype); WRITE_OID_FIELD(refelemtype); + WRITE_OID_FIELD(refrestype); WRITE_INT_FIELD(reftypmod); WRITE_OID_FIELD(refcollid); WRITE_NODE_FIELD(refupperindexpr); diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c index 169d5581b9..0f6a77afc4 100644 --- a/src/backend/nodes/readfuncs.c +++ b/src/backend/nodes/readfuncs.c @@ -671,6 +671,7 @@ _readSubscriptingRef(void) READ_OID_FIELD(refcontainertype); READ_OID_FIELD(refelemtype); + READ_OID_FIELD(refrestype); READ_INT_FIELD(reftypmod); READ_OID_FIELD(refcollid); READ_NODE_FIELD(refupperindexpr); diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index 587d494c34..1fb9e41b9d 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -840,8 +840,9 @@ contain_nonstrict_functions_walker(Node *node, void *context) if (IsA(node, SubscriptingRef)) { /* - * subscripting assignment is nonstrict, but subscripting itself is - * strict + * Subscripting assignment is nonstrict, but subscripting fetch is + * assumed strict. (This constrains type-specific subscripting + * implementations; maybe we should relax it someday.) */ if (((SubscriptingRef *) node)->refassgnexpr != NULL) return true; @@ -2843,6 +2844,11 @@ eval_const_expressions_mutator(Node *node, * known to be immutable, and for which we need no smarts * beyond "simplify if all inputs are constants". * + * Treating SubscriptingRef this way assumes that subscripting + * fetch and assignment are both immutable. This constrains + * type-specific subscripting implementations; maybe we should + * relax it someday. + * * Treating MinMaxExpr this way amounts to assuming that the * btree comparison function it calls is immutable; see the * reasoning in contain_mutable_functions_walker. @@ -3106,10 +3112,10 @@ eval_const_expressions_mutator(Node *node, { /* * This case could be folded into the generic handling used - * for SubscriptingRef etc. But because the simplification - * logic is so trivial, applying evaluate_expr() to perform it - * would be a heavy overhead. BooleanTest is probably common - * enough to justify keeping this bespoke implementation. + * for ArrayExpr etc. But because the simplification logic is + * so trivial, applying evaluate_expr() to perform it would be + * a heavy overhead. BooleanTest is probably common enough to + * justify keeping this bespoke implementation. */ BooleanTest *btest = (BooleanTest *) node; BooleanTest *newbtest; diff --git a/src/backend/parser/parse_coerce.c b/src/backend/parser/parse_coerce.c index a2924e3d1c..da6c3ae4b5 100644 --- a/src/backend/parser/parse_coerce.c +++ b/src/backend/parser/parse_coerce.c @@ -26,6 +26,7 @@ #include "parser/parse_type.h" #include "utils/builtins.h" #include "utils/datum.h" /* needed for datumIsEqual() */ +#include "utils/fmgroids.h" #include "utils/lsyscache.h" #include "utils/syscache.h" #include "utils/typcache.h" @@ -2854,8 +2855,8 @@ find_typmod_coercion_function(Oid typeId, targetType = typeidType(typeId); typeForm = (Form_pg_type) GETSTRUCT(targetType); - /* Check for a varlena array type */ - if (typeForm->typelem != InvalidOid && typeForm->typlen == -1) + /* Check for a "true" array type */ + if (IsTrueArrayType(typeForm)) { /* Yes, switch our attention to the element type */ typeId = typeForm->typelem; diff --git a/src/backend/parser/parse_collate.c b/src/backend/parser/parse_collate.c index bf800f5937..13e62a2015 100644 --- a/src/backend/parser/parse_collate.c +++ b/src/backend/parser/parse_collate.c @@ -667,6 +667,29 @@ assign_collations_walker(Node *node, assign_collations_context *context) &loccontext); } break; + case T_SubscriptingRef: + { + /* + * The subscripts are treated as independent + * expressions not contributing to the node's + * collation. Only the container, and the source + * expression if any, contribute. (This models + * the old behavior, in which the subscripts could + * be counted on to be integers and thus not + * contribute anything.) + */ + SubscriptingRef *sbsref = (SubscriptingRef *) node; + + assign_expr_collations(context->pstate, + (Node *) sbsref->refupperindexpr); + assign_expr_collations(context->pstate, + (Node *) sbsref->reflowerindexpr); + (void) assign_collations_walker((Node *) sbsref->refexpr, + &loccontext); + (void) assign_collations_walker((Node *) sbsref->refassgnexpr, + &loccontext); + } + break; default: /* diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 36002f059d..974168f55b 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -464,10 +464,9 @@ transformIndirection(ParseState *pstate, A_Indirection *ind) result = (Node *) transformContainerSubscripts(pstate, result, exprType(result), - InvalidOid, exprTypmod(result), subscripts, - NULL); + false); subscripts = NIL; newresult = ParseFuncOrColumn(pstate, @@ -487,10 +486,9 @@ transformIndirection(ParseState *pstate, A_Indirection *ind) result = (Node *) transformContainerSubscripts(pstate, result, exprType(result), - InvalidOid, exprTypmod(result), subscripts, - NULL); + false); return result; } diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c index 6e98fe55fc..e90f6c9d01 100644 --- a/src/backend/parser/parse_node.c +++ b/src/backend/parser/parse_node.c @@ -20,6 +20,7 @@ #include "mb/pg_wchar.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" +#include "nodes/subscripting.h" #include "parser/parse_coerce.h" #include "parser/parse_expr.h" #include "parser/parse_relation.h" @@ -182,23 +183,16 @@ pcb_error_callback(void *arg) /* * transformContainerType() - * Identify the types involved in a subscripting operation for container + * Identify the actual container type for a subscripting operation. * - * - * On entry, containerType/containerTypmod identify the type of the input value - * to be subscripted (which could be a domain type). These are modified if - * necessary to identify the actual container type and typmod, and the - * container's element type is returned. An error is thrown if the input isn't - * an array type. + * containerType/containerTypmod are modified if necessary to identify + * the actual container type and typmod. This mainly involves smashing + * any domain to its base type, but there are some special considerations. + * Note that caller still needs to check if the result type is a container. */ -Oid +void transformContainerType(Oid *containerType, int32 *containerTypmod) { - Oid origContainerType = *containerType; - Oid elementType; - HeapTuple type_tuple_container; - Form_pg_type type_struct_container; - /* * If the input is a domain, smash to base type, and extract the actual * typmod to be applied to the base type. Subscripting a domain is an @@ -209,35 +203,16 @@ transformContainerType(Oid *containerType, int32 *containerTypmod) *containerType = getBaseTypeAndTypmod(*containerType, containerTypmod); /* - * Here is an array specific code. We treat int2vector and oidvector as - * though they were domains over int2[] and oid[]. This is needed because - * array slicing could create an array that doesn't satisfy the - * dimensionality constraints of the xxxvector type; so we want the result - * of a slice operation to be considered to be of the more general type. + * We treat int2vector and oidvector as though they were domains over + * int2[] and oid[]. This is needed because array slicing could create an + * array that doesn't satisfy the dimensionality constraints of the + * xxxvector type; so we want the result of a slice operation to be + * considered to be of the more general type. */ if (*containerType == INT2VECTOROID) *containerType = INT2ARRAYOID; else if (*containerType == OIDVECTOROID) *containerType = OIDARRAYOID; - - /* Get the type tuple for the container */ - type_tuple_container = SearchSysCache1(TYPEOID, ObjectIdGetDatum(*containerType)); - if (!HeapTupleIsValid(type_tuple_container)) - elog(ERROR, "cache lookup failed for type %u", *containerType); - type_struct_container = (Form_pg_type) GETSTRUCT(type_tuple_container); - - /* needn't check typisdefined since this will fail anyway */ - - elementType = type_struct_container->typelem; - if (elementType == InvalidOid) - ereport(ERROR, - (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("cannot subscript type %s because it is not an array", - format_type_be(origContainerType)))); - - ReleaseSysCache(type_tuple_container); - - return elementType; } /* @@ -249,13 +224,14 @@ transformContainerType(Oid *containerType, int32 *containerTypmod) * an expression that represents the result of extracting a single container * element or a container slice. * - * In a container assignment, we are given a destination container value plus a - * source value that is to be assigned to a single element or a slice of that - * container. We produce an expression that represents the new container value - * with the source data inserted into the right part of the container. + * Container assignments are treated basically the same as container fetches + * here. The caller will modify the result node to insert the source value + * that is to be assigned to the element or slice that a fetch would have + * retrieved. The execution result will be a new container value with + * the source value inserted into the right part of the container. * - * For both cases, if the source container is of a domain-over-array type, - * the result is of the base array type or its element type; essentially, + * For both cases, if the source is of a domain-over-container type, the + * result is the same as if it had been of the container type; essentially, * we must fold a domain to its base type before applying subscripting. * (Note that int2vector and oidvector are treated as domains here.) * @@ -264,48 +240,48 @@ transformContainerType(Oid *containerType, int32 *containerTypmod) * containerType OID of container's datatype (should match type of * containerBase, or be the base type of containerBase's * domain type) - * elementType OID of container's element type (fetch with - * transformContainerType, or pass InvalidOid to do it here) - * containerTypMod typmod for the container (which is also typmod for the - * elements) + * containerTypMod typmod for the container * indirection Untransformed list of subscripts (must not be NIL) - * assignFrom NULL for container fetch, else transformed expression for - * source. + * isAssignment True if this will become a container assignment. */ SubscriptingRef * transformContainerSubscripts(ParseState *pstate, Node *containerBase, Oid containerType, - Oid elementType, int32 containerTypMod, List *indirection, - Node *assignFrom) + bool isAssignment) { + SubscriptingRef *sbsref; + const SubscriptRoutines *sbsroutines; + Oid elementType; bool isSlice = false; - List *upperIndexpr = NIL; - List *lowerIndexpr = NIL; ListCell *idx; - SubscriptingRef *sbsref; /* - * Caller may or may not have bothered to determine elementType. Note - * that if the caller did do so, containerType/containerTypMod must be as - * modified by transformContainerType, ie, smash domain to base type. + * Determine the actual container type, smashing any domain. In the + * assignment case the caller already did this, since it also needs to + * know the actual container type. */ - if (!OidIsValid(elementType)) - elementType = transformContainerType(&containerType, &containerTypMod); + if (!isAssignment) + transformContainerType(&containerType, &containerTypMod); /* + * Verify that the container type is subscriptable, and get its support + * functions and typelem. + */ + sbsroutines = getSubscriptingRoutines(containerType, &elementType); + + /* + * Detect whether any of the indirection items are slice specifiers. + * * A list containing only simple subscripts refers to a single container * element. If any of the items are slice specifiers (lower:upper), then - * the subscript expression means a container slice operation. In this - * case, we convert any non-slice items to slices by treating the single - * subscript as the upper bound and supplying an assumed lower bound of 1. - * We have to prescan the list to see if there are any slice items. + * the subscript expression means a container slice operation. */ foreach(idx, indirection) { - A_Indices *ai = (A_Indices *) lfirst(idx); + A_Indices *ai = lfirst_node(A_Indices, idx); if (ai->is_slice) { @@ -314,121 +290,36 @@ transformContainerSubscripts(ParseState *pstate, } } - /* - * Transform the subscript expressions. - */ - foreach(idx, indirection) - { - A_Indices *ai = lfirst_node(A_Indices, idx); - Node *subexpr; - - if (isSlice) - { - if (ai->lidx) - { - subexpr = transformExpr(pstate, ai->lidx, pstate->p_expr_kind); - /* If it's not int4 already, try to coerce */ - subexpr = coerce_to_target_type(pstate, - subexpr, exprType(subexpr), - INT4OID, -1, - COERCION_ASSIGNMENT, - COERCE_IMPLICIT_CAST, - -1); - if (subexpr == NULL) - ereport(ERROR, - (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("array subscript must have type integer"), - parser_errposition(pstate, exprLocation(ai->lidx)))); - } - else if (!ai->is_slice) - { - /* Make a constant 1 */ - subexpr = (Node *) makeConst(INT4OID, - -1, - InvalidOid, - sizeof(int32), - Int32GetDatum(1), - false, - true); /* pass by value */ - } - else - { - /* Slice with omitted lower bound, put NULL into the list */ - subexpr = NULL; - } - lowerIndexpr = lappend(lowerIndexpr, subexpr); - } - else - Assert(ai->lidx == NULL && !ai->is_slice); - - if (ai->uidx) - { - subexpr = transformExpr(pstate, ai->uidx, pstate->p_expr_kind); - /* If it's not int4 already, try to coerce */ - subexpr = coerce_to_target_type(pstate, - subexpr, exprType(subexpr), - INT4OID, -1, - COERCION_ASSIGNMENT, - COERCE_IMPLICIT_CAST, - -1); - if (subexpr == NULL) - ereport(ERROR, - (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("array subscript must have type integer"), - parser_errposition(pstate, exprLocation(ai->uidx)))); - } - else - { - /* Slice with omitted upper bound, put NULL into the list */ - Assert(isSlice && ai->is_slice); - subexpr = NULL; - } - upperIndexpr = lappend(upperIndexpr, subexpr); - } - - /* - * If doing an array store, coerce the source value to the right type. - * (This should agree with the coercion done by transformAssignedExpr.) - */ - if (assignFrom != NULL) - { - Oid typesource = exprType(assignFrom); - Oid typeneeded = isSlice ? containerType : elementType; - Node *newFrom; - - newFrom = coerce_to_target_type(pstate, - assignFrom, typesource, - typeneeded, containerTypMod, - COERCION_ASSIGNMENT, - COERCE_IMPLICIT_CAST, - -1); - if (newFrom == NULL) - ereport(ERROR, - (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("array assignment requires type %s" - " but expression is of type %s", - format_type_be(typeneeded), - format_type_be(typesource)), - errhint("You will need to rewrite or cast the expression."), - parser_errposition(pstate, exprLocation(assignFrom)))); - assignFrom = newFrom; - } - /* * Ready to build the SubscriptingRef node. */ - sbsref = (SubscriptingRef *) makeNode(SubscriptingRef); - if (assignFrom != NULL) - sbsref->refassgnexpr = (Expr *) assignFrom; + sbsref = makeNode(SubscriptingRef); sbsref->refcontainertype = containerType; sbsref->refelemtype = elementType; + /* refrestype is to be set by container-specific logic */ sbsref->reftypmod = containerTypMod; /* refcollid will be set by parse_collate.c */ - sbsref->refupperindexpr = upperIndexpr; - sbsref->reflowerindexpr = lowerIndexpr; + /* refupperindexpr, reflowerindexpr are to be set by container logic */ sbsref->refexpr = (Expr *) containerBase; - sbsref->refassgnexpr = (Expr *) assignFrom; + sbsref->refassgnexpr = NULL; /* caller will fill if it's an assignment */ + + /* + * Call the container-type-specific logic to transform the subscripts and + * determine the subscripting result type. + */ + sbsroutines->transform(sbsref, indirection, pstate, + isSlice, isAssignment); + + /* + * Verify we got a valid type (this defends, for example, against someone + * using array_subscript_handler as typsubscript without setting typelem). + */ + if (!OidIsValid(sbsref->refrestype)) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("cannot subscript type %s because it does not support subscripting", + format_type_be(containerType)))); return sbsref; } diff --git a/src/backend/parser/parse_target.c b/src/backend/parser/parse_target.c index 9de0cff833..df3d405ca9 100644 --- a/src/backend/parser/parse_target.c +++ b/src/backend/parser/parse_target.c @@ -861,7 +861,7 @@ transformAssignmentIndirection(ParseState *pstate, if (targetIsSubscripting) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("array assignment to \"%s\" requires type %s" + errmsg("subscripted assignment to \"%s\" requires type %s" " but expression is of type %s", targetName, format_type_be(targetTypeId), @@ -901,26 +901,37 @@ transformAssignmentSubscripts(ParseState *pstate, int location) { Node *result; + SubscriptingRef *sbsref; Oid containerType; int32 containerTypMod; - Oid elementTypeId; Oid typeNeeded; + int32 typmodNeeded; Oid collationNeeded; Assert(subscripts != NIL); - /* Identify the actual array type and element type involved */ + /* Identify the actual container type involved */ containerType = targetTypeId; containerTypMod = targetTypMod; - elementTypeId = transformContainerType(&containerType, &containerTypMod); + transformContainerType(&containerType, &containerTypMod); - /* Identify type that RHS must provide */ - typeNeeded = isSlice ? containerType : elementTypeId; + /* Process subscripts and identify required type for RHS */ + sbsref = transformContainerSubscripts(pstate, + basenode, + containerType, + containerTypMod, + subscripts, + true); + + typeNeeded = sbsref->refrestype; + typmodNeeded = sbsref->reftypmod; /* - * container normally has same collation as elements, but there's an - * exception: we might be subscripting a domain over a container type. In - * that case use collation of the base type. + * Container normally has same collation as its elements, but there's an + * exception: we might be subscripting a domain over a container type. In + * that case use collation of the base type. (This is shaky for arbitrary + * subscripting semantics, but it doesn't matter all that much since we + * only use this to label the collation of a possible CaseTestExpr.) */ if (containerType == targetTypeId) collationNeeded = targetCollation; @@ -933,21 +944,22 @@ transformAssignmentSubscripts(ParseState *pstate, targetName, true, typeNeeded, - containerTypMod, + typmodNeeded, collationNeeded, indirection, next_indirection, rhs, location); - /* process subscripts */ - result = (Node *) transformContainerSubscripts(pstate, - basenode, - containerType, - elementTypeId, - containerTypMod, - subscripts, - rhs); + /* + * Insert the already-properly-coerced RHS into the SubscriptingRef. Then + * set refrestype and reftypmod back to the container type's values. + */ + sbsref->refassgnexpr = (Expr *) rhs; + sbsref->refrestype = containerType; + sbsref->reftypmod = containerTypMod; + + result = (Node *) sbsref; /* If target was a domain over container, need to coerce up to the domain */ if (containerType != targetTypeId) diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index f6ec7b64cd..ce09ad7375 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -17,6 +17,7 @@ OBJS = \ array_typanalyze.o \ array_userfuncs.o \ arrayfuncs.o \ + arraysubs.o \ arrayutils.o \ ascii.o \ bool.o \ diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c index a7ea7656c7..4c8a739bc4 100644 --- a/src/backend/utils/adt/arrayfuncs.c +++ b/src/backend/utils/adt/arrayfuncs.c @@ -2044,7 +2044,8 @@ array_get_element_expanded(Datum arraydatum, * array bound. * * NOTE: we assume it is OK to scribble on the provided subscript arrays - * lowerIndx[] and upperIndx[]. These are generally just temporaries. + * lowerIndx[] and upperIndx[]; also, these arrays must be of size MAXDIM + * even when nSubscripts is less. These are generally just temporaries. */ Datum array_get_slice(Datum arraydatum, @@ -2772,7 +2773,8 @@ array_set_element_expanded(Datum arraydatum, * (XXX TODO: allow a corresponding behavior for multidimensional arrays) * * NOTE: we assume it is OK to scribble on the provided index arrays - * lowerIndx[] and upperIndx[]. These are generally just temporaries. + * lowerIndx[] and upperIndx[]; also, these arrays must be of size MAXDIM + * even when nSubscripts is less. These are generally just temporaries. * * NOTE: For assignments, we throw an error for silly subscripts etc, * rather than returning a NULL or empty array as the fetch operations do. diff --git a/src/backend/utils/adt/arraysubs.c b/src/backend/utils/adt/arraysubs.c new file mode 100644 index 0000000000..8982bdba49 --- /dev/null +++ b/src/backend/utils/adt/arraysubs.c @@ -0,0 +1,569 @@ +/*------------------------------------------------------------------------- + * + * arraysubs.c + * Subscripting support functions for arrays. + * + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * + * IDENTIFICATION + * src/backend/utils/adt/arraysubs.c + * + *------------------------------------------------------------------------- + */ +#include "postgres.h" + +#include "executor/execExpr.h" +#include "nodes/makefuncs.h" +#include "nodes/nodeFuncs.h" +#include "nodes/subscripting.h" +#include "parser/parse_coerce.h" +#include "parser/parse_expr.h" +#include "utils/array.h" +#include "utils/builtins.h" +#include "utils/lsyscache.h" + + +/* SubscriptingRefState.workspace for array subscripting execution */ +typedef struct ArraySubWorkspace +{ + /* Values determined during expression compilation */ + Oid refelemtype; /* OID of the array element type */ + int16 refattrlength; /* typlen of array type */ + int16 refelemlength; /* typlen of the array element type */ + bool refelembyval; /* is the element type pass-by-value? */ + char refelemalign; /* typalign of the element type */ + + /* + * Subscript values converted to integers. Note that these arrays must be + * of length MAXDIM even when dealing with fewer subscripts, because + * array_get/set_slice may scribble on the extra entries. + */ + int upperindex[MAXDIM]; + int lowerindex[MAXDIM]; +} ArraySubWorkspace; + + +/* + * Finish parse analysis of a SubscriptingRef expression for an array. + * + * Transform the subscript expressions, coerce them to integers, + * and determine the result type of the SubscriptingRef node. + */ +static void +array_subscript_transform(SubscriptingRef *sbsref, + List *indirection, + ParseState *pstate, + bool isSlice, + bool isAssignment) +{ + List *upperIndexpr = NIL; + List *lowerIndexpr = NIL; + ListCell *idx; + + /* + * Transform the subscript expressions, and separate upper and lower + * bounds into two lists. + * + * If we have a container slice expression, we convert any non-slice + * indirection items to slices by treating the single subscript as the + * upper bound and supplying an assumed lower bound of 1. + */ + foreach(idx, indirection) + { + A_Indices *ai = lfirst_node(A_Indices, idx); + Node *subexpr; + + if (isSlice) + { + if (ai->lidx) + { + subexpr = transformExpr(pstate, ai->lidx, pstate->p_expr_kind); + /* If it's not int4 already, try to coerce */ + subexpr = coerce_to_target_type(pstate, + subexpr, exprType(subexpr), + INT4OID, -1, + COERCION_ASSIGNMENT, + COERCE_IMPLICIT_CAST, + -1); + if (subexpr == NULL) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("array subscript must have type integer"), + parser_errposition(pstate, exprLocation(ai->lidx)))); + } + else if (!ai->is_slice) + { + /* Make a constant 1 */ + subexpr = (Node *) makeConst(INT4OID, + -1, + InvalidOid, + sizeof(int32), + Int32GetDatum(1), + false, + true); /* pass by value */ + } + else + { + /* Slice with omitted lower bound, put NULL into the list */ + subexpr = NULL; + } + lowerIndexpr = lappend(lowerIndexpr, subexpr); + } + else + Assert(ai->lidx == NULL && !ai->is_slice); + + if (ai->uidx) + { + subexpr = transformExpr(pstate, ai->uidx, pstate->p_expr_kind); + /* If it's not int4 already, try to coerce */ + subexpr = coerce_to_target_type(pstate, + subexpr, exprType(subexpr), + INT4OID, -1, + COERCION_ASSIGNMENT, + COERCE_IMPLICIT_CAST, + -1); + if (subexpr == NULL) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("array subscript must have type integer"), + parser_errposition(pstate, exprLocation(ai->uidx)))); + } + else + { + /* Slice with omitted upper bound, put NULL into the list */ + Assert(isSlice && ai->is_slice); + subexpr = NULL; + } + upperIndexpr = lappend(upperIndexpr, subexpr); + } + + /* ... and store the transformed lists into the SubscriptRef node */ + sbsref->refupperindexpr = upperIndexpr; + sbsref->reflowerindexpr = lowerIndexpr; + + /* Verify subscript list lengths are within implementation limit */ + if (list_length(upperIndexpr) > MAXDIM) + ereport(ERROR, + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), + errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", + list_length(upperIndexpr), MAXDIM))); + /* We need not check lowerIndexpr separately */ + + /* + * Determine the result type of the subscripting operation. It's the same + * as the array type if we're slicing, else it's the element type. In + * either case, the typmod is the same as the array's, so we need not + * change reftypmod. + */ + if (isSlice) + sbsref->refrestype = sbsref->refcontainertype; + else + sbsref->refrestype = sbsref->refelemtype; +} + +/* + * During execution, process the subscripts in a SubscriptingRef expression. + * + * The subscript expressions are already evaluated in Datum form in the + * SubscriptingRefState's arrays. Check and convert them as necessary. + * + * If any subscript is NULL, we throw error in assignment cases, or in fetch + * cases set result to NULL and return false (instructing caller to skip the + * rest of the SubscriptingRef sequence). + * + * We convert all the subscripts to plain integers and save them in the + * sbsrefstate->workspace arrays. + */ +static bool +array_subscript_check_subscripts(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbsrefstate = op->d.sbsref_subscript.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbsrefstate->workspace; + + /* Process upper subscripts */ + for (int i = 0; i < sbsrefstate->numupper; i++) + { + if (sbsrefstate->upperprovided[i]) + { + /* If any index expr yields NULL, result is NULL or error */ + if (sbsrefstate->upperindexnull[i]) + { + if (sbsrefstate->isassignment) + ereport(ERROR, + (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), + errmsg("array subscript in assignment must not be null"))); + *op->resnull = true; + return false; + } + workspace->upperindex[i] = DatumGetInt32(sbsrefstate->upperindex[i]); + } + } + + /* Likewise for lower subscripts */ + for (int i = 0; i < sbsrefstate->numlower; i++) + { + if (sbsrefstate->lowerprovided[i]) + { + /* If any index expr yields NULL, result is NULL or error */ + if (sbsrefstate->lowerindexnull[i]) + { + if (sbsrefstate->isassignment) + ereport(ERROR, + (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), + errmsg("array subscript in assignment must not be null"))); + *op->resnull = true; + return false; + } + workspace->lowerindex[i] = DatumGetInt32(sbsrefstate->lowerindex[i]); + } + } + + return true; +} + +/* + * Evaluate SubscriptingRef fetch for an array element. + * + * Source container is in step's result variable (it's known not NULL), + * and indexes have already been evaluated into workspace array. + */ +static void +array_subscript_fetch(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbsrefstate = op->d.sbsref.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbsrefstate->workspace; + + /* Should not get here if source array (or any subscript) is null */ + Assert(!(*op->resnull)); + + *op->resvalue = array_get_element(*op->resvalue, + sbsrefstate->numupper, + workspace->upperindex, + workspace->refattrlength, + workspace->refelemlength, + workspace->refelembyval, + workspace->refelemalign, + op->resnull); +} + +/* + * Evaluate SubscriptingRef fetch for an array slice. + * + * Source container is in step's result variable (it's known not NULL), + * and indexes have already been evaluated into workspace array. + */ +static void +array_subscript_fetch_slice(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbsrefstate = op->d.sbsref.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbsrefstate->workspace; + + /* Should not get here if source array (or any subscript) is null */ + Assert(!(*op->resnull)); + + *op->resvalue = array_get_slice(*op->resvalue, + sbsrefstate->numupper, + workspace->upperindex, + workspace->lowerindex, + sbsrefstate->upperprovided, + sbsrefstate->lowerprovided, + workspace->refattrlength, + workspace->refelemlength, + workspace->refelembyval, + workspace->refelemalign); + /* The slice is never NULL, so no need to change *op->resnull */ +} + +/* + * Evaluate SubscriptingRef assignment for an array element assignment. + * + * Input container (possibly null) is in result area, replacement value is in + * SubscriptingRefState's replacevalue/replacenull. + */ +static void +array_subscript_assign(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbsrefstate = op->d.sbsref.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbsrefstate->workspace; + Datum arraySource = *op->resvalue; + + /* + * For an assignment to a fixed-length array type, both the original array + * and the value to be assigned into it must be non-NULL, else we punt and + * return the original array. + */ + if (workspace->refattrlength > 0) + { + if (*op->resnull || sbsrefstate->replacenull) + return; + } + + /* + * For assignment to varlena arrays, we handle a NULL original array by + * substituting an empty (zero-dimensional) array; insertion of the new + * element will result in a singleton array value. It does not matter + * whether the new element is NULL. + */ + if (*op->resnull) + { + arraySource = PointerGetDatum(construct_empty_array(workspace->refelemtype)); + *op->resnull = false; + } + + *op->resvalue = array_set_element(arraySource, + sbsrefstate->numupper, + workspace->upperindex, + sbsrefstate->replacevalue, + sbsrefstate->replacenull, + workspace->refattrlength, + workspace->refelemlength, + workspace->refelembyval, + workspace->refelemalign); + /* The result is never NULL, so no need to change *op->resnull */ +} + +/* + * Evaluate SubscriptingRef assignment for an array slice assignment. + * + * Input container (possibly null) is in result area, replacement value is in + * SubscriptingRefState's replacevalue/replacenull. + */ +static void +array_subscript_assign_slice(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbsrefstate = op->d.sbsref.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbsrefstate->workspace; + Datum arraySource = *op->resvalue; + + /* + * For an assignment to a fixed-length array type, both the original array + * and the value to be assigned into it must be non-NULL, else we punt and + * return the original array. + */ + if (workspace->refattrlength > 0) + { + if (*op->resnull || sbsrefstate->replacenull) + return; + } + + /* + * For assignment to varlena arrays, we handle a NULL original array by + * substituting an empty (zero-dimensional) array; insertion of the new + * element will result in a singleton array value. It does not matter + * whether the new element is NULL. + */ + if (*op->resnull) + { + arraySource = PointerGetDatum(construct_empty_array(workspace->refelemtype)); + *op->resnull = false; + } + + *op->resvalue = array_set_slice(arraySource, + sbsrefstate->numupper, + workspace->upperindex, + workspace->lowerindex, + sbsrefstate->upperprovided, + sbsrefstate->lowerprovided, + sbsrefstate->replacevalue, + sbsrefstate->replacenull, + workspace->refattrlength, + workspace->refelemlength, + workspace->refelembyval, + workspace->refelemalign); + /* The result is never NULL, so no need to change *op->resnull */ +} + +/* + * Compute old array element value for a SubscriptingRef assignment + * expression. Will only be called if the new-value subexpression + * contains SubscriptingRef or FieldStore. This is the same as the + * regular fetch case, except that we have to handle a null array, + * and the value should be stored into the SubscriptingRefState's + * prevvalue/prevnull fields. + */ +static void +array_subscript_fetch_old(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbsrefstate = op->d.sbsref.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbsrefstate->workspace; + + if (*op->resnull) + { + /* whole array is null, so any element is too */ + sbsrefstate->prevvalue = (Datum) 0; + sbsrefstate->prevnull = true; + } + else + sbsrefstate->prevvalue = array_get_element(*op->resvalue, + sbsrefstate->numupper, + workspace->upperindex, + workspace->refattrlength, + workspace->refelemlength, + workspace->refelembyval, + workspace->refelemalign, + &sbsrefstate->prevnull); +} + +/* + * Compute old array slice value for a SubscriptingRef assignment + * expression. Will only be called if the new-value subexpression + * contains SubscriptingRef or FieldStore. This is the same as the + * regular fetch case, except that we have to handle a null array, + * and the value should be stored into the SubscriptingRefState's + * prevvalue/prevnull fields. + * + * Note: this is presently dead code, because the new value for a + * slice would have to be an array, so it couldn't directly contain a + * FieldStore; nor could it contain a SubscriptingRef assignment, since + * we consider adjacent subscripts to index one multidimensional array + * not nested array types. Future generalizations might make this + * reachable, however. + */ +static void +array_subscript_fetch_old_slice(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbsrefstate = op->d.sbsref.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbsrefstate->workspace; + + if (*op->resnull) + { + /* whole array is null, so any slice is too */ + sbsrefstate->prevvalue = (Datum) 0; + sbsrefstate->prevnull = true; + } + else + { + sbsrefstate->prevvalue = array_get_slice(*op->resvalue, + sbsrefstate->numupper, + workspace->upperindex, + workspace->lowerindex, + sbsrefstate->upperprovided, + sbsrefstate->lowerprovided, + workspace->refattrlength, + workspace->refelemlength, + workspace->refelembyval, + workspace->refelemalign); + /* slices of non-null arrays are never null */ + sbsrefstate->prevnull = false; + } +} + +/* + * Set up execution state for an array subscript operation. + */ +static void +array_exec_setup(const SubscriptingRef *sbsref, + SubscriptingRefState *sbsrefstate, + SubscriptExecSteps *methods) +{ + bool is_slice = (sbsrefstate->numlower != 0); + ArraySubWorkspace *workspace; + + /* + * Enforce the implementation limit on number of array subscripts. This + * check isn't entirely redundant with checking at parse time; conceivably + * the expression was stored by a backend with a different MAXDIM value. + */ + if (sbsrefstate->numupper > MAXDIM) + ereport(ERROR, + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), + errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", + sbsrefstate->numupper, MAXDIM))); + + /* Should be impossible if parser is sane, but check anyway: */ + if (sbsrefstate->numlower != 0 && + sbsrefstate->numupper != sbsrefstate->numlower) + elog(ERROR, "upper and lower index lists are not same length"); + + /* + * Allocate type-specific workspace. + */ + workspace = (ArraySubWorkspace *) palloc(sizeof(ArraySubWorkspace)); + sbsrefstate->workspace = workspace; + + /* + * Collect datatype details we'll need at execution. + */ + workspace->refelemtype = sbsref->refelemtype; + workspace->refattrlength = get_typlen(sbsref->refcontainertype); + get_typlenbyvalalign(sbsref->refelemtype, + &workspace->refelemlength, + &workspace->refelembyval, + &workspace->refelemalign); + + /* + * Pass back pointers to appropriate step execution functions. + */ + methods->sbs_check_subscripts = array_subscript_check_subscripts; + if (is_slice) + { + methods->sbs_fetch = array_subscript_fetch_slice; + methods->sbs_assign = array_subscript_assign_slice; + methods->sbs_fetch_old = array_subscript_fetch_old_slice; + } + else + { + methods->sbs_fetch = array_subscript_fetch; + methods->sbs_assign = array_subscript_assign; + methods->sbs_fetch_old = array_subscript_fetch_old; + } +} + +/* + * array_subscript_handler + * Subscripting handler for standard varlena arrays. + * + * This should be used only for "true" array types, which have array headers + * as understood by the varlena array routines, and are referenced by the + * element type's pg_type.typarray field. + */ +Datum +array_subscript_handler(PG_FUNCTION_ARGS) +{ + static const SubscriptRoutines sbsroutines = { + .transform = array_subscript_transform, + .exec_setup = array_exec_setup + }; + + PG_RETURN_POINTER(&sbsroutines); +} + +/* + * raw_array_subscript_handler + * Subscripting handler for "raw" arrays. + * + * A "raw" array just contains N independent instances of the element type. + * Currently we require both the element type and the array type to be fixed + * length, but it wouldn't be too hard to relax that for the array type. + * + * As of now, all the support code is shared with standard varlena arrays. + * We may split those into separate code paths, but probably that would yield + * only marginal speedups. The main point of having a separate handler is + * so that pg_type.typsubscript clearly indicates the type's semantics. + */ +Datum +raw_array_subscript_handler(PG_FUNCTION_ARGS) +{ + static const SubscriptRoutines sbsroutines = { + .transform = array_subscript_transform, + .exec_setup = array_exec_setup + }; + + PG_RETURN_POINTER(&sbsroutines); +} diff --git a/src/backend/utils/adt/format_type.c b/src/backend/utils/adt/format_type.c index f2816e4f37..013409aee7 100644 --- a/src/backend/utils/adt/format_type.c +++ b/src/backend/utils/adt/format_type.c @@ -22,6 +22,7 @@ #include "catalog/pg_type.h" #include "mb/pg_wchar.h" #include "utils/builtins.h" +#include "utils/fmgroids.h" #include "utils/lsyscache.h" #include "utils/numeric.h" #include "utils/syscache.h" @@ -138,15 +139,14 @@ format_type_extended(Oid type_oid, int32 typemod, bits16 flags) typeform = (Form_pg_type) GETSTRUCT(tuple); /* - * Check if it's a regular (variable length) array type. Fixed-length - * array types such as "name" shouldn't get deconstructed. As of Postgres - * 8.1, rather than checking typlen we check the toast property, and don't + * Check if it's a "true" array type. Pseudo-array types such as "name" + * shouldn't get deconstructed. Also check the toast property, and don't * deconstruct "plain storage" array types --- this is because we don't * want to show oidvector as oid[]. */ array_base_type = typeform->typelem; - if (array_base_type != InvalidOid && + if (IsTrueArrayType(typeform) && typeform->typstorage != TYPSTORAGE_PLAIN) { /* Switch our attention to the array element type */ diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c index d370348a1c..12557ce3af 100644 --- a/src/backend/utils/adt/jsonfuncs.c +++ b/src/backend/utils/adt/jsonfuncs.c @@ -26,6 +26,7 @@ #include "miscadmin.h" #include "utils/array.h" #include "utils/builtins.h" +#include "utils/fmgroids.h" #include "utils/hsearch.h" #include "utils/json.h" #include "utils/jsonb.h" @@ -3011,7 +3012,7 @@ prepare_column_cache(ColumnIOData *column, column->io.composite.base_typmod = typmod; column->io.composite.domain_info = NULL; } - else if (type->typlen == -1 && OidIsValid(type->typelem)) + else if (IsTrueArrayType(type)) { column->typcat = TYPECAT_ARRAY; column->io.array.element_info = MemoryContextAllocZero(mcxt, diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index ae23299162..6e5c7379e2 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -2634,8 +2634,9 @@ get_typ_typrelid(Oid typid) * * Given the type OID, get the typelem (InvalidOid if not an array type). * - * NB: this only considers varlena arrays to be true arrays; InvalidOid is - * returned if the input is a fixed-length array type. + * NB: this only succeeds for "true" arrays having array_subscript_handler + * as typsubscript. For other types, InvalidOid is returned independently + * of whether they have typelem or typsubscript set. */ Oid get_element_type(Oid typid) @@ -2648,7 +2649,7 @@ get_element_type(Oid typid) Form_pg_type typtup = (Form_pg_type) GETSTRUCT(tp); Oid result; - if (typtup->typlen == -1) + if (IsTrueArrayType(typtup)) result = typtup->typelem; else result = InvalidOid; @@ -2731,7 +2732,7 @@ get_base_element_type(Oid typid) Oid result; /* This test must match get_element_type */ - if (typTup->typlen == -1) + if (IsTrueArrayType(typTup)) result = typTup->typelem; else result = InvalidOid; @@ -2966,6 +2967,64 @@ type_is_collatable(Oid typid) } +/* + * get_typsubscript + * + * Given the type OID, return the type's subscripting handler's OID, + * if it has one. + * + * If typelemp isn't NULL, we also store the type's typelem value there. + * This saves some callers an extra catalog lookup. + */ +RegProcedure +get_typsubscript(Oid typid, Oid *typelemp) +{ + HeapTuple tp; + + tp = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typid)); + if (HeapTupleIsValid(tp)) + { + Form_pg_type typform = (Form_pg_type) GETSTRUCT(tp); + RegProcedure handler = typform->typsubscript; + + if (typelemp) + *typelemp = typform->typelem; + ReleaseSysCache(tp); + return handler; + } + else + { + if (typelemp) + *typelemp = InvalidOid; + return InvalidOid; + } +} + +/* + * getSubscriptingRoutines + * + * Given the type OID, fetch the type's subscripting methods struct. + * Fail if type is not subscriptable. + * + * If typelemp isn't NULL, we also store the type's typelem value there. + * This saves some callers an extra catalog lookup. + */ +const struct SubscriptRoutines * +getSubscriptingRoutines(Oid typid, Oid *typelemp) +{ + RegProcedure typsubscript = get_typsubscript(typid, typelemp); + + if (!OidIsValid(typsubscript)) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("cannot subscript type %s because it does not support subscripting", + format_type_be(typid)))); + + return (const struct SubscriptRoutines *) + DatumGetPointer(OidFunctionCall0(typsubscript)); +} + + /* ---------- STATISTICS CACHE ---------- */ /* diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c index dca1d48e89..5883fde367 100644 --- a/src/backend/utils/cache/typcache.c +++ b/src/backend/utils/cache/typcache.c @@ -406,6 +406,7 @@ lookup_type_cache(Oid type_id, int flags) typentry->typstorage = typtup->typstorage; typentry->typtype = typtup->typtype; typentry->typrelid = typtup->typrelid; + typentry->typsubscript = typtup->typsubscript; typentry->typelem = typtup->typelem; typentry->typcollation = typtup->typcollation; typentry->flags |= TCFLAGS_HAVE_PG_TYPE_DATA; @@ -450,6 +451,7 @@ lookup_type_cache(Oid type_id, int flags) typentry->typstorage = typtup->typstorage; typentry->typtype = typtup->typtype; typentry->typrelid = typtup->typrelid; + typentry->typsubscript = typtup->typsubscript; typentry->typelem = typtup->typelem; typentry->typcollation = typtup->typcollation; typentry->flags |= TCFLAGS_HAVE_PG_TYPE_DATA; diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 3b36335aa6..673a670347 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -10794,11 +10794,13 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo) char *typmodin; char *typmodout; char *typanalyze; + char *typsubscript; Oid typreceiveoid; Oid typsendoid; Oid typmodinoid; Oid typmodoutoid; Oid typanalyzeoid; + Oid typsubscriptoid; char *typcategory; char *typispreferred; char *typdelim; @@ -10840,6 +10842,14 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo) else appendPQExpBufferStr(query, "false AS typcollatable, "); + if (fout->remoteVersion >= 140000) + appendPQExpBufferStr(query, + "typsubscript, " + "typsubscript::pg_catalog.oid AS typsubscriptoid, "); + else + appendPQExpBufferStr(query, + "'-' AS typsubscript, 0 AS typsubscriptoid, "); + /* Before 8.4, pg_get_expr does not allow 0 for its second arg */ if (fout->remoteVersion >= 80400) appendPQExpBufferStr(query, @@ -10862,11 +10872,13 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo) typmodin = PQgetvalue(res, 0, PQfnumber(res, "typmodin")); typmodout = PQgetvalue(res, 0, PQfnumber(res, "typmodout")); typanalyze = PQgetvalue(res, 0, PQfnumber(res, "typanalyze")); + typsubscript = PQgetvalue(res, 0, PQfnumber(res, "typsubscript")); typreceiveoid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typreceiveoid"))); typsendoid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typsendoid"))); typmodinoid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typmodinoid"))); typmodoutoid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typmodoutoid"))); typanalyzeoid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typanalyzeoid"))); + typsubscriptoid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typsubscriptoid"))); typcategory = PQgetvalue(res, 0, PQfnumber(res, "typcategory")); typispreferred = PQgetvalue(res, 0, PQfnumber(res, "typispreferred")); typdelim = PQgetvalue(res, 0, PQfnumber(res, "typdelim")); @@ -10935,6 +10947,9 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo) appendPQExpBufferStr(q, typdefault); } + if (OidIsValid(typsubscriptoid)) + appendPQExpBuffer(q, ",\n SUBSCRIPT = %s", typsubscript); + if (OidIsValid(tyinfo->typelem)) { char *elemType; diff --git a/src/include/c.h b/src/include/c.h index b21e4074dd..12ea056a35 100644 --- a/src/include/c.h +++ b/src/include/c.h @@ -592,13 +592,9 @@ typedef uint32 CommandId; #define InvalidCommandId (~(CommandId)0) /* - * Array indexing support + * Maximum number of array subscripts, for regular varlena arrays */ #define MAXDIM 6 -typedef struct -{ - int indx[MAXDIM]; -} IntArray; /* ---------------- * Variable-length datatypes all share the 'struct varlena' header. diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index fc2202b843..e6c7b070f6 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -10936,6 +10936,14 @@ proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float8_pass_by_value,data_page_checksum_version}', prosrc => 'pg_control_init' }, +# subscripting support for built-in types +{ oid => '9255', descr => 'standard array subscripting support', + proname => 'array_subscript_handler', prorettype => 'internal', + proargtypes => 'internal', prosrc => 'array_subscript_handler' }, +{ oid => '9256', descr => 'raw array subscripting support', + proname => 'raw_array_subscript_handler', prorettype => 'internal', + proargtypes => 'internal', prosrc => 'raw_array_subscript_handler' }, + # collation management functions { oid => '3445', descr => 'import collations from operating system', proname => 'pg_import_system_collations', procost => '100', diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat index 21a467a7a7..28240bdce3 100644 --- a/src/include/catalog/pg_type.dat +++ b/src/include/catalog/pg_type.dat @@ -48,9 +48,10 @@ { oid => '19', array_type_oid => '1003', descr => '63-byte type for storing system identifiers', typname => 'name', typlen => 'NAMEDATALEN', typbyval => 'f', - typcategory => 'S', typelem => 'char', typinput => 'namein', - typoutput => 'nameout', typreceive => 'namerecv', typsend => 'namesend', - typalign => 'c', typcollation => 'C' }, + typcategory => 'S', typsubscript => 'raw_array_subscript_handler', + typelem => 'char', typinput => 'namein', typoutput => 'nameout', + typreceive => 'namerecv', typsend => 'namesend', typalign => 'c', + typcollation => 'C' }, { oid => '20', array_type_oid => '1016', descr => '~18 digit integer, 8-byte storage', typname => 'int8', typlen => '8', typbyval => 'FLOAT8PASSBYVAL', @@ -64,7 +65,8 @@ { oid => '22', array_type_oid => '1006', descr => 'array of int2, used in system tables', typname => 'int2vector', typlen => '-1', typbyval => 'f', typcategory => 'A', - typelem => 'int2', typinput => 'int2vectorin', typoutput => 'int2vectorout', + typsubscript => 'array_subscript_handler', typelem => 'int2', + typinput => 'int2vectorin', typoutput => 'int2vectorout', typreceive => 'int2vectorrecv', typsend => 'int2vectorsend', typalign => 'i' }, { oid => '23', array_type_oid => '1007', @@ -104,7 +106,8 @@ { oid => '30', array_type_oid => '1013', descr => 'array of oids, used in system tables', typname => 'oidvector', typlen => '-1', typbyval => 'f', typcategory => 'A', - typelem => 'oid', typinput => 'oidvectorin', typoutput => 'oidvectorout', + typsubscript => 'array_subscript_handler', typelem => 'oid', + typinput => 'oidvectorin', typoutput => 'oidvectorout', typreceive => 'oidvectorrecv', typsend => 'oidvectorsend', typalign => 'i' }, # hand-built rowtype entries for bootstrapped catalogs @@ -178,13 +181,15 @@ { oid => '600', array_type_oid => '1017', descr => 'geometric point \'(x, y)\'', typname => 'point', typlen => '16', typbyval => 'f', typcategory => 'G', - typelem => 'float8', typinput => 'point_in', typoutput => 'point_out', - typreceive => 'point_recv', typsend => 'point_send', typalign => 'd' }, + typsubscript => 'raw_array_subscript_handler', typelem => 'float8', + typinput => 'point_in', typoutput => 'point_out', typreceive => 'point_recv', + typsend => 'point_send', typalign => 'd' }, { oid => '601', array_type_oid => '1018', descr => 'geometric line segment \'(pt1,pt2)\'', typname => 'lseg', typlen => '32', typbyval => 'f', typcategory => 'G', - typelem => 'point', typinput => 'lseg_in', typoutput => 'lseg_out', - typreceive => 'lseg_recv', typsend => 'lseg_send', typalign => 'd' }, + typsubscript => 'raw_array_subscript_handler', typelem => 'point', + typinput => 'lseg_in', typoutput => 'lseg_out', typreceive => 'lseg_recv', + typsend => 'lseg_send', typalign => 'd' }, { oid => '602', array_type_oid => '1019', descr => 'geometric path \'(pt1,...)\'', typname => 'path', typlen => '-1', typbyval => 'f', typcategory => 'G', @@ -193,9 +198,9 @@ { oid => '603', array_type_oid => '1020', descr => 'geometric box \'(lower left,upper right)\'', typname => 'box', typlen => '32', typbyval => 'f', typcategory => 'G', - typdelim => ';', typelem => 'point', typinput => 'box_in', - typoutput => 'box_out', typreceive => 'box_recv', typsend => 'box_send', - typalign => 'd' }, + typdelim => ';', typsubscript => 'raw_array_subscript_handler', + typelem => 'point', typinput => 'box_in', typoutput => 'box_out', + typreceive => 'box_recv', typsend => 'box_send', typalign => 'd' }, { oid => '604', array_type_oid => '1027', descr => 'geometric polygon \'(pt1,...)\'', typname => 'polygon', typlen => '-1', typbyval => 'f', typcategory => 'G', @@ -203,8 +208,9 @@ typsend => 'poly_send', typalign => 'd', typstorage => 'x' }, { oid => '628', array_type_oid => '629', descr => 'geometric line', typname => 'line', typlen => '24', typbyval => 'f', typcategory => 'G', - typelem => 'float8', typinput => 'line_in', typoutput => 'line_out', - typreceive => 'line_recv', typsend => 'line_send', typalign => 'd' }, + typsubscript => 'raw_array_subscript_handler', typelem => 'float8', + typinput => 'line_in', typoutput => 'line_out', typreceive => 'line_recv', + typsend => 'line_send', typalign => 'd' }, # OIDS 700 - 799 @@ -507,8 +513,9 @@ # Arrays of records have typcategory P, so they can't be autogenerated. { oid => '2287', typname => '_record', typlen => '-1', typbyval => 'f', typtype => 'p', - typcategory => 'P', typelem => 'record', typinput => 'array_in', - typoutput => 'array_out', typreceive => 'array_recv', typsend => 'array_send', + typcategory => 'P', typsubscript => 'array_subscript_handler', + typelem => 'record', typinput => 'array_in', typoutput => 'array_out', + typreceive => 'array_recv', typsend => 'array_send', typanalyze => 'array_typanalyze', typalign => 'd', typstorage => 'x' }, { oid => '2275', array_type_oid => '1263', descr => 'C-style string', typname => 'cstring', typlen => '-2', typbyval => 'f', typtype => 'p', diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h index 6099e5f57c..15f2514a14 100644 --- a/src/include/catalog/pg_type.h +++ b/src/include/catalog/pg_type.h @@ -101,15 +101,18 @@ CATALOG(pg_type,1247,TypeRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(71,TypeRelati Oid typrelid BKI_DEFAULT(0) BKI_ARRAY_DEFAULT(0) BKI_LOOKUP(pg_class); /* - * If typelem is not 0 then it identifies another row in pg_type. The - * current type can then be subscripted like an array yielding values of - * type typelem. A non-zero typelem does not guarantee this type to be a - * "real" array type; some ordinary fixed-length types can also be - * subscripted (e.g., name, point). Variable-length types can *not* be - * turned into pseudo-arrays like that. Hence, the way to determine - * whether a type is a "true" array type is if: - * - * typelem != 0 and typlen == -1. + * Type-specific subscripting handler. If typsubscript is 0, it means + * that this type doesn't support subscripting. Note that various parts + * of the system deem types to be "true" array types only if their + * typsubscript is array_subscript_handler. + */ + regproc typsubscript BKI_DEFAULT(-) BKI_ARRAY_DEFAULT(array_subscript_handler) BKI_LOOKUP(pg_proc); + + /* + * If typelem is not 0 then it identifies another row in pg_type, defining + * the type yielded by subscripting. This should be 0 if typsubscript is + * 0. However, it can be 0 when typsubscript isn't 0, if the handler + * doesn't need typelem to determine the subscripting result type. */ Oid typelem BKI_DEFAULT(0) BKI_LOOKUP(pg_type); @@ -319,6 +322,11 @@ DECLARE_UNIQUE_INDEX(pg_type_typname_nsp_index, 2704, on pg_type using btree(typ (typid) == ANYCOMPATIBLENONARRAYOID || \ (typid) == ANYCOMPATIBLERANGEOID) +/* Is this a "true" array type? (Requires fmgroids.h) */ +#define IsTrueArrayType(typeForm) \ + (OidIsValid((typeForm)->typelem) && \ + (typeForm)->typsubscript == F_ARRAY_SUBSCRIPT_HANDLER) + /* * Backwards compatibility for ancient random spellings of pg_type OID macros. * Don't use these names in new code. @@ -351,6 +359,7 @@ extern ObjectAddress TypeCreate(Oid newTypeOid, Oid typmodinProcedure, Oid typmodoutProcedure, Oid analyzeProcedure, + Oid subscriptProcedure, Oid elementType, bool isImplicitArray, Oid arrayType, diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h index abb489e206..b4e0a9b7d3 100644 --- a/src/include/executor/execExpr.h +++ b/src/include/executor/execExpr.h @@ -32,6 +32,11 @@ typedef void (*ExecEvalSubroutine) (ExprState *state, struct ExprEvalStep *op, ExprContext *econtext); +/* API for out-of-line evaluation subroutines returning bool */ +typedef bool (*ExecEvalBoolSubroutine) (ExprState *state, + struct ExprEvalStep *op, + ExprContext *econtext); + /* * Discriminator for ExprEvalSteps. * @@ -185,8 +190,8 @@ typedef enum ExprEvalOp */ EEOP_FIELDSTORE_FORM, - /* Process a container subscript; short-circuit expression to NULL if NULL */ - EEOP_SBSREF_SUBSCRIPT, + /* Process container subscripts; possibly short-circuit result to NULL */ + EEOP_SBSREF_SUBSCRIPTS, /* * Compute old container element/slice when a SubscriptingRef assignment @@ -494,19 +499,19 @@ typedef struct ExprEvalStep int ncolumns; } fieldstore; - /* for EEOP_SBSREF_SUBSCRIPT */ + /* for EEOP_SBSREF_SUBSCRIPTS */ struct { + ExecEvalBoolSubroutine subscriptfunc; /* evaluation subroutine */ /* too big to have inline */ struct SubscriptingRefState *state; - int off; /* 0-based index of this subscript */ - bool isupper; /* is it upper or lower subscript? */ int jumpdone; /* jump here on null */ } sbsref_subscript; /* for EEOP_SBSREF_OLD / ASSIGN / FETCH */ struct { + ExecEvalSubroutine subscriptfunc; /* evaluation subroutine */ /* too big to have inline */ struct SubscriptingRefState *state; } sbsref; @@ -640,36 +645,41 @@ typedef struct SubscriptingRefState { bool isassignment; /* is it assignment, or just fetch? */ - Oid refelemtype; /* OID of the container element type */ - int16 refattrlength; /* typlen of container type */ - int16 refelemlength; /* typlen of the container element type */ - bool refelembyval; /* is the element type pass-by-value? */ - char refelemalign; /* typalign of the element type */ + /* workspace for type-specific subscripting code */ + void *workspace; - /* numupper and upperprovided[] are filled at compile time */ - /* at runtime, extracted subscript datums get stored in upperindex[] */ + /* numupper and upperprovided[] are filled at expression compile time */ + /* at runtime, subscripts are computed in upperindex[]/upperindexnull[] */ int numupper; - bool upperprovided[MAXDIM]; - int upperindex[MAXDIM]; + bool *upperprovided; /* indicates if this position is supplied */ + Datum *upperindex; + bool *upperindexnull; /* similarly for lower indexes, if any */ int numlower; - bool lowerprovided[MAXDIM]; - int lowerindex[MAXDIM]; - - /* subscript expressions get evaluated into here */ - Datum subscriptvalue; - bool subscriptnull; + bool *lowerprovided; + Datum *lowerindex; + bool *lowerindexnull; /* for assignment, new value to assign is evaluated into here */ Datum replacevalue; bool replacenull; - /* if we have a nested assignment, SBSREF_OLD puts old value here */ + /* if we have a nested assignment, sbs_fetch_old puts old value here */ Datum prevvalue; bool prevnull; } SubscriptingRefState; +/* Execution step methods used for SubscriptingRef */ +typedef struct SubscriptExecSteps +{ + /* See nodes/subscripting.h for more detail about these */ + ExecEvalBoolSubroutine sbs_check_subscripts; /* process subscripts */ + ExecEvalSubroutine sbs_fetch; /* fetch an element */ + ExecEvalSubroutine sbs_assign; /* assign to an element */ + ExecEvalSubroutine sbs_fetch_old; /* fetch old value for assignment */ +} SubscriptExecSteps; + /* functions in execExpr.c */ extern void ExprEvalPushStep(ExprState *es, const ExprEvalStep *s); @@ -712,10 +722,6 @@ extern void ExecEvalFieldStoreDeForm(ExprState *state, ExprEvalStep *op, ExprContext *econtext); extern void ExecEvalFieldStoreForm(ExprState *state, ExprEvalStep *op, ExprContext *econtext); -extern bool ExecEvalSubscriptingRef(ExprState *state, ExprEvalStep *op); -extern void ExecEvalSubscriptingRefFetch(ExprState *state, ExprEvalStep *op); -extern void ExecEvalSubscriptingRefOld(ExprState *state, ExprEvalStep *op); -extern void ExecEvalSubscriptingRefAssign(ExprState *state, ExprEvalStep *op); extern void ExecEvalConvertRowtype(ExprState *state, ExprEvalStep *op, ExprContext *econtext); extern void ExecEvalScalarArrayOp(ExprState *state, ExprEvalStep *op); diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h index cdbe781c73..dd85908fe2 100644 --- a/src/include/nodes/primnodes.h +++ b/src/include/nodes/primnodes.h @@ -390,14 +390,14 @@ typedef struct WindowFunc int location; /* token location, or -1 if unknown */ } WindowFunc; -/* ---------------- - * SubscriptingRef: describes a subscripting operation over a container - * (array, etc). +/* + * SubscriptingRef: describes a subscripting operation over a container + * (array, etc). * * A SubscriptingRef can describe fetching a single element from a container, - * fetching a part of container (e.g. array slice), storing a single element into - * a container, or storing a slice. The "store" cases work with an - * initial container value and a source value that is inserted into the + * fetching a part of a container (e.g. an array slice), storing a single + * element into a container, or storing a slice. The "store" cases work with + * an initial container value and a source value that is inserted into the * appropriate part of the container; the result of the operation is an * entire new modified container value. * @@ -410,23 +410,32 @@ typedef struct WindowFunc * * In the slice case, individual expressions in the subscript lists can be * NULL, meaning "substitute the array's current lower or upper bound". - * - * Note: the result datatype is the element type when fetching a single - * element; but it is the array type when doing subarray fetch or either - * type of store. + * (Non-array containers may or may not support this.) + * + * refcontainertype is the actual container type that determines the + * subscripting semantics. (This will generally be either the exposed type of + * refexpr, or the base type if that is a domain.) refelemtype is the type of + * the container's elements; this is saved for the use of the subscripting + * functions, but is not used by the core code. refrestype, reftypmod, and + * refcollid describe the type of the SubscriptingRef's result. In a store + * expression, refrestype will always match refcontainertype; in a fetch, + * it could be refelemtype for an element fetch, or refcontainertype for a + * slice fetch, or possibly something else as determined by type-specific + * subscripting logic. Likewise, reftypmod and refcollid will match the + * container's properties in a store, but could be different in a fetch. * * Note: for the cases where a container is returned, if refexpr yields a R/W - * expanded container, then the implementation is allowed to modify that object - * in-place and return the same object.) - * ---------------- + * expanded container, then the implementation is allowed to modify that + * object in-place and return the same object. */ typedef struct SubscriptingRef { Expr xpr; Oid refcontainertype; /* type of the container proper */ - Oid refelemtype; /* type of the container elements */ - int32 reftypmod; /* typmod of the container (and elements too) */ - Oid refcollid; /* OID of collation, or InvalidOid if none */ + Oid refelemtype; /* the container type's pg_type.typelem */ + Oid refrestype; /* type of the SubscriptingRef's result */ + int32 reftypmod; /* typmod of the result */ + Oid refcollid; /* collation of result, or InvalidOid if none */ List *refupperindexpr; /* expressions that evaluate to upper * container indexes */ List *reflowerindexpr; /* expressions that evaluate to lower @@ -434,7 +443,6 @@ typedef struct SubscriptingRef * container element */ Expr *refexpr; /* the expression that evaluates to a * container value */ - Expr *refassgnexpr; /* expression for the source value, or NULL if * fetch */ } SubscriptingRef; diff --git a/src/include/nodes/subscripting.h b/src/include/nodes/subscripting.h new file mode 100644 index 0000000000..aea7197218 --- /dev/null +++ b/src/include/nodes/subscripting.h @@ -0,0 +1,146 @@ +/*------------------------------------------------------------------------- + * + * subscripting.h + * API for generic type subscripting + * + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/nodes/subscripting.h + * + *------------------------------------------------------------------------- + */ +#ifndef SUBSCRIPTING_H +#define SUBSCRIPTING_H + +#include "nodes/primnodes.h" + +/* Forward declarations, to avoid including other headers */ +struct ParseState; +struct SubscriptingRefState; +struct SubscriptExecSteps; + +/* + * The SQL-visible function that defines a subscripting method is declared + * subscripting_function(internal) returns internal + * but it actually is not passed any parameter. It must return a pointer + * to a "struct SubscriptRoutines" that provides pointers to the individual + * subscript parsing and execution methods. Typically the pointer will point + * to a "static const" variable, but at need it can point to palloc'd space. + * The type (after domain-flattening) of the head variable or expression + * of a subscripting construct determines which subscripting function is + * called for that construct. + * + * There are some general restrictions on what subscripting can do. The + * planner expects subscripting fetches to be strict (i.e., return NULL for + * any null input), immutable (same inputs always give same results), and + * leakproof (data-value-dependent errors must not be thrown; in other + * words, you must silently return NULL for any bad subscript value). + * Subscripting assignment need not be, and usually isn't, strict; it need + * not be leakproof either; but it must be immutable. + */ + +/* + * The transform method is called during parse analysis of a subscripting + * construct. The SubscriptingRef node has been constructed, but some of + * its fields still need to be filled in, and the subscript expression(s) + * are still in raw form. The transform method is responsible for doing + * parse analysis of each subscript expression (using transformExpr), + * coercing the subscripts to whatever type it needs, and building the + * refupperindexpr and reflowerindexpr lists from those results. The + * reflowerindexpr list must be empty for an element operation, or the + * same length as refupperindexpr for a slice operation. Insert NULLs + * (that is, an empty parse tree, not a null Const node) for any omitted + * subscripts in a slice operation. (Of course, if the transform method + * does not care to support slicing, it can just throw an error if isSlice.) + * See array_subscript_transform() for sample code. + * + * The transform method is also responsible for identifying the result type + * of the subscripting operation. At call, refcontainertype and reftypmod + * describe the container type (this will be a base type not a domain), and + * refelemtype is set to the container type's pg_type.typelem value. The + * transform method must set refrestype and reftypmod to describe the result + * of subscripting. For arrays, refrestype is set to refelemtype for an + * element operation or refcontainertype for a slice, while reftypmod stays + * the same in either case; but other types might use other rules. The + * transform method should ignore refcollid, as that's determined later on + * during parsing. + * + * At call, refassgnexpr has not been filled in, so the SubscriptingRef node + * always looks like a fetch; refrestype should be set as though for a + * fetch, too. (The isAssignment parameter is typically only useful if the + * transform method wishes to throw an error for not supporting assignment.) + * To complete processing of an assignment, the core parser will coerce the + * element/slice source expression to the returned refrestype and reftypmod + * before putting it into refassgnexpr. It will then set refrestype and + * reftypmod to again describe the container type, since that's what an + * assignment must return. + */ +typedef void (*SubscriptTransform) (SubscriptingRef *sbsref, + List *indirection, + struct ParseState *pstate, + bool isSlice, + bool isAssignment); + +/* + * The exec_setup method is called during executor-startup compilation of a + * SubscriptingRef node in an expression. It must fill *methods with pointers + * to functions that can be called for execution of the node. Optionally, + * exec_setup can initialize sbsrefstate->workspace to point to some palloc'd + * workspace for execution. (Typically, such workspace is used to hold + * looked-up catalog data and/or provide space for the check_subscripts step + * to pass data forward to the other step functions.) See executor/execExpr.h + * for the definitions of these structs and other ones used in expression + * execution. + * + * The methods to be provided are: + * + * sbs_check_subscripts: examine the just-computed subscript values available + * in sbsrefstate's arrays, and possibly convert them into another form + * (stored in sbsrefstate->workspace). Return TRUE to continue with + * evaluation of the subscripting construct, or FALSE to skip it and + * return an overall NULL result. If there are any NULL subscripts, + * sbs_check_subscripts must return FALSE in a fetch case, because subscript + * fetch is assumed to be a strict operation. In an assignment case it can + * choose to throw an error, or return FALSE, or let sbs_assign deal with the + * null subscripts. + * + * sbs_fetch: perform a subscripting fetch, using the container value in + * *op->resvalue and the subscripts from sbs_check_subscripts. All of these + * inputs will be non-NULL. Place the result in *op->resvalue / *op->resnull. + * + * sbs_assign: perform a subscripting assignment, using the original + * container value in *op->resvalue / *op->resnull, the subscripts from + * sbs_check_subscripts, and the new element/slice value in + * sbsrefstate->replacevalue/replacenull. Any of these inputs might be NULL + * (unless sbs_check_subscripts rejected null subscripts). Place the result + * (an entire new container value) in *op->resvalue / *op->resnull. + * + * sbs_fetch_old: this is only used in cases where an element or slice + * assignment involves an assignment to a sub-field or sub-element + * (i.e., nested containers are involved). It must fetch the existing + * value of the target element or slice. This is exactly the same as + * sbs_fetch except that (a) it must cope with a NULL container (typically, + * returning NULL is good enough); and (b) the result must be placed in + * sbsrefstate->prevvalue/prevnull, without overwriting *op->resvalue. + * + * Subscripting implementations that do not support assignment need not + * provide sbs_assign or sbs_fetch_old methods. It might be reasonable + * to also omit sbs_check_subscripts, in which case the sbs_fetch method must + * combine the functionality of sbs_check_subscripts and sbs_fetch. (The + * main reason to have a separate sbs_check_subscripts method is so that + * sbs_fetch_old and sbs_assign need not duplicate subscript processing.) + * Set the relevant pointers to NULL for any omitted methods. + */ +typedef void (*SubscriptExecSetup) (const SubscriptingRef *sbsref, + struct SubscriptingRefState *sbsrefstate, + struct SubscriptExecSteps *methods); + +/* Struct returned by the SQL-visible subscript handler function */ +typedef struct SubscriptRoutines +{ + SubscriptTransform transform; + SubscriptExecSetup exec_setup; +} SubscriptRoutines; + +#endif /* SUBSCRIPTING_H */ diff --git a/src/include/parser/parse_node.h b/src/include/parser/parse_node.h index d25819aa28..beb56fec87 100644 --- a/src/include/parser/parse_node.h +++ b/src/include/parser/parse_node.h @@ -313,15 +313,15 @@ extern void setup_parser_errposition_callback(ParseCallbackState *pcbstate, ParseState *pstate, int location); extern void cancel_parser_errposition_callback(ParseCallbackState *pcbstate); -extern Oid transformContainerType(Oid *containerType, int32 *containerTypmod); +extern void transformContainerType(Oid *containerType, int32 *containerTypmod); extern SubscriptingRef *transformContainerSubscripts(ParseState *pstate, Node *containerBase, Oid containerType, - Oid elementType, int32 containerTypMod, List *indirection, - Node *assignFrom); + bool isAssignment); + extern Const *make_const(ParseState *pstate, Value *value, int location); #endif /* PARSE_NODE_H */ diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index fecfe1f4f6..475b842b09 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -17,6 +17,9 @@ #include "access/htup.h" #include "nodes/pg_list.h" +/* avoid including subscripting.h here */ +struct SubscriptRoutines; + /* Result list element for get_op_btree_interpretation */ typedef struct OpBtreeInterpretation { @@ -172,6 +175,9 @@ extern void getTypeBinaryOutputInfo(Oid type, Oid *typSend, bool *typIsVarlena); extern Oid get_typmodin(Oid typid); extern Oid get_typcollation(Oid typid); extern bool type_is_collatable(Oid typid); +extern RegProcedure get_typsubscript(Oid typid, Oid *typelemp); +extern const struct SubscriptRoutines *getSubscriptingRoutines(Oid typid, + Oid *typelemp); extern Oid getBaseType(Oid typid); extern Oid getBaseTypeAndTypmod(Oid typid, int32 *typmod); extern int32 get_typavgwidth(Oid typid, int32 typmod); diff --git a/src/include/utils/typcache.h b/src/include/utils/typcache.h index cdd20e56d7..38c8fe0192 100644 --- a/src/include/utils/typcache.h +++ b/src/include/utils/typcache.h @@ -42,6 +42,7 @@ typedef struct TypeCacheEntry char typstorage; char typtype; Oid typrelid; + Oid typsubscript; Oid typelem; Oid typcollation; diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c index 7844c500ee..4de756455d 100644 --- a/src/pl/plperl/plperl.c +++ b/src/pl/plperl/plperl.c @@ -2853,9 +2853,7 @@ compile_plperl_function(Oid fn_oid, bool is_trigger, bool is_event_trigger) prodesc->result_oid = rettype; prodesc->fn_retisset = procStruct->proretset; prodesc->fn_retistuple = type_is_rowtype(rettype); - - prodesc->fn_retisarray = - (typeStruct->typlen == -1 && typeStruct->typelem); + prodesc->fn_retisarray = IsTrueArrayType(typeStruct); fmgr_info_cxt(typeStruct->typinput, &(prodesc->result_in_func), @@ -2901,7 +2899,7 @@ compile_plperl_function(Oid fn_oid, bool is_trigger, bool is_event_trigger) } /* Identify array-type arguments */ - if (typeStruct->typelem != 0 && typeStruct->typlen == -1) + if (IsTrueArrayType(typeStruct)) prodesc->arg_arraytype[i] = argtype; else prodesc->arg_arraytype[i] = InvalidOid; diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c index 6df8e14629..b610b28d70 100644 --- a/src/pl/plpgsql/src/pl_comp.c +++ b/src/pl/plpgsql/src/pl_comp.c @@ -26,6 +26,7 @@ #include "parser/parse_type.h" #include "plpgsql.h" #include "utils/builtins.h" +#include "utils/fmgroids.h" #include "utils/guc.h" #include "utils/lsyscache.h" #include "utils/memutils.h" @@ -2144,8 +2145,7 @@ build_datatype(HeapTuple typeTup, int32 typmod, * This test should include what get_element_type() checks. We also * disallow non-toastable array types (i.e. oidvector and int2vector). */ - typ->typisarray = (typeStruct->typlen == -1 && - OidIsValid(typeStruct->typelem) && + typ->typisarray = (IsTrueArrayType(typeStruct) && typeStruct->typstorage != TYPSTORAGE_PLAIN); } else if (typeStruct->typtype == TYPTYPE_DOMAIN) diff --git a/src/pl/plpython/plpy_typeio.c b/src/pl/plpython/plpy_typeio.c index b4aeb7fd59..5e807b139f 100644 --- a/src/pl/plpython/plpy_typeio.c +++ b/src/pl/plpython/plpy_typeio.c @@ -352,9 +352,9 @@ PLy_output_setup_func(PLyObToDatum *arg, MemoryContext arg_mcxt, proc); } else if (typentry && - OidIsValid(typentry->typelem) && typentry->typlen == -1) + IsTrueArrayType(typentry)) { - /* Standard varlena array (cf. get_element_type) */ + /* Standard array */ arg->func = PLySequence_ToArray; /* Get base type OID to insert into constructed array */ /* (note this might not be the same as the immediate child type) */ @@ -470,9 +470,9 @@ PLy_input_setup_func(PLyDatumToOb *arg, MemoryContext arg_mcxt, proc); } else if (typentry && - OidIsValid(typentry->typelem) && typentry->typlen == -1) + IsTrueArrayType(typentry)) { - /* Standard varlena array (cf. get_element_type) */ + /* Standard array */ arg->func = PLyList_FromArray; /* Recursively set up conversion info for the element type */ arg->u.array.elm = (PLyDatumToOb *) diff --git a/src/test/regress/expected/arrays.out b/src/test/regress/expected/arrays.out index c03ac65ff8..448b3ee526 100644 --- a/src/test/regress/expected/arrays.out +++ b/src/test/regress/expected/arrays.out @@ -27,12 +27,12 @@ INSERT INTO arrtest (a, b[1:2][1:2], c, d, e, f, g) INSERT INTO arrtest (a, b[1:2], c, d[1:2]) VALUES ('{}', '{3,4}', '{foo,bar}', '{bar,foo}'); INSERT INTO arrtest (b[2]) VALUES(now()); -- error, type mismatch -ERROR: array assignment to "b" requires type integer but expression is of type timestamp with time zone +ERROR: subscripted assignment to "b" requires type integer but expression is of type timestamp with time zone LINE 1: INSERT INTO arrtest (b[2]) VALUES(now()); ^ HINT: You will need to rewrite or cast the expression. INSERT INTO arrtest (b[1:2]) VALUES(now()); -- error, type mismatch -ERROR: array assignment to "b" requires type integer[] but expression is of type timestamp with time zone +ERROR: subscripted assignment to "b" requires type integer[] but expression is of type timestamp with time zone LINE 1: INSERT INTO arrtest (b[1:2]) VALUES(now()); ^ HINT: You will need to rewrite or cast the expression. @@ -237,7 +237,7 @@ UPDATE arrtest ERROR: array subscript in assignment must not be null -- Un-subscriptable type SELECT (now())[1]; -ERROR: cannot subscript type timestamp with time zone because it is not an array +ERROR: cannot subscript type timestamp with time zone because it does not support subscripting -- test slices with empty lower and/or upper index CREATE TEMP TABLE arrtest_s ( a int2[], diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out index 3b39137400..507b474b1b 100644 --- a/src/test/regress/expected/opr_sanity.out +++ b/src/test/regress/expected/opr_sanity.out @@ -31,7 +31,8 @@ begin if $2 = 'pg_catalog.any'::pg_catalog.regtype then return true; end if; if $2 = 'pg_catalog.anyarray'::pg_catalog.regtype then if EXISTS(select 1 from pg_catalog.pg_type where - oid = $1 and typelem != 0 and typlen = -1) + oid = $1 and typelem != 0 and + typsubscript = 'pg_catalog.array_subscript_handler'::pg_catalog.regproc) then return true; end if; end if; if $2 = 'pg_catalog.anyrange'::pg_catalog.regtype then @@ -55,7 +56,8 @@ begin if $2 = 'pg_catalog.any'::pg_catalog.regtype then return true; end if; if $2 = 'pg_catalog.anyarray'::pg_catalog.regtype then if EXISTS(select 1 from pg_catalog.pg_type where - oid = $1 and typelem != 0 and typlen = -1) + oid = $1 and typelem != 0 and + typsubscript = 'pg_catalog.array_subscript_handler'::pg_catalog.regproc) then return true; end if; end if; if $2 = 'pg_catalog.anyrange'::pg_catalog.regtype then diff --git a/src/test/regress/expected/type_sanity.out b/src/test/regress/expected/type_sanity.out index ec1cd47623..13567ddf84 100644 --- a/src/test/regress/expected/type_sanity.out +++ b/src/test/regress/expected/type_sanity.out @@ -75,14 +75,15 @@ ORDER BY p1.oid; 5017 | pg_mcv_list (4 rows) --- Make sure typarray points to a varlena array type of our own base +-- Make sure typarray points to a "true" array type of our own base SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype, - p2.typelem, p2.typlen + p2.typsubscript FROM pg_type p1 LEFT JOIN pg_type p2 ON (p1.typarray = p2.oid) WHERE p1.typarray <> 0 AND - (p2.oid IS NULL OR p2.typelem <> p1.oid OR p2.typlen <> -1); - oid | basetype | arraytype | typelem | typlen ------+----------+-----------+---------+-------- + (p2.oid IS NULL OR + p2.typsubscript <> 'array_subscript_handler'::regproc); + oid | basetype | arraytype | typsubscript +-----+----------+-----------+-------------- (0 rows) -- Look for range types that do not have a pg_range entry @@ -448,6 +449,33 @@ WHERE p1.typarray = p2.oid AND -----+---------+----------+---------+---------- (0 rows) +-- Check for typelem set without a handler +SELECT p1.oid, p1.typname, p1.typelem +FROM pg_type AS p1 +WHERE p1.typelem != 0 AND p1.typsubscript = 0; + oid | typname | typelem +-----+---------+--------- +(0 rows) + +-- Check for misuse of standard subscript handlers +SELECT p1.oid, p1.typname, + p1.typelem, p1.typlen, p1.typbyval +FROM pg_type AS p1 +WHERE p1.typsubscript = 'array_subscript_handler'::regproc AND NOT + (p1.typelem != 0 AND p1.typlen = -1 AND NOT p1.typbyval); + oid | typname | typelem | typlen | typbyval +-----+---------+---------+--------+---------- +(0 rows) + +SELECT p1.oid, p1.typname, + p1.typelem, p1.typlen, p1.typbyval +FROM pg_type AS p1 +WHERE p1.typsubscript = 'raw_array_subscript_handler'::regproc AND NOT + (p1.typelem != 0 AND p1.typlen > 0 AND NOT p1.typbyval); + oid | typname | typelem | typlen | typbyval +-----+---------+---------+--------+---------- +(0 rows) + -- Check for bogus typanalyze routines SELECT p1.oid, p1.typname, p2.oid, p2.proname FROM pg_type AS p1, pg_proc AS p2 @@ -485,7 +513,7 @@ SELECT t.oid, t.typname, t.typanalyze FROM pg_type t WHERE t.typbasetype = 0 AND (t.typanalyze = 'array_typanalyze'::regproc) != - (typelem != 0 AND typlen < 0) + (t.typsubscript = 'array_subscript_handler'::regproc) ORDER BY 1; oid | typname | typanalyze -----+------------+------------ @@ -608,7 +636,8 @@ WHERE o.opcmethod != 403 OR ((o.opcintype != p1.rngsubtype) AND NOT (o.opcintype = 'pg_catalog.anyarray'::regtype AND EXISTS(select 1 from pg_catalog.pg_type where - oid = p1.rngsubtype and typelem != 0 and typlen = -1))); + oid = p1.rngsubtype and typelem != 0 and + typsubscript = 'array_subscript_handler'::regproc))); rngtypid | rngsubtype | opcmethod | opcname ----------+------------+-----------+--------- (0 rows) diff --git a/src/test/regress/sql/opr_sanity.sql b/src/test/regress/sql/opr_sanity.sql index 307aab1deb..4189a5a4e0 100644 --- a/src/test/regress/sql/opr_sanity.sql +++ b/src/test/regress/sql/opr_sanity.sql @@ -34,7 +34,8 @@ begin if $2 = 'pg_catalog.any'::pg_catalog.regtype then return true; end if; if $2 = 'pg_catalog.anyarray'::pg_catalog.regtype then if EXISTS(select 1 from pg_catalog.pg_type where - oid = $1 and typelem != 0 and typlen = -1) + oid = $1 and typelem != 0 and + typsubscript = 'pg_catalog.array_subscript_handler'::pg_catalog.regproc) then return true; end if; end if; if $2 = 'pg_catalog.anyrange'::pg_catalog.regtype then @@ -59,7 +60,8 @@ begin if $2 = 'pg_catalog.any'::pg_catalog.regtype then return true; end if; if $2 = 'pg_catalog.anyarray'::pg_catalog.regtype then if EXISTS(select 1 from pg_catalog.pg_type where - oid = $1 and typelem != 0 and typlen = -1) + oid = $1 and typelem != 0 and + typsubscript = 'pg_catalog.array_subscript_handler'::pg_catalog.regproc) then return true; end if; end if; if $2 = 'pg_catalog.anyrange'::pg_catalog.regtype then diff --git a/src/test/regress/sql/type_sanity.sql b/src/test/regress/sql/type_sanity.sql index 5e433388cd..8c6e614f20 100644 --- a/src/test/regress/sql/type_sanity.sql +++ b/src/test/regress/sql/type_sanity.sql @@ -63,12 +63,13 @@ WHERE p1.typtype not in ('p') AND p1.typname NOT LIKE E'\\_%' p2.typelem = p1.oid and p1.typarray = p2.oid) ORDER BY p1.oid; --- Make sure typarray points to a varlena array type of our own base +-- Make sure typarray points to a "true" array type of our own base SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype, - p2.typelem, p2.typlen + p2.typsubscript FROM pg_type p1 LEFT JOIN pg_type p2 ON (p1.typarray = p2.oid) WHERE p1.typarray <> 0 AND - (p2.oid IS NULL OR p2.typelem <> p1.oid OR p2.typlen <> -1); + (p2.oid IS NULL OR + p2.typsubscript <> 'array_subscript_handler'::regproc); -- Look for range types that do not have a pg_range entry SELECT p1.oid, p1.typname @@ -323,6 +324,26 @@ WHERE p1.typarray = p2.oid AND p2.typalign != (CASE WHEN p1.typalign = 'd' THEN 'd'::"char" ELSE 'i'::"char" END); +-- Check for typelem set without a handler + +SELECT p1.oid, p1.typname, p1.typelem +FROM pg_type AS p1 +WHERE p1.typelem != 0 AND p1.typsubscript = 0; + +-- Check for misuse of standard subscript handlers + +SELECT p1.oid, p1.typname, + p1.typelem, p1.typlen, p1.typbyval +FROM pg_type AS p1 +WHERE p1.typsubscript = 'array_subscript_handler'::regproc AND NOT + (p1.typelem != 0 AND p1.typlen = -1 AND NOT p1.typbyval); + +SELECT p1.oid, p1.typname, + p1.typelem, p1.typlen, p1.typbyval +FROM pg_type AS p1 +WHERE p1.typsubscript = 'raw_array_subscript_handler'::regproc AND NOT + (p1.typelem != 0 AND p1.typlen > 0 AND NOT p1.typbyval); + -- Check for bogus typanalyze routines SELECT p1.oid, p1.typname, p2.oid, p2.proname @@ -356,7 +377,7 @@ SELECT t.oid, t.typname, t.typanalyze FROM pg_type t WHERE t.typbasetype = 0 AND (t.typanalyze = 'array_typanalyze'::regproc) != - (typelem != 0 AND typlen < 0) + (t.typsubscript = 'array_subscript_handler'::regproc) ORDER BY 1; -- **************** pg_class **************** @@ -452,7 +473,8 @@ WHERE o.opcmethod != 403 OR ((o.opcintype != p1.rngsubtype) AND NOT (o.opcintype = 'pg_catalog.anyarray'::regtype AND EXISTS(select 1 from pg_catalog.pg_type where - oid = p1.rngsubtype and typelem != 0 and typlen = -1))); + oid = p1.rngsubtype and typelem != 0 and + typsubscript = 'array_subscript_handler'::regproc))); -- canonical function, if any, had better match the range type
Hi, On 2020-12-07 14:08:35 -0500, Tom Lane wrote: > 1. I'm still wondering if TypeParamBool is the right thing to pass to > LLVMFunctionType() to describe a function-returning-bool. It does > seem to work on x64_64 and aarch64, for what that's worth. > - v_ret = build_EvalXFunc(b, mod, "ExecEvalSubscriptingRef", > - v_state, op); > + param_types[0] = l_ptr(StructExprState); > + param_types[1] = l_ptr(TypeSizeT); > + param_types[2] = l_ptr(StructExprContext); > + > + v_functype = LLVMFunctionType(TypeParamBool, > + param_types, > + lengthof(param_types), > + false); > + v_func = l_ptr_const(op->d.sbsref_subscript.subscriptfunc, > + l_ptr(v_functype)); > + > + v_params[0] = v_state; > + v_params[1] = l_ptr_const(op, l_ptr(TypeSizeT)); > + v_params[2] = v_econtext; > + v_ret = LLVMBuildCall(b, > + v_func, > + v_params, lengthof(v_params), ""); > v_ret = LLVMBuildZExt(b, v_ret, TypeStorageBool, ""); The TypeParamBool stuff here is ok. Basically LLVM uses a '1bit' integer to represent booleans in the IR. But when it comes to storing such a value in memory, it uses 1 byte, for obvious reasons. Hence the two types. We infer it like this: > /* > * Clang represents stdbool.h style booleans that are returned by functions > * differently (as i1) than stored ones (as i8). Therefore we do not just need > * TypeBool (above), but also a way to determine the width of a returned > * integer. This allows us to keep compatible with non-stdbool using > * architectures. > */ > extern bool FunctionReturningBool(void); > bool > FunctionReturningBool(void) > { > return false; > } so you should be good. I think it'd be a better to rely on the backend's definition of ExecEvalBoolSubroutine etc. For the functions implementing expression steps I've found that far easier to work with over time (because you can get LLVM to issue type mismatch errors when the signature changes, instead of seeing compile failures). I've attached a prototype conversion for two other such places. Which immediately pointed to a bug. And one harmless issue (using a pointer to size_t instead of ExprEvalOp* to represent the 'op' parameter), which you promptly copied... If I pushed a slightly cleaned up version of that, it should be fairly easy to adapt your code to it, I think? WRT the prototype, I think it may be worth removing most of the types from llvmjit.h. Worth keeping the most common ones, but most aren't used all the time so terseness doesn't matter that much, and the llvm_pg_var_type() would suffice. Greetings, Andres Freund
Attachment
Andres Freund <andres@anarazel.de> writes: > The TypeParamBool stuff here is ok. Basically LLVM uses a '1bit' integer > to represent booleans in the IR. But when it comes to storing such a > value in memory, it uses 1 byte, for obvious reasons. Hence the two > types. Cool, thanks for taking a look. > I think it'd be a better to rely on the backend's definition of > ExecEvalBoolSubroutine etc. For the functions implementing expression > steps I've found that far easier to work with over time (because you can > get LLVM to issue type mismatch errors when the signature changes, > instead of seeing compile failures). I'm a little unclear on what you mean here? There wasn't such a thing as ExecEvalBoolSubroutine until I added it in this patch. > I've attached a prototype conversion for two other such places. Which > immediately pointed to a bug. And one harmless issue (using a pointer to > size_t instead of ExprEvalOp* to represent the 'op' parameter), which > you promptly copied... > If I pushed a slightly cleaned up version of that, it should be fairly > easy to adapt your code to it, I think? Sure. I just copied the existing code for EEOP_PARAM_CALLBACK; if that changes, I'll just copy the new code. What did you think of the idea of merging EEOP_SBSREF_OLD / ASSIGN / FETCH into a single step type distinguished only by the callback function? regards, tom lane
Hi, On 2020-12-07 16:32:32 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I think it'd be a better to rely on the backend's definition of > > ExecEvalBoolSubroutine etc. For the functions implementing expression > > steps I've found that far easier to work with over time (because you can > > get LLVM to issue type mismatch errors when the signature changes, > > instead of seeing compile failures). > > I'm a little unclear on what you mean here? There wasn't such a > thing as ExecEvalBoolSubroutine until I added it in this patch. Basically that I suggest doing what I did in the prototype patch I attached, mirroring what it did with TypeExecEvalSubroutine for the new ExecEvalBoolSubroutine case. > What did you think of the idea of merging EEOP_SBSREF_OLD / ASSIGN / FETCH > into a single step type distinguished only by the callback function? I don't have a strong opinion on this. I guess find it a bit easier to understand the generated "program" if the opcodes are distinct (I've a pending patch printing the opcode sequence). Especially as the payload is just function pointers. So I think I'd just merge the *implementation* of the steps, but leave the different opcodes around? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2020-12-07 16:32:32 -0500, Tom Lane wrote: >> What did you think of the idea of merging EEOP_SBSREF_OLD / ASSIGN / FETCH >> into a single step type distinguished only by the callback function? > I don't have a strong opinion on this. I guess find it a bit easier to > understand the generated "program" if the opcodes are distinct (I've a > pending patch printing the opcode sequence). Especially as the payload > is just function pointers. > So I think I'd just merge the *implementation* of the steps, but leave > the different opcodes around? Fair enough. It wasn't entirely clear to me whether it'd be kosher to write EEO_CASE(EEOP_SBSREF_OLD) EEO_CASE(EEOP_SBSREF_ASSIGN) EEO_CASE(EEOP_SBSREF_FETCH) { // do something EEO_NEXT(); } I can see that that should work for the two existing implementations of EEO_CASE, but I wasn't sure if you wanted to wire in an assumption that it'll always work. regards, tom lane
Hi, On 2020-12-07 17:25:41 -0500, Tom Lane wrote: > Fair enough. It wasn't entirely clear to me whether it'd be kosher to > write > EEO_CASE(EEOP_SBSREF_OLD) > EEO_CASE(EEOP_SBSREF_ASSIGN) > EEO_CASE(EEOP_SBSREF_FETCH) > { > // do something > EEO_NEXT(); > } > > I can see that that should work for the two existing implementations > of EEO_CASE, but I wasn't sure if you wanted to wire in an assumption > that it'll always work. I don't think it's likely to be a problem, and if it ends up being one, we can still deduplicate the ops at that point... Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > I've attached a prototype conversion for two other such places. Which > immediately pointed to a bug. And one harmless issue (using a pointer to > size_t instead of ExprEvalOp* to represent the 'op' parameter), which > you promptly copied... > If I pushed a slightly cleaned up version of that, it should be fairly > easy to adapt your code to it, I think? I've now studied this patch and it seems sane to me, although I wondered why you wrote "extern"s here: @@ -48,6 +48,10 @@ PGFunction TypePGFunction; size_t TypeSizeT; bool TypeStorageBool; +extern ExprStateEvalFunc TypeExprStateEvalFunc; +ExprStateEvalFunc TypeExprStateEvalFunc; +extern ExecEvalSubroutine TypeExecEvalSubroutine; +ExecEvalSubroutine TypeExecEvalSubroutine; NullableDatum StructNullableDatum; AggState StructAggState; The other variables in that file don't have that. Other than that nit, please finish this up and push it so I can finish the generic-subscripting patch. > WRT the prototype, I think it may be worth removing most of the types > from llvmjit.h. Worth keeping the most common ones, but most aren't used > all the time so terseness doesn't matter that much, and > the llvm_pg_var_type() would suffice. Hm, that would mean redoing llvm_pg_var_type() often wouldn't it? I don't have a very good feeling for how expensive that is, so I'm not sure if this seems like a good idea or not. regards, tom lane
Andres Freund <andres@anarazel.de> writes: > On 2020-12-07 17:25:41 -0500, Tom Lane wrote: >> I can see that that should work for the two existing implementations >> of EEO_CASE, but I wasn't sure if you wanted to wire in an assumption >> that it'll always work. > I don't think it's likely to be a problem, and if it ends up being one, > we can still deduplicate the ops at that point... Seems reasonable. Here's a v38 that addresses the semantic loose ends I was worried about. I decided that it's worth allowing subscripting functions to dictate whether they should be considered strict or not, at least for the fetch side (store is still assumed nonstrict always) and whether they should be considered leakproof or not. That requires only a minimal amount of extra code. While the planner does have to do extra catalog lookups to check strictness and leakproofness, those are not common things to need to check, so I don't think we're paying anything in performance for the flexibility. I left out the option of "strict store" because that *would* have required extra code (to generate a nullness test on the replacement value) and the potential use-case seems too narrow to justify that. I also left out any option to control volatility or parallel safety, again on the grounds of lack of use-case; plus, planner checks for those properties would have been in significantly hotter code paths. I'm waiting on your earlier patch to rewrite the llvmjit_expr.c code, but otherwise I think this is ready to go. regards, tom lane diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c index 2d44df19fe..ca2f9f3215 100644 --- a/contrib/postgres_fdw/deparse.c +++ b/contrib/postgres_fdw/deparse.c @@ -426,23 +426,28 @@ foreign_expr_walker(Node *node, return false; /* - * Recurse to remaining subexpressions. Since the container - * subscripts must yield (noncollatable) integers, they won't - * affect the inner_cxt state. + * Recurse into the remaining subexpressions. The container + * subscripts will not affect collation of the SubscriptingRef + * result, so do those first and reset inner_cxt afterwards. */ if (!foreign_expr_walker((Node *) sr->refupperindexpr, glob_cxt, &inner_cxt)) return false; + inner_cxt.collation = InvalidOid; + inner_cxt.state = FDW_COLLATE_NONE; if (!foreign_expr_walker((Node *) sr->reflowerindexpr, glob_cxt, &inner_cxt)) return false; + inner_cxt.collation = InvalidOid; + inner_cxt.state = FDW_COLLATE_NONE; if (!foreign_expr_walker((Node *) sr->refexpr, glob_cxt, &inner_cxt)) return false; /* - * Container subscripting should yield same collation as - * input, but for safety use same logic as for function nodes. + * Container subscripting typically yields same collation as + * refexpr's, but in case it doesn't, use same logic as for + * function nodes. */ collation = sr->refcollid; if (collation == InvalidOid) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 79069ddfab..583a5ce3b9 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -8740,6 +8740,21 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l </para></entry> </row> + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>typsubscript</structfield> <type>regproc</type> + (references <link linkend="catalog-pg-proc"><structname>pg_proc</structname></link>.<structfield>oid</structfield>) + </para> + <para> + Subscripting handler function's OID, or zero if this type doesn't + support subscripting. Types that are <quote>true</quote> array + types have <structfield>typsubscript</structfield> + = <function>array_subscript_handler</function>, but other types may + have other handler functions to implement specialized subscripting + behavior. + </para></entry> + </row> + <row> <entry role="catalog_table_entry"><para role="column_definition"> <structfield>typelem</structfield> <type>oid</type> @@ -8747,19 +8762,12 @@ SCRAM-SHA-256$<replaceable><iteration count></replaceable>:<replaceable>&l </para> <para> If <structfield>typelem</structfield> is not 0 then it - identifies another row in <structname>pg_type</structname>. - The current type can then be subscripted like an array yielding - values of type <structfield>typelem</structfield>. A - <quote>true</quote> array type is variable length - (<structfield>typlen</structfield> = -1), - but some fixed-length (<structfield>typlen</structfield> > 0) types - also have nonzero <structfield>typelem</structfield>, for example - <type>name</type> and <type>point</type>. - If a fixed-length type has a <structfield>typelem</structfield> then - its internal representation must be some number of values of the - <structfield>typelem</structfield> data type with no other data. - Variable-length array types have a header defined by the array - subroutines. + identifies another row in <structname>pg_type</structname>, + defining the type yielded by subscripting. This should be 0 + if <structfield>typsubscript</structfield> is 0. However, it can + be 0 when <structfield>typsubscript</structfield> isn't 0, if the + handler doesn't need <structfield>typelem</structfield> to + determine the subscripting result type. </para></entry> </row> diff --git a/doc/src/sgml/ref/create_type.sgml b/doc/src/sgml/ref/create_type.sgml index 970b517db9..fc09282db7 100644 --- a/doc/src/sgml/ref/create_type.sgml +++ b/doc/src/sgml/ref/create_type.sgml @@ -43,6 +43,7 @@ CREATE TYPE <replaceable class="parameter">name</replaceable> ( [ , TYPMOD_IN = <replaceable class="parameter">type_modifier_input_function</replaceable> ] [ , TYPMOD_OUT = <replaceable class="parameter">type_modifier_output_function</replaceable> ] [ , ANALYZE = <replaceable class="parameter">analyze_function</replaceable> ] + [ , SUBSCRIPT = <replaceable class="parameter">subscript_function</replaceable> ] [ , INTERNALLENGTH = { <replaceable class="parameter">internallength</replaceable> | VARIABLE } ] [ , PASSEDBYVALUE ] [ , ALIGNMENT = <replaceable class="parameter">alignment</replaceable> ] @@ -196,8 +197,9 @@ CREATE TYPE <replaceable class="parameter">name</replaceable> <replaceable class="parameter">receive_function</replaceable>, <replaceable class="parameter">send_function</replaceable>, <replaceable class="parameter">type_modifier_input_function</replaceable>, - <replaceable class="parameter">type_modifier_output_function</replaceable> and - <replaceable class="parameter">analyze_function</replaceable> + <replaceable class="parameter">type_modifier_output_function</replaceable>, + <replaceable class="parameter">analyze_function</replaceable>, and + <replaceable class="parameter">subscript_function</replaceable> are optional. Generally these functions have to be coded in C or another low-level language. </para> @@ -318,6 +320,26 @@ CREATE TYPE <replaceable class="parameter">name</replaceable> in <filename>src/include/commands/vacuum.h</filename>. </para> + <para> + The optional <replaceable class="parameter">subscript_function</replaceable> + allows the data type to be subscripted in SQL commands. Specifying this + function does not cause the type to be considered a <quote>true</quote> + array type; for example, it will not be a candidate for the result type + of <literal>ARRAY[]</literal> constructs. But if subscripting a value + of the type is a natural notation for extracting data from it, then + a <replaceable class="parameter">subscript_function</replaceable> can + be written to define what that means. The subscript function must be + declared to take a single argument of type <type>internal</type>, and + return an <type>internal</type> result, which is a pointer to a struct + of methods (functions) that implement subscripting. + The detailed API for subscript functions appears + in <filename>src/include/nodes/subscripting.h</filename>; + it may also be useful to read the array implementation + in <filename>src/backend/utils/adt/arraysubs.c</filename>. + Additional information appears in + <xref linkend="sql-createtype-array"/> below. + </para> + <para> While the details of the new type's internal representation are only known to the I/O functions and other functions you create to work with @@ -428,11 +450,12 @@ CREATE TYPE <replaceable class="parameter">name</replaceable> </para> <para> - To indicate that a type is an array, specify the type of the array + To indicate that a type is a fixed-length subscriptable type, + specify the type of the array elements using the <literal>ELEMENT</literal> key word. For example, to define an array of 4-byte integers (<type>int4</type>), specify - <literal>ELEMENT = int4</literal>. More details about array types - appear below. + <literal>ELEMENT = int4</literal>. For more details, + see <xref linkend="sql-createtype-array"/> below. </para> <para> @@ -456,7 +479,7 @@ CREATE TYPE <replaceable class="parameter">name</replaceable> </para> </refsect2> - <refsect2> + <refsect2 id="sql-createtype-array" xreflabel="Array Types"> <title>Array Types</title> <para> @@ -469,7 +492,9 @@ CREATE TYPE <replaceable class="parameter">name</replaceable> repeated until a non-colliding name is found.) This implicitly-created array type is variable length and uses the built-in input and output functions <literal>array_in</literal> and - <literal>array_out</literal>. The array type tracks any changes in its + <literal>array_out</literal>. Furthermore, this type is what the system + uses for constructs such as <literal>ARRAY[]</literal> over the + user-defined type. The array type tracks any changes in its element type's owner or schema, and is dropped if the element type is. </para> @@ -485,13 +510,27 @@ CREATE TYPE <replaceable class="parameter">name</replaceable> using <literal>point[0]</literal> and <literal>point[1]</literal>. Note that this facility only works for fixed-length types whose internal form - is exactly a sequence of identical fixed-length fields. A subscriptable - variable-length type must have the generalized internal representation - used by <literal>array_in</literal> and <literal>array_out</literal>. + is exactly a sequence of identical fixed-length fields. For historical reasons (i.e., this is clearly wrong but it's far too late to change it), subscripting of fixed-length array types starts from zero, rather than from one as for variable-length arrays. </para> + + <para> + Specifying the <option>SUBSCRIPT</option> option allows a data type to + be subscripted, even though the system does not otherwise regard it as + an array type. The behavior just described for fixed-length arrays is + actually implemented by the <option>SUBSCRIPT</option> handler + function <function>raw_array_subscript_handler</function>, which is + used automatically if you specify <option>ELEMENT</option> for a + fixed-length type without also writing <option>SUBSCRIPT</option>. + When specifying a custom <option>SUBSCRIPT</option> function, it is + not necessary to specify <option>ELEMENT</option> unless + the <option>SUBSCRIPT</option> handler function needs to + consult <structfield>typelem</structfield> to find out what to return, + or if you want an explicit dependency from the new type to the + subscripting output type. + </para> </refsect2> </refsect1> @@ -654,6 +693,16 @@ CREATE TYPE <replaceable class="parameter">name</replaceable> </listitem> </varlistentry> + <varlistentry> + <term><replaceable class="parameter">subscript_function</replaceable></term> + <listitem> + <para> + The name of a function that defines what subscripting a value of the + data type does. + </para> + </listitem> + </varlistentry> + <varlistentry> <term><replaceable class="parameter">internallength</replaceable></term> <listitem> diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c index c626161408..c4594b0b09 100644 --- a/src/backend/catalog/aclchk.c +++ b/src/backend/catalog/aclchk.c @@ -3114,7 +3114,7 @@ ExecGrant_Type(InternalGrant *istmt) pg_type_tuple = (Form_pg_type) GETSTRUCT(tuple); - if (pg_type_tuple->typelem != 0 && pg_type_tuple->typlen == -1) + if (IsTrueArrayType(pg_type_tuple)) ereport(ERROR, (errcode(ERRCODE_INVALID_GRANT_OPERATION), errmsg("cannot set privileges of array types"), @@ -4392,7 +4392,7 @@ pg_type_aclmask(Oid type_oid, Oid roleid, AclMode mask, AclMaskHow how) * "True" array types don't manage permissions of their own; consult the * element type instead. */ - if (OidIsValid(typeForm->typelem) && typeForm->typlen == -1) + if (IsTrueArrayType(typeForm)) { Oid elttype_oid = typeForm->typelem; diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c index 245c2f4fc8..119006159b 100644 --- a/src/backend/catalog/dependency.c +++ b/src/backend/catalog/dependency.c @@ -2074,6 +2074,22 @@ find_expr_references_walker(Node *node, context->addrs); /* fall through to examine arguments */ } + else if (IsA(node, SubscriptingRef)) + { + SubscriptingRef *sbsref = (SubscriptingRef *) node; + + /* + * The refexpr should provide adequate dependency on refcontainertype, + * and that type in turn depends on refelemtype. However, a custom + * subscripting handler might set refrestype to something different + * from either of those, in which case we'd better record it. + */ + if (sbsref->refrestype != sbsref->refcontainertype && + sbsref->refrestype != sbsref->refelemtype) + add_object_address(OCLASS_TYPE, sbsref->refrestype, 0, + context->addrs); + /* fall through to examine arguments */ + } else if (IsA(node, SubPlan)) { /* Extra work needed here if we ever need this case */ diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index 4cd7d76938..51b5c4f7f6 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -1079,6 +1079,7 @@ AddNewRelationType(const char *typeName, InvalidOid, /* typmodin procedure - none */ InvalidOid, /* typmodout procedure - none */ InvalidOid, /* analyze procedure - default */ + InvalidOid, /* subscript procedure - none */ InvalidOid, /* array element type - irrelevant */ false, /* this is not an array type */ new_array_type, /* array type if any */ @@ -1358,6 +1359,7 @@ heap_create_with_catalog(const char *relname, InvalidOid, /* typmodin procedure - none */ InvalidOid, /* typmodout procedure - none */ F_ARRAY_TYPANALYZE, /* array analyze procedure */ + F_ARRAY_SUBSCRIPT_HANDLER, /* array subscript procedure */ new_type_oid, /* array element type - the rowtype */ true, /* yes, this is an array type */ InvalidOid, /* this has no array type */ diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c index aeb4a54f63..4252875ef5 100644 --- a/src/backend/catalog/pg_type.c +++ b/src/backend/catalog/pg_type.c @@ -103,6 +103,7 @@ TypeShellMake(const char *typeName, Oid typeNamespace, Oid ownerId) values[Anum_pg_type_typisdefined - 1] = BoolGetDatum(false); values[Anum_pg_type_typdelim - 1] = CharGetDatum(DEFAULT_TYPDELIM); values[Anum_pg_type_typrelid - 1] = ObjectIdGetDatum(InvalidOid); + values[Anum_pg_type_typsubscript - 1] = ObjectIdGetDatum(InvalidOid); values[Anum_pg_type_typelem - 1] = ObjectIdGetDatum(InvalidOid); values[Anum_pg_type_typarray - 1] = ObjectIdGetDatum(InvalidOid); values[Anum_pg_type_typinput - 1] = ObjectIdGetDatum(F_SHELL_IN); @@ -208,6 +209,7 @@ TypeCreate(Oid newTypeOid, Oid typmodinProcedure, Oid typmodoutProcedure, Oid analyzeProcedure, + Oid subscriptProcedure, Oid elementType, bool isImplicitArray, Oid arrayType, @@ -357,6 +359,7 @@ TypeCreate(Oid newTypeOid, values[Anum_pg_type_typisdefined - 1] = BoolGetDatum(true); values[Anum_pg_type_typdelim - 1] = CharGetDatum(typDelim); values[Anum_pg_type_typrelid - 1] = ObjectIdGetDatum(relationOid); + values[Anum_pg_type_typsubscript - 1] = ObjectIdGetDatum(subscriptProcedure); values[Anum_pg_type_typelem - 1] = ObjectIdGetDatum(elementType); values[Anum_pg_type_typarray - 1] = ObjectIdGetDatum(arrayType); values[Anum_pg_type_typinput - 1] = ObjectIdGetDatum(inputProcedure); @@ -667,7 +670,7 @@ GenerateTypeDependencies(HeapTuple typeTuple, recordDependencyOnCurrentExtension(&myself, rebuild); } - /* Normal dependencies on the I/O functions */ + /* Normal dependencies on the I/O and support functions */ if (OidIsValid(typeForm->typinput)) { ObjectAddressSet(referenced, ProcedureRelationId, typeForm->typinput); @@ -710,6 +713,12 @@ GenerateTypeDependencies(HeapTuple typeTuple, add_exact_object_address(&referenced, addrs_normal); } + if (OidIsValid(typeForm->typsubscript)) + { + ObjectAddressSet(referenced, ProcedureRelationId, typeForm->typsubscript); + add_exact_object_address(&referenced, addrs_normal); + } + /* Normal dependency from a domain to its base type. */ if (OidIsValid(typeForm->typbasetype)) { diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c index 483bb65ddc..29fe52d2ce 100644 --- a/src/backend/commands/typecmds.c +++ b/src/backend/commands/typecmds.c @@ -115,6 +115,7 @@ static Oid findTypeSendFunction(List *procname, Oid typeOid); static Oid findTypeTypmodinFunction(List *procname); static Oid findTypeTypmodoutFunction(List *procname); static Oid findTypeAnalyzeFunction(List *procname, Oid typeOid); +static Oid findTypeSubscriptingFunction(List *procname, Oid typeOid); static Oid findRangeSubOpclass(List *opcname, Oid subtype); static Oid findRangeCanonicalFunction(List *procname, Oid typeOid); static Oid findRangeSubtypeDiffFunction(List *procname, Oid subtype); @@ -149,6 +150,7 @@ DefineType(ParseState *pstate, List *names, List *parameters) List *typmodinName = NIL; List *typmodoutName = NIL; List *analyzeName = NIL; + List *subscriptName = NIL; char category = TYPCATEGORY_USER; bool preferred = false; char delimiter = DEFAULT_TYPDELIM; @@ -167,6 +169,7 @@ DefineType(ParseState *pstate, List *names, List *parameters) DefElem *typmodinNameEl = NULL; DefElem *typmodoutNameEl = NULL; DefElem *analyzeNameEl = NULL; + DefElem *subscriptNameEl = NULL; DefElem *categoryEl = NULL; DefElem *preferredEl = NULL; DefElem *delimiterEl = NULL; @@ -183,6 +186,7 @@ DefineType(ParseState *pstate, List *names, List *parameters) Oid typmodinOid = InvalidOid; Oid typmodoutOid = InvalidOid; Oid analyzeOid = InvalidOid; + Oid subscriptOid = InvalidOid; char *array_type; Oid array_oid; Oid typoid; @@ -288,6 +292,8 @@ DefineType(ParseState *pstate, List *names, List *parameters) else if (strcmp(defel->defname, "analyze") == 0 || strcmp(defel->defname, "analyse") == 0) defelp = &analyzeNameEl; + else if (strcmp(defel->defname, "subscript") == 0) + defelp = &subscriptNameEl; else if (strcmp(defel->defname, "category") == 0) defelp = &categoryEl; else if (strcmp(defel->defname, "preferred") == 0) @@ -358,6 +364,8 @@ DefineType(ParseState *pstate, List *names, List *parameters) typmodoutName = defGetQualifiedName(typmodoutNameEl); if (analyzeNameEl) analyzeName = defGetQualifiedName(analyzeNameEl); + if (subscriptNameEl) + subscriptName = defGetQualifiedName(subscriptNameEl); if (categoryEl) { char *p = defGetString(categoryEl); @@ -482,6 +490,24 @@ DefineType(ParseState *pstate, List *names, List *parameters) if (analyzeName) analyzeOid = findTypeAnalyzeFunction(analyzeName, typoid); + /* + * Likewise look up the subscripting procedure if any. If it is not + * specified, but a typelem is specified, allow that if + * raw_array_subscript_handler can be used. (This is for backwards + * compatibility; maybe someday we should throw an error instead.) + */ + if (subscriptName) + subscriptOid = findTypeSubscriptingFunction(subscriptName, typoid); + else if (OidIsValid(elemType)) + { + if (internalLength > 0 && !byValue && get_typlen(elemType) > 0) + subscriptOid = F_RAW_ARRAY_SUBSCRIPT_HANDLER; + else + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("element type cannot be specified without a valid subscripting procedure"))); + } + /* * Check permissions on functions. We choose to require the creator/owner * of a type to also own the underlying functions. Since creating a type @@ -516,6 +542,9 @@ DefineType(ParseState *pstate, List *names, List *parameters) if (analyzeOid && !pg_proc_ownercheck(analyzeOid, GetUserId())) aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_FUNCTION, NameListToString(analyzeName)); + if (subscriptOid && !pg_proc_ownercheck(subscriptOid, GetUserId())) + aclcheck_error(ACLCHECK_NOT_OWNER, OBJECT_FUNCTION, + NameListToString(subscriptName)); #endif /* @@ -551,8 +580,9 @@ DefineType(ParseState *pstate, List *names, List *parameters) typmodinOid, /* typmodin procedure */ typmodoutOid, /* typmodout procedure */ analyzeOid, /* analyze procedure */ + subscriptOid, /* subscript procedure */ elemType, /* element type ID */ - false, /* this is not an array type */ + false, /* this is not an implicit array type */ array_oid, /* array type we are about to create */ InvalidOid, /* base type ID (only for domains) */ defaultValue, /* default type value */ @@ -592,6 +622,7 @@ DefineType(ParseState *pstate, List *names, List *parameters) typmodinOid, /* typmodin procedure */ typmodoutOid, /* typmodout procedure */ F_ARRAY_TYPANALYZE, /* analyze procedure */ + F_ARRAY_SUBSCRIPT_HANDLER, /* array subscript procedure */ typoid, /* element type ID */ true, /* yes this is an array type */ InvalidOid, /* no further array type */ @@ -800,6 +831,12 @@ DefineDomain(CreateDomainStmt *stmt) /* Analysis function */ analyzeProcedure = baseType->typanalyze; + /* + * Domains don't need a subscript procedure, since they are not + * subscriptable on their own. If the base type is subscriptable, the + * parser will reduce the type to the base type before subscripting. + */ + /* Inherited default value */ datum = SysCacheGetAttr(TYPEOID, typeTup, Anum_pg_type_typdefault, &isnull); @@ -993,6 +1030,7 @@ DefineDomain(CreateDomainStmt *stmt) InvalidOid, /* typmodin procedure - none */ InvalidOid, /* typmodout procedure - none */ analyzeProcedure, /* analyze procedure */ + InvalidOid, /* subscript procedure - none */ InvalidOid, /* no array element type */ false, /* this isn't an array */ domainArrayOid, /* array type we are about to create */ @@ -1033,6 +1071,7 @@ DefineDomain(CreateDomainStmt *stmt) InvalidOid, /* typmodin procedure - none */ InvalidOid, /* typmodout procedure - none */ F_ARRAY_TYPANALYZE, /* analyze procedure */ + F_ARRAY_SUBSCRIPT_HANDLER, /* array subscript procedure */ address.objectId, /* element type ID */ true, /* yes this is an array type */ InvalidOid, /* no further array type */ @@ -1148,6 +1187,7 @@ DefineEnum(CreateEnumStmt *stmt) InvalidOid, /* typmodin procedure - none */ InvalidOid, /* typmodout procedure - none */ InvalidOid, /* analyze procedure - default */ + InvalidOid, /* subscript procedure - none */ InvalidOid, /* element type ID */ false, /* this is not an array type */ enumArrayOid, /* array type we are about to create */ @@ -1188,6 +1228,7 @@ DefineEnum(CreateEnumStmt *stmt) InvalidOid, /* typmodin procedure - none */ InvalidOid, /* typmodout procedure - none */ F_ARRAY_TYPANALYZE, /* analyze procedure */ + F_ARRAY_SUBSCRIPT_HANDLER, /* array subscript procedure */ enumTypeAddr.objectId, /* element type ID */ true, /* yes this is an array type */ InvalidOid, /* no further array type */ @@ -1476,6 +1517,7 @@ DefineRange(CreateRangeStmt *stmt) InvalidOid, /* typmodin procedure - none */ InvalidOid, /* typmodout procedure - none */ F_RANGE_TYPANALYZE, /* analyze procedure */ + InvalidOid, /* subscript procedure - none */ InvalidOid, /* element type ID - none */ false, /* this is not an array type */ rangeArrayOid, /* array type we are about to create */ @@ -1519,6 +1561,7 @@ DefineRange(CreateRangeStmt *stmt) InvalidOid, /* typmodin procedure - none */ InvalidOid, /* typmodout procedure - none */ F_ARRAY_TYPANALYZE, /* analyze procedure */ + F_ARRAY_SUBSCRIPT_HANDLER, /* array subscript procedure */ typoid, /* element type ID */ true, /* yes this is an array type */ InvalidOid, /* no further array type */ @@ -1616,7 +1659,7 @@ makeRangeConstructors(const char *name, Oid namespace, /* - * Find suitable I/O functions for a type. + * Find suitable I/O and other support functions for a type. * * typeOid is the type's OID (which will already exist, if only as a shell * type). @@ -1904,6 +1947,45 @@ findTypeAnalyzeFunction(List *procname, Oid typeOid) return procOid; } +static Oid +findTypeSubscriptingFunction(List *procname, Oid typeOid) +{ + Oid argList[1]; + Oid procOid; + + /* + * Subscripting support functions always take one INTERNAL argument and + * return INTERNAL. (The argument is not used, but we must have it to + * maintain type safety.) + */ + argList[0] = INTERNALOID; + + procOid = LookupFuncName(procname, 1, argList, true); + if (!OidIsValid(procOid)) + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_FUNCTION), + errmsg("function %s does not exist", + func_signature_string(procname, 1, NIL, argList)))); + + if (get_func_rettype(procOid) != INTERNALOID) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("type subscripting function %s must return type %s", + NameListToString(procname), "internal"))); + + /* + * We disallow array_subscript_handler() from being selected explicitly, + * since that must only be applied to autogenerated array types. + */ + if (procOid == F_ARRAY_SUBSCRIPT_HANDLER) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("user-defined types cannot use subscripting function %s", + NameListToString(procname)))); + + return procOid; +} + /* * Find suitable support functions and opclasses for a range type. */ @@ -3221,8 +3303,7 @@ RenameType(RenameStmt *stmt) errhint("Use ALTER TABLE instead."))); /* don't allow direct alteration of array types, either */ - if (OidIsValid(typTup->typelem) && - get_array_type(typTup->typelem) == typeOid) + if (IsTrueArrayType(typTup)) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("cannot alter array type %s", @@ -3303,8 +3384,7 @@ AlterTypeOwner(List *names, Oid newOwnerId, ObjectType objecttype) errhint("Use ALTER TABLE instead."))); /* don't allow direct alteration of array types, either */ - if (OidIsValid(typTup->typelem) && - get_array_type(typTup->typelem) == typeOid) + if (IsTrueArrayType(typTup)) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("cannot alter array type %s", @@ -3869,8 +3949,7 @@ AlterType(AlterTypeStmt *stmt) /* * For the same reasons, don't allow direct alteration of array types. */ - if (OidIsValid(typForm->typelem) && - get_array_type(typForm->typelem) == typeOid) + if (IsTrueArrayType(typForm)) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("%s is not a base type", diff --git a/src/backend/executor/execExpr.c b/src/backend/executor/execExpr.c index 79b325c7cf..0134ecc261 100644 --- a/src/backend/executor/execExpr.c +++ b/src/backend/executor/execExpr.c @@ -40,6 +40,7 @@ #include "miscadmin.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" +#include "nodes/subscripting.h" #include "optimizer/optimizer.h" #include "pgstat.h" #include "utils/acl.h" @@ -2523,19 +2524,51 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, ExprState *state, Datum *resv, bool *resnull) { bool isAssignment = (sbsref->refassgnexpr != NULL); - SubscriptingRefState *sbsrefstate = palloc0(sizeof(SubscriptingRefState)); + int nupper = list_length(sbsref->refupperindexpr); + int nlower = list_length(sbsref->reflowerindexpr); + const SubscriptRoutines *sbsroutines; + SubscriptingRefState *sbsrefstate; + SubscriptExecSteps methods; + char *ptr; List *adjust_jumps = NIL; ListCell *lc; int i; + /* Look up the subscripting support methods */ + sbsroutines = getSubscriptingRoutines(sbsref->refcontainertype, NULL); + + /* Allocate sbsrefstate, with enough space for per-subscript arrays too */ + sbsrefstate = palloc0(MAXALIGN(sizeof(SubscriptingRefState)) + + (nupper + nlower) * (sizeof(Datum) + + 2 * sizeof(bool))); + /* Fill constant fields of SubscriptingRefState */ sbsrefstate->isassignment = isAssignment; - sbsrefstate->refelemtype = sbsref->refelemtype; - sbsrefstate->refattrlength = get_typlen(sbsref->refcontainertype); - get_typlenbyvalalign(sbsref->refelemtype, - &sbsrefstate->refelemlength, - &sbsrefstate->refelembyval, - &sbsrefstate->refelemalign); + sbsrefstate->numupper = nupper; + sbsrefstate->numlower = nlower; + /* Set up per-subscript arrays */ + ptr = ((char *) sbsrefstate) + MAXALIGN(sizeof(SubscriptingRefState)); + sbsrefstate->upperindex = (Datum *) ptr; + ptr += nupper * sizeof(Datum); + sbsrefstate->lowerindex = (Datum *) ptr; + ptr += nlower * sizeof(Datum); + sbsrefstate->upperprovided = (bool *) ptr; + ptr += nupper * sizeof(bool); + sbsrefstate->lowerprovided = (bool *) ptr; + ptr += nlower * sizeof(bool); + sbsrefstate->upperindexnull = (bool *) ptr; + ptr += nupper * sizeof(bool); + sbsrefstate->lowerindexnull = (bool *) ptr; + /* ptr += nlower * sizeof(bool); */ + + /* + * Let the container-type-specific code have a chance. It must fill the + * "methods" struct with function pointers for us to possibly use in + * execution steps below; and it can optionally set up some data pointed + * to by the workspace field. + */ + memset(&methods, 0, sizeof(methods)); + sbsroutines->exec_setup(sbsref, sbsrefstate, &methods); /* * Evaluate array input. It's safe to do so into resv/resnull, because we @@ -2546,11 +2579,11 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, ExecInitExprRec(sbsref->refexpr, state, resv, resnull); /* - * If refexpr yields NULL, and it's a fetch, then result is NULL. We can - * implement this with just JUMP_IF_NULL, since we evaluated the array - * into the desired target location. + * If refexpr yields NULL, and the operation should be strict, then result + * is NULL. We can implement this with just JUMP_IF_NULL, since we + * evaluated the array into the desired target location. */ - if (!isAssignment) + if (!isAssignment && sbsroutines->fetch_strict) { scratch->opcode = EEOP_JUMP_IF_NULL; scratch->d.jump.jumpdone = -1; /* adjust later */ @@ -2559,19 +2592,6 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, state->steps_len - 1); } - /* Verify subscript list lengths are within limit */ - if (list_length(sbsref->refupperindexpr) > MAXDIM) - ereport(ERROR, - (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), - errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", - list_length(sbsref->refupperindexpr), MAXDIM))); - - if (list_length(sbsref->reflowerindexpr) > MAXDIM) - ereport(ERROR, - (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), - errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", - list_length(sbsref->reflowerindexpr), MAXDIM))); - /* Evaluate upper subscripts */ i = 0; foreach(lc, sbsref->refupperindexpr) @@ -2582,28 +2602,18 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, if (!e) { sbsrefstate->upperprovided[i] = false; - i++; - continue; + sbsrefstate->upperindexnull[i] = true; + } + else + { + sbsrefstate->upperprovided[i] = true; + /* Each subscript is evaluated into appropriate array entry */ + ExecInitExprRec(e, state, + &sbsrefstate->upperindex[i], + &sbsrefstate->upperindexnull[i]); } - - sbsrefstate->upperprovided[i] = true; - - /* Each subscript is evaluated into subscriptvalue/subscriptnull */ - ExecInitExprRec(e, state, - &sbsrefstate->subscriptvalue, &sbsrefstate->subscriptnull); - - /* ... and then SBSREF_SUBSCRIPT saves it into step's workspace */ - scratch->opcode = EEOP_SBSREF_SUBSCRIPT; - scratch->d.sbsref_subscript.state = sbsrefstate; - scratch->d.sbsref_subscript.off = i; - scratch->d.sbsref_subscript.isupper = true; - scratch->d.sbsref_subscript.jumpdone = -1; /* adjust later */ - ExprEvalPushStep(state, scratch); - adjust_jumps = lappend_int(adjust_jumps, - state->steps_len - 1); i++; } - sbsrefstate->numupper = i; /* Evaluate lower subscripts similarly */ i = 0; @@ -2615,39 +2625,43 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, if (!e) { sbsrefstate->lowerprovided[i] = false; - i++; - continue; + sbsrefstate->lowerindexnull[i] = true; } + else + { + sbsrefstate->lowerprovided[i] = true; + /* Each subscript is evaluated into appropriate array entry */ + ExecInitExprRec(e, state, + &sbsrefstate->lowerindex[i], + &sbsrefstate->lowerindexnull[i]); + } + i++; + } - sbsrefstate->lowerprovided[i] = true; - - /* Each subscript is evaluated into subscriptvalue/subscriptnull */ - ExecInitExprRec(e, state, - &sbsrefstate->subscriptvalue, &sbsrefstate->subscriptnull); - - /* ... and then SBSREF_SUBSCRIPT saves it into step's workspace */ - scratch->opcode = EEOP_SBSREF_SUBSCRIPT; + /* SBSREF_SUBSCRIPTS checks and converts all the subscripts at once */ + if (methods.sbs_check_subscripts) + { + scratch->opcode = EEOP_SBSREF_SUBSCRIPTS; + scratch->d.sbsref_subscript.subscriptfunc = methods.sbs_check_subscripts; scratch->d.sbsref_subscript.state = sbsrefstate; - scratch->d.sbsref_subscript.off = i; - scratch->d.sbsref_subscript.isupper = false; scratch->d.sbsref_subscript.jumpdone = -1; /* adjust later */ ExprEvalPushStep(state, scratch); adjust_jumps = lappend_int(adjust_jumps, state->steps_len - 1); - i++; } - sbsrefstate->numlower = i; - - /* Should be impossible if parser is sane, but check anyway: */ - if (sbsrefstate->numlower != 0 && - sbsrefstate->numupper != sbsrefstate->numlower) - elog(ERROR, "upper and lower index lists are not same length"); if (isAssignment) { Datum *save_innermost_caseval; bool *save_innermost_casenull; + /* Check for unimplemented methods */ + if (!methods.sbs_assign) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("type %s does not support subscripted assignment", + format_type_be(sbsref->refcontainertype)))); + /* * We might have a nested-assignment situation, in which the * refassgnexpr is itself a FieldStore or SubscriptingRef that needs @@ -2664,7 +2678,13 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, */ if (isAssignmentIndirectionExpr(sbsref->refassgnexpr)) { + if (!methods.sbs_fetch_old) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("type %s does not support subscripted assignment", + format_type_be(sbsref->refcontainertype)))); scratch->opcode = EEOP_SBSREF_OLD; + scratch->d.sbsref.subscriptfunc = methods.sbs_fetch_old; scratch->d.sbsref.state = sbsrefstate; ExprEvalPushStep(state, scratch); } @@ -2684,17 +2704,17 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, /* and perform the assignment */ scratch->opcode = EEOP_SBSREF_ASSIGN; + scratch->d.sbsref.subscriptfunc = methods.sbs_assign; scratch->d.sbsref.state = sbsrefstate; ExprEvalPushStep(state, scratch); - } else { /* array fetch is much simpler */ scratch->opcode = EEOP_SBSREF_FETCH; + scratch->d.sbsref.subscriptfunc = methods.sbs_fetch; scratch->d.sbsref.state = sbsrefstate; ExprEvalPushStep(state, scratch); - } /* adjust jump targets */ @@ -2702,7 +2722,7 @@ ExecInitSubscriptingRef(ExprEvalStep *scratch, SubscriptingRef *sbsref, { ExprEvalStep *as = &state->steps[lfirst_int(lc)]; - if (as->opcode == EEOP_SBSREF_SUBSCRIPT) + if (as->opcode == EEOP_SBSREF_SUBSCRIPTS) { Assert(as->d.sbsref_subscript.jumpdone == -1); as->d.sbsref_subscript.jumpdone = state->steps_len; diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index c09371ad58..6b9fc38134 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -417,7 +417,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) &&CASE_EEOP_FIELDSELECT, &&CASE_EEOP_FIELDSTORE_DEFORM, &&CASE_EEOP_FIELDSTORE_FORM, - &&CASE_EEOP_SBSREF_SUBSCRIPT, + &&CASE_EEOP_SBSREF_SUBSCRIPTS, &&CASE_EEOP_SBSREF_OLD, &&CASE_EEOP_SBSREF_ASSIGN, &&CASE_EEOP_SBSREF_FETCH, @@ -1396,12 +1396,10 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) EEO_NEXT(); } - EEO_CASE(EEOP_SBSREF_SUBSCRIPT) + EEO_CASE(EEOP_SBSREF_SUBSCRIPTS) { - /* Process an array subscript */ - - /* too complex for an inline implementation */ - if (ExecEvalSubscriptingRef(state, op)) + /* Precheck SubscriptingRef subscript(s) */ + if (op->d.sbsref_subscript.subscriptfunc(state, op, econtext)) { EEO_NEXT(); } @@ -1413,37 +1411,11 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull) } EEO_CASE(EEOP_SBSREF_OLD) + EEO_CASE(EEOP_SBSREF_ASSIGN) + EEO_CASE(EEOP_SBSREF_FETCH) { - /* - * Fetch the old value in an sbsref assignment, in case it's - * referenced (via a CaseTestExpr) inside the assignment - * expression. - */ - - /* too complex for an inline implementation */ - ExecEvalSubscriptingRefOld(state, op); - - EEO_NEXT(); - } - - /* - * Perform SubscriptingRef assignment - */ - EEO_CASE(EEOP_SBSREF_ASSIGN) - { - /* too complex for an inline implementation */ - ExecEvalSubscriptingRefAssign(state, op); - - EEO_NEXT(); - } - - /* - * Fetch subset of an array. - */ - EEO_CASE(EEOP_SBSREF_FETCH) - { - /* too complex for an inline implementation */ - ExecEvalSubscriptingRefFetch(state, op); + /* Perform a SubscriptingRef fetch or assignment */ + op->d.sbsref.subscriptfunc(state, op, econtext); EEO_NEXT(); } @@ -3122,200 +3094,6 @@ ExecEvalFieldStoreForm(ExprState *state, ExprEvalStep *op, ExprContext *econtext *op->resnull = false; } -/* - * Process a subscript in a SubscriptingRef expression. - * - * If subscript is NULL, throw error in assignment case, or in fetch case - * set result to NULL and return false (instructing caller to skip the rest - * of the SubscriptingRef sequence). - * - * Subscript expression result is in subscriptvalue/subscriptnull. - * On success, integer subscript value has been saved in upperindex[] or - * lowerindex[] for use later. - */ -bool -ExecEvalSubscriptingRef(ExprState *state, ExprEvalStep *op) -{ - SubscriptingRefState *sbsrefstate = op->d.sbsref_subscript.state; - int *indexes; - int off; - - /* If any index expr yields NULL, result is NULL or error */ - if (sbsrefstate->subscriptnull) - { - if (sbsrefstate->isassignment) - ereport(ERROR, - (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), - errmsg("array subscript in assignment must not be null"))); - *op->resnull = true; - return false; - } - - /* Convert datum to int, save in appropriate place */ - if (op->d.sbsref_subscript.isupper) - indexes = sbsrefstate->upperindex; - else - indexes = sbsrefstate->lowerindex; - off = op->d.sbsref_subscript.off; - - indexes[off] = DatumGetInt32(sbsrefstate->subscriptvalue); - - return true; -} - -/* - * Evaluate SubscriptingRef fetch. - * - * Source container is in step's result variable. - */ -void -ExecEvalSubscriptingRefFetch(ExprState *state, ExprEvalStep *op) -{ - SubscriptingRefState *sbsrefstate = op->d.sbsref.state; - - /* Should not get here if source container (or any subscript) is null */ - Assert(!(*op->resnull)); - - if (sbsrefstate->numlower == 0) - { - /* Scalar case */ - *op->resvalue = array_get_element(*op->resvalue, - sbsrefstate->numupper, - sbsrefstate->upperindex, - sbsrefstate->refattrlength, - sbsrefstate->refelemlength, - sbsrefstate->refelembyval, - sbsrefstate->refelemalign, - op->resnull); - } - else - { - /* Slice case */ - *op->resvalue = array_get_slice(*op->resvalue, - sbsrefstate->numupper, - sbsrefstate->upperindex, - sbsrefstate->lowerindex, - sbsrefstate->upperprovided, - sbsrefstate->lowerprovided, - sbsrefstate->refattrlength, - sbsrefstate->refelemlength, - sbsrefstate->refelembyval, - sbsrefstate->refelemalign); - } -} - -/* - * Compute old container element/slice value for a SubscriptingRef assignment - * expression. Will only be generated if the new-value subexpression - * contains SubscriptingRef or FieldStore. The value is stored into the - * SubscriptingRefState's prevvalue/prevnull fields. - */ -void -ExecEvalSubscriptingRefOld(ExprState *state, ExprEvalStep *op) -{ - SubscriptingRefState *sbsrefstate = op->d.sbsref.state; - - if (*op->resnull) - { - /* whole array is null, so any element or slice is too */ - sbsrefstate->prevvalue = (Datum) 0; - sbsrefstate->prevnull = true; - } - else if (sbsrefstate->numlower == 0) - { - /* Scalar case */ - sbsrefstate->prevvalue = array_get_element(*op->resvalue, - sbsrefstate->numupper, - sbsrefstate->upperindex, - sbsrefstate->refattrlength, - sbsrefstate->refelemlength, - sbsrefstate->refelembyval, - sbsrefstate->refelemalign, - &sbsrefstate->prevnull); - } - else - { - /* Slice case */ - /* this is currently unreachable */ - sbsrefstate->prevvalue = array_get_slice(*op->resvalue, - sbsrefstate->numupper, - sbsrefstate->upperindex, - sbsrefstate->lowerindex, - sbsrefstate->upperprovided, - sbsrefstate->lowerprovided, - sbsrefstate->refattrlength, - sbsrefstate->refelemlength, - sbsrefstate->refelembyval, - sbsrefstate->refelemalign); - sbsrefstate->prevnull = false; - } -} - -/* - * Evaluate SubscriptingRef assignment. - * - * Input container (possibly null) is in result area, replacement value is in - * SubscriptingRefState's replacevalue/replacenull. - */ -void -ExecEvalSubscriptingRefAssign(ExprState *state, ExprEvalStep *op) -{ - SubscriptingRefState *sbsrefstate = op->d.sbsref_subscript.state; - - /* - * For an assignment to a fixed-length container type, both the original - * container and the value to be assigned into it must be non-NULL, else - * we punt and return the original container. - */ - if (sbsrefstate->refattrlength > 0) - { - if (*op->resnull || sbsrefstate->replacenull) - return; - } - - /* - * For assignment to varlena arrays, we handle a NULL original array by - * substituting an empty (zero-dimensional) array; insertion of the new - * element will result in a singleton array value. It does not matter - * whether the new element is NULL. - */ - if (*op->resnull) - { - *op->resvalue = PointerGetDatum(construct_empty_array(sbsrefstate->refelemtype)); - *op->resnull = false; - } - - if (sbsrefstate->numlower == 0) - { - /* Scalar case */ - *op->resvalue = array_set_element(*op->resvalue, - sbsrefstate->numupper, - sbsrefstate->upperindex, - sbsrefstate->replacevalue, - sbsrefstate->replacenull, - sbsrefstate->refattrlength, - sbsrefstate->refelemlength, - sbsrefstate->refelembyval, - sbsrefstate->refelemalign); - } - else - { - /* Slice case */ - *op->resvalue = array_set_slice(*op->resvalue, - sbsrefstate->numupper, - sbsrefstate->upperindex, - sbsrefstate->lowerindex, - sbsrefstate->upperprovided, - sbsrefstate->lowerprovided, - sbsrefstate->replacevalue, - sbsrefstate->replacenull, - sbsrefstate->refattrlength, - sbsrefstate->refelemlength, - sbsrefstate->refelembyval, - sbsrefstate->refelemalign); - } -} - /* * Evaluate a rowtype coercion operation. * This may require rearranging field positions. diff --git a/src/backend/jit/llvm/llvmjit_expr.c b/src/backend/jit/llvm/llvmjit_expr.c index da5e3a2c1d..9a6af90914 100644 --- a/src/backend/jit/llvm/llvmjit_expr.c +++ b/src/backend/jit/llvm/llvmjit_expr.c @@ -1113,23 +1113,72 @@ llvm_compile_expr(ExprState *state) break; } - case EEOP_SBSREF_OLD: - build_EvalXFunc(b, mod, "ExecEvalSubscriptingRefOld", - v_state, op); - LLVMBuildBr(b, opblocks[opno + 1]); - break; + case EEOP_SBSREF_SUBSCRIPTS: + { + int jumpdone = op->d.sbsref_subscript.jumpdone; + LLVMTypeRef param_types[3]; + LLVMValueRef v_params[3]; + LLVMTypeRef v_functype; + LLVMValueRef v_func; + LLVMValueRef v_ret; - case EEOP_SBSREF_ASSIGN: - build_EvalXFunc(b, mod, "ExecEvalSubscriptingRefAssign", - v_state, op); - LLVMBuildBr(b, opblocks[opno + 1]); - break; + param_types[0] = l_ptr(StructExprState); + param_types[1] = l_ptr(TypeSizeT); + param_types[2] = l_ptr(StructExprContext); + v_functype = LLVMFunctionType(TypeParamBool, + param_types, + lengthof(param_types), + false); + v_func = l_ptr_const(op->d.sbsref_subscript.subscriptfunc, + l_ptr(v_functype)); + + v_params[0] = v_state; + v_params[1] = l_ptr_const(op, l_ptr(TypeSizeT)); + v_params[2] = v_econtext; + v_ret = LLVMBuildCall(b, + v_func, + v_params, lengthof(v_params), ""); + v_ret = LLVMBuildZExt(b, v_ret, TypeStorageBool, ""); + + LLVMBuildCondBr(b, + LLVMBuildICmp(b, LLVMIntEQ, v_ret, + l_sbool_const(1), ""), + opblocks[opno + 1], + opblocks[jumpdone]); + break; + } + + case EEOP_SBSREF_OLD: + case EEOP_SBSREF_ASSIGN: case EEOP_SBSREF_FETCH: - build_EvalXFunc(b, mod, "ExecEvalSubscriptingRefFetch", - v_state, op); - LLVMBuildBr(b, opblocks[opno + 1]); - break; + { + LLVMTypeRef param_types[3]; + LLVMValueRef v_params[3]; + LLVMTypeRef v_functype; + LLVMValueRef v_func; + + param_types[0] = l_ptr(StructExprState); + param_types[1] = l_ptr(TypeSizeT); + param_types[2] = l_ptr(StructExprContext); + + v_functype = LLVMFunctionType(LLVMVoidType(), + param_types, + lengthof(param_types), + false); + v_func = l_ptr_const(op->d.sbsref.subscriptfunc, + l_ptr(v_functype)); + + v_params[0] = v_state; + v_params[1] = l_ptr_const(op, l_ptr(TypeSizeT)); + v_params[2] = v_econtext; + LLVMBuildCall(b, + v_func, + v_params, lengthof(v_params), ""); + + LLVMBuildBr(b, opblocks[opno + 1]); + break; + } case EEOP_CASE_TESTVAL: { @@ -1744,23 +1793,6 @@ llvm_compile_expr(ExprState *state) LLVMBuildBr(b, opblocks[opno + 1]); break; - case EEOP_SBSREF_SUBSCRIPT: - { - int jumpdone = op->d.sbsref_subscript.jumpdone; - LLVMValueRef v_ret; - - v_ret = build_EvalXFunc(b, mod, "ExecEvalSubscriptingRef", - v_state, op); - v_ret = LLVMBuildZExt(b, v_ret, TypeStorageBool, ""); - - LLVMBuildCondBr(b, - LLVMBuildICmp(b, LLVMIntEQ, v_ret, - l_sbool_const(1), ""), - opblocks[opno + 1], - opblocks[jumpdone]); - break; - } - case EEOP_DOMAIN_TESTVAL: { LLVMBasicBlockRef b_avail, diff --git a/src/backend/jit/llvm/llvmjit_types.c b/src/backend/jit/llvm/llvmjit_types.c index 1ed3cafa2f..ae3c88aad9 100644 --- a/src/backend/jit/llvm/llvmjit_types.c +++ b/src/backend/jit/llvm/llvmjit_types.c @@ -124,10 +124,6 @@ void *referenced_functions[] = ExecEvalSQLValueFunction, ExecEvalScalarArrayOp, ExecEvalSubPlan, - ExecEvalSubscriptingRef, - ExecEvalSubscriptingRefAssign, - ExecEvalSubscriptingRefFetch, - ExecEvalSubscriptingRefOld, ExecEvalSysVar, ExecEvalWholeRowVar, ExecEvalXmlExpr, diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index 910906f639..70f8b718e0 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -1548,6 +1548,7 @@ _copySubscriptingRef(const SubscriptingRef *from) COPY_SCALAR_FIELD(refcontainertype); COPY_SCALAR_FIELD(refelemtype); + COPY_SCALAR_FIELD(refrestype); COPY_SCALAR_FIELD(reftypmod); COPY_SCALAR_FIELD(refcollid); COPY_NODE_FIELD(refupperindexpr); diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index 687609f59e..541e0e6b48 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -276,6 +276,7 @@ _equalSubscriptingRef(const SubscriptingRef *a, const SubscriptingRef *b) { COMPARE_SCALAR_FIELD(refcontainertype); COMPARE_SCALAR_FIELD(refelemtype); + COMPARE_SCALAR_FIELD(refrestype); COMPARE_SCALAR_FIELD(reftypmod); COMPARE_SCALAR_FIELD(refcollid); COMPARE_NODE_FIELD(refupperindexpr); diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c index 1dc873ed25..963f71e99d 100644 --- a/src/backend/nodes/nodeFuncs.c +++ b/src/backend/nodes/nodeFuncs.c @@ -66,15 +66,7 @@ exprType(const Node *expr) type = ((const WindowFunc *) expr)->wintype; break; case T_SubscriptingRef: - { - const SubscriptingRef *sbsref = (const SubscriptingRef *) expr; - - /* slice and/or store operations yield the container type */ - if (sbsref->reflowerindexpr || sbsref->refassgnexpr) - type = sbsref->refcontainertype; - else - type = sbsref->refelemtype; - } + type = ((const SubscriptingRef *) expr)->refrestype; break; case T_FuncExpr: type = ((const FuncExpr *) expr)->funcresulttype; @@ -286,7 +278,6 @@ exprTypmod(const Node *expr) case T_Param: return ((const Param *) expr)->paramtypmod; case T_SubscriptingRef: - /* typmod is the same for container or element */ return ((const SubscriptingRef *) expr)->reftypmod; case T_FuncExpr: { diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index 8f5e4e71b2..d78b16ed1d 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -1194,6 +1194,7 @@ _outSubscriptingRef(StringInfo str, const SubscriptingRef *node) WRITE_OID_FIELD(refcontainertype); WRITE_OID_FIELD(refelemtype); + WRITE_OID_FIELD(refrestype); WRITE_INT_FIELD(reftypmod); WRITE_OID_FIELD(refcollid); WRITE_NODE_FIELD(refupperindexpr); diff --git a/src/backend/nodes/readfuncs.c b/src/backend/nodes/readfuncs.c index 169d5581b9..0f6a77afc4 100644 --- a/src/backend/nodes/readfuncs.c +++ b/src/backend/nodes/readfuncs.c @@ -671,6 +671,7 @@ _readSubscriptingRef(void) READ_OID_FIELD(refcontainertype); READ_OID_FIELD(refelemtype); + READ_OID_FIELD(refrestype); READ_INT_FIELD(reftypmod); READ_OID_FIELD(refcollid); READ_NODE_FIELD(refupperindexpr); diff --git a/src/backend/optimizer/util/clauses.c b/src/backend/optimizer/util/clauses.c index cb7fa66180..e3a81a7a02 100644 --- a/src/backend/optimizer/util/clauses.c +++ b/src/backend/optimizer/util/clauses.c @@ -32,6 +32,7 @@ #include "miscadmin.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" +#include "nodes/subscripting.h" #include "nodes/supportnodes.h" #include "optimizer/clauses.h" #include "optimizer/cost.h" @@ -839,13 +840,16 @@ contain_nonstrict_functions_walker(Node *node, void *context) } if (IsA(node, SubscriptingRef)) { - /* - * subscripting assignment is nonstrict, but subscripting itself is - * strict - */ - if (((SubscriptingRef *) node)->refassgnexpr != NULL) - return true; + SubscriptingRef *sbsref = (SubscriptingRef *) node; + const SubscriptRoutines *sbsroutines; + /* Subscripting assignment is always presumed nonstrict */ + if (sbsref->refassgnexpr != NULL) + return true; + /* Otherwise we must look up the subscripting support methods */ + sbsroutines = getSubscriptingRoutines(sbsref->refcontainertype, NULL); + if (!sbsroutines->fetch_strict) + return true; /* else fall through to check args */ } if (IsA(node, DistinctExpr)) @@ -1135,12 +1139,14 @@ contain_leaked_vars_walker(Node *node, void *context) case T_SubscriptingRef: { SubscriptingRef *sbsref = (SubscriptingRef *) node; - - /* - * subscripting assignment is leaky, but subscripted fetches - * are not - */ - if (sbsref->refassgnexpr != NULL) + const SubscriptRoutines *sbsroutines; + + /* Consult the subscripting support method info */ + sbsroutines = getSubscriptingRoutines(sbsref->refcontainertype, + NULL); + if (!(sbsref->refassgnexpr != NULL ? + sbsroutines->store_leakproof : + sbsroutines->fetch_leakproof)) { /* Node is leaky, so reject if it contains Vars */ if (contain_var_clause(node)) @@ -2859,6 +2865,11 @@ eval_const_expressions_mutator(Node *node, * known to be immutable, and for which we need no smarts * beyond "simplify if all inputs are constants". * + * Treating SubscriptingRef this way assumes that subscripting + * fetch and assignment are both immutable. This constrains + * type-specific subscripting implementations; maybe we should + * relax it someday. + * * Treating MinMaxExpr this way amounts to assuming that the * btree comparison function it calls is immutable; see the * reasoning in contain_mutable_functions_walker. @@ -3122,10 +3133,10 @@ eval_const_expressions_mutator(Node *node, { /* * This case could be folded into the generic handling used - * for SubscriptingRef etc. But because the simplification - * logic is so trivial, applying evaluate_expr() to perform it - * would be a heavy overhead. BooleanTest is probably common - * enough to justify keeping this bespoke implementation. + * for ArrayExpr etc. But because the simplification logic is + * so trivial, applying evaluate_expr() to perform it would be + * a heavy overhead. BooleanTest is probably common enough to + * justify keeping this bespoke implementation. */ BooleanTest *btest = (BooleanTest *) node; BooleanTest *newbtest; diff --git a/src/backend/parser/parse_coerce.c b/src/backend/parser/parse_coerce.c index a2924e3d1c..da6c3ae4b5 100644 --- a/src/backend/parser/parse_coerce.c +++ b/src/backend/parser/parse_coerce.c @@ -26,6 +26,7 @@ #include "parser/parse_type.h" #include "utils/builtins.h" #include "utils/datum.h" /* needed for datumIsEqual() */ +#include "utils/fmgroids.h" #include "utils/lsyscache.h" #include "utils/syscache.h" #include "utils/typcache.h" @@ -2854,8 +2855,8 @@ find_typmod_coercion_function(Oid typeId, targetType = typeidType(typeId); typeForm = (Form_pg_type) GETSTRUCT(targetType); - /* Check for a varlena array type */ - if (typeForm->typelem != InvalidOid && typeForm->typlen == -1) + /* Check for a "true" array type */ + if (IsTrueArrayType(typeForm)) { /* Yes, switch our attention to the element type */ typeId = typeForm->typelem; diff --git a/src/backend/parser/parse_collate.c b/src/backend/parser/parse_collate.c index bf800f5937..13e62a2015 100644 --- a/src/backend/parser/parse_collate.c +++ b/src/backend/parser/parse_collate.c @@ -667,6 +667,29 @@ assign_collations_walker(Node *node, assign_collations_context *context) &loccontext); } break; + case T_SubscriptingRef: + { + /* + * The subscripts are treated as independent + * expressions not contributing to the node's + * collation. Only the container, and the source + * expression if any, contribute. (This models + * the old behavior, in which the subscripts could + * be counted on to be integers and thus not + * contribute anything.) + */ + SubscriptingRef *sbsref = (SubscriptingRef *) node; + + assign_expr_collations(context->pstate, + (Node *) sbsref->refupperindexpr); + assign_expr_collations(context->pstate, + (Node *) sbsref->reflowerindexpr); + (void) assign_collations_walker((Node *) sbsref->refexpr, + &loccontext); + (void) assign_collations_walker((Node *) sbsref->refassgnexpr, + &loccontext); + } + break; default: /* diff --git a/src/backend/parser/parse_expr.c b/src/backend/parser/parse_expr.c index 1e62d31aca..ffc96e2a6f 100644 --- a/src/backend/parser/parse_expr.c +++ b/src/backend/parser/parse_expr.c @@ -406,10 +406,9 @@ transformIndirection(ParseState *pstate, A_Indirection *ind) result = (Node *) transformContainerSubscripts(pstate, result, exprType(result), - InvalidOid, exprTypmod(result), subscripts, - NULL); + false); subscripts = NIL; newresult = ParseFuncOrColumn(pstate, @@ -429,10 +428,9 @@ transformIndirection(ParseState *pstate, A_Indirection *ind) result = (Node *) transformContainerSubscripts(pstate, result, exprType(result), - InvalidOid, exprTypmod(result), subscripts, - NULL); + false); return result; } diff --git a/src/backend/parser/parse_node.c b/src/backend/parser/parse_node.c index 6e98fe55fc..e90f6c9d01 100644 --- a/src/backend/parser/parse_node.c +++ b/src/backend/parser/parse_node.c @@ -20,6 +20,7 @@ #include "mb/pg_wchar.h" #include "nodes/makefuncs.h" #include "nodes/nodeFuncs.h" +#include "nodes/subscripting.h" #include "parser/parse_coerce.h" #include "parser/parse_expr.h" #include "parser/parse_relation.h" @@ -182,23 +183,16 @@ pcb_error_callback(void *arg) /* * transformContainerType() - * Identify the types involved in a subscripting operation for container + * Identify the actual container type for a subscripting operation. * - * - * On entry, containerType/containerTypmod identify the type of the input value - * to be subscripted (which could be a domain type). These are modified if - * necessary to identify the actual container type and typmod, and the - * container's element type is returned. An error is thrown if the input isn't - * an array type. + * containerType/containerTypmod are modified if necessary to identify + * the actual container type and typmod. This mainly involves smashing + * any domain to its base type, but there are some special considerations. + * Note that caller still needs to check if the result type is a container. */ -Oid +void transformContainerType(Oid *containerType, int32 *containerTypmod) { - Oid origContainerType = *containerType; - Oid elementType; - HeapTuple type_tuple_container; - Form_pg_type type_struct_container; - /* * If the input is a domain, smash to base type, and extract the actual * typmod to be applied to the base type. Subscripting a domain is an @@ -209,35 +203,16 @@ transformContainerType(Oid *containerType, int32 *containerTypmod) *containerType = getBaseTypeAndTypmod(*containerType, containerTypmod); /* - * Here is an array specific code. We treat int2vector and oidvector as - * though they were domains over int2[] and oid[]. This is needed because - * array slicing could create an array that doesn't satisfy the - * dimensionality constraints of the xxxvector type; so we want the result - * of a slice operation to be considered to be of the more general type. + * We treat int2vector and oidvector as though they were domains over + * int2[] and oid[]. This is needed because array slicing could create an + * array that doesn't satisfy the dimensionality constraints of the + * xxxvector type; so we want the result of a slice operation to be + * considered to be of the more general type. */ if (*containerType == INT2VECTOROID) *containerType = INT2ARRAYOID; else if (*containerType == OIDVECTOROID) *containerType = OIDARRAYOID; - - /* Get the type tuple for the container */ - type_tuple_container = SearchSysCache1(TYPEOID, ObjectIdGetDatum(*containerType)); - if (!HeapTupleIsValid(type_tuple_container)) - elog(ERROR, "cache lookup failed for type %u", *containerType); - type_struct_container = (Form_pg_type) GETSTRUCT(type_tuple_container); - - /* needn't check typisdefined since this will fail anyway */ - - elementType = type_struct_container->typelem; - if (elementType == InvalidOid) - ereport(ERROR, - (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("cannot subscript type %s because it is not an array", - format_type_be(origContainerType)))); - - ReleaseSysCache(type_tuple_container); - - return elementType; } /* @@ -249,13 +224,14 @@ transformContainerType(Oid *containerType, int32 *containerTypmod) * an expression that represents the result of extracting a single container * element or a container slice. * - * In a container assignment, we are given a destination container value plus a - * source value that is to be assigned to a single element or a slice of that - * container. We produce an expression that represents the new container value - * with the source data inserted into the right part of the container. + * Container assignments are treated basically the same as container fetches + * here. The caller will modify the result node to insert the source value + * that is to be assigned to the element or slice that a fetch would have + * retrieved. The execution result will be a new container value with + * the source value inserted into the right part of the container. * - * For both cases, if the source container is of a domain-over-array type, - * the result is of the base array type or its element type; essentially, + * For both cases, if the source is of a domain-over-container type, the + * result is the same as if it had been of the container type; essentially, * we must fold a domain to its base type before applying subscripting. * (Note that int2vector and oidvector are treated as domains here.) * @@ -264,48 +240,48 @@ transformContainerType(Oid *containerType, int32 *containerTypmod) * containerType OID of container's datatype (should match type of * containerBase, or be the base type of containerBase's * domain type) - * elementType OID of container's element type (fetch with - * transformContainerType, or pass InvalidOid to do it here) - * containerTypMod typmod for the container (which is also typmod for the - * elements) + * containerTypMod typmod for the container * indirection Untransformed list of subscripts (must not be NIL) - * assignFrom NULL for container fetch, else transformed expression for - * source. + * isAssignment True if this will become a container assignment. */ SubscriptingRef * transformContainerSubscripts(ParseState *pstate, Node *containerBase, Oid containerType, - Oid elementType, int32 containerTypMod, List *indirection, - Node *assignFrom) + bool isAssignment) { + SubscriptingRef *sbsref; + const SubscriptRoutines *sbsroutines; + Oid elementType; bool isSlice = false; - List *upperIndexpr = NIL; - List *lowerIndexpr = NIL; ListCell *idx; - SubscriptingRef *sbsref; /* - * Caller may or may not have bothered to determine elementType. Note - * that if the caller did do so, containerType/containerTypMod must be as - * modified by transformContainerType, ie, smash domain to base type. + * Determine the actual container type, smashing any domain. In the + * assignment case the caller already did this, since it also needs to + * know the actual container type. */ - if (!OidIsValid(elementType)) - elementType = transformContainerType(&containerType, &containerTypMod); + if (!isAssignment) + transformContainerType(&containerType, &containerTypMod); /* + * Verify that the container type is subscriptable, and get its support + * functions and typelem. + */ + sbsroutines = getSubscriptingRoutines(containerType, &elementType); + + /* + * Detect whether any of the indirection items are slice specifiers. + * * A list containing only simple subscripts refers to a single container * element. If any of the items are slice specifiers (lower:upper), then - * the subscript expression means a container slice operation. In this - * case, we convert any non-slice items to slices by treating the single - * subscript as the upper bound and supplying an assumed lower bound of 1. - * We have to prescan the list to see if there are any slice items. + * the subscript expression means a container slice operation. */ foreach(idx, indirection) { - A_Indices *ai = (A_Indices *) lfirst(idx); + A_Indices *ai = lfirst_node(A_Indices, idx); if (ai->is_slice) { @@ -314,121 +290,36 @@ transformContainerSubscripts(ParseState *pstate, } } - /* - * Transform the subscript expressions. - */ - foreach(idx, indirection) - { - A_Indices *ai = lfirst_node(A_Indices, idx); - Node *subexpr; - - if (isSlice) - { - if (ai->lidx) - { - subexpr = transformExpr(pstate, ai->lidx, pstate->p_expr_kind); - /* If it's not int4 already, try to coerce */ - subexpr = coerce_to_target_type(pstate, - subexpr, exprType(subexpr), - INT4OID, -1, - COERCION_ASSIGNMENT, - COERCE_IMPLICIT_CAST, - -1); - if (subexpr == NULL) - ereport(ERROR, - (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("array subscript must have type integer"), - parser_errposition(pstate, exprLocation(ai->lidx)))); - } - else if (!ai->is_slice) - { - /* Make a constant 1 */ - subexpr = (Node *) makeConst(INT4OID, - -1, - InvalidOid, - sizeof(int32), - Int32GetDatum(1), - false, - true); /* pass by value */ - } - else - { - /* Slice with omitted lower bound, put NULL into the list */ - subexpr = NULL; - } - lowerIndexpr = lappend(lowerIndexpr, subexpr); - } - else - Assert(ai->lidx == NULL && !ai->is_slice); - - if (ai->uidx) - { - subexpr = transformExpr(pstate, ai->uidx, pstate->p_expr_kind); - /* If it's not int4 already, try to coerce */ - subexpr = coerce_to_target_type(pstate, - subexpr, exprType(subexpr), - INT4OID, -1, - COERCION_ASSIGNMENT, - COERCE_IMPLICIT_CAST, - -1); - if (subexpr == NULL) - ereport(ERROR, - (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("array subscript must have type integer"), - parser_errposition(pstate, exprLocation(ai->uidx)))); - } - else - { - /* Slice with omitted upper bound, put NULL into the list */ - Assert(isSlice && ai->is_slice); - subexpr = NULL; - } - upperIndexpr = lappend(upperIndexpr, subexpr); - } - - /* - * If doing an array store, coerce the source value to the right type. - * (This should agree with the coercion done by transformAssignedExpr.) - */ - if (assignFrom != NULL) - { - Oid typesource = exprType(assignFrom); - Oid typeneeded = isSlice ? containerType : elementType; - Node *newFrom; - - newFrom = coerce_to_target_type(pstate, - assignFrom, typesource, - typeneeded, containerTypMod, - COERCION_ASSIGNMENT, - COERCE_IMPLICIT_CAST, - -1); - if (newFrom == NULL) - ereport(ERROR, - (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("array assignment requires type %s" - " but expression is of type %s", - format_type_be(typeneeded), - format_type_be(typesource)), - errhint("You will need to rewrite or cast the expression."), - parser_errposition(pstate, exprLocation(assignFrom)))); - assignFrom = newFrom; - } - /* * Ready to build the SubscriptingRef node. */ - sbsref = (SubscriptingRef *) makeNode(SubscriptingRef); - if (assignFrom != NULL) - sbsref->refassgnexpr = (Expr *) assignFrom; + sbsref = makeNode(SubscriptingRef); sbsref->refcontainertype = containerType; sbsref->refelemtype = elementType; + /* refrestype is to be set by container-specific logic */ sbsref->reftypmod = containerTypMod; /* refcollid will be set by parse_collate.c */ - sbsref->refupperindexpr = upperIndexpr; - sbsref->reflowerindexpr = lowerIndexpr; + /* refupperindexpr, reflowerindexpr are to be set by container logic */ sbsref->refexpr = (Expr *) containerBase; - sbsref->refassgnexpr = (Expr *) assignFrom; + sbsref->refassgnexpr = NULL; /* caller will fill if it's an assignment */ + + /* + * Call the container-type-specific logic to transform the subscripts and + * determine the subscripting result type. + */ + sbsroutines->transform(sbsref, indirection, pstate, + isSlice, isAssignment); + + /* + * Verify we got a valid type (this defends, for example, against someone + * using array_subscript_handler as typsubscript without setting typelem). + */ + if (!OidIsValid(sbsref->refrestype)) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("cannot subscript type %s because it does not support subscripting", + format_type_be(containerType)))); return sbsref; } diff --git a/src/backend/parser/parse_target.c b/src/backend/parser/parse_target.c index ce68663cc2..3dda8e2847 100644 --- a/src/backend/parser/parse_target.c +++ b/src/backend/parser/parse_target.c @@ -861,7 +861,7 @@ transformAssignmentIndirection(ParseState *pstate, if (targetIsSubscripting) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), - errmsg("array assignment to \"%s\" requires type %s" + errmsg("subscripted assignment to \"%s\" requires type %s" " but expression is of type %s", targetName, format_type_be(targetTypeId), @@ -901,26 +901,37 @@ transformAssignmentSubscripts(ParseState *pstate, int location) { Node *result; + SubscriptingRef *sbsref; Oid containerType; int32 containerTypMod; - Oid elementTypeId; Oid typeNeeded; + int32 typmodNeeded; Oid collationNeeded; Assert(subscripts != NIL); - /* Identify the actual array type and element type involved */ + /* Identify the actual container type involved */ containerType = targetTypeId; containerTypMod = targetTypMod; - elementTypeId = transformContainerType(&containerType, &containerTypMod); + transformContainerType(&containerType, &containerTypMod); - /* Identify type that RHS must provide */ - typeNeeded = isSlice ? containerType : elementTypeId; + /* Process subscripts and identify required type for RHS */ + sbsref = transformContainerSubscripts(pstate, + basenode, + containerType, + containerTypMod, + subscripts, + true); + + typeNeeded = sbsref->refrestype; + typmodNeeded = sbsref->reftypmod; /* - * container normally has same collation as elements, but there's an - * exception: we might be subscripting a domain over a container type. In - * that case use collation of the base type. + * Container normally has same collation as its elements, but there's an + * exception: we might be subscripting a domain over a container type. In + * that case use collation of the base type. (This is shaky for arbitrary + * subscripting semantics, but it doesn't matter all that much since we + * only use this to label the collation of a possible CaseTestExpr.) */ if (containerType == targetTypeId) collationNeeded = targetCollation; @@ -933,21 +944,22 @@ transformAssignmentSubscripts(ParseState *pstate, targetName, true, typeNeeded, - containerTypMod, + typmodNeeded, collationNeeded, indirection, next_indirection, rhs, location); - /* process subscripts */ - result = (Node *) transformContainerSubscripts(pstate, - basenode, - containerType, - elementTypeId, - containerTypMod, - subscripts, - rhs); + /* + * Insert the already-properly-coerced RHS into the SubscriptingRef. Then + * set refrestype and reftypmod back to the container type's values. + */ + sbsref->refassgnexpr = (Expr *) rhs; + sbsref->refrestype = containerType; + sbsref->reftypmod = containerTypMod; + + result = (Node *) sbsref; /* If target was a domain over container, need to coerce up to the domain */ if (containerType != targetTypeId) diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index f6ec7b64cd..ce09ad7375 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -17,6 +17,7 @@ OBJS = \ array_typanalyze.o \ array_userfuncs.o \ arrayfuncs.o \ + arraysubs.o \ arrayutils.o \ ascii.o \ bool.o \ diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c index a7ea7656c7..4c8a739bc4 100644 --- a/src/backend/utils/adt/arrayfuncs.c +++ b/src/backend/utils/adt/arrayfuncs.c @@ -2044,7 +2044,8 @@ array_get_element_expanded(Datum arraydatum, * array bound. * * NOTE: we assume it is OK to scribble on the provided subscript arrays - * lowerIndx[] and upperIndx[]. These are generally just temporaries. + * lowerIndx[] and upperIndx[]; also, these arrays must be of size MAXDIM + * even when nSubscripts is less. These are generally just temporaries. */ Datum array_get_slice(Datum arraydatum, @@ -2772,7 +2773,8 @@ array_set_element_expanded(Datum arraydatum, * (XXX TODO: allow a corresponding behavior for multidimensional arrays) * * NOTE: we assume it is OK to scribble on the provided index arrays - * lowerIndx[] and upperIndx[]. These are generally just temporaries. + * lowerIndx[] and upperIndx[]; also, these arrays must be of size MAXDIM + * even when nSubscripts is less. These are generally just temporaries. * * NOTE: For assignments, we throw an error for silly subscripts etc, * rather than returning a NULL or empty array as the fetch operations do. diff --git a/src/backend/utils/adt/arraysubs.c b/src/backend/utils/adt/arraysubs.c new file mode 100644 index 0000000000..a081288f42 --- /dev/null +++ b/src/backend/utils/adt/arraysubs.c @@ -0,0 +1,577 @@ +/*------------------------------------------------------------------------- + * + * arraysubs.c + * Subscripting support functions for arrays. + * + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * + * IDENTIFICATION + * src/backend/utils/adt/arraysubs.c + * + *------------------------------------------------------------------------- + */ +#include "postgres.h" + +#include "executor/execExpr.h" +#include "nodes/makefuncs.h" +#include "nodes/nodeFuncs.h" +#include "nodes/subscripting.h" +#include "parser/parse_coerce.h" +#include "parser/parse_expr.h" +#include "utils/array.h" +#include "utils/builtins.h" +#include "utils/lsyscache.h" + + +/* SubscriptingRefState.workspace for array subscripting execution */ +typedef struct ArraySubWorkspace +{ + /* Values determined during expression compilation */ + Oid refelemtype; /* OID of the array element type */ + int16 refattrlength; /* typlen of array type */ + int16 refelemlength; /* typlen of the array element type */ + bool refelembyval; /* is the element type pass-by-value? */ + char refelemalign; /* typalign of the element type */ + + /* + * Subscript values converted to integers. Note that these arrays must be + * of length MAXDIM even when dealing with fewer subscripts, because + * array_get/set_slice may scribble on the extra entries. + */ + int upperindex[MAXDIM]; + int lowerindex[MAXDIM]; +} ArraySubWorkspace; + + +/* + * Finish parse analysis of a SubscriptingRef expression for an array. + * + * Transform the subscript expressions, coerce them to integers, + * and determine the result type of the SubscriptingRef node. + */ +static void +array_subscript_transform(SubscriptingRef *sbsref, + List *indirection, + ParseState *pstate, + bool isSlice, + bool isAssignment) +{ + List *upperIndexpr = NIL; + List *lowerIndexpr = NIL; + ListCell *idx; + + /* + * Transform the subscript expressions, and separate upper and lower + * bounds into two lists. + * + * If we have a container slice expression, we convert any non-slice + * indirection items to slices by treating the single subscript as the + * upper bound and supplying an assumed lower bound of 1. + */ + foreach(idx, indirection) + { + A_Indices *ai = lfirst_node(A_Indices, idx); + Node *subexpr; + + if (isSlice) + { + if (ai->lidx) + { + subexpr = transformExpr(pstate, ai->lidx, pstate->p_expr_kind); + /* If it's not int4 already, try to coerce */ + subexpr = coerce_to_target_type(pstate, + subexpr, exprType(subexpr), + INT4OID, -1, + COERCION_ASSIGNMENT, + COERCE_IMPLICIT_CAST, + -1); + if (subexpr == NULL) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("array subscript must have type integer"), + parser_errposition(pstate, exprLocation(ai->lidx)))); + } + else if (!ai->is_slice) + { + /* Make a constant 1 */ + subexpr = (Node *) makeConst(INT4OID, + -1, + InvalidOid, + sizeof(int32), + Int32GetDatum(1), + false, + true); /* pass by value */ + } + else + { + /* Slice with omitted lower bound, put NULL into the list */ + subexpr = NULL; + } + lowerIndexpr = lappend(lowerIndexpr, subexpr); + } + else + Assert(ai->lidx == NULL && !ai->is_slice); + + if (ai->uidx) + { + subexpr = transformExpr(pstate, ai->uidx, pstate->p_expr_kind); + /* If it's not int4 already, try to coerce */ + subexpr = coerce_to_target_type(pstate, + subexpr, exprType(subexpr), + INT4OID, -1, + COERCION_ASSIGNMENT, + COERCE_IMPLICIT_CAST, + -1); + if (subexpr == NULL) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("array subscript must have type integer"), + parser_errposition(pstate, exprLocation(ai->uidx)))); + } + else + { + /* Slice with omitted upper bound, put NULL into the list */ + Assert(isSlice && ai->is_slice); + subexpr = NULL; + } + upperIndexpr = lappend(upperIndexpr, subexpr); + } + + /* ... and store the transformed lists into the SubscriptRef node */ + sbsref->refupperindexpr = upperIndexpr; + sbsref->reflowerindexpr = lowerIndexpr; + + /* Verify subscript list lengths are within implementation limit */ + if (list_length(upperIndexpr) > MAXDIM) + ereport(ERROR, + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), + errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", + list_length(upperIndexpr), MAXDIM))); + /* We need not check lowerIndexpr separately */ + + /* + * Determine the result type of the subscripting operation. It's the same + * as the array type if we're slicing, else it's the element type. In + * either case, the typmod is the same as the array's, so we need not + * change reftypmod. + */ + if (isSlice) + sbsref->refrestype = sbsref->refcontainertype; + else + sbsref->refrestype = sbsref->refelemtype; +} + +/* + * During execution, process the subscripts in a SubscriptingRef expression. + * + * The subscript expressions are already evaluated in Datum form in the + * SubscriptingRefState's arrays. Check and convert them as necessary. + * + * If any subscript is NULL, we throw error in assignment cases, or in fetch + * cases set result to NULL and return false (instructing caller to skip the + * rest of the SubscriptingRef sequence). + * + * We convert all the subscripts to plain integers and save them in the + * sbsrefstate->workspace arrays. + */ +static bool +array_subscript_check_subscripts(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbsrefstate = op->d.sbsref_subscript.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbsrefstate->workspace; + + /* Process upper subscripts */ + for (int i = 0; i < sbsrefstate->numupper; i++) + { + if (sbsrefstate->upperprovided[i]) + { + /* If any index expr yields NULL, result is NULL or error */ + if (sbsrefstate->upperindexnull[i]) + { + if (sbsrefstate->isassignment) + ereport(ERROR, + (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), + errmsg("array subscript in assignment must not be null"))); + *op->resnull = true; + return false; + } + workspace->upperindex[i] = DatumGetInt32(sbsrefstate->upperindex[i]); + } + } + + /* Likewise for lower subscripts */ + for (int i = 0; i < sbsrefstate->numlower; i++) + { + if (sbsrefstate->lowerprovided[i]) + { + /* If any index expr yields NULL, result is NULL or error */ + if (sbsrefstate->lowerindexnull[i]) + { + if (sbsrefstate->isassignment) + ereport(ERROR, + (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), + errmsg("array subscript in assignment must not be null"))); + *op->resnull = true; + return false; + } + workspace->lowerindex[i] = DatumGetInt32(sbsrefstate->lowerindex[i]); + } + } + + return true; +} + +/* + * Evaluate SubscriptingRef fetch for an array element. + * + * Source container is in step's result variable (it's known not NULL, since + * we set fetch_strict to true), and indexes have already been evaluated into + * workspace array. + */ +static void +array_subscript_fetch(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbsrefstate = op->d.sbsref.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbsrefstate->workspace; + + /* Should not get here if source array (or any subscript) is null */ + Assert(!(*op->resnull)); + + *op->resvalue = array_get_element(*op->resvalue, + sbsrefstate->numupper, + workspace->upperindex, + workspace->refattrlength, + workspace->refelemlength, + workspace->refelembyval, + workspace->refelemalign, + op->resnull); +} + +/* + * Evaluate SubscriptingRef fetch for an array slice. + * + * Source container is in step's result variable (it's known not NULL, since + * we set fetch_strict to true), and indexes have already been evaluated into + * workspace array. + */ +static void +array_subscript_fetch_slice(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbsrefstate = op->d.sbsref.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbsrefstate->workspace; + + /* Should not get here if source array (or any subscript) is null */ + Assert(!(*op->resnull)); + + *op->resvalue = array_get_slice(*op->resvalue, + sbsrefstate->numupper, + workspace->upperindex, + workspace->lowerindex, + sbsrefstate->upperprovided, + sbsrefstate->lowerprovided, + workspace->refattrlength, + workspace->refelemlength, + workspace->refelembyval, + workspace->refelemalign); + /* The slice is never NULL, so no need to change *op->resnull */ +} + +/* + * Evaluate SubscriptingRef assignment for an array element assignment. + * + * Input container (possibly null) is in result area, replacement value is in + * SubscriptingRefState's replacevalue/replacenull. + */ +static void +array_subscript_assign(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbsrefstate = op->d.sbsref.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbsrefstate->workspace; + Datum arraySource = *op->resvalue; + + /* + * For an assignment to a fixed-length array type, both the original array + * and the value to be assigned into it must be non-NULL, else we punt and + * return the original array. + */ + if (workspace->refattrlength > 0) + { + if (*op->resnull || sbsrefstate->replacenull) + return; + } + + /* + * For assignment to varlena arrays, we handle a NULL original array by + * substituting an empty (zero-dimensional) array; insertion of the new + * element will result in a singleton array value. It does not matter + * whether the new element is NULL. + */ + if (*op->resnull) + { + arraySource = PointerGetDatum(construct_empty_array(workspace->refelemtype)); + *op->resnull = false; + } + + *op->resvalue = array_set_element(arraySource, + sbsrefstate->numupper, + workspace->upperindex, + sbsrefstate->replacevalue, + sbsrefstate->replacenull, + workspace->refattrlength, + workspace->refelemlength, + workspace->refelembyval, + workspace->refelemalign); + /* The result is never NULL, so no need to change *op->resnull */ +} + +/* + * Evaluate SubscriptingRef assignment for an array slice assignment. + * + * Input container (possibly null) is in result area, replacement value is in + * SubscriptingRefState's replacevalue/replacenull. + */ +static void +array_subscript_assign_slice(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbsrefstate = op->d.sbsref.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbsrefstate->workspace; + Datum arraySource = *op->resvalue; + + /* + * For an assignment to a fixed-length array type, both the original array + * and the value to be assigned into it must be non-NULL, else we punt and + * return the original array. + */ + if (workspace->refattrlength > 0) + { + if (*op->resnull || sbsrefstate->replacenull) + return; + } + + /* + * For assignment to varlena arrays, we handle a NULL original array by + * substituting an empty (zero-dimensional) array; insertion of the new + * element will result in a singleton array value. It does not matter + * whether the new element is NULL. + */ + if (*op->resnull) + { + arraySource = PointerGetDatum(construct_empty_array(workspace->refelemtype)); + *op->resnull = false; + } + + *op->resvalue = array_set_slice(arraySource, + sbsrefstate->numupper, + workspace->upperindex, + workspace->lowerindex, + sbsrefstate->upperprovided, + sbsrefstate->lowerprovided, + sbsrefstate->replacevalue, + sbsrefstate->replacenull, + workspace->refattrlength, + workspace->refelemlength, + workspace->refelembyval, + workspace->refelemalign); + /* The result is never NULL, so no need to change *op->resnull */ +} + +/* + * Compute old array element value for a SubscriptingRef assignment + * expression. Will only be called if the new-value subexpression + * contains SubscriptingRef or FieldStore. This is the same as the + * regular fetch case, except that we have to handle a null array, + * and the value should be stored into the SubscriptingRefState's + * prevvalue/prevnull fields. + */ +static void +array_subscript_fetch_old(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbsrefstate = op->d.sbsref.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbsrefstate->workspace; + + if (*op->resnull) + { + /* whole array is null, so any element is too */ + sbsrefstate->prevvalue = (Datum) 0; + sbsrefstate->prevnull = true; + } + else + sbsrefstate->prevvalue = array_get_element(*op->resvalue, + sbsrefstate->numupper, + workspace->upperindex, + workspace->refattrlength, + workspace->refelemlength, + workspace->refelembyval, + workspace->refelemalign, + &sbsrefstate->prevnull); +} + +/* + * Compute old array slice value for a SubscriptingRef assignment + * expression. Will only be called if the new-value subexpression + * contains SubscriptingRef or FieldStore. This is the same as the + * regular fetch case, except that we have to handle a null array, + * and the value should be stored into the SubscriptingRefState's + * prevvalue/prevnull fields. + * + * Note: this is presently dead code, because the new value for a + * slice would have to be an array, so it couldn't directly contain a + * FieldStore; nor could it contain a SubscriptingRef assignment, since + * we consider adjacent subscripts to index one multidimensional array + * not nested array types. Future generalizations might make this + * reachable, however. + */ +static void +array_subscript_fetch_old_slice(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbsrefstate = op->d.sbsref.state; + ArraySubWorkspace *workspace = (ArraySubWorkspace *) sbsrefstate->workspace; + + if (*op->resnull) + { + /* whole array is null, so any slice is too */ + sbsrefstate->prevvalue = (Datum) 0; + sbsrefstate->prevnull = true; + } + else + { + sbsrefstate->prevvalue = array_get_slice(*op->resvalue, + sbsrefstate->numupper, + workspace->upperindex, + workspace->lowerindex, + sbsrefstate->upperprovided, + sbsrefstate->lowerprovided, + workspace->refattrlength, + workspace->refelemlength, + workspace->refelembyval, + workspace->refelemalign); + /* slices of non-null arrays are never null */ + sbsrefstate->prevnull = false; + } +} + +/* + * Set up execution state for an array subscript operation. + */ +static void +array_exec_setup(const SubscriptingRef *sbsref, + SubscriptingRefState *sbsrefstate, + SubscriptExecSteps *methods) +{ + bool is_slice = (sbsrefstate->numlower != 0); + ArraySubWorkspace *workspace; + + /* + * Enforce the implementation limit on number of array subscripts. This + * check isn't entirely redundant with checking at parse time; conceivably + * the expression was stored by a backend with a different MAXDIM value. + */ + if (sbsrefstate->numupper > MAXDIM) + ereport(ERROR, + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), + errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)", + sbsrefstate->numupper, MAXDIM))); + + /* Should be impossible if parser is sane, but check anyway: */ + if (sbsrefstate->numlower != 0 && + sbsrefstate->numupper != sbsrefstate->numlower) + elog(ERROR, "upper and lower index lists are not same length"); + + /* + * Allocate type-specific workspace. + */ + workspace = (ArraySubWorkspace *) palloc(sizeof(ArraySubWorkspace)); + sbsrefstate->workspace = workspace; + + /* + * Collect datatype details we'll need at execution. + */ + workspace->refelemtype = sbsref->refelemtype; + workspace->refattrlength = get_typlen(sbsref->refcontainertype); + get_typlenbyvalalign(sbsref->refelemtype, + &workspace->refelemlength, + &workspace->refelembyval, + &workspace->refelemalign); + + /* + * Pass back pointers to appropriate step execution functions. + */ + methods->sbs_check_subscripts = array_subscript_check_subscripts; + if (is_slice) + { + methods->sbs_fetch = array_subscript_fetch_slice; + methods->sbs_assign = array_subscript_assign_slice; + methods->sbs_fetch_old = array_subscript_fetch_old_slice; + } + else + { + methods->sbs_fetch = array_subscript_fetch; + methods->sbs_assign = array_subscript_assign; + methods->sbs_fetch_old = array_subscript_fetch_old; + } +} + +/* + * array_subscript_handler + * Subscripting handler for standard varlena arrays. + * + * This should be used only for "true" array types, which have array headers + * as understood by the varlena array routines, and are referenced by the + * element type's pg_type.typarray field. + */ +Datum +array_subscript_handler(PG_FUNCTION_ARGS) +{ + static const SubscriptRoutines sbsroutines = { + .transform = array_subscript_transform, + .exec_setup = array_exec_setup, + .fetch_strict = true, /* fetch returns NULL for NULL inputs */ + .fetch_leakproof = true, /* fetch returns NULL for bad subscript */ + .store_leakproof = false /* ... but assignment throws error */ + }; + + PG_RETURN_POINTER(&sbsroutines); +} + +/* + * raw_array_subscript_handler + * Subscripting handler for "raw" arrays. + * + * A "raw" array just contains N independent instances of the element type. + * Currently we require both the element type and the array type to be fixed + * length, but it wouldn't be too hard to relax that for the array type. + * + * As of now, all the support code is shared with standard varlena arrays. + * We may split those into separate code paths, but probably that would yield + * only marginal speedups. The main point of having a separate handler is + * so that pg_type.typsubscript clearly indicates the type's semantics. + */ +Datum +raw_array_subscript_handler(PG_FUNCTION_ARGS) +{ + static const SubscriptRoutines sbsroutines = { + .transform = array_subscript_transform, + .exec_setup = array_exec_setup, + .fetch_strict = true, /* fetch returns NULL for NULL inputs */ + .fetch_leakproof = true, /* fetch returns NULL for bad subscript */ + .store_leakproof = false /* ... but assignment throws error */ + }; + + PG_RETURN_POINTER(&sbsroutines); +} diff --git a/src/backend/utils/adt/format_type.c b/src/backend/utils/adt/format_type.c index f2816e4f37..013409aee7 100644 --- a/src/backend/utils/adt/format_type.c +++ b/src/backend/utils/adt/format_type.c @@ -22,6 +22,7 @@ #include "catalog/pg_type.h" #include "mb/pg_wchar.h" #include "utils/builtins.h" +#include "utils/fmgroids.h" #include "utils/lsyscache.h" #include "utils/numeric.h" #include "utils/syscache.h" @@ -138,15 +139,14 @@ format_type_extended(Oid type_oid, int32 typemod, bits16 flags) typeform = (Form_pg_type) GETSTRUCT(tuple); /* - * Check if it's a regular (variable length) array type. Fixed-length - * array types such as "name" shouldn't get deconstructed. As of Postgres - * 8.1, rather than checking typlen we check the toast property, and don't + * Check if it's a "true" array type. Pseudo-array types such as "name" + * shouldn't get deconstructed. Also check the toast property, and don't * deconstruct "plain storage" array types --- this is because we don't * want to show oidvector as oid[]. */ array_base_type = typeform->typelem; - if (array_base_type != InvalidOid && + if (IsTrueArrayType(typeform) && typeform->typstorage != TYPSTORAGE_PLAIN) { /* Switch our attention to the array element type */ diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c index d370348a1c..12557ce3af 100644 --- a/src/backend/utils/adt/jsonfuncs.c +++ b/src/backend/utils/adt/jsonfuncs.c @@ -26,6 +26,7 @@ #include "miscadmin.h" #include "utils/array.h" #include "utils/builtins.h" +#include "utils/fmgroids.h" #include "utils/hsearch.h" #include "utils/json.h" #include "utils/jsonb.h" @@ -3011,7 +3012,7 @@ prepare_column_cache(ColumnIOData *column, column->io.composite.base_typmod = typmod; column->io.composite.domain_info = NULL; } - else if (type->typlen == -1 && OidIsValid(type->typelem)) + else if (IsTrueArrayType(type)) { column->typcat = TYPECAT_ARRAY; column->io.array.element_info = MemoryContextAllocZero(mcxt, diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index ae23299162..6e5c7379e2 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -2634,8 +2634,9 @@ get_typ_typrelid(Oid typid) * * Given the type OID, get the typelem (InvalidOid if not an array type). * - * NB: this only considers varlena arrays to be true arrays; InvalidOid is - * returned if the input is a fixed-length array type. + * NB: this only succeeds for "true" arrays having array_subscript_handler + * as typsubscript. For other types, InvalidOid is returned independently + * of whether they have typelem or typsubscript set. */ Oid get_element_type(Oid typid) @@ -2648,7 +2649,7 @@ get_element_type(Oid typid) Form_pg_type typtup = (Form_pg_type) GETSTRUCT(tp); Oid result; - if (typtup->typlen == -1) + if (IsTrueArrayType(typtup)) result = typtup->typelem; else result = InvalidOid; @@ -2731,7 +2732,7 @@ get_base_element_type(Oid typid) Oid result; /* This test must match get_element_type */ - if (typTup->typlen == -1) + if (IsTrueArrayType(typTup)) result = typTup->typelem; else result = InvalidOid; @@ -2966,6 +2967,64 @@ type_is_collatable(Oid typid) } +/* + * get_typsubscript + * + * Given the type OID, return the type's subscripting handler's OID, + * if it has one. + * + * If typelemp isn't NULL, we also store the type's typelem value there. + * This saves some callers an extra catalog lookup. + */ +RegProcedure +get_typsubscript(Oid typid, Oid *typelemp) +{ + HeapTuple tp; + + tp = SearchSysCache1(TYPEOID, ObjectIdGetDatum(typid)); + if (HeapTupleIsValid(tp)) + { + Form_pg_type typform = (Form_pg_type) GETSTRUCT(tp); + RegProcedure handler = typform->typsubscript; + + if (typelemp) + *typelemp = typform->typelem; + ReleaseSysCache(tp); + return handler; + } + else + { + if (typelemp) + *typelemp = InvalidOid; + return InvalidOid; + } +} + +/* + * getSubscriptingRoutines + * + * Given the type OID, fetch the type's subscripting methods struct. + * Fail if type is not subscriptable. + * + * If typelemp isn't NULL, we also store the type's typelem value there. + * This saves some callers an extra catalog lookup. + */ +const struct SubscriptRoutines * +getSubscriptingRoutines(Oid typid, Oid *typelemp) +{ + RegProcedure typsubscript = get_typsubscript(typid, typelemp); + + if (!OidIsValid(typsubscript)) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("cannot subscript type %s because it does not support subscripting", + format_type_be(typid)))); + + return (const struct SubscriptRoutines *) + DatumGetPointer(OidFunctionCall0(typsubscript)); +} + + /* ---------- STATISTICS CACHE ---------- */ /* diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c index dca1d48e89..5883fde367 100644 --- a/src/backend/utils/cache/typcache.c +++ b/src/backend/utils/cache/typcache.c @@ -406,6 +406,7 @@ lookup_type_cache(Oid type_id, int flags) typentry->typstorage = typtup->typstorage; typentry->typtype = typtup->typtype; typentry->typrelid = typtup->typrelid; + typentry->typsubscript = typtup->typsubscript; typentry->typelem = typtup->typelem; typentry->typcollation = typtup->typcollation; typentry->flags |= TCFLAGS_HAVE_PG_TYPE_DATA; @@ -450,6 +451,7 @@ lookup_type_cache(Oid type_id, int flags) typentry->typstorage = typtup->typstorage; typentry->typtype = typtup->typtype; typentry->typrelid = typtup->typrelid; + typentry->typsubscript = typtup->typsubscript; typentry->typelem = typtup->typelem; typentry->typcollation = typtup->typcollation; typentry->flags |= TCFLAGS_HAVE_PG_TYPE_DATA; diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 3b36335aa6..673a670347 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -10794,11 +10794,13 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo) char *typmodin; char *typmodout; char *typanalyze; + char *typsubscript; Oid typreceiveoid; Oid typsendoid; Oid typmodinoid; Oid typmodoutoid; Oid typanalyzeoid; + Oid typsubscriptoid; char *typcategory; char *typispreferred; char *typdelim; @@ -10840,6 +10842,14 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo) else appendPQExpBufferStr(query, "false AS typcollatable, "); + if (fout->remoteVersion >= 140000) + appendPQExpBufferStr(query, + "typsubscript, " + "typsubscript::pg_catalog.oid AS typsubscriptoid, "); + else + appendPQExpBufferStr(query, + "'-' AS typsubscript, 0 AS typsubscriptoid, "); + /* Before 8.4, pg_get_expr does not allow 0 for its second arg */ if (fout->remoteVersion >= 80400) appendPQExpBufferStr(query, @@ -10862,11 +10872,13 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo) typmodin = PQgetvalue(res, 0, PQfnumber(res, "typmodin")); typmodout = PQgetvalue(res, 0, PQfnumber(res, "typmodout")); typanalyze = PQgetvalue(res, 0, PQfnumber(res, "typanalyze")); + typsubscript = PQgetvalue(res, 0, PQfnumber(res, "typsubscript")); typreceiveoid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typreceiveoid"))); typsendoid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typsendoid"))); typmodinoid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typmodinoid"))); typmodoutoid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typmodoutoid"))); typanalyzeoid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typanalyzeoid"))); + typsubscriptoid = atooid(PQgetvalue(res, 0, PQfnumber(res, "typsubscriptoid"))); typcategory = PQgetvalue(res, 0, PQfnumber(res, "typcategory")); typispreferred = PQgetvalue(res, 0, PQfnumber(res, "typispreferred")); typdelim = PQgetvalue(res, 0, PQfnumber(res, "typdelim")); @@ -10935,6 +10947,9 @@ dumpBaseType(Archive *fout, TypeInfo *tyinfo) appendPQExpBufferStr(q, typdefault); } + if (OidIsValid(typsubscriptoid)) + appendPQExpBuffer(q, ",\n SUBSCRIPT = %s", typsubscript); + if (OidIsValid(tyinfo->typelem)) { char *elemType; diff --git a/src/include/c.h b/src/include/c.h index b21e4074dd..12ea056a35 100644 --- a/src/include/c.h +++ b/src/include/c.h @@ -592,13 +592,9 @@ typedef uint32 CommandId; #define InvalidCommandId (~(CommandId)0) /* - * Array indexing support + * Maximum number of array subscripts, for regular varlena arrays */ #define MAXDIM 6 -typedef struct -{ - int indx[MAXDIM]; -} IntArray; /* ---------------- * Variable-length datatypes all share the 'struct varlena' header. diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index fc2202b843..e6c7b070f6 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -10936,6 +10936,14 @@ proargnames => '{max_data_alignment,database_block_size,blocks_per_segment,wal_block_size,bytes_per_wal_segment,max_identifier_length,max_index_columns,max_toast_chunk_size,large_object_chunk_size,float8_pass_by_value,data_page_checksum_version}', prosrc => 'pg_control_init' }, +# subscripting support for built-in types +{ oid => '9255', descr => 'standard array subscripting support', + proname => 'array_subscript_handler', prorettype => 'internal', + proargtypes => 'internal', prosrc => 'array_subscript_handler' }, +{ oid => '9256', descr => 'raw array subscripting support', + proname => 'raw_array_subscript_handler', prorettype => 'internal', + proargtypes => 'internal', prosrc => 'raw_array_subscript_handler' }, + # collation management functions { oid => '3445', descr => 'import collations from operating system', proname => 'pg_import_system_collations', procost => '100', diff --git a/src/include/catalog/pg_type.dat b/src/include/catalog/pg_type.dat index 21a467a7a7..28240bdce3 100644 --- a/src/include/catalog/pg_type.dat +++ b/src/include/catalog/pg_type.dat @@ -48,9 +48,10 @@ { oid => '19', array_type_oid => '1003', descr => '63-byte type for storing system identifiers', typname => 'name', typlen => 'NAMEDATALEN', typbyval => 'f', - typcategory => 'S', typelem => 'char', typinput => 'namein', - typoutput => 'nameout', typreceive => 'namerecv', typsend => 'namesend', - typalign => 'c', typcollation => 'C' }, + typcategory => 'S', typsubscript => 'raw_array_subscript_handler', + typelem => 'char', typinput => 'namein', typoutput => 'nameout', + typreceive => 'namerecv', typsend => 'namesend', typalign => 'c', + typcollation => 'C' }, { oid => '20', array_type_oid => '1016', descr => '~18 digit integer, 8-byte storage', typname => 'int8', typlen => '8', typbyval => 'FLOAT8PASSBYVAL', @@ -64,7 +65,8 @@ { oid => '22', array_type_oid => '1006', descr => 'array of int2, used in system tables', typname => 'int2vector', typlen => '-1', typbyval => 'f', typcategory => 'A', - typelem => 'int2', typinput => 'int2vectorin', typoutput => 'int2vectorout', + typsubscript => 'array_subscript_handler', typelem => 'int2', + typinput => 'int2vectorin', typoutput => 'int2vectorout', typreceive => 'int2vectorrecv', typsend => 'int2vectorsend', typalign => 'i' }, { oid => '23', array_type_oid => '1007', @@ -104,7 +106,8 @@ { oid => '30', array_type_oid => '1013', descr => 'array of oids, used in system tables', typname => 'oidvector', typlen => '-1', typbyval => 'f', typcategory => 'A', - typelem => 'oid', typinput => 'oidvectorin', typoutput => 'oidvectorout', + typsubscript => 'array_subscript_handler', typelem => 'oid', + typinput => 'oidvectorin', typoutput => 'oidvectorout', typreceive => 'oidvectorrecv', typsend => 'oidvectorsend', typalign => 'i' }, # hand-built rowtype entries for bootstrapped catalogs @@ -178,13 +181,15 @@ { oid => '600', array_type_oid => '1017', descr => 'geometric point \'(x, y)\'', typname => 'point', typlen => '16', typbyval => 'f', typcategory => 'G', - typelem => 'float8', typinput => 'point_in', typoutput => 'point_out', - typreceive => 'point_recv', typsend => 'point_send', typalign => 'd' }, + typsubscript => 'raw_array_subscript_handler', typelem => 'float8', + typinput => 'point_in', typoutput => 'point_out', typreceive => 'point_recv', + typsend => 'point_send', typalign => 'd' }, { oid => '601', array_type_oid => '1018', descr => 'geometric line segment \'(pt1,pt2)\'', typname => 'lseg', typlen => '32', typbyval => 'f', typcategory => 'G', - typelem => 'point', typinput => 'lseg_in', typoutput => 'lseg_out', - typreceive => 'lseg_recv', typsend => 'lseg_send', typalign => 'd' }, + typsubscript => 'raw_array_subscript_handler', typelem => 'point', + typinput => 'lseg_in', typoutput => 'lseg_out', typreceive => 'lseg_recv', + typsend => 'lseg_send', typalign => 'd' }, { oid => '602', array_type_oid => '1019', descr => 'geometric path \'(pt1,...)\'', typname => 'path', typlen => '-1', typbyval => 'f', typcategory => 'G', @@ -193,9 +198,9 @@ { oid => '603', array_type_oid => '1020', descr => 'geometric box \'(lower left,upper right)\'', typname => 'box', typlen => '32', typbyval => 'f', typcategory => 'G', - typdelim => ';', typelem => 'point', typinput => 'box_in', - typoutput => 'box_out', typreceive => 'box_recv', typsend => 'box_send', - typalign => 'd' }, + typdelim => ';', typsubscript => 'raw_array_subscript_handler', + typelem => 'point', typinput => 'box_in', typoutput => 'box_out', + typreceive => 'box_recv', typsend => 'box_send', typalign => 'd' }, { oid => '604', array_type_oid => '1027', descr => 'geometric polygon \'(pt1,...)\'', typname => 'polygon', typlen => '-1', typbyval => 'f', typcategory => 'G', @@ -203,8 +208,9 @@ typsend => 'poly_send', typalign => 'd', typstorage => 'x' }, { oid => '628', array_type_oid => '629', descr => 'geometric line', typname => 'line', typlen => '24', typbyval => 'f', typcategory => 'G', - typelem => 'float8', typinput => 'line_in', typoutput => 'line_out', - typreceive => 'line_recv', typsend => 'line_send', typalign => 'd' }, + typsubscript => 'raw_array_subscript_handler', typelem => 'float8', + typinput => 'line_in', typoutput => 'line_out', typreceive => 'line_recv', + typsend => 'line_send', typalign => 'd' }, # OIDS 700 - 799 @@ -507,8 +513,9 @@ # Arrays of records have typcategory P, so they can't be autogenerated. { oid => '2287', typname => '_record', typlen => '-1', typbyval => 'f', typtype => 'p', - typcategory => 'P', typelem => 'record', typinput => 'array_in', - typoutput => 'array_out', typreceive => 'array_recv', typsend => 'array_send', + typcategory => 'P', typsubscript => 'array_subscript_handler', + typelem => 'record', typinput => 'array_in', typoutput => 'array_out', + typreceive => 'array_recv', typsend => 'array_send', typanalyze => 'array_typanalyze', typalign => 'd', typstorage => 'x' }, { oid => '2275', array_type_oid => '1263', descr => 'C-style string', typname => 'cstring', typlen => '-2', typbyval => 'f', typtype => 'p', diff --git a/src/include/catalog/pg_type.h b/src/include/catalog/pg_type.h index 6099e5f57c..15f2514a14 100644 --- a/src/include/catalog/pg_type.h +++ b/src/include/catalog/pg_type.h @@ -101,15 +101,18 @@ CATALOG(pg_type,1247,TypeRelationId) BKI_BOOTSTRAP BKI_ROWTYPE_OID(71,TypeRelati Oid typrelid BKI_DEFAULT(0) BKI_ARRAY_DEFAULT(0) BKI_LOOKUP(pg_class); /* - * If typelem is not 0 then it identifies another row in pg_type. The - * current type can then be subscripted like an array yielding values of - * type typelem. A non-zero typelem does not guarantee this type to be a - * "real" array type; some ordinary fixed-length types can also be - * subscripted (e.g., name, point). Variable-length types can *not* be - * turned into pseudo-arrays like that. Hence, the way to determine - * whether a type is a "true" array type is if: - * - * typelem != 0 and typlen == -1. + * Type-specific subscripting handler. If typsubscript is 0, it means + * that this type doesn't support subscripting. Note that various parts + * of the system deem types to be "true" array types only if their + * typsubscript is array_subscript_handler. + */ + regproc typsubscript BKI_DEFAULT(-) BKI_ARRAY_DEFAULT(array_subscript_handler) BKI_LOOKUP(pg_proc); + + /* + * If typelem is not 0 then it identifies another row in pg_type, defining + * the type yielded by subscripting. This should be 0 if typsubscript is + * 0. However, it can be 0 when typsubscript isn't 0, if the handler + * doesn't need typelem to determine the subscripting result type. */ Oid typelem BKI_DEFAULT(0) BKI_LOOKUP(pg_type); @@ -319,6 +322,11 @@ DECLARE_UNIQUE_INDEX(pg_type_typname_nsp_index, 2704, on pg_type using btree(typ (typid) == ANYCOMPATIBLENONARRAYOID || \ (typid) == ANYCOMPATIBLERANGEOID) +/* Is this a "true" array type? (Requires fmgroids.h) */ +#define IsTrueArrayType(typeForm) \ + (OidIsValid((typeForm)->typelem) && \ + (typeForm)->typsubscript == F_ARRAY_SUBSCRIPT_HANDLER) + /* * Backwards compatibility for ancient random spellings of pg_type OID macros. * Don't use these names in new code. @@ -351,6 +359,7 @@ extern ObjectAddress TypeCreate(Oid newTypeOid, Oid typmodinProcedure, Oid typmodoutProcedure, Oid analyzeProcedure, + Oid subscriptProcedure, Oid elementType, bool isImplicitArray, Oid arrayType, diff --git a/src/include/executor/execExpr.h b/src/include/executor/execExpr.h index abb489e206..b4e0a9b7d3 100644 --- a/src/include/executor/execExpr.h +++ b/src/include/executor/execExpr.h @@ -32,6 +32,11 @@ typedef void (*ExecEvalSubroutine) (ExprState *state, struct ExprEvalStep *op, ExprContext *econtext); +/* API for out-of-line evaluation subroutines returning bool */ +typedef bool (*ExecEvalBoolSubroutine) (ExprState *state, + struct ExprEvalStep *op, + ExprContext *econtext); + /* * Discriminator for ExprEvalSteps. * @@ -185,8 +190,8 @@ typedef enum ExprEvalOp */ EEOP_FIELDSTORE_FORM, - /* Process a container subscript; short-circuit expression to NULL if NULL */ - EEOP_SBSREF_SUBSCRIPT, + /* Process container subscripts; possibly short-circuit result to NULL */ + EEOP_SBSREF_SUBSCRIPTS, /* * Compute old container element/slice when a SubscriptingRef assignment @@ -494,19 +499,19 @@ typedef struct ExprEvalStep int ncolumns; } fieldstore; - /* for EEOP_SBSREF_SUBSCRIPT */ + /* for EEOP_SBSREF_SUBSCRIPTS */ struct { + ExecEvalBoolSubroutine subscriptfunc; /* evaluation subroutine */ /* too big to have inline */ struct SubscriptingRefState *state; - int off; /* 0-based index of this subscript */ - bool isupper; /* is it upper or lower subscript? */ int jumpdone; /* jump here on null */ } sbsref_subscript; /* for EEOP_SBSREF_OLD / ASSIGN / FETCH */ struct { + ExecEvalSubroutine subscriptfunc; /* evaluation subroutine */ /* too big to have inline */ struct SubscriptingRefState *state; } sbsref; @@ -640,36 +645,41 @@ typedef struct SubscriptingRefState { bool isassignment; /* is it assignment, or just fetch? */ - Oid refelemtype; /* OID of the container element type */ - int16 refattrlength; /* typlen of container type */ - int16 refelemlength; /* typlen of the container element type */ - bool refelembyval; /* is the element type pass-by-value? */ - char refelemalign; /* typalign of the element type */ + /* workspace for type-specific subscripting code */ + void *workspace; - /* numupper and upperprovided[] are filled at compile time */ - /* at runtime, extracted subscript datums get stored in upperindex[] */ + /* numupper and upperprovided[] are filled at expression compile time */ + /* at runtime, subscripts are computed in upperindex[]/upperindexnull[] */ int numupper; - bool upperprovided[MAXDIM]; - int upperindex[MAXDIM]; + bool *upperprovided; /* indicates if this position is supplied */ + Datum *upperindex; + bool *upperindexnull; /* similarly for lower indexes, if any */ int numlower; - bool lowerprovided[MAXDIM]; - int lowerindex[MAXDIM]; - - /* subscript expressions get evaluated into here */ - Datum subscriptvalue; - bool subscriptnull; + bool *lowerprovided; + Datum *lowerindex; + bool *lowerindexnull; /* for assignment, new value to assign is evaluated into here */ Datum replacevalue; bool replacenull; - /* if we have a nested assignment, SBSREF_OLD puts old value here */ + /* if we have a nested assignment, sbs_fetch_old puts old value here */ Datum prevvalue; bool prevnull; } SubscriptingRefState; +/* Execution step methods used for SubscriptingRef */ +typedef struct SubscriptExecSteps +{ + /* See nodes/subscripting.h for more detail about these */ + ExecEvalBoolSubroutine sbs_check_subscripts; /* process subscripts */ + ExecEvalSubroutine sbs_fetch; /* fetch an element */ + ExecEvalSubroutine sbs_assign; /* assign to an element */ + ExecEvalSubroutine sbs_fetch_old; /* fetch old value for assignment */ +} SubscriptExecSteps; + /* functions in execExpr.c */ extern void ExprEvalPushStep(ExprState *es, const ExprEvalStep *s); @@ -712,10 +722,6 @@ extern void ExecEvalFieldStoreDeForm(ExprState *state, ExprEvalStep *op, ExprContext *econtext); extern void ExecEvalFieldStoreForm(ExprState *state, ExprEvalStep *op, ExprContext *econtext); -extern bool ExecEvalSubscriptingRef(ExprState *state, ExprEvalStep *op); -extern void ExecEvalSubscriptingRefFetch(ExprState *state, ExprEvalStep *op); -extern void ExecEvalSubscriptingRefOld(ExprState *state, ExprEvalStep *op); -extern void ExecEvalSubscriptingRefAssign(ExprState *state, ExprEvalStep *op); extern void ExecEvalConvertRowtype(ExprState *state, ExprEvalStep *op, ExprContext *econtext); extern void ExecEvalScalarArrayOp(ExprState *state, ExprEvalStep *op); diff --git a/src/include/nodes/primnodes.h b/src/include/nodes/primnodes.h index cdbe781c73..dd85908fe2 100644 --- a/src/include/nodes/primnodes.h +++ b/src/include/nodes/primnodes.h @@ -390,14 +390,14 @@ typedef struct WindowFunc int location; /* token location, or -1 if unknown */ } WindowFunc; -/* ---------------- - * SubscriptingRef: describes a subscripting operation over a container - * (array, etc). +/* + * SubscriptingRef: describes a subscripting operation over a container + * (array, etc). * * A SubscriptingRef can describe fetching a single element from a container, - * fetching a part of container (e.g. array slice), storing a single element into - * a container, or storing a slice. The "store" cases work with an - * initial container value and a source value that is inserted into the + * fetching a part of a container (e.g. an array slice), storing a single + * element into a container, or storing a slice. The "store" cases work with + * an initial container value and a source value that is inserted into the * appropriate part of the container; the result of the operation is an * entire new modified container value. * @@ -410,23 +410,32 @@ typedef struct WindowFunc * * In the slice case, individual expressions in the subscript lists can be * NULL, meaning "substitute the array's current lower or upper bound". - * - * Note: the result datatype is the element type when fetching a single - * element; but it is the array type when doing subarray fetch or either - * type of store. + * (Non-array containers may or may not support this.) + * + * refcontainertype is the actual container type that determines the + * subscripting semantics. (This will generally be either the exposed type of + * refexpr, or the base type if that is a domain.) refelemtype is the type of + * the container's elements; this is saved for the use of the subscripting + * functions, but is not used by the core code. refrestype, reftypmod, and + * refcollid describe the type of the SubscriptingRef's result. In a store + * expression, refrestype will always match refcontainertype; in a fetch, + * it could be refelemtype for an element fetch, or refcontainertype for a + * slice fetch, or possibly something else as determined by type-specific + * subscripting logic. Likewise, reftypmod and refcollid will match the + * container's properties in a store, but could be different in a fetch. * * Note: for the cases where a container is returned, if refexpr yields a R/W - * expanded container, then the implementation is allowed to modify that object - * in-place and return the same object.) - * ---------------- + * expanded container, then the implementation is allowed to modify that + * object in-place and return the same object. */ typedef struct SubscriptingRef { Expr xpr; Oid refcontainertype; /* type of the container proper */ - Oid refelemtype; /* type of the container elements */ - int32 reftypmod; /* typmod of the container (and elements too) */ - Oid refcollid; /* OID of collation, or InvalidOid if none */ + Oid refelemtype; /* the container type's pg_type.typelem */ + Oid refrestype; /* type of the SubscriptingRef's result */ + int32 reftypmod; /* typmod of the result */ + Oid refcollid; /* collation of result, or InvalidOid if none */ List *refupperindexpr; /* expressions that evaluate to upper * container indexes */ List *reflowerindexpr; /* expressions that evaluate to lower @@ -434,7 +443,6 @@ typedef struct SubscriptingRef * container element */ Expr *refexpr; /* the expression that evaluates to a * container value */ - Expr *refassgnexpr; /* expression for the source value, or NULL if * fetch */ } SubscriptingRef; diff --git a/src/include/nodes/subscripting.h b/src/include/nodes/subscripting.h new file mode 100644 index 0000000000..3b0a60773d --- /dev/null +++ b/src/include/nodes/subscripting.h @@ -0,0 +1,167 @@ +/*------------------------------------------------------------------------- + * + * subscripting.h + * API for generic type subscripting + * + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * src/include/nodes/subscripting.h + * + *------------------------------------------------------------------------- + */ +#ifndef SUBSCRIPTING_H +#define SUBSCRIPTING_H + +#include "nodes/primnodes.h" + +/* Forward declarations, to avoid including other headers */ +struct ParseState; +struct SubscriptingRefState; +struct SubscriptExecSteps; + +/* + * The SQL-visible function that defines a subscripting method is declared + * subscripting_function(internal) returns internal + * but it actually is not passed any parameter. It must return a pointer + * to a "struct SubscriptRoutines" that provides pointers to the individual + * subscript parsing and execution methods. Typically the pointer will point + * to a "static const" variable, but at need it can point to palloc'd space. + * The type (after domain-flattening) of the head variable or expression + * of a subscripting construct determines which subscripting function is + * called for that construct. + * + * In addition to the method pointers, struct SubscriptRoutines includes + * several bool flags that specify properties of the subscripting actions + * this data type can perform: + * + * fetch_strict indicates that a fetch SubscriptRef is strict, i.e., returns + * NULL if any input (either the container or any subscript) is NULL. + * + * fetch_leakproof indicates that a fetch SubscriptRef is leakproof, i.e., + * will not throw any data-value-dependent errors. Typically this requires + * silently returning NULL for invalid subscripts. + * + * store_leakproof similarly indicates whether an assignment SubscriptRef is + * leakproof. (It is common to prefer throwing errors for invalid subscripts + * in assignments; that's fine, but it makes the operation not leakproof. + * In current usage there is no advantage in making assignments leakproof.) + * + * There is no store_strict flag. Such behavior would generally be + * undesirable, since for example a null subscript in an assignment would + * cause the entire container to become NULL. + * + * Regardless of these flags, all SubscriptRefs are expected to be immutable, + * that is they must always give the same results for the same inputs. + * They are expected to always be parallel-safe, as well. + */ + +/* + * The transform method is called during parse analysis of a subscripting + * construct. The SubscriptingRef node has been constructed, but some of + * its fields still need to be filled in, and the subscript expression(s) + * are still in raw form. The transform method is responsible for doing + * parse analysis of each subscript expression (using transformExpr), + * coercing the subscripts to whatever type it needs, and building the + * refupperindexpr and reflowerindexpr lists from those results. The + * reflowerindexpr list must be empty for an element operation, or the + * same length as refupperindexpr for a slice operation. Insert NULLs + * (that is, an empty parse tree, not a null Const node) for any omitted + * subscripts in a slice operation. (Of course, if the transform method + * does not care to support slicing, it can just throw an error if isSlice.) + * See array_subscript_transform() for sample code. + * + * The transform method is also responsible for identifying the result type + * of the subscripting operation. At call, refcontainertype and reftypmod + * describe the container type (this will be a base type not a domain), and + * refelemtype is set to the container type's pg_type.typelem value. The + * transform method must set refrestype and reftypmod to describe the result + * of subscripting. For arrays, refrestype is set to refelemtype for an + * element operation or refcontainertype for a slice, while reftypmod stays + * the same in either case; but other types might use other rules. The + * transform method should ignore refcollid, as that's determined later on + * during parsing. + * + * At call, refassgnexpr has not been filled in, so the SubscriptingRef node + * always looks like a fetch; refrestype should be set as though for a + * fetch, too. (The isAssignment parameter is typically only useful if the + * transform method wishes to throw an error for not supporting assignment.) + * To complete processing of an assignment, the core parser will coerce the + * element/slice source expression to the returned refrestype and reftypmod + * before putting it into refassgnexpr. It will then set refrestype and + * reftypmod to again describe the container type, since that's what an + * assignment must return. + */ +typedef void (*SubscriptTransform) (SubscriptingRef *sbsref, + List *indirection, + struct ParseState *pstate, + bool isSlice, + bool isAssignment); + +/* + * The exec_setup method is called during executor-startup compilation of a + * SubscriptingRef node in an expression. It must fill *methods with pointers + * to functions that can be called for execution of the node. Optionally, + * exec_setup can initialize sbsrefstate->workspace to point to some palloc'd + * workspace for execution. (Typically, such workspace is used to hold + * looked-up catalog data and/or provide space for the check_subscripts step + * to pass data forward to the other step functions.) See executor/execExpr.h + * for the definitions of these structs and other ones used in expression + * execution. + * + * The methods to be provided are: + * + * sbs_check_subscripts: examine the just-computed subscript values available + * in sbsrefstate's arrays, and possibly convert them into another form + * (stored in sbsrefstate->workspace). Return TRUE to continue with + * evaluation of the subscripting construct, or FALSE to skip it and return an + * overall NULL result. If this is a fetch and the data type's fetch_strict + * flag is true, then sbs_check_subscripts must return FALSE if there are any + * NULL subscripts. Otherwise it can choose to throw an error, or return + * FALSE, or let sbs_fetch or sbs_assign deal with the null subscripts. + * + * sbs_fetch: perform a subscripting fetch, using the container value in + * *op->resvalue and the subscripts from sbs_check_subscripts. If + * fetch_strict is true then all these inputs can be assumed non-NULL, + * otherwise sbs_fetch must check for null inputs. Place the result in + * *op->resvalue / *op->resnull. + * + * sbs_assign: perform a subscripting assignment, using the original + * container value in *op->resvalue / *op->resnull, the subscripts from + * sbs_check_subscripts, and the new element/slice value in + * sbsrefstate->replacevalue/replacenull. Any of these inputs might be NULL + * (unless sbs_check_subscripts rejected null subscripts). Place the result + * (an entire new container value) in *op->resvalue / *op->resnull. + * + * sbs_fetch_old: this is only used in cases where an element or slice + * assignment involves an assignment to a sub-field or sub-element + * (i.e., nested containers are involved). It must fetch the existing + * value of the target element or slice. This is exactly the same as + * sbs_fetch except that (a) it must cope with a NULL container, and + * with NULL subscripts if sbs_check_subscripts allows them (typically, + * returning NULL is good enough); and (b) the result must be placed in + * sbsrefstate->prevvalue/prevnull, without overwriting *op->resvalue. + * + * Subscripting implementations that do not support assignment need not + * provide sbs_assign or sbs_fetch_old methods. It might be reasonable + * to also omit sbs_check_subscripts, in which case the sbs_fetch method must + * combine the functionality of sbs_check_subscripts and sbs_fetch. (The + * main reason to have a separate sbs_check_subscripts method is so that + * sbs_fetch_old and sbs_assign need not duplicate subscript processing.) + * Set the relevant pointers to NULL for any omitted methods. + */ +typedef void (*SubscriptExecSetup) (const SubscriptingRef *sbsref, + struct SubscriptingRefState *sbsrefstate, + struct SubscriptExecSteps *methods); + +/* Struct returned by the SQL-visible subscript handler function */ +typedef struct SubscriptRoutines +{ + SubscriptTransform transform; /* parse analysis function */ + SubscriptExecSetup exec_setup; /* expression compilation function */ + bool fetch_strict; /* is fetch SubscriptRef strict? */ + bool fetch_leakproof; /* is fetch SubscriptRef leakproof? */ + bool store_leakproof; /* is assignment SubscriptRef leakproof? */ +} SubscriptRoutines; + +#endif /* SUBSCRIPTING_H */ diff --git a/src/include/parser/parse_node.h b/src/include/parser/parse_node.h index d25819aa28..beb56fec87 100644 --- a/src/include/parser/parse_node.h +++ b/src/include/parser/parse_node.h @@ -313,15 +313,15 @@ extern void setup_parser_errposition_callback(ParseCallbackState *pcbstate, ParseState *pstate, int location); extern void cancel_parser_errposition_callback(ParseCallbackState *pcbstate); -extern Oid transformContainerType(Oid *containerType, int32 *containerTypmod); +extern void transformContainerType(Oid *containerType, int32 *containerTypmod); extern SubscriptingRef *transformContainerSubscripts(ParseState *pstate, Node *containerBase, Oid containerType, - Oid elementType, int32 containerTypMod, List *indirection, - Node *assignFrom); + bool isAssignment); + extern Const *make_const(ParseState *pstate, Value *value, int location); #endif /* PARSE_NODE_H */ diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index fecfe1f4f6..475b842b09 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -17,6 +17,9 @@ #include "access/htup.h" #include "nodes/pg_list.h" +/* avoid including subscripting.h here */ +struct SubscriptRoutines; + /* Result list element for get_op_btree_interpretation */ typedef struct OpBtreeInterpretation { @@ -172,6 +175,9 @@ extern void getTypeBinaryOutputInfo(Oid type, Oid *typSend, bool *typIsVarlena); extern Oid get_typmodin(Oid typid); extern Oid get_typcollation(Oid typid); extern bool type_is_collatable(Oid typid); +extern RegProcedure get_typsubscript(Oid typid, Oid *typelemp); +extern const struct SubscriptRoutines *getSubscriptingRoutines(Oid typid, + Oid *typelemp); extern Oid getBaseType(Oid typid); extern Oid getBaseTypeAndTypmod(Oid typid, int32 *typmod); extern int32 get_typavgwidth(Oid typid, int32 typmod); diff --git a/src/include/utils/typcache.h b/src/include/utils/typcache.h index cdd20e56d7..38c8fe0192 100644 --- a/src/include/utils/typcache.h +++ b/src/include/utils/typcache.h @@ -42,6 +42,7 @@ typedef struct TypeCacheEntry char typstorage; char typtype; Oid typrelid; + Oid typsubscript; Oid typelem; Oid typcollation; diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c index 7844c500ee..4de756455d 100644 --- a/src/pl/plperl/plperl.c +++ b/src/pl/plperl/plperl.c @@ -2853,9 +2853,7 @@ compile_plperl_function(Oid fn_oid, bool is_trigger, bool is_event_trigger) prodesc->result_oid = rettype; prodesc->fn_retisset = procStruct->proretset; prodesc->fn_retistuple = type_is_rowtype(rettype); - - prodesc->fn_retisarray = - (typeStruct->typlen == -1 && typeStruct->typelem); + prodesc->fn_retisarray = IsTrueArrayType(typeStruct); fmgr_info_cxt(typeStruct->typinput, &(prodesc->result_in_func), @@ -2901,7 +2899,7 @@ compile_plperl_function(Oid fn_oid, bool is_trigger, bool is_event_trigger) } /* Identify array-type arguments */ - if (typeStruct->typelem != 0 && typeStruct->typlen == -1) + if (IsTrueArrayType(typeStruct)) prodesc->arg_arraytype[i] = argtype; else prodesc->arg_arraytype[i] = InvalidOid; diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c index 6df8e14629..b610b28d70 100644 --- a/src/pl/plpgsql/src/pl_comp.c +++ b/src/pl/plpgsql/src/pl_comp.c @@ -26,6 +26,7 @@ #include "parser/parse_type.h" #include "plpgsql.h" #include "utils/builtins.h" +#include "utils/fmgroids.h" #include "utils/guc.h" #include "utils/lsyscache.h" #include "utils/memutils.h" @@ -2144,8 +2145,7 @@ build_datatype(HeapTuple typeTup, int32 typmod, * This test should include what get_element_type() checks. We also * disallow non-toastable array types (i.e. oidvector and int2vector). */ - typ->typisarray = (typeStruct->typlen == -1 && - OidIsValid(typeStruct->typelem) && + typ->typisarray = (IsTrueArrayType(typeStruct) && typeStruct->typstorage != TYPSTORAGE_PLAIN); } else if (typeStruct->typtype == TYPTYPE_DOMAIN) diff --git a/src/pl/plpython/plpy_typeio.c b/src/pl/plpython/plpy_typeio.c index b4aeb7fd59..5e807b139f 100644 --- a/src/pl/plpython/plpy_typeio.c +++ b/src/pl/plpython/plpy_typeio.c @@ -352,9 +352,9 @@ PLy_output_setup_func(PLyObToDatum *arg, MemoryContext arg_mcxt, proc); } else if (typentry && - OidIsValid(typentry->typelem) && typentry->typlen == -1) + IsTrueArrayType(typentry)) { - /* Standard varlena array (cf. get_element_type) */ + /* Standard array */ arg->func = PLySequence_ToArray; /* Get base type OID to insert into constructed array */ /* (note this might not be the same as the immediate child type) */ @@ -470,9 +470,9 @@ PLy_input_setup_func(PLyDatumToOb *arg, MemoryContext arg_mcxt, proc); } else if (typentry && - OidIsValid(typentry->typelem) && typentry->typlen == -1) + IsTrueArrayType(typentry)) { - /* Standard varlena array (cf. get_element_type) */ + /* Standard array */ arg->func = PLyList_FromArray; /* Recursively set up conversion info for the element type */ arg->u.array.elm = (PLyDatumToOb *) diff --git a/src/test/regress/expected/arrays.out b/src/test/regress/expected/arrays.out index c03ac65ff8..448b3ee526 100644 --- a/src/test/regress/expected/arrays.out +++ b/src/test/regress/expected/arrays.out @@ -27,12 +27,12 @@ INSERT INTO arrtest (a, b[1:2][1:2], c, d, e, f, g) INSERT INTO arrtest (a, b[1:2], c, d[1:2]) VALUES ('{}', '{3,4}', '{foo,bar}', '{bar,foo}'); INSERT INTO arrtest (b[2]) VALUES(now()); -- error, type mismatch -ERROR: array assignment to "b" requires type integer but expression is of type timestamp with time zone +ERROR: subscripted assignment to "b" requires type integer but expression is of type timestamp with time zone LINE 1: INSERT INTO arrtest (b[2]) VALUES(now()); ^ HINT: You will need to rewrite or cast the expression. INSERT INTO arrtest (b[1:2]) VALUES(now()); -- error, type mismatch -ERROR: array assignment to "b" requires type integer[] but expression is of type timestamp with time zone +ERROR: subscripted assignment to "b" requires type integer[] but expression is of type timestamp with time zone LINE 1: INSERT INTO arrtest (b[1:2]) VALUES(now()); ^ HINT: You will need to rewrite or cast the expression. @@ -237,7 +237,7 @@ UPDATE arrtest ERROR: array subscript in assignment must not be null -- Un-subscriptable type SELECT (now())[1]; -ERROR: cannot subscript type timestamp with time zone because it is not an array +ERROR: cannot subscript type timestamp with time zone because it does not support subscripting -- test slices with empty lower and/or upper index CREATE TEMP TABLE arrtest_s ( a int2[], diff --git a/src/test/regress/expected/opr_sanity.out b/src/test/regress/expected/opr_sanity.out index 3b39137400..507b474b1b 100644 --- a/src/test/regress/expected/opr_sanity.out +++ b/src/test/regress/expected/opr_sanity.out @@ -31,7 +31,8 @@ begin if $2 = 'pg_catalog.any'::pg_catalog.regtype then return true; end if; if $2 = 'pg_catalog.anyarray'::pg_catalog.regtype then if EXISTS(select 1 from pg_catalog.pg_type where - oid = $1 and typelem != 0 and typlen = -1) + oid = $1 and typelem != 0 and + typsubscript = 'pg_catalog.array_subscript_handler'::pg_catalog.regproc) then return true; end if; end if; if $2 = 'pg_catalog.anyrange'::pg_catalog.regtype then @@ -55,7 +56,8 @@ begin if $2 = 'pg_catalog.any'::pg_catalog.regtype then return true; end if; if $2 = 'pg_catalog.anyarray'::pg_catalog.regtype then if EXISTS(select 1 from pg_catalog.pg_type where - oid = $1 and typelem != 0 and typlen = -1) + oid = $1 and typelem != 0 and + typsubscript = 'pg_catalog.array_subscript_handler'::pg_catalog.regproc) then return true; end if; end if; if $2 = 'pg_catalog.anyrange'::pg_catalog.regtype then diff --git a/src/test/regress/expected/type_sanity.out b/src/test/regress/expected/type_sanity.out index ec1cd47623..13567ddf84 100644 --- a/src/test/regress/expected/type_sanity.out +++ b/src/test/regress/expected/type_sanity.out @@ -75,14 +75,15 @@ ORDER BY p1.oid; 5017 | pg_mcv_list (4 rows) --- Make sure typarray points to a varlena array type of our own base +-- Make sure typarray points to a "true" array type of our own base SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype, - p2.typelem, p2.typlen + p2.typsubscript FROM pg_type p1 LEFT JOIN pg_type p2 ON (p1.typarray = p2.oid) WHERE p1.typarray <> 0 AND - (p2.oid IS NULL OR p2.typelem <> p1.oid OR p2.typlen <> -1); - oid | basetype | arraytype | typelem | typlen ------+----------+-----------+---------+-------- + (p2.oid IS NULL OR + p2.typsubscript <> 'array_subscript_handler'::regproc); + oid | basetype | arraytype | typsubscript +-----+----------+-----------+-------------- (0 rows) -- Look for range types that do not have a pg_range entry @@ -448,6 +449,33 @@ WHERE p1.typarray = p2.oid AND -----+---------+----------+---------+---------- (0 rows) +-- Check for typelem set without a handler +SELECT p1.oid, p1.typname, p1.typelem +FROM pg_type AS p1 +WHERE p1.typelem != 0 AND p1.typsubscript = 0; + oid | typname | typelem +-----+---------+--------- +(0 rows) + +-- Check for misuse of standard subscript handlers +SELECT p1.oid, p1.typname, + p1.typelem, p1.typlen, p1.typbyval +FROM pg_type AS p1 +WHERE p1.typsubscript = 'array_subscript_handler'::regproc AND NOT + (p1.typelem != 0 AND p1.typlen = -1 AND NOT p1.typbyval); + oid | typname | typelem | typlen | typbyval +-----+---------+---------+--------+---------- +(0 rows) + +SELECT p1.oid, p1.typname, + p1.typelem, p1.typlen, p1.typbyval +FROM pg_type AS p1 +WHERE p1.typsubscript = 'raw_array_subscript_handler'::regproc AND NOT + (p1.typelem != 0 AND p1.typlen > 0 AND NOT p1.typbyval); + oid | typname | typelem | typlen | typbyval +-----+---------+---------+--------+---------- +(0 rows) + -- Check for bogus typanalyze routines SELECT p1.oid, p1.typname, p2.oid, p2.proname FROM pg_type AS p1, pg_proc AS p2 @@ -485,7 +513,7 @@ SELECT t.oid, t.typname, t.typanalyze FROM pg_type t WHERE t.typbasetype = 0 AND (t.typanalyze = 'array_typanalyze'::regproc) != - (typelem != 0 AND typlen < 0) + (t.typsubscript = 'array_subscript_handler'::regproc) ORDER BY 1; oid | typname | typanalyze -----+------------+------------ @@ -608,7 +636,8 @@ WHERE o.opcmethod != 403 OR ((o.opcintype != p1.rngsubtype) AND NOT (o.opcintype = 'pg_catalog.anyarray'::regtype AND EXISTS(select 1 from pg_catalog.pg_type where - oid = p1.rngsubtype and typelem != 0 and typlen = -1))); + oid = p1.rngsubtype and typelem != 0 and + typsubscript = 'array_subscript_handler'::regproc))); rngtypid | rngsubtype | opcmethod | opcname ----------+------------+-----------+--------- (0 rows) diff --git a/src/test/regress/sql/opr_sanity.sql b/src/test/regress/sql/opr_sanity.sql index 307aab1deb..4189a5a4e0 100644 --- a/src/test/regress/sql/opr_sanity.sql +++ b/src/test/regress/sql/opr_sanity.sql @@ -34,7 +34,8 @@ begin if $2 = 'pg_catalog.any'::pg_catalog.regtype then return true; end if; if $2 = 'pg_catalog.anyarray'::pg_catalog.regtype then if EXISTS(select 1 from pg_catalog.pg_type where - oid = $1 and typelem != 0 and typlen = -1) + oid = $1 and typelem != 0 and + typsubscript = 'pg_catalog.array_subscript_handler'::pg_catalog.regproc) then return true; end if; end if; if $2 = 'pg_catalog.anyrange'::pg_catalog.regtype then @@ -59,7 +60,8 @@ begin if $2 = 'pg_catalog.any'::pg_catalog.regtype then return true; end if; if $2 = 'pg_catalog.anyarray'::pg_catalog.regtype then if EXISTS(select 1 from pg_catalog.pg_type where - oid = $1 and typelem != 0 and typlen = -1) + oid = $1 and typelem != 0 and + typsubscript = 'pg_catalog.array_subscript_handler'::pg_catalog.regproc) then return true; end if; end if; if $2 = 'pg_catalog.anyrange'::pg_catalog.regtype then diff --git a/src/test/regress/sql/type_sanity.sql b/src/test/regress/sql/type_sanity.sql index 5e433388cd..8c6e614f20 100644 --- a/src/test/regress/sql/type_sanity.sql +++ b/src/test/regress/sql/type_sanity.sql @@ -63,12 +63,13 @@ WHERE p1.typtype not in ('p') AND p1.typname NOT LIKE E'\\_%' p2.typelem = p1.oid and p1.typarray = p2.oid) ORDER BY p1.oid; --- Make sure typarray points to a varlena array type of our own base +-- Make sure typarray points to a "true" array type of our own base SELECT p1.oid, p1.typname as basetype, p2.typname as arraytype, - p2.typelem, p2.typlen + p2.typsubscript FROM pg_type p1 LEFT JOIN pg_type p2 ON (p1.typarray = p2.oid) WHERE p1.typarray <> 0 AND - (p2.oid IS NULL OR p2.typelem <> p1.oid OR p2.typlen <> -1); + (p2.oid IS NULL OR + p2.typsubscript <> 'array_subscript_handler'::regproc); -- Look for range types that do not have a pg_range entry SELECT p1.oid, p1.typname @@ -323,6 +324,26 @@ WHERE p1.typarray = p2.oid AND p2.typalign != (CASE WHEN p1.typalign = 'd' THEN 'd'::"char" ELSE 'i'::"char" END); +-- Check for typelem set without a handler + +SELECT p1.oid, p1.typname, p1.typelem +FROM pg_type AS p1 +WHERE p1.typelem != 0 AND p1.typsubscript = 0; + +-- Check for misuse of standard subscript handlers + +SELECT p1.oid, p1.typname, + p1.typelem, p1.typlen, p1.typbyval +FROM pg_type AS p1 +WHERE p1.typsubscript = 'array_subscript_handler'::regproc AND NOT + (p1.typelem != 0 AND p1.typlen = -1 AND NOT p1.typbyval); + +SELECT p1.oid, p1.typname, + p1.typelem, p1.typlen, p1.typbyval +FROM pg_type AS p1 +WHERE p1.typsubscript = 'raw_array_subscript_handler'::regproc AND NOT + (p1.typelem != 0 AND p1.typlen > 0 AND NOT p1.typbyval); + -- Check for bogus typanalyze routines SELECT p1.oid, p1.typname, p2.oid, p2.proname @@ -356,7 +377,7 @@ SELECT t.oid, t.typname, t.typanalyze FROM pg_type t WHERE t.typbasetype = 0 AND (t.typanalyze = 'array_typanalyze'::regproc) != - (typelem != 0 AND typlen < 0) + (t.typsubscript = 'array_subscript_handler'::regproc) ORDER BY 1; -- **************** pg_class **************** @@ -452,7 +473,8 @@ WHERE o.opcmethod != 403 OR ((o.opcintype != p1.rngsubtype) AND NOT (o.opcintype = 'pg_catalog.anyarray'::regtype AND EXISTS(select 1 from pg_catalog.pg_type where - oid = p1.rngsubtype and typelem != 0 and typlen = -1))); + oid = p1.rngsubtype and typelem != 0 and + typsubscript = 'array_subscript_handler'::regproc))); -- canonical function, if any, had better match the range type
Hi, On 2020-12-08 11:05:05 -0500, Tom Lane wrote: > I've now studied this patch and it seems sane to me, although > I wondered why you wrote "extern"s here: Brainfart... > Other than that nit, please finish this up and push it so I can finish > the generic-subscripting patch. Done. > > WRT the prototype, I think it may be worth removing most of the types > > from llvmjit.h. Worth keeping the most common ones, but most aren't used > > all the time so terseness doesn't matter that much, and > > the llvm_pg_var_type() would suffice. > > Hm, that would mean redoing llvm_pg_var_type() often wouldn't it? > I don't have a very good feeling for how expensive that is, so I'm > not sure if this seems like a good idea or not. It should be fairly cheap - it's basically one hashtable lookup. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2020-12-08 11:05:05 -0500, Tom Lane wrote: >> Other than that nit, please finish this up and push it so I can finish >> the generic-subscripting patch. > Done. Cool, thanks. I'll fix that part of the generic-subscript patch and push it tomorrow, unless somebody objects to it before then. regards, tom lane
BTW, I observe that with MAXDIM gone from execExpr.h, there are no widely-visible uses of MAXDIM except for array.h. I therefore suggest that we should pull "#define MAXDIM" out of c.h and put it into array.h, as attached. I was slightly surprised to find that this seems to entail *no* new inclusions of array.h ... I expected there would be one or two. But the main point here is we want to restrict use of that symbol to stuff that's tightly integrated with varlena-array handling, so it ought not be in c.h. regards, tom lane diff --git a/src/include/c.h b/src/include/c.h index 12ea056a35..7bc4b8a001 100644 --- a/src/include/c.h +++ b/src/include/c.h @@ -591,10 +591,6 @@ typedef uint32 CommandId; #define FirstCommandId ((CommandId) 0) #define InvalidCommandId (~(CommandId)0) -/* - * Maximum number of array subscripts, for regular varlena arrays - */ -#define MAXDIM 6 /* ---------------- * Variable-length datatypes all share the 'struct varlena' header. diff --git a/src/include/utils/array.h b/src/include/utils/array.h index 2809dfee93..16925880a1 100644 --- a/src/include/utils/array.h +++ b/src/include/utils/array.h @@ -69,6 +69,11 @@ struct ExprState; struct ExprContext; +/* + * Maximum number of array subscripts (arbitrary limit) + */ +#define MAXDIM 6 + /* * Arrays are varlena objects, so must meet the varlena convention that * the first int32 of the object contains the total object size in bytes.
On 2020-12-08 22:02:24 -0500, Tom Lane wrote: > BTW, I observe that with MAXDIM gone from execExpr.h, there are > no widely-visible uses of MAXDIM except for array.h. I therefore > suggest that we should pull "#define MAXDIM" out of c.h and put it > into array.h, as attached. I was slightly surprised to find that > this seems to entail *no* new inclusions of array.h ... I expected > there would be one or two. But the main point here is we want to > restrict use of that symbol to stuff that's tightly integrated with > varlena-array handling, so it ought not be in c.h. +1
I've pushed the core patch now. The jsonb parts now have to be rebased onto this design, which I'm assuming Dmitry will tackle (I do not intend to). It's not quite clear to me whether we have a meeting of the minds on what the jsonb functionality should be, anyway. Alexander seemed to be thinking about offering an option to let the subscript be a jsonpath, but how would we distinguish that from a plain-text field name? BTW, while reviewing the thread to write the commit message, I was reminded of my concerns around the "is it a container" business. As things stand, if type A has a typelem link to type B, then the system supposes that A contains B physically; this has implications for what's allowed in DDL, for example (cf find_composite_type_dependencies() and other places). We now have a feature whereby subscripting can yield a type that is not contained in the source type in that sense. I'd be happier if the "container" terminology were reserved for that sort of physical containment, which means that I think a lot of the commentary around SubscriptingRef is misleading. But I do not have a better word to suggest offhand. Thoughts? regards, tom lane
> On Wed, Dec 09, 2020 at 12:49:48PM -0500, Tom Lane wrote: > > I've pushed the core patch now. Thanks a lot! > The jsonb parts now have to be > rebased onto this design, which I'm assuming Dmitry will tackle Yes, I'm already on it, just couldn't keep up with the changes in this thread. > BTW, while reviewing the thread to write the commit message, > I was reminded of my concerns around the "is it a container" > business. As things stand, if type A has a typelem link to > type B, then the system supposes that A contains B physically; > this has implications for what's allowed in DDL, for example > (cf find_composite_type_dependencies() and other places). > We now have a feature whereby subscripting can yield a type > that is not contained in the source type in that sense. > I'd be happier if the "container" terminology were reserved for > that sort of physical containment, which means that I think a lot > of the commentary around SubscriptingRef is misleading. But I do > not have a better word to suggest offhand. Thoughts? I have only 'a composite'/'a compound' alternative in mind, but it's probably the same confusing as a container.
Dmitry Dolgov <9erthalion6@gmail.com> writes: >> On Wed, Dec 09, 2020 at 12:49:48PM -0500, Tom Lane wrote: >> I'd be happier if the "container" terminology were reserved for >> that sort of physical containment, which means that I think a lot >> of the commentary around SubscriptingRef is misleading. But I do >> not have a better word to suggest offhand. Thoughts? > I have only 'a composite'/'a compound' alternative in mind, but it's > probably the same confusing as a container. Yeah, in fact likely worse, since we tend to use those words for rowtypes. Naming is the hardest problem in CS :-( regards, tom lane
Here's a couple of little finger exercises to move this along a bit. 0001 adds the ability to attach a subscript handler to an existing data type with ALTER TYPE. This is clearly going to be necessary if we want extension types to be able to use this facility. The only thing that I think might be controversial here is that I did not add the ability to set pg_type.typelem. While that'd be easy enough so far as ALTER TYPE is concerned, I'm not sure that we want to encourage people to change it. The dependency rules mean that the semantics of typelem aren't something you really want to change after-the-fact on an existing type. Also, if we did allow it, any existing SubscriptingRef.refelemtype values in stored views would fail to be updated. 0002 makes use of that to support subscripting of hstore. I'm not sure how much we care about that from a functionality standpoint, but it seems like it might be good to have a contrib module testing that extensions can use this. Also, I thought possibly an example showing what's basically the minimum possible amount of complexity would be good to have. If people like this, I'll finish it up (it lacks docs) and add it. regards, tom lane diff --git a/doc/src/sgml/ref/alter_type.sgml b/doc/src/sgml/ref/alter_type.sgml index 64bf266373..21887e88a0 100644 --- a/doc/src/sgml/ref/alter_type.sgml +++ b/doc/src/sgml/ref/alter_type.sgml @@ -194,6 +194,14 @@ ALTER TYPE <replaceable class="parameter">name</replaceable> SET ( <replaceable requires superuser privilege. </para> </listitem> + <listitem> + <para> + <literal>SUBSCRIPT</literal> can be set to the name of a type-specific + subscripting handler function, or <literal>NONE</literal> to remove + the type's subscripting handler function. Using this option + requires superuser privilege. + </para> + </listitem> <listitem> <para> <literal>STORAGE</literal><indexterm> diff --git a/src/backend/commands/typecmds.c b/src/backend/commands/typecmds.c index 29fe52d2ce..7c0b2c3bf0 100644 --- a/src/backend/commands/typecmds.c +++ b/src/backend/commands/typecmds.c @@ -94,6 +94,7 @@ typedef struct bool updateTypmodin; bool updateTypmodout; bool updateAnalyze; + bool updateSubscript; /* New values for relevant attributes */ char storage; Oid receiveOid; @@ -101,6 +102,7 @@ typedef struct Oid typmodinOid; Oid typmodoutOid; Oid analyzeOid; + Oid subscriptOid; } AlterTypeRecurseParams; /* Potentially set by pg_upgrade_support functions */ @@ -3885,6 +3887,18 @@ AlterType(AlterTypeStmt *stmt) /* Replacing an analyze function requires superuser. */ requireSuper = true; } + else if (strcmp(defel->defname, "subscript") == 0) + { + if (defel->arg != NULL) + atparams.subscriptOid = + findTypeSubscriptingFunction(defGetQualifiedName(defel), + typeOid); + else + atparams.subscriptOid = InvalidOid; /* NONE, remove function */ + atparams.updateSubscript = true; + /* Replacing a subscript function requires superuser. */ + requireSuper = true; + } /* * The rest of the options that CREATE accepts cannot be changed. @@ -4042,6 +4056,11 @@ AlterTypeRecurse(Oid typeOid, bool isImplicitArray, replaces[Anum_pg_type_typanalyze - 1] = true; values[Anum_pg_type_typanalyze - 1] = ObjectIdGetDatum(atparams->analyzeOid); } + if (atparams->updateSubscript) + { + replaces[Anum_pg_type_typsubscript - 1] = true; + values[Anum_pg_type_typsubscript - 1] = ObjectIdGetDatum(atparams->subscriptOid); + } newtup = heap_modify_tuple(tup, RelationGetDescr(catalog), values, nulls, replaces); @@ -4098,6 +4117,7 @@ AlterTypeRecurse(Oid typeOid, bool isImplicitArray, atparams->updateReceive = false; /* domains use F_DOMAIN_RECV */ atparams->updateTypmodin = false; /* domains don't have typmods */ atparams->updateTypmodout = false; + atparams->updateSubscript = false; /* domains don't have subscriptors */ /* Skip the scan if nothing remains to be done */ if (!(atparams->updateStorage || diff --git a/src/test/regress/expected/create_type.out b/src/test/regress/expected/create_type.out index f85afcb31e..14394cc95c 100644 --- a/src/test/regress/expected/create_type.out +++ b/src/test/regress/expected/create_type.out @@ -260,38 +260,40 @@ ALTER TYPE myvarchar SET ( receive = myvarcharrecv, typmod_in = varchartypmodin, typmod_out = varchartypmodout, - analyze = array_typanalyze -- bogus, but it doesn't matter + -- these are bogus, but it's safe as long as we don't use the type: + analyze = ts_typanalyze, + subscript = raw_array_subscript_handler ); SELECT typinput, typoutput, typreceive, typsend, typmodin, typmodout, - typanalyze, typstorage + typanalyze, typsubscript, typstorage FROM pg_type WHERE typname = 'myvarchar'; - typinput | typoutput | typreceive | typsend | typmodin | typmodout | typanalyze | typstorage --------------+--------------+---------------+---------------+-----------------+------------------+------------------+------------ - myvarcharin | myvarcharout | myvarcharrecv | myvarcharsend | varchartypmodin | varchartypmodout | array_typanalyze | x + typinput | typoutput | typreceive | typsend | typmodin | typmodout | typanalyze | typsubscript | typstorage +-------------+--------------+---------------+---------------+-----------------+------------------+---------------+-----------------------------+------------ + myvarcharin | myvarcharout | myvarcharrecv | myvarcharsend | varchartypmodin | varchartypmodout | ts_typanalyze | raw_array_subscript_handler| x (1 row) SELECT typinput, typoutput, typreceive, typsend, typmodin, typmodout, - typanalyze, typstorage + typanalyze, typsubscript, typstorage FROM pg_type WHERE typname = '_myvarchar'; - typinput | typoutput | typreceive | typsend | typmodin | typmodout | typanalyze | typstorage -----------+-----------+------------+------------+-----------------+------------------+------------------+------------ - array_in | array_out | array_recv | array_send | varchartypmodin | varchartypmodout | array_typanalyze | x + typinput | typoutput | typreceive | typsend | typmodin | typmodout | typanalyze | typsubscript | typstorage +----------+-----------+------------+------------+-----------------+------------------+------------------+-------------------------+------------ + array_in | array_out | array_recv | array_send | varchartypmodin | varchartypmodout | array_typanalyze | array_subscript_handler| x (1 row) SELECT typinput, typoutput, typreceive, typsend, typmodin, typmodout, - typanalyze, typstorage + typanalyze, typsubscript, typstorage FROM pg_type WHERE typname = 'myvarchardom'; - typinput | typoutput | typreceive | typsend | typmodin | typmodout | typanalyze | typstorage ------------+--------------+-------------+---------------+----------+-----------+------------------+------------ - domain_in | myvarcharout | domain_recv | myvarcharsend | - | - | array_typanalyze | x + typinput | typoutput | typreceive | typsend | typmodin | typmodout | typanalyze | typsubscript | typstorage +-----------+--------------+-------------+---------------+----------+-----------+---------------+--------------+------------ + domain_in | myvarcharout | domain_recv | myvarcharsend | - | - | ts_typanalyze | - | x (1 row) SELECT typinput, typoutput, typreceive, typsend, typmodin, typmodout, - typanalyze, typstorage + typanalyze, typsubscript, typstorage FROM pg_type WHERE typname = '_myvarchardom'; - typinput | typoutput | typreceive | typsend | typmodin | typmodout | typanalyze | typstorage -----------+-----------+------------+------------+----------+-----------+------------------+------------ - array_in | array_out | array_recv | array_send | - | - | array_typanalyze | x + typinput | typoutput | typreceive | typsend | typmodin | typmodout | typanalyze | typsubscript | typstorage +----------+-----------+------------+------------+----------+-----------+------------------+-------------------------+------------ + array_in | array_out | array_recv | array_send | - | - | array_typanalyze | array_subscript_handler | x (1 row) -- ensure dependencies are straight diff --git a/src/test/regress/sql/create_type.sql b/src/test/regress/sql/create_type.sql index 584ece0670..a32a9e6795 100644 --- a/src/test/regress/sql/create_type.sql +++ b/src/test/regress/sql/create_type.sql @@ -207,23 +207,25 @@ ALTER TYPE myvarchar SET ( receive = myvarcharrecv, typmod_in = varchartypmodin, typmod_out = varchartypmodout, - analyze = array_typanalyze -- bogus, but it doesn't matter + -- these are bogus, but it's safe as long as we don't use the type: + analyze = ts_typanalyze, + subscript = raw_array_subscript_handler ); SELECT typinput, typoutput, typreceive, typsend, typmodin, typmodout, - typanalyze, typstorage + typanalyze, typsubscript, typstorage FROM pg_type WHERE typname = 'myvarchar'; SELECT typinput, typoutput, typreceive, typsend, typmodin, typmodout, - typanalyze, typstorage + typanalyze, typsubscript, typstorage FROM pg_type WHERE typname = '_myvarchar'; SELECT typinput, typoutput, typreceive, typsend, typmodin, typmodout, - typanalyze, typstorage + typanalyze, typsubscript, typstorage FROM pg_type WHERE typname = 'myvarchardom'; SELECT typinput, typoutput, typreceive, typsend, typmodin, typmodout, - typanalyze, typstorage + typanalyze, typsubscript, typstorage FROM pg_type WHERE typname = '_myvarchardom'; -- ensure dependencies are straight diff --git a/contrib/hstore/Makefile b/contrib/hstore/Makefile index 72376d9007..c4e339b57c 100644 --- a/contrib/hstore/Makefile +++ b/contrib/hstore/Makefile @@ -7,10 +7,12 @@ OBJS = \ hstore_gin.o \ hstore_gist.o \ hstore_io.o \ - hstore_op.o + hstore_op.o \ + hstore_subs.o EXTENSION = hstore DATA = hstore--1.4.sql \ + hstore--1.7--1.8.sql \ hstore--1.6--1.7.sql \ hstore--1.5--1.6.sql \ hstore--1.4--1.5.sql \ diff --git a/contrib/hstore/expected/hstore.out b/contrib/hstore/expected/hstore.out index 8901079438..fdcc3920ce 100644 --- a/contrib/hstore/expected/hstore.out +++ b/contrib/hstore/expected/hstore.out @@ -1560,6 +1560,29 @@ select json_agg(q) from (select f1, hstore_to_json_loose(f2) as f2 from test_jso {"f1":"rec2","f2":{"b": false, "c": "null", "d": -12345, "e": "012345.6", "f": -1.234, "g": 0.345e-4, "a key": 2}}] (1 row) +-- Test subscripting +insert into test_json_agg default values; +select f2['d'], f2['x'] is null as x_isnull from test_json_agg; + f2 | x_isnull +--------+---------- + 12345 | t + -12345 | t + | t +(3 rows) + +select f2['d']['e'] from test_json_agg; -- error +ERROR: hstore allows only one subscript +select f2['d':'e'] from test_json_agg; -- error +ERROR: hstore allows only one subscript +update test_json_agg set f2['d'] = f2['e'], f2['x'] = 'xyzzy'; +select f2 from test_json_agg; + f2 +--------------------------------------------------------------------------------------------------------------------- + "b"=>"t", "c"=>NULL, "d"=>"012345", "e"=>"012345", "f"=>"1.234", "g"=>"2.345e+4", "x"=>"xyzzy", "a key"=>"1" + "b"=>"f", "c"=>"null", "d"=>"012345.6", "e"=>"012345.6", "f"=>"-1.234", "g"=>"0.345e-4", "x"=>"xyzzy", "a key"=>"2" + "d"=>NULL, "x"=>"xyzzy" +(3 rows) + -- Check the hstore_hash() and hstore_hash_extended() function explicitly. SELECT v as value, hstore_hash(v)::bit(32) as standard, hstore_hash_extended(v, 0)::bit(32) as extended0, diff --git a/contrib/hstore/hstore--1.7--1.8.sql b/contrib/hstore/hstore--1.7--1.8.sql new file mode 100644 index 0000000000..d80a138465 --- /dev/null +++ b/contrib/hstore/hstore--1.7--1.8.sql @@ -0,0 +1,13 @@ +/* contrib/hstore/hstore--1.7--1.8.sql */ + +-- complain if script is sourced in psql, rather than via ALTER EXTENSION +\echo Use "ALTER EXTENSION hstore UPDATE TO '1.8'" to load this file. \quit + +CREATE FUNCTION hstore_subscript_handler(internal) +RETURNS internal +AS 'MODULE_PATHNAME', 'hstore_subscript_handler' +LANGUAGE C STRICT IMMUTABLE PARALLEL SAFE; + +ALTER TYPE hstore SET ( + SUBSCRIPT = hstore_subscript_handler +); diff --git a/contrib/hstore/hstore.control b/contrib/hstore/hstore.control index f0da772429..89e3c746c4 100644 --- a/contrib/hstore/hstore.control +++ b/contrib/hstore/hstore.control @@ -1,6 +1,6 @@ # hstore extension comment = 'data type for storing sets of (key, value) pairs' -default_version = '1.7' +default_version = '1.8' module_pathname = '$libdir/hstore' relocatable = true trusted = true diff --git a/contrib/hstore/hstore_subs.c b/contrib/hstore/hstore_subs.c new file mode 100644 index 0000000000..e52de04f1a --- /dev/null +++ b/contrib/hstore/hstore_subs.c @@ -0,0 +1,297 @@ +/*------------------------------------------------------------------------- + * + * hstore_subs.c + * Subscripting support functions for hstore. + * + * This is a great deal simpler than array_subs.c, because the result of + * subscripting an hstore is just a text string (the value for the key). + * We do not need to support array slicing notation, nor multiple subscripts. + * Less obviously, because the subscript result is never a SQL container + * type, there will never be any nested-assignment scenarios, so we do not + * need a fetch_old function. In turn, that means we can drop the + * check_subscripts function and just let the fetch and assign functions + * do everything. + * + * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group + * Portions Copyright (c) 1994, Regents of the University of California + * + * + * IDENTIFICATION + * contrib/hstore/hstore_subs.c + * + *------------------------------------------------------------------------- + */ +#include "postgres.h" + +#include "executor/execExpr.h" +#include "hstore.h" +#include "nodes/nodeFuncs.h" +#include "nodes/subscripting.h" +#include "parser/parse_coerce.h" +#include "parser/parse_expr.h" +#include "utils/builtins.h" + + +/* + * Finish parse analysis of a SubscriptingRef expression for hstore. + * + * Verify there's just one subscript, coerce it to text, + * and set the result type of the SubscriptingRef node. + */ +static void +hstore_subscript_transform(SubscriptingRef *sbsref, + List *indirection, + ParseState *pstate, + bool isSlice, + bool isAssignment) +{ + A_Indices *ai; + Node *subexpr; + + /* We support only single-subscript, non-slice cases */ + if (isSlice || list_length(indirection) != 1) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("hstore allows only one subscript"), + parser_errposition(pstate, + exprLocation((Node *) indirection)))); + + /* Transform the subscript expression to type text */ + ai = linitial_node(A_Indices, indirection); + Assert(ai->uidx != NULL && ai->lidx == NULL && !ai->is_slice); + + subexpr = transformExpr(pstate, ai->uidx, pstate->p_expr_kind); + /* If it's not text already, try to coerce */ + subexpr = coerce_to_target_type(pstate, + subexpr, exprType(subexpr), + TEXTOID, -1, + COERCION_ASSIGNMENT, + COERCE_IMPLICIT_CAST, + -1); + if (subexpr == NULL) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), + errmsg("hstore subscript must have type text"), + parser_errposition(pstate, exprLocation(ai->uidx)))); + + /* ... and store the transformed subscript into the SubscriptRef node */ + sbsref->refupperindexpr = list_make1(subexpr); + sbsref->reflowerindexpr = NIL; + + /* Determine the result type of the subscripting operation; always text */ + sbsref->refrestype = TEXTOID; + sbsref->reftypmod = -1; +} + +/* + * Evaluate SubscriptingRef fetch for hstore. + * + * Source container is in step's result variable (it's known not NULL, since + * we set fetch_strict to true), and the subscript expression is in the + * upperindex[] array. + */ +static void +hstore_subscript_fetch(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbsrefstate = op->d.sbsref.state; + HStore *hs; + text *key; + HEntry *entries; + int idx; + text *out; + + /* Should not get here if source hstore is null */ + Assert(!(*op->resnull)); + + /* Check for null subscript */ + if (sbsrefstate->upperindexnull[0]) + { + *op->resnull = true; + return; + } + + /* OK, fetch/detoast the hstore and subscript */ + hs = DatumGetHStoreP(*op->resvalue); + key = DatumGetTextPP(sbsrefstate->upperindex[0]); + + /* The rest is basically the same as hstore_fetchval() */ + entries = ARRPTR(hs); + idx = hstoreFindKey(hs, NULL, + VARDATA_ANY(key), VARSIZE_ANY_EXHDR(key)); + + if (idx < 0 || HSTORE_VALISNULL(entries, idx)) + { + *op->resnull = true; + return; + } + + out = cstring_to_text_with_len(HSTORE_VAL(entries, STRPTR(hs), idx), + HSTORE_VALLEN(entries, idx)); + + *op->resvalue = PointerGetDatum(out); +} + +/* + * Evaluate SubscriptingRef assignment for hstore. + * + * Input container (possibly null) is in result area, replacement value is in + * SubscriptingRefState's replacevalue/replacenull. + */ +static void +hstore_subscript_assign(ExprState *state, + ExprEvalStep *op, + ExprContext *econtext) +{ + SubscriptingRefState *sbsrefstate = op->d.sbsref.state; + text *key; + Pairs p; + HStore *out; + + /* Check for null subscript */ + if (sbsrefstate->upperindexnull[0]) + ereport(ERROR, + (errcode(ERRCODE_NULL_VALUE_NOT_ALLOWED), + errmsg("hstore subscript in assignment must not be null"))); + + /* OK, fetch/detoast the subscript */ + key = DatumGetTextPP(sbsrefstate->upperindex[0]); + + /* Create a Pairs entry for subscript + replacement value */ + p.needfree = false; + p.key = VARDATA_ANY(key); + p.keylen = hstoreCheckKeyLen(VARSIZE_ANY_EXHDR(key)); + + if (sbsrefstate->replacenull) + { + p.vallen = 0; + p.isnull = true; + } + else + { + text *val = DatumGetTextPP(sbsrefstate->replacevalue); + + p.val = VARDATA_ANY(val); + p.vallen = hstoreCheckValLen(VARSIZE_ANY_EXHDR(val)); + p.isnull = false; + } + + if (*op->resnull) + { + /* Just build a one-element hstore (cf. hstore_from_text) */ + out = hstorePairs(&p, 1, p.keylen + p.vallen); + } + else + { + /* + * Otherwise, merge the new key into the hstore. Based on + * hstore_concat. + */ + HStore *hs = DatumGetHStoreP(*op->resvalue); + int s1count = HS_COUNT(hs); + int outcount = 0; + int vsize; + char *ps1, + *bufd, + *pd; + HEntry *es1, + *ed; + int s1idx; + int s2idx; + + /* Allocate result without considering possibility of duplicate */ + vsize = CALCDATASIZE(s1count + 1, VARSIZE(hs) + p.keylen + p.vallen); + out = palloc(vsize); + SET_VARSIZE(out, vsize); + HS_SETCOUNT(out, s1count + 1); + + ps1 = STRPTR(hs); + bufd = pd = STRPTR(out); + es1 = ARRPTR(hs); + ed = ARRPTR(out); + + for (s1idx = s2idx = 0; s1idx < s1count || s2idx < 1; ++outcount) + { + int difference; + + if (s1idx >= s1count) + difference = 1; + else if (s2idx >= 1) + difference = -1; + else + { + int s1keylen = HSTORE_KEYLEN(es1, s1idx); + int s2keylen = p.keylen; + + if (s1keylen == s2keylen) + difference = memcmp(HSTORE_KEY(es1, ps1, s1idx), + p.key, + s1keylen); + else + difference = (s1keylen > s2keylen) ? 1 : -1; + } + + if (difference >= 0) + { + HS_ADDITEM(ed, bufd, pd, p); + ++s2idx; + if (difference == 0) + ++s1idx; + } + else + { + HS_COPYITEM(ed, bufd, pd, + HSTORE_KEY(es1, ps1, s1idx), + HSTORE_KEYLEN(es1, s1idx), + HSTORE_VALLEN(es1, s1idx), + HSTORE_VALISNULL(es1, s1idx)); + ++s1idx; + } + } + + HS_FINALIZE(out, outcount, bufd, pd); + } + + *op->resvalue = PointerGetDatum(out); + *op->resnull = false; +} + +/* + * Set up execution state for an hstore subscript operation. + */ +static void +hstore_exec_setup(const SubscriptingRef *sbsref, + SubscriptingRefState *sbsrefstate, + SubscriptExecSteps *methods) +{ + /* Assert we are dealing with one subscript */ + Assert(sbsrefstate->numlower == 0); + Assert(sbsrefstate->numupper == 1); + /* We can't check upperprovided[0] here, but it must be true */ + + /* Pass back pointers to appropriate step execution functions */ + methods->sbs_check_subscripts = NULL; + methods->sbs_fetch = hstore_subscript_fetch; + methods->sbs_assign = hstore_subscript_assign; + methods->sbs_fetch_old = NULL; +} + +/* + * hstore_subscript_handler + * Subscripting handler for hstore. + */ +PG_FUNCTION_INFO_V1(hstore_subscript_handler); +Datum +hstore_subscript_handler(PG_FUNCTION_ARGS) +{ + static const SubscriptRoutines sbsroutines = { + .transform = hstore_subscript_transform, + .exec_setup = hstore_exec_setup, + .fetch_strict = true, /* fetch returns NULL for NULL inputs */ + .fetch_leakproof = true, /* fetch returns NULL for bad subscript */ + .store_leakproof = false /* ... but assignment throws error */ + }; + + PG_RETURN_POINTER(&sbsroutines); +} diff --git a/contrib/hstore/sql/hstore.sql b/contrib/hstore/sql/hstore.sql index a6c2f3a0ce..8d96e30403 100644 --- a/contrib/hstore/sql/hstore.sql +++ b/contrib/hstore/sql/hstore.sql @@ -364,6 +364,14 @@ insert into test_json_agg values ('rec1','"a key" =>1, b => t, c => null, d=> 12 select json_agg(q) from test_json_agg q; select json_agg(q) from (select f1, hstore_to_json_loose(f2) as f2 from test_json_agg) q; +-- Test subscripting +insert into test_json_agg default values; +select f2['d'], f2['x'] is null as x_isnull from test_json_agg; +select f2['d']['e'] from test_json_agg; -- error +select f2['d':'e'] from test_json_agg; -- error +update test_json_agg set f2['d'] = f2['e'], f2['x'] = 'xyzzy'; +select f2 from test_json_agg; + -- Check the hstore_hash() and hstore_hash_extended() function explicitly. SELECT v as value, hstore_hash(v)::bit(32) as standard, hstore_hash_extended(v, 0)::bit(32) as extended0,
st 9. 12. 2020 v 22:59 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Here's a couple of little finger exercises to move this along a bit.
0001 adds the ability to attach a subscript handler to an existing
data type with ALTER TYPE. This is clearly going to be necessary
if we want extension types to be able to use this facility. The
only thing that I think might be controversial here is that I did
not add the ability to set pg_type.typelem. While that'd be easy
enough so far as ALTER TYPE is concerned, I'm not sure that we want
to encourage people to change it. The dependency rules mean that
the semantics of typelem aren't something you really want to change
after-the-fact on an existing type. Also, if we did allow it, any
existing SubscriptingRef.refelemtype values in stored views would
fail to be updated.
0002 makes use of that to support subscripting of hstore. I'm not
sure how much we care about that from a functionality standpoint,
but it seems like it might be good to have a contrib module testing
that extensions can use this. Also, I thought possibly an example
showing what's basically the minimum possible amount of complexity
would be good to have. If people like this, I'll finish it up (it
lacks docs) and add it.
+1 using subscripts for hstore is nice idea
Pavel
regards, tom lane
On Wed, Dec 09, 2020 at 12:49:48PM -0500, Tom Lane wrote: > I've pushed the core patch now. The jsonb parts now have to be > rebased onto this design, which I'm assuming Dmitry will tackle > (I do not intend to). It's not quite clear to me whether we have > a meeting of the minds on what the jsonb functionality should be, > anyway. Alexander seemed to be thinking about offering an option > to let the subscript be a jsonpath, but how would we distinguish > that from a plain-text field name? > > BTW, while reviewing the thread to write the commit message, > I was reminded of my concerns around the "is it a container" > business. As things stand, if type A has a typelem link to > type B, then the system supposes that A contains B physically; > this has implications for what's allowed in DDL, for example > (cf find_composite_type_dependencies() and other places). > We now have a feature whereby subscripting can yield a type > that is not contained in the source type in that sense. > I'd be happier if the "container" terminology were reserved for > that sort of physical containment, which means that I think a lot > of the commentary around SubscriptingRef is misleading. But I do > not have a better word to suggest offhand. Thoughts? Would this be something more along the lines of a "dependent type," or is that adding too much baggage? Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
> On Wed, Dec 09, 2020 at 04:59:34PM -0500, Tom Lane wrote: > > 0001 adds the ability to attach a subscript handler to an existing > data type with ALTER TYPE. This is clearly going to be necessary > if we want extension types to be able to use this facility. The > only thing that I think might be controversial here is that I did > not add the ability to set pg_type.typelem. While that'd be easy > enough so far as ALTER TYPE is concerned, I'm not sure that we want > to encourage people to change it. The dependency rules mean that > the semantics of typelem aren't something you really want to change > after-the-fact on an existing type. Also, if we did allow it, any > existing SubscriptingRef.refelemtype values in stored views would > fail to be updated. I'm curious what could be the use case for setting pg_type.typelem for subscripting? I don't see this that much controversial, but maybe I'm missing something. > On Thu, Dec 10, 2020 at 05:37:20AM +0100, Pavel Stehule wrote: > st 9. 12. 2020 v 22:59 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal: > > > 0002 makes use of that to support subscripting of hstore. I'm not > > sure how much we care about that from a functionality standpoint, > > but it seems like it might be good to have a contrib module testing > > that extensions can use this. Also, I thought possibly an example > > showing what's basically the minimum possible amount of complexity > > would be good to have. If people like this, I'll finish it up (it > > lacks docs) and add it. > > > > +1 using subscripts for hstore is nice idea Yeah, I also find it's a good suggestion, the implementation seems fine as well. As a side note, I'm surprised hstore doesn't have any functionality to update values, except hstore_concat.
Dmitry Dolgov <9erthalion6@gmail.com> writes: >> On Wed, Dec 09, 2020 at 04:59:34PM -0500, Tom Lane wrote: >> 0001 adds the ability to attach a subscript handler to an existing >> data type with ALTER TYPE. This is clearly going to be necessary >> if we want extension types to be able to use this facility. The >> only thing that I think might be controversial here is that I did >> not add the ability to set pg_type.typelem. > I'm curious what could be the use case for setting pg_type.typelem for > subscripting? I don't see this that much controversial, but maybe I'm > missing something. If you want the result of subscripting to be "text" or some other built-in type, then clearly there's no need to use typelem for that, you can just refer to the standard OID macros. The potential use-case that I thought of for setting typelem is where an extension defines types A and B and would like subscripting of B to yield A. Installing A's OID as B.typelem would save a catalog lookup during subscript parsing, and remove a bunch of edge failure cases such as what happens if A gets renamed. However, given the dependency behavior, this would also have the effect of "you can't drop A without dropping B, and you can't modify A in any interesting way either". That would be annoyingly restrictive if there weren't any actual physical containment relationship. But on the other hand, maybe it's acceptable and we just need to document it. The other issue is what about existing stored SubscriptingRef structs. If our backs were to the wall I'd think about removing the refelemtype field so there's no stored image of typelem that needs to be updated. But that would incur an extra catalog lookup in array_exec_setup, so I don't much like it. If we do add the ability to set typelem, I'd prefer to just warn people to not change it once they've installed a subscript handler. Anyway, between those two issues I'm about -0.1 on adding a way to alter typelem. I won't fight hard if somebody wants it, but I'm inclined to leave it out. >> +1 using subscripts for hstore is nice idea > Yeah, I also find it's a good suggestion, the implementation seems fine > as well. As a side note, I'm surprised hstore doesn't have any > functionality to update values, except hstore_concat. Yeah. I cribbed the subscript-fetch implementation from hstore_fetchval, but was surprised to find that there wasn't any direct equivalent function for subscript-store. I guess people have gotten by with concat, but it's not exactly an obvious way to do things. regards, tom lane
> On Fri, Dec 11, 2020 at 10:38:07AM -0500, Tom Lane wrote: > Dmitry Dolgov <9erthalion6@gmail.com> writes: > >> On Wed, Dec 09, 2020 at 04:59:34PM -0500, Tom Lane wrote: > >> 0001 adds the ability to attach a subscript handler to an existing > >> data type with ALTER TYPE. This is clearly going to be necessary > >> if we want extension types to be able to use this facility. The > >> only thing that I think might be controversial here is that I did > >> not add the ability to set pg_type.typelem. > > > I'm curious what could be the use case for setting pg_type.typelem for > > subscripting? I don't see this that much controversial, but maybe I'm > > missing something. > > If you want the result of subscripting to be "text" or some other built-in > type, then clearly there's no need to use typelem for that, you can just > refer to the standard OID macros. The potential use-case that I thought > of for setting typelem is where an extension defines types A and B and > would like subscripting of B to yield A. Installing A's OID as B.typelem > would save a catalog lookup during subscript parsing, and remove a bunch > of edge failure cases such as what happens if A gets renamed. However, > given the dependency behavior, this would also have the effect of "you > can't drop A without dropping B, and you can't modify A in any interesting > way either". That would be annoyingly restrictive if there weren't any > actual physical containment relationship. But on the other hand, maybe > it's acceptable and we just need to document it. > > The other issue is what about existing stored SubscriptingRef structs. > If our backs were to the wall I'd think about removing the refelemtype > field so there's no stored image of typelem that needs to be updated. > But that would incur an extra catalog lookup in array_exec_setup, so > I don't much like it. If we do add the ability to set typelem, I'd > prefer to just warn people to not change it once they've installed a > subscript handler. > > Anyway, between those two issues I'm about -0.1 on adding a way to alter > typelem. I won't fight hard if somebody wants it, but I'm inclined > to leave it out. Yes, makes sense. Thanks for the clarification. > On Wed, Dec 09, 2020 at 07:37:04PM +0100, Dmitry Dolgov wrote: > > On Wed, Dec 09, 2020 at 12:49:48PM -0500, Tom Lane wrote: > > > > The jsonb parts now have to be > > rebased onto this design, which I'm assuming Dmitry will tackle > > Yes, I'm already on it, just couldn't keep up with the changes in this > thread. While rebasing the jsonb patch I found out that the current subscripting assignment implementation in transformAssignmentIndirection always coerce the value to be assigned to the type which subscripting result suppose to have (refrestype). For arrays it's fine, since those two indeed must be the same, but for jsonb (and for hstore I guess too) the result of subscripting is always jsonb (well, text type) and the assigned value could be of some other type. This leads to assigning everything converted to text. Originally this coercion was done in the type specific code, so I hoped to put it into "transform" routine. Unfortunately "transform" is called before that (and could not be called later, because type information from sbsref is required) and all the other hooks are apparently too late. Probably the most straightforward solution here would be to add a new argument to transformAssignmentIndirection to signal if coercion needs to happen or not, and allow the type specific code to specify it via SubscriptingRef. Are there any better ideas?
Dmitry Dolgov <9erthalion6@gmail.com> writes: > While rebasing the jsonb patch I found out that the current subscripting > assignment implementation in transformAssignmentIndirection always > coerce the value to be assigned to the type which subscripting result > suppose to have (refrestype). For arrays it's fine, since those two > indeed must be the same, but for jsonb (and for hstore I guess too) the > result of subscripting is always jsonb (well, text type) and the > assigned value could be of some other type. This leads to assigning > everything converted to text. So ... what's the problem with that? Seems like what you should put in and what you should get out should be the same type. We can certainly reconsider the API for the parsing hook if there's really a good reason for these to be different types, but it seems like that would just be encouraging poor design. regards, tom lane
čt 17. 12. 2020 v 19:49 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Dmitry Dolgov <9erthalion6@gmail.com> writes:
> While rebasing the jsonb patch I found out that the current subscripting
> assignment implementation in transformAssignmentIndirection always
> coerce the value to be assigned to the type which subscripting result
> suppose to have (refrestype). For arrays it's fine, since those two
> indeed must be the same, but for jsonb (and for hstore I guess too) the
> result of subscripting is always jsonb (well, text type) and the
> assigned value could be of some other type. This leads to assigning
> everything converted to text.
So ... what's the problem with that? Seems like what you should put
in and what you should get out should be the same type.
I don't think so. For XML or JSON the target can be different, and it can safe one CAST
DECLARE
n int;
v varchar;
js jsonb default '{"n": 100, "v" : "Hello"};
BEGIN
n := js['n'];
v := js['v'];
Can be nice to do this with a minimum number of transformations.
Regards
Pavel
We can certainly reconsider the API for the parsing hook if there's
really a good reason for these to be different types, but it seems
like that would just be encouraging poor design.
regards, tom lane
Pavel Stehule <pavel.stehule@gmail.com> writes: > čt 17. 12. 2020 v 19:49 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal: >> So ... what's the problem with that? Seems like what you should put >> in and what you should get out should be the same type. > I don't think so. For XML or JSON the target can be different, and it can > safe one CAST > DECLARE > n int; > v varchar; > js jsonb default '{"n": 100, "v" : "Hello"}; > BEGIN > n := js['n']; > v := js['v']; If you're imagining that js['n'] and js['v'] would emit different datatypes, forget it. That would require knowing at parse time what the structure of the json object will be at run time. But in any case, the discussion here is about the source datatype for an assignment, which this example doesn't even contain. regards, tom lane
> On Thu, Dec 17, 2020 at 01:49:17PM -0500, Tom Lane wrote: > Dmitry Dolgov <9erthalion6@gmail.com> writes: > > While rebasing the jsonb patch I found out that the current subscripting > > assignment implementation in transformAssignmentIndirection always > > coerce the value to be assigned to the type which subscripting result > > suppose to have (refrestype). For arrays it's fine, since those two > > indeed must be the same, but for jsonb (and for hstore I guess too) the > > result of subscripting is always jsonb (well, text type) and the > > assigned value could be of some other type. This leads to assigning > > everything converted to text. > > So ... what's the problem with that? Seems like what you should put > in and what you should get out should be the same type. > > We can certainly reconsider the API for the parsing hook if there's > really a good reason for these to be different types, but it seems > like that would just be encouraging poor design. To be more specific, this is the current behaviour (an example from the tests) and it doesn't seem right: =# update test_jsonb_subscript set test_json['a'] = 3 where id = 1; UPDATE 1 =# select jsonb_typeof(test_json->'a') from test_jsonb_subscript where id = 1; jsonb_typeof -------------- string =# update test_jsonb_subscript set test_json = jsonb_set(test_json, '{a}', '3') where id = 1; UPDATE 1 =# select jsonb_typeof(test_json->'a') from test_jsonb_subscript where id = 1; jsonb_typeof -------------- number
On 12/17/20 14:28, Tom Lane wrote: > Pavel Stehule <pavel.stehule@gmail.com> writes: >> n int; >> v varchar; >> js jsonb default '{"n": 100, "v" : "Hello"}; >> BEGIN >> n := js['n']; >> v := js['v']; > > If you're imagining that js['n'] and js['v'] would emit different > datatypes, forget it. That would require knowing at parse time > what the structure of the json object will be at run time. Would it be feasible to analyze that as something like an implicit 'treat as' with the type of the assignment target? 'treat as' is an operator in XML Query that's distinct from 'cast as'; 'cast as foo' has ordinary cast semantics and can coerce non-foo to foo; 'treat as foo' is just a promise from the programmer: "go ahead and statically rely on this being a foo, and give me a runtime exception if it isn't". It would offer a nice economy of expression. Following that idea further, if there were such a thing as a 'treat as' node, would the implicit generation of such a node, according to an assignment target data type, be the kind of thing that could be accomplished by a user function's planner-support function? Regards, -Chap
čt 17. 12. 2020 v 20:28 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> čt 17. 12. 2020 v 19:49 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>> So ... what's the problem with that? Seems like what you should put
>> in and what you should get out should be the same type.
> I don't think so. For XML or JSON the target can be different, and it can
> safe one CAST
> DECLARE
> n int;
> v varchar;
> js jsonb default '{"n": 100, "v" : "Hello"};
> BEGIN
> n := js['n'];
> v := js['v'];
If you're imagining that js['n'] and js['v'] would emit different
datatypes, forget it. That would require knowing at parse time
what the structure of the json object will be at run time.
My idea was a little bit different. When we know the target type (in this example int or varchar), then we can *theoretically* push this information to the subscribing function. This optimization is used in the XMLTABLE function. Now the subscribing function returns JSONB, although internally inside the source value, there are stored integer and varchar values. So the returned value should be converted to jsonb first. Immediately it is casted to the target type outside. My idea was to join the subscription function and outer cast to one functionality, that allows to skip casting when it is not necessary. It will be known in run time. Sure. But because the outer cast and subscription function are separate things, then it is not possible to skip the outer cast.
Pavel
But in any case, the discussion here is about the source datatype
for an assignment, which this example doesn't even contain.
regards, tom lane
Dmitry Dolgov <9erthalion6@gmail.com> writes: > On Thu, Dec 17, 2020 at 01:49:17PM -0500, Tom Lane wrote: >> We can certainly reconsider the API for the parsing hook if there's >> really a good reason for these to be different types, but it seems >> like that would just be encouraging poor design. > To be more specific, this is the current behaviour (an example from the > tests) and it doesn't seem right: > =# update test_jsonb_subscript > set test_json['a'] = 3 where id = 1; > UPDATE 1 > =# select jsonb_typeof(test_json->'a') > from test_jsonb_subscript where id = 1; > jsonb_typeof > -------------- > string I'm kind of unmoved by that example, because making it better would require more guessing about what the user wanted than I care for. You could imagine, perhaps, that the subscript parsing hook gives back a list of potential assignment source types, or that we make it responsible for transforming the source expression as well as the subscripts and then let it do something like that internally. But that just opens the door to confusion and ambiguity. We already had this discussion a few months ago, as I recall, when you wanted to try assignment transforms to both text and integer but I pointed out that both ways would succeed in some cases. The assignment coercion rules are only intended to be used when there is *exactly one* possible result type. I'd only be willing to accept multiple possible coercion target types if we backed off the coercion level to "implicit" to make multiple matches less likely (which is exactly what we do when resolving input types for functions). But I'm afraid that doing so would break more cases than it improves. It would certainly break existing queries for array assignment. I'm rather inclined to think that the result of subscripting a jsonb (and therefore also the required source type for assignment) should be jsonb, not just text. In that case, something like update ... set jsoncol['a'] = 3 would fail, because there's no cast from integer to jsonb. You'd have to write one of update ... set jsoncol['a'] = '3' update ... set jsoncol['a'] = '"3"' to clarify how you wanted the input to be interpreted. But that seems like a good thing, just as it is for jsonb_in. The background for my being so down on this is that it reminds me way too much of the implicit-casts-to-text mess that we cleaned up (with great pain and squawking) back around 8.3. It looks to me like you're basically trying to introduce multiple implicit casts to jsonb, and I'm afraid that's just as bad an idea. At the very least, if we do do it I don't see why it should only happen in the context of subscripted assignment. regards, tom lane
Chapman Flack <chap@anastigmatix.net> writes: > On 12/17/20 14:28, Tom Lane wrote: >> If you're imagining that js['n'] and js['v'] would emit different >> datatypes, forget it. That would require knowing at parse time >> what the structure of the json object will be at run time. > Would it be feasible to analyze that as something like an implicit > 'treat as' with the type of the assignment target? TBH, I think that expending any great amount of effort in that direction would be a big waste of effort. We already have strongly-typed composite types. The use-case for json is where you *don't* have ironclad guarantees about what the structure of the data is. As for doing it implicitly, that is still going to fall foul of the fundamental problem, which is that we don't have the info at parse time. Examples with constant values for the json input are not what to look at, because they'll just mislead you as to what's possible. regards, tom lane
On 12/17/20 15:50, Tom Lane wrote: > Chapman Flack <chap@anastigmatix.net> writes: >> On 12/17/20 14:28, Tom Lane wrote: >>> If you're imagining that js['n'] and js['v'] would emit different >>> datatypes, forget it. That would require knowing at parse time >>> what the structure of the json object will be at run time. > >> Would it be feasible to analyze that as something like an implicit >> 'treat as' with the type of the assignment target? > > TBH, I think that expending any great amount of effort in that direction > would be a big waste of effort. We already have strongly-typed > composite types. The use-case for json is where you *don't* have > ironclad guarantees about what the structure of the data is. > > As for doing it implicitly, that is still going to fall foul of the > fundamental problem, which is that we don't have the info at parse > time. Examples with constant values for the json input are not what > to look at, because they'll just mislead you as to what's possible. Respectfully, I think that fundamental problem is exactly what led to XQuery having the 'treat as' construct [1]. XML is in the same boat as JSON as far as not having ironclad guarantees about what the structure will be. But there are situations where the programmer knows full well that the only inputs of interest will have js['n'] an integer and js['v'] a string, and any input not conforming to that expectation will be erroneous and should produce an error at runtime. That's likely to be what a programmer intends when writing (variable explicitly typed integer) := js['n'] and (variable explicitly types varchar) := js['v'] so it might be nice to be able to write it without a lot of extra ceremony. What I had in mind was not to try too hard to analyze the JSON subscript expression, but only to know that its result can only ever be: more JSON, a string, a number, a boolean, or an array of one of those, and if the assignment target has one of those types, assume that a 'treat as' is intended. Naturally there's a trade-off, and that provides economy of expression at the cost of not giving an immediate parse-time error if the programmer really made a thinko rather than intending a 'treat as'. I haven't closely followed what's proposed as the subscript in js[...] - can it be any arbitrary jsonpath? And does jsonpath have an operator analogous to XQuery's 'treat as'? If so, something like (but with jsonpath rather than XQuery spelling) n := js['n treat as number']; v := js['v treat as string']; might be a happy medium: perhaps parsing the expression enough to see that its outer node is a 'treat as' is not asking too much, and then the programmer has to explicitly add that to avoid a parse-time error, but it's a reasonably painless notation to add. Regards, -Chap [1] https://www.w3.org/TR/2003/WD-xquery-20030822/#id-treat
Chapman Flack <chap@anastigmatix.net> writes: > That's likely to be what a programmer intends when writing > (variable explicitly typed integer) := js['n'] and > (variable explicitly types varchar) := js['v'] I think that what we want, if we're to support that sort of thing, is that the js[] constructs produce jsonb by definition, and then an assignment-level cast is applied to get from jsonb to integer or text. I see we already have most of the necessary casts, but they're currently marked explicit-only. Downgrading them to assignment level might be okay though. If we don't want to do that, it means we have to write integervar := js['n']::integer which is a bit more wordy but also unmistakable as to intent. (I think the "intent" angle might be the reason we insisted on these things being explicit to start with.) It's somewhat interesting to speculate about whether we could optimize the combination of the subscripting function and the cast function. But (a) that's an optimization, not something that should be part of the user-visible semantics, and (b) it should not be part of the initial feature. I think a large part of the reason this patch is still not done after four years is that it's been biting off more than it could chew all along. Let's try to get it to completion and then optimize later. As far as "treat as" is concerned, we already have a spelling for that, it's called a cast. regards, tom lane
On 12/17/20 16:47, Tom Lane wrote: > As far as "treat as" is concerned, we already have a spelling for > that, it's called a cast. I find them different; XQuery was the first language I had encountered that provides both (a cast in XQuery is spelled 'cast as', just as you'd expect), and the idea of an explicit operation that means "I am only asserting statically what type this will have at run time; do not ever perform any conversion or coercion, just give me an error if I'm wrong" seems to be a distinct and useful one. Even if there is no available SQL spelling for that, it might still one day be a useful expression node that could be generated in certain chosen cases. Regards, -Chap
On 12/17/20 16:56, Chapman Flack wrote: >> that, it's called a cast. > > I find them different; XQuery was the first language I had encountered > that provides both (a cast in XQuery is spelled 'cast as', just as you'd > expect), and the idea of an explicit operation that means "I am only > asserting statically what type this will have at run time; do not ever > perform any conversion or coercion, just give me an error if I'm wrong" > seems to be a distinct and useful one. I should have added: it may be an idea that never seemed important in languages that mean to statically type everything, but it seems to arise quite naturally for a language like XQuery (and arguably jsonpath) that tries to do a useful amount of static typing while applied to data structures like XML or JSON that don't come with ironclad guarantees. Regards, -Chap
čt 17. 12. 2020 v 22:47 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Chapman Flack <chap@anastigmatix.net> writes:
> That's likely to be what a programmer intends when writing
> (variable explicitly typed integer) := js['n'] and
> (variable explicitly types varchar) := js['v']
I think that what we want, if we're to support that sort of thing,
is that the js[] constructs produce jsonb by definition, and then an
assignment-level cast is applied to get from jsonb to integer or text.
I see we already have most of the necessary casts, but they're currently
marked explicit-only. Downgrading them to assignment level might be
okay though. If we don't want to do that, it means we have to write
integervar := js['n']::integer
which is a bit more wordy but also unmistakable as to intent. (I think
the "intent" angle might be the reason we insisted on these things
being explicit to start with.)
It's somewhat interesting to speculate about whether we could optimize
the combination of the subscripting function and the cast function.
But (a) that's an optimization, not something that should be part of
the user-visible semantics, and (b) it should not be part of the initial
feature. I think a large part of the reason this patch is still not
done after four years is that it's been biting off more than it could
chew all along. Let's try to get it to completion and then optimize
later.
sure
Pavel
As far as "treat as" is concerned, we already have a spelling for
that, it's called a cast.
regards, tom lane
> On Thu, Dec 17, 2020 at 03:29:35PM -0500, Tom Lane wrote: > Dmitry Dolgov <9erthalion6@gmail.com> writes: > > On Thu, Dec 17, 2020 at 01:49:17PM -0500, Tom Lane wrote: > >> We can certainly reconsider the API for the parsing hook if there's > >> really a good reason for these to be different types, but it seems > >> like that would just be encouraging poor design. > > > To be more specific, this is the current behaviour (an example from the > > tests) and it doesn't seem right: > > > =# update test_jsonb_subscript > > set test_json['a'] = 3 where id = 1; > > UPDATE 1 > > =# select jsonb_typeof(test_json->'a') > > from test_jsonb_subscript where id = 1; > > jsonb_typeof > > -------------- > > string > > > I'm rather inclined to think that the result of subscripting a > jsonb (and therefore also the required source type for assignment) > should be jsonb, not just text. In that case, something like > update ... set jsoncol['a'] = 3 > would fail, because there's no cast from integer to jsonb. You'd > have to write one of > update ... set jsoncol['a'] = '3' > update ... set jsoncol['a'] = '"3"' > to clarify how you wanted the input to be interpreted. > But that seems like a good thing, just as it is for jsonb_in. Yep, that makes sense, will go with this idea.
> On Fri, Dec 18, 2020 at 08:59:25PM +0100, Dmitry Dolgov wrote: > > On Thu, Dec 17, 2020 at 03:29:35PM -0500, Tom Lane wrote: > > Dmitry Dolgov <9erthalion6@gmail.com> writes: > > > On Thu, Dec 17, 2020 at 01:49:17PM -0500, Tom Lane wrote: > > >> We can certainly reconsider the API for the parsing hook if there's > > >> really a good reason for these to be different types, but it seems > > >> like that would just be encouraging poor design. > > > > > To be more specific, this is the current behaviour (an example from the > > > tests) and it doesn't seem right: > > > > > =# update test_jsonb_subscript > > > set test_json['a'] = 3 where id = 1; > > > UPDATE 1 > > > =# select jsonb_typeof(test_json->'a') > > > from test_jsonb_subscript where id = 1; > > > jsonb_typeof > > > -------------- > > > string > > > > > > I'm rather inclined to think that the result of subscripting a > > jsonb (and therefore also the required source type for assignment) > > should be jsonb, not just text. In that case, something like > > update ... set jsoncol['a'] = 3 > > would fail, because there's no cast from integer to jsonb. You'd > > have to write one of > > update ... set jsoncol['a'] = '3' > > update ... set jsoncol['a'] = '"3"' > > to clarify how you wanted the input to be interpreted. > > But that seems like a good thing, just as it is for jsonb_in. > > Yep, that makes sense, will go with this idea. Here is the new version of jsonb subscripting rebased on the committed infrastructure patch. I hope it will not introduce any confusion with the previously posted patched in this thread (about alter type subscript and hstore) as they are independent. There are few differences from the previous version: * No limit on number of subscripts for jsonb (as there is no intrinsic limitation of this kind for jsonb). * In case of assignment via subscript now it expects the replace value to be of jsonb type. * Similar to the implementation for arrays, if the source jsonb is NULL, it will be replaced by an empty jsonb and the new value will be assigned to it. This means: =# select * from test_jsonb_subscript where id = 3; id | test_json ----+----------- 3 | NULL =# update test_jsonb_subscript set test_json['a'] = '1' where id = 3; UPDATE 1 =# select * from test_jsonb_subscript where id = 3; id | test_json ----+----------- 3 | {"a": 1} and similar: =# select * from test_jsonb_subscript where id = 3; id | test_json ----+----------- 3 | NULL =# update test_jsonb_subscript set test_json[1] = '1' where id = 3; UPDATE 1 =# select * from test_jsonb_subscript where id = 3; id | test_json ----+----------- 3 | {"1": 1} The latter is probably a bit strange looking, but if there are any concerns about this part (and in general about an assignment to jsonb which is NULL) of the implementation it could be easily changed. * There is nothing to address question about distinguishing a regular text subscript and jsonpath in the patch yet. I guess the idea would be to save the original subscript value type before coercing it into text and allow a type specific code to convert it back. But I'll probably do it as a separate patch when we finish with this one.
Attachment
út 22. 12. 2020 v 11:24 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Fri, Dec 18, 2020 at 08:59:25PM +0100, Dmitry Dolgov wrote:
> > On Thu, Dec 17, 2020 at 03:29:35PM -0500, Tom Lane wrote:
> > Dmitry Dolgov <9erthalion6@gmail.com> writes:
> > > On Thu, Dec 17, 2020 at 01:49:17PM -0500, Tom Lane wrote:
> > >> We can certainly reconsider the API for the parsing hook if there's
> > >> really a good reason for these to be different types, but it seems
> > >> like that would just be encouraging poor design.
> >
> > > To be more specific, this is the current behaviour (an example from the
> > > tests) and it doesn't seem right:
> >
> > > =# update test_jsonb_subscript
> > > set test_json['a'] = 3 where id = 1;
> > > UPDATE 1
> > > =# select jsonb_typeof(test_json->'a')
> > > from test_jsonb_subscript where id = 1;
> > > jsonb_typeof
> > > --------------
> > > string
> >
> >
> > I'm rather inclined to think that the result of subscripting a
> > jsonb (and therefore also the required source type for assignment)
> > should be jsonb, not just text. In that case, something like
> > update ... set jsoncol['a'] = 3
> > would fail, because there's no cast from integer to jsonb. You'd
> > have to write one of
> > update ... set jsoncol['a'] = '3'
> > update ... set jsoncol['a'] = '"3"'
> > to clarify how you wanted the input to be interpreted.
> > But that seems like a good thing, just as it is for jsonb_in.
>
> Yep, that makes sense, will go with this idea.
Here is the new version of jsonb subscripting rebased on the committed
infrastructure patch. I hope it will not introduce any confusion with
the previously posted patched in this thread (about alter type subscript
and hstore) as they are independent.
There are few differences from the previous version:
* No limit on number of subscripts for jsonb (as there is no intrinsic
limitation of this kind for jsonb).
* In case of assignment via subscript now it expects the replace value
to be of jsonb type.
* Similar to the implementation for arrays, if the source jsonb is NULL,
it will be replaced by an empty jsonb and the new value will be
assigned to it. This means:
=# select * from test_jsonb_subscript where id = 3;
id | test_json
----+-----------
3 | NULL
=# update test_jsonb_subscript set test_json['a'] = '1' where id = 3;
UPDATE 1
=# select * from test_jsonb_subscript where id = 3;
id | test_json
----+-----------
3 | {"a": 1}
and similar:
=# select * from test_jsonb_subscript where id = 3;
id | test_json
----+-----------
3 | NULL
=# update test_jsonb_subscript set test_json[1] = '1' where id = 3;
UPDATE 1
=# select * from test_jsonb_subscript where id = 3;
id | test_json
----+-----------
3 | {"1": 1}
The latter is probably a bit strange looking, but if there are any concerns
about this part (and in general about an assignment to jsonb which is NULL)
of the implementation it could be easily changed.
I expect behave like
update x set test[1] = 10; --> "[10]";
update x set test['1'] = 10; --> "{"1": 10}"
Regards
Pavel
* There is nothing to address question about distinguishing a regular text
subscript and jsonpath in the patch yet. I guess the idea would be to save
the original subscript value type before coercing it into text and allow a
type specific code to convert it back. But I'll probably do it as a separate
patch when we finish with this one.
> On Tue, Dec 22, 2020 at 12:19:26PM +0100, Pavel Stehule wrote: > > > Here is the new version of jsonb subscripting rebased on the committed > > infrastructure patch. I hope it will not introduce any confusion with > > the previously posted patched in this thread (about alter type subscript > > and hstore) as they are independent. > > > > There are few differences from the previous version: > > > > * No limit on number of subscripts for jsonb (as there is no intrinsic > > limitation of this kind for jsonb). > > > > * In case of assignment via subscript now it expects the replace value > > to be of jsonb type. > > > > * Similar to the implementation for arrays, if the source jsonb is NULL, > > it will be replaced by an empty jsonb and the new value will be > > assigned to it. This means: > > > > =# select * from test_jsonb_subscript where id = 3; > > id | test_json > > ----+----------- > > 3 | NULL > > > > =# update test_jsonb_subscript set test_json['a'] = '1' where id = > > 3; > > UPDATE 1 > > > > =# select * from test_jsonb_subscript where id = 3; > > id | test_json > > ----+----------- > > 3 | {"a": 1} > > > > and similar: > > > > =# select * from test_jsonb_subscript where id = 3; > > id | test_json > > ----+----------- > > 3 | NULL > > > > =# update test_jsonb_subscript set test_json[1] = '1' where id = 3; > > UPDATE 1 > > > > =# select * from test_jsonb_subscript where id = 3; > > id | test_json > > ----+----------- > > 3 | {"1": 1} > > > > The latter is probably a bit strange looking, but if there are any > > concerns > > about this part (and in general about an assignment to jsonb which is > > NULL) > > of the implementation it could be easily changed. > > > > What is the possibility to make an array instead of a record? > > I expect behave like > > update x set test[1] = 10; --> "[10]"; > update x set test['1'] = 10; --> "{"1": 10}" Yes, I also was thinking about this because such behaviour is more natural. To implement this we need to provide possibility for type specific code to remember original subscript expression type (something like in the attached version), which could be also useful for the future work on jsonpath. I'm just not sure if there are again some important bits are missing in this idea, so if someone can take a look I would appreciate. In case there are any issues, I would just suggest keep it simple and return NULL.
Attachment
Dmitry Dolgov <9erthalion6@gmail.com> writes: > On Tue, Dec 22, 2020 at 12:19:26PM +0100, Pavel Stehule wrote: >> I expect behave like >> >> update x set test[1] = 10; --> "[10]"; >> update x set test['1'] = 10; --> "{"1": 10}" > Yes, I also was thinking about this because such behaviour is more > natural. I continue to feel that this is a fundamentally bad idea that will lead to much more pain than benefit. People are going to want to know why "test[1.0]" doesn't act like "test[1]". They are going to complain because "test[$1]" acts so much differently depending on whether they assigned a type to the $1 parameter or not. And they are going to bitch because dumping and reloading a rule causes it to do something different than it did before --- or at least we'd be at horrid risk of that; only if we hide the injected cast-to-text doesd the dumped rule look the way it needs to. Even then, the whole thing is critically dependent on the fact that integer-type constants are written and displayed differently from other constants, so it won't scale to any other type that someone might want to treat specially. So you're just leading datatype designers down a garden path that will be a dead end for many of them. IMO this isn't actually any saner than your previous iterations on the idea. regards, tom lane
út 22. 12. 2020 v 17:57 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Dmitry Dolgov <9erthalion6@gmail.com> writes:
> On Tue, Dec 22, 2020 at 12:19:26PM +0100, Pavel Stehule wrote:
>> I expect behave like
>>
>> update x set test[1] = 10; --> "[10]";
>> update x set test['1'] = 10; --> "{"1": 10}"
> Yes, I also was thinking about this because such behaviour is more
> natural.
I continue to feel that this is a fundamentally bad idea that will
lead to much more pain than benefit. People are going to want to
know why "test[1.0]" doesn't act like "test[1]". They are going
to complain because "test[$1]" acts so much differently depending
on whether they assigned a type to the $1 parameter or not. And
they are going to bitch because dumping and reloading a rule causes
it to do something different than it did before --- or at least we'd
be at horrid risk of that; only if we hide the injected cast-to-text
doesd the dumped rule look the way it needs to. Even then, the whole
thing is critically dependent on the fact that integer-type constants
are written and displayed differently from other constants, so it
won't scale to any other type that someone might want to treat specially.
So you're just leading datatype designers down a garden path that will be
a dead end for many of them.
IMO this isn't actually any saner than your previous iterations
on the idea.
I think so there can be some logic. But json has two kinds of very different objects - objects and arrays, and we should support both.
can be a good solution based on initial source value?
DECLARE v jsonb;
BEGIN
v := '[]';
v[1] := 10; v[2] := 20; -- v = "[10,20]"
v['a'] := 30; --> should to raise an error
v := '{}';
v[1] := 10; v[2] := 20; -- v = "{"1": 10, "2":20}"
v['a'] := 30; -- v = "{"1": 10, "2":20, "a": 30}"
When the source variable is null, then the default behavior can be the same like json objects. But it doesn't solve well numeric indexes.
because
v := '[]'
v[1.5] := 100; -- it is nonsense
but
v := '{}'
v[1.5] := 100; -- v = "{"1.5":100}" -- and this can have good benefit, but "1" and "1.0" are different keys.
But maybe we try to design some that are designed already. Is there some info about index specification in SQL/JSON?
Regards
Pavel
regards, tom lane
> On Tue, Dec 22, 2020 at 11:57:13AM -0500, Tom Lane wrote: > Dmitry Dolgov <9erthalion6@gmail.com> writes: > > On Tue, Dec 22, 2020 at 12:19:26PM +0100, Pavel Stehule wrote: > >> I expect behave like > >> > >> update x set test[1] = 10; --> "[10]"; > >> update x set test['1'] = 10; --> "{"1": 10}" > > > Yes, I also was thinking about this because such behaviour is more > > natural. > > I continue to feel that this is a fundamentally bad idea that will > lead to much more pain than benefit. People are going to want to > know why "test[1.0]" doesn't act like "test[1]". They are going > to complain because "test[$1]" acts so much differently depending > on whether they assigned a type to the $1 parameter or not. And > they are going to bitch because dumping and reloading a rule causes > it to do something different than it did before --- or at least we'd > be at horrid risk of that; only if we hide the injected cast-to-text > doesd the dumped rule look the way it needs to. Even then, the whole > thing is critically dependent on the fact that integer-type constants > are written and displayed differently from other constants, so it > won't scale to any other type that someone might want to treat specially. > So you're just leading datatype designers down a garden path that will be > a dead end for many of them. > > IMO this isn't actually any saner than your previous iterations > on the idea. Ok. While I don't have any preferences here, we can disregard the last posted patch (extended-with-subscript-type) and consider only v38-0001-Subscripting-for-jsonb version.
út 22. 12. 2020 v 18:35 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Tue, Dec 22, 2020 at 11:57:13AM -0500, Tom Lane wrote:
> Dmitry Dolgov <9erthalion6@gmail.com> writes:
> > On Tue, Dec 22, 2020 at 12:19:26PM +0100, Pavel Stehule wrote:
> >> I expect behave like
> >>
> >> update x set test[1] = 10; --> "[10]";
> >> update x set test['1'] = 10; --> "{"1": 10}"
>
> > Yes, I also was thinking about this because such behaviour is more
> > natural.
>
> I continue to feel that this is a fundamentally bad idea that will
> lead to much more pain than benefit. People are going to want to
> know why "test[1.0]" doesn't act like "test[1]". They are going
> to complain because "test[$1]" acts so much differently depending
> on whether they assigned a type to the $1 parameter or not. And
> they are going to bitch because dumping and reloading a rule causes
> it to do something different than it did before --- or at least we'd
> be at horrid risk of that; only if we hide the injected cast-to-text
> doesd the dumped rule look the way it needs to. Even then, the whole
> thing is critically dependent on the fact that integer-type constants
> are written and displayed differently from other constants, so it
> won't scale to any other type that someone might want to treat specially.
> So you're just leading datatype designers down a garden path that will be
> a dead end for many of them.
>
> IMO this isn't actually any saner than your previous iterations
> on the idea.
Ok. While I don't have any preferences here, we can disregard the last
posted patch (extended-with-subscript-type) and consider only
v38-0001-Subscripting-for-jsonb version.
There are two parts - fetching and setting.
Probably there can be an agreement on fetching part: if index text is JSONPath expression, use jsonb_path_query, else use jsonb_extract_path.
The setting should be the same in the inverse direction.
I like the behavior of jsonb_extract_path - it has intuitive behaviour and we should use it.
Pavel Stehule <pavel.stehule@gmail.com> writes: > But maybe we try to design some that are designed already. Is there some > info about index specification in SQL/JSON? We do have precedent for this, it's the rules about resolving argument types for overloaded functions. But the conclusion that that precedent leads to is that we should check whether the subscript expression can be *implicitly* coerced to either integer or text, and fail if neither coercion or both coercions succeed. I'd be okay with that from a system design standpoint, but it's hard to say without trying it whether it will work out nicely from a usability standpoint. In a quick trial it seems it might be okay: regression=# create function mysub(int) returns text language sql regression-# as $$select 'int'$$; CREATE FUNCTION regression=# create function mysub(text) returns text language sql as $$select 'text'$$; CREATE FUNCTION regression=# select mysub(42); mysub ------- int (1 row) regression=# select mysub('foo'); mysub ------- text (1 row) But there are definitely cases that will fail when an assignment coercion would have succeeded, eg regression=# select mysub(42::bigint); ERROR: function mysub(bigint) does not exist Maybe that's okay. (As I said earlier, we can't use assignment coercion when there's two possible coercion targets, because it'd be too likely that they both succeed.) regards, tom lane
> On Tue, Dec 22, 2020 at 02:21:22PM -0500, Tom Lane wrote: > Pavel Stehule <pavel.stehule@gmail.com> writes: > > But maybe we try to design some that are designed already. Is there some > > info about index specification in SQL/JSON? > > We do have precedent for this, it's the rules about resolving argument > types for overloaded functions. But the conclusion that that precedent > leads to is that we should check whether the subscript expression can > be *implicitly* coerced to either integer or text, and fail if neither > coercion or both coercions succeed. I'd be okay with that from a system > design standpoint, but it's hard to say without trying it whether it > will work out nicely from a usability standpoint. In a quick trial > it seems it might be okay: > > regression=# create function mysub(int) returns text language sql > regression-# as $$select 'int'$$; > CREATE FUNCTION > regression=# create function mysub(text) returns text language sql > as $$select 'text'$$; > CREATE FUNCTION > regression=# select mysub(42); > mysub > ------- > int > (1 row) > > regression=# select mysub('foo'); > mysub > ------- > text > (1 row) > > regression=# select mysub(42::bigint); > ERROR: function mysub(bigint) does not exist I'm not sure I completely follow and can't immediately find the relevant code for overloaded functions, so I need to do a perception check. Following this design in jsonb_subscripting_transform we try to coerce the subscription expression to both integer and text (and maybe even to jsonpath), and based on the result of which coercion has succeeded chose different logic to handle it, right? And just for me to understand. In the above example of the overloaded function, with the integer we can coerce it only to text (since original type of the expression is integer), and with the bigint it could be coerced both to integer and text, that's why failure, isn't?
Dmitry Dolgov <9erthalion6@gmail.com> writes: >> On Tue, Dec 22, 2020 at 02:21:22PM -0500, Tom Lane wrote: >> We do have precedent for this, it's the rules about resolving argument >> types for overloaded functions. But the conclusion that that precedent >> leads to is that we should check whether the subscript expression can >> be *implicitly* coerced to either integer or text, and fail if neither >> coercion or both coercions succeed. > I'm not sure I completely follow and can't immediately find the relevant > code for overloaded functions, so I need to do a perception check. > Following this design in jsonb_subscripting_transform we try to coerce > the subscription expression to both integer and text (and maybe even to > jsonpath), and based on the result of which coercion has succeeded chose > different logic to handle it, right? Right, with the important proviso that the coercion strength is COERCION_IMPLICIT not COERCION_ASSIGNMENT. > And just for me to understand. In the above example of the overloaded > function, with the integer we can coerce it only to text (since original > type of the expression is integer), and with the bigint it could be > coerced both to integer and text, that's why failure, isn't? No, there's no such IMPLICIT-level casts. Coercing bigint down to int is only allowed at ASSIGNMENT or higher coercion strength. In a case like jsonpath['...'], the initially UNKNOWN-type literal could in theory be coerced to any of these types, so you'd have to resolve that case manually. The overloaded-function code has an internal preference that makes it choose TEXT if it has a choice of TEXT or some other target type for an UNKNOWN input (cf parse_func.c starting about line 1150), but if you ask can_coerce_type() it's going to say TRUE for all three cases. Roughly speaking, then, I think what you want to do is 1. If input type is UNKNOWNOID, choose result type TEXT. 2. Otherwise, apply can_coerce_type() to see if the input type can be coerced to int4, text, or jsonpath. If it succeeds for none or more than one of these, throw error. Otherwise choose the single successful type. 3. Apply coerce_type() to coerce to the chosen result type. 4. At runtime, examine exprType() of the input to figure out what to do. regards, tom lane
> On Sat, Dec 26, 2020 at 01:24:04PM -0500, Tom Lane wrote: > > In a case like jsonpath['...'], the initially UNKNOWN-type literal could > in theory be coerced to any of these types, so you'd have to resolve that > case manually. The overloaded-function code has an internal preference > that makes it choose TEXT if it has a choice of TEXT or some other target > type for an UNKNOWN input (cf parse_func.c starting about line 1150), but > if you ask can_coerce_type() it's going to say TRUE for all three cases. > > Roughly speaking, then, I think what you want to do is > > 1. If input type is UNKNOWNOID, choose result type TEXT. > > 2. Otherwise, apply can_coerce_type() to see if the input type can be > coerced to int4, text, or jsonpath. If it succeeds for none or more > than one of these, throw error. Otherwise choose the single successful > type. > > 3. Apply coerce_type() to coerce to the chosen result type. > > 4. At runtime, examine exprType() of the input to figure out what to do. Thanks, that was super useful. Following this suggestion I've made necessary adjustments for the patch. There is no jsonpath support, but this could be easily added on top.
> On Wed, Dec 30, 2020 at 02:45:12PM +0100, Dmitry Dolgov wrote: > > On Sat, Dec 26, 2020 at 01:24:04PM -0500, Tom Lane wrote: > > > > In a case like jsonpath['...'], the initially UNKNOWN-type literal could > > in theory be coerced to any of these types, so you'd have to resolve that > > case manually. The overloaded-function code has an internal preference > > that makes it choose TEXT if it has a choice of TEXT or some other target > > type for an UNKNOWN input (cf parse_func.c starting about line 1150), but > > if you ask can_coerce_type() it's going to say TRUE for all three cases. > > > > Roughly speaking, then, I think what you want to do is > > > > 1. If input type is UNKNOWNOID, choose result type TEXT. > > > > 2. Otherwise, apply can_coerce_type() to see if the input type can be > > coerced to int4, text, or jsonpath. If it succeeds for none or more > > than one of these, throw error. Otherwise choose the single successful > > type. > > > > 3. Apply coerce_type() to coerce to the chosen result type. > > > > 4. At runtime, examine exprType() of the input to figure out what to do. > > Thanks, that was super useful. Following this suggestion I've made > necessary adjustments for the patch. There is no jsonpath support, but > this could be easily added on top. And the forgotten patch itself.
Attachment
st 30. 12. 2020 v 14:46 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Wed, Dec 30, 2020 at 02:45:12PM +0100, Dmitry Dolgov wrote:
> > On Sat, Dec 26, 2020 at 01:24:04PM -0500, Tom Lane wrote:
> >
> > In a case like jsonpath['...'], the initially UNKNOWN-type literal could
> > in theory be coerced to any of these types, so you'd have to resolve that
> > case manually. The overloaded-function code has an internal preference
> > that makes it choose TEXT if it has a choice of TEXT or some other target
> > type for an UNKNOWN input (cf parse_func.c starting about line 1150), but
> > if you ask can_coerce_type() it's going to say TRUE for all three cases.
> >
> > Roughly speaking, then, I think what you want to do is
> >
> > 1. If input type is UNKNOWNOID, choose result type TEXT.
> >
> > 2. Otherwise, apply can_coerce_type() to see if the input type can be
> > coerced to int4, text, or jsonpath. If it succeeds for none or more
> > than one of these, throw error. Otherwise choose the single successful
> > type.
> >
> > 3. Apply coerce_type() to coerce to the chosen result type.
> >
> > 4. At runtime, examine exprType() of the input to figure out what to do.
>
> Thanks, that was super useful. Following this suggestion I've made
> necessary adjustments for the patch. There is no jsonpath support, but
> this could be easily added on top.
And the forgotten patch itself.
make check fails
But I dislike two issues
1. quietly ignored update
postgres=# update foo set a['a'][10] = '20';
UPDATE 1
postgres=# select * from foo;
┌────┐
│ a │
╞════╡
│ {} │
└────┘
(1 row)
UPDATE 1
postgres=# select * from foo;
┌────┐
│ a │
╞════╡
│ {} │
└────┘
(1 row)
The value should be modified or there should be an error (but I prefer implicit creating nested empty objects when it is necessary).
update foo set a['a'] = '[]';
2. The index position was ignored.
postgres=# update foo set a['a'][10] = '20';
UPDATE 1
postgres=# select * from foo;
┌─────────────┐
│ a │
╞═════════════╡
│ {"a": [20]} │
└─────────────┘
(1 row)
UPDATE 1
postgres=# select * from foo;
┌─────────────┐
│ a │
╞═════════════╡
│ {"a": [20]} │
└─────────────┘
(1 row)
Notes:
1. It is very nice so casts are supported. I wrote int2jsonb cast and it was working. Maybe we can create buildin casts for int, bigint, numeric, boolean, date, timestamp to jsonb.
Regards
Pavel
> On Wed, Dec 30, 2020 at 07:48:57PM +0100, Pavel Stehule wrote: > st 30. 12. 2020 v 14:46 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> > napsal: > > > > On Wed, Dec 30, 2020 at 02:45:12PM +0100, Dmitry Dolgov wrote: > > > > On Sat, Dec 26, 2020 at 01:24:04PM -0500, Tom Lane wrote: > > > > > > > > In a case like jsonpath['...'], the initially UNKNOWN-type literal > > could > > > > in theory be coerced to any of these types, so you'd have to resolve > > that > > > > case manually. The overloaded-function code has an internal preference > > > > that makes it choose TEXT if it has a choice of TEXT or some other > > target > > > > type for an UNKNOWN input (cf parse_func.c starting about line 1150), > > but > > > > if you ask can_coerce_type() it's going to say TRUE for all three > > cases. > > > > > > > > Roughly speaking, then, I think what you want to do is > > > > > > > > 1. If input type is UNKNOWNOID, choose result type TEXT. > > > > > > > > 2. Otherwise, apply can_coerce_type() to see if the input type can be > > > > coerced to int4, text, or jsonpath. If it succeeds for none or more > > > > than one of these, throw error. Otherwise choose the single successful > > > > type. > > > > > > > > 3. Apply coerce_type() to coerce to the chosen result type. > > > > > > > > 4. At runtime, examine exprType() of the input to figure out what to > > do. > > > > > > Thanks, that was super useful. Following this suggestion I've made > > > necessary adjustments for the patch. There is no jsonpath support, but > > > this could be easily added on top. > > > > And the forgotten patch itself. > > > > make check fails Yeah, apparently I forgot to enable asserts back after the last benchmarking discussion, and missed some of those. Will fix. > 2. The index position was ignored. > > postgres=# update foo set a['a'][10] = '20'; > UPDATE 1 > postgres=# select * from foo; > ┌─────────────┐ > │ a │ > ╞═════════════╡ > │ {"a": [20]} │ > └─────────────┘ > (1 row) I just realized I haven't included "filling the gaps" part, that's why it works as before. Can add this too. > 1. quietly ignored update > > postgres=# update foo set a['a'][10] = '20'; > UPDATE 1 > postgres=# select * from foo; > ┌────┐ > │ a │ > ╞════╡ > │ {} │ > └────┘ > (1 row) This belongs to the original jsonb_set implementation. Although if we started to change it anyway with "filling the gaps", maybe it's fine to add one more flag to tune its behaviour in this case as well. I can check how complicated that could be.
> On Wed, Dec 30, 2020 at 09:01:37PM +0100, Dmitry Dolgov wrote: > > make check fails > > Yeah, apparently I forgot to enable asserts back after the last > benchmarking discussion, and missed some of those. Will fix. > > > 2. The index position was ignored. > > > > postgres=# update foo set a['a'][10] = '20'; > > UPDATE 1 > > postgres=# select * from foo; > > ┌─────────────┐ > > │ a │ > > ╞═════════════╡ > > │ {"a": [20]} │ > > └─────────────┘ > > (1 row) > > I just realized I haven't included "filling the gaps" part, that's why > it works as before. Can add this too. > > > 1. quietly ignored update > > > > postgres=# update foo set a['a'][10] = '20'; > > UPDATE 1 > > postgres=# select * from foo; > > ┌────┐ > > │ a │ > > ╞════╡ > > │ {} │ > > └────┘ > > (1 row) > > This belongs to the original jsonb_set implementation. Although if we > started to change it anyway with "filling the gaps", maybe it's fine to > add one more flag to tune its behaviour in this case as well. I can > check how complicated that could be. Here is what I had in mind. Assert issue in main patch is fixed (nothing serious, it was just the rawscalar check for an empty jsonb created during assignment), and the second patch contains all the bits with "filling the gaps" including your suggestion about creating the whole path if it's not present. The latter (creating the chain of empty objects) I haven't tested that much, but if there are any issues or concerns I guess it will not prevent the main patch from going forward.
Attachment
čt 31. 12. 2020 v 15:27 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Wed, Dec 30, 2020 at 09:01:37PM +0100, Dmitry Dolgov wrote:
> > make check fails
>
> Yeah, apparently I forgot to enable asserts back after the last
> benchmarking discussion, and missed some of those. Will fix.
>
> > 2. The index position was ignored.
> >
> > postgres=# update foo set a['a'][10] = '20';
> > UPDATE 1
> > postgres=# select * from foo;
> > ┌─────────────┐
> > │ a │
> > ╞═════════════╡
> > │ {"a": [20]} │
> > └─────────────┘
> > (1 row)
>
> I just realized I haven't included "filling the gaps" part, that's why
> it works as before. Can add this too.
>
> > 1. quietly ignored update
> >
> > postgres=# update foo set a['a'][10] = '20';
> > UPDATE 1
> > postgres=# select * from foo;
> > ┌────┐
> > │ a │
> > ╞════╡
> > │ {} │
> > └────┘
> > (1 row)
>
> This belongs to the original jsonb_set implementation. Although if we
> started to change it anyway with "filling the gaps", maybe it's fine to
> add one more flag to tune its behaviour in this case as well. I can
> check how complicated that could be.
Here is what I had in mind. Assert issue in main patch is fixed (nothing
serious, it was just the rawscalar check for an empty jsonb created
during assignment), and the second patch contains all the bits with
"filling the gaps" including your suggestion about creating the whole
path if it's not present. The latter (creating the chain of empty
objects) I haven't tested that much, but if there are any issues or
concerns I guess it will not prevent the main patch from going forward
the tests passed and filling gaps works well
but creating empty objects doesn't work
create table foo(a jsonb);
insert into foo values('{}');
postgres=# update foo set a['k'][1] = '20';
UPDATE 1
postgres=# select * from foo;
┌───────────────────┐
│ a │
╞═══════════════════╡
│ {"k": [null, 20]} │
└───────────────────┘
(1 row)
UPDATE 1
postgres=# select * from foo;
┌───────────────────┐
│ a │
╞═══════════════════╡
│ {"k": [null, 20]} │
└───────────────────┘
(1 row)
it is ok
postgres=# update foo set a['k3'][10] = '20';
UPDATE 1
postgres=# select * from foo;
┌───────────────────┐
│ a │
╞═══════════════════╡
│ {"k": [null, 20]} │
└───────────────────┘
(1 row)
UPDATE 1
postgres=# select * from foo;
┌───────────────────┐
│ a │
╞═══════════════════╡
│ {"k": [null, 20]} │
└───────────────────┘
(1 row)
the second update was not successful
.
> On Thu, Dec 31, 2020 at 08:21:55PM +0100, Pavel Stehule wrote: > čt 31. 12. 2020 v 15:27 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> > napsal: > > the tests passed and filling gaps works well > > but creating empty objects doesn't work > > create table foo(a jsonb); > > insert into foo values('{}'); > > postgres=# update foo set a['k'][1] = '20'; > UPDATE 1 > postgres=# select * from foo; > ┌───────────────────┐ > │ a │ > ╞═══════════════════╡ > │ {"k": [null, 20]} │ > └───────────────────┘ > (1 row) > > it is ok > > postgres=# update foo set a['k3'][10] = '20'; > UPDATE 1 > postgres=# select * from foo; > ┌───────────────────┐ > │ a │ > ╞═══════════════════╡ > │ {"k": [null, 20]} │ > └───────────────────┘ > (1 row) > > the second update was not successful Right, it was working only if the source level is empty, thanks for checking. I've found a bit more time and prepared more decent version which covers all the cases I could come up with following the same implementation logic. The first patch is the same though.
Attachment
Hi
probably some is wrong still
create table foo(a jsonb);
update foo set a['a'] = '10';
update foo set a['b']['c'][1] = '10';
update foo set a['b']['c'][10] = '10'
WARNING: problem in alloc set ExprContext: req size > alloc size for chunk 0x256dd88 in block 0x256d160
WARNING: problem in alloc set ExprContext: bogus aset link in block 0x256d160, chunk 0x256dd88
WARNING: problem in alloc set ExprContext: req size > alloc size for chunk 0x256dfa0 in block 0x256d160
WARNING: problem in alloc set ExprContext: bogus aset link in block 0x256d160, chunk 0x256dfa0
WARNING: problem in alloc set ExprContext: req size > alloc size for chunk 0x256dfc4 in block 0x256d160
WARNING: problem in alloc set ExprContext: bad single-chunk 0x256dfc4 in block 0x256d160
WARNING: problem in alloc set ExprContext: bogus aset link in block 0x256d160, chunk 0x256dfc4
WARNING: problem in alloc set ExprContext: found inconsistent memory block 0x256d160
UPDATE 1
WARNING: problem in alloc set ExprContext: bogus aset link in block 0x256d160, chunk 0x256dd88
WARNING: problem in alloc set ExprContext: req size > alloc size for chunk 0x256dfa0 in block 0x256d160
WARNING: problem in alloc set ExprContext: bogus aset link in block 0x256d160, chunk 0x256dfa0
WARNING: problem in alloc set ExprContext: req size > alloc size for chunk 0x256dfc4 in block 0x256d160
WARNING: problem in alloc set ExprContext: bad single-chunk 0x256dfc4 in block 0x256d160
WARNING: problem in alloc set ExprContext: bogus aset link in block 0x256d160, chunk 0x256dfc4
WARNING: problem in alloc set ExprContext: found inconsistent memory block 0x256d160
UPDATE 1
and result is wrong, the value of a['b']['c'][1] was overwritten by NULL
Regards
Pavel
> On Sun, Jan 03, 2021 at 08:41:17PM +0100, Pavel Stehule wrote: > > probably some is wrong still > > create table foo(a jsonb); > update foo set a['a'] = '10'; > update foo set a['b']['c'][1] = '10'; > update foo set a['b']['c'][10] = '10' Thanks for noticing. Indeed, there was a subtle change of meaning for 'done' flag in setPath, which I haven't covered. Could you try this version?
Attachment
po 4. 1. 2021 v 14:58 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Sun, Jan 03, 2021 at 08:41:17PM +0100, Pavel Stehule wrote:
>
> probably some is wrong still
>
> create table foo(a jsonb);
> update foo set a['a'] = '10';
> update foo set a['b']['c'][1] = '10';
> update foo set a['b']['c'][10] = '10'
Thanks for noticing. Indeed, there was a subtle change of meaning for
'done' flag in setPath, which I haven't covered. Could you try this
version?
sure
postgres=# insert into foo values('{}');
INSERT 0 1
postgres=# update foo set a['c']['c'][10] = '10';
UPDATE 1
postgres=# select * from foo;
┌────────────────────────────────────────────────────────────────────────────────┐
│ a │
╞════════════════════════════════════════════════════════════════════════════════╡
│ {"c": {"c": [null, null, null, null, null, null, null, null, null, null, 10]}} │
└────────────────────────────────────────────────────────────────────────────────┘
(1 row)
postgres=# update foo set a['c'][10][10] = '10';
WARNING: problem in alloc set ExprContext: req size > alloc size for chunk 0x151b688 in block 0x151aa90
WARNING: problem in alloc set ExprContext: bogus aset link in block 0x151aa90, chunk 0x151b688
WARNING: problem in alloc set ExprContext: bad size 0 for chunk 0x151b8a0 in block 0x151aa90
WARNING: problem in alloc set ExprContext: bad single-chunk 0x151b8b8 in block 0x151aa90
WARNING: problem in alloc set ExprContext: bogus aset link in block 0x151aa90, chunk 0x151b8b8
WARNING: problem in alloc set ExprContext: found inconsistent memory block 0x151aa90
UPDATE 1
INSERT 0 1
postgres=# update foo set a['c']['c'][10] = '10';
UPDATE 1
postgres=# select * from foo;
┌────────────────────────────────────────────────────────────────────────────────┐
│ a │
╞════════════════════════════════════════════════════════════════════════════════╡
│ {"c": {"c": [null, null, null, null, null, null, null, null, null, null, 10]}} │
└────────────────────────────────────────────────────────────────────────────────┘
(1 row)
postgres=# update foo set a['c'][10][10] = '10';
WARNING: problem in alloc set ExprContext: req size > alloc size for chunk 0x151b688 in block 0x151aa90
WARNING: problem in alloc set ExprContext: bogus aset link in block 0x151aa90, chunk 0x151b688
WARNING: problem in alloc set ExprContext: bad size 0 for chunk 0x151b8a0 in block 0x151aa90
WARNING: problem in alloc set ExprContext: bad single-chunk 0x151b8b8 in block 0x151aa90
WARNING: problem in alloc set ExprContext: bogus aset link in block 0x151aa90, chunk 0x151b8b8
WARNING: problem in alloc set ExprContext: found inconsistent memory block 0x151aa90
UPDATE 1
> On Mon, Jan 04, 2021 at 06:56:17PM +0100, Pavel Stehule wrote: > po 4. 1. 2021 v 14:58 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> > napsal: > postgres=# update foo set a['c']['c'][10] = '10'; > postgres=# update foo set a['c'][10][10] = '10'; Yeah, there was one clumsy memory allocation. On the way I've found and fixed another issue with jsonb generation, right now I don't see any other problems. But as my imagination, despite all the sci-fi I've read this year, is apparently not so versatile, I'll rely on yours, could you please check this version again?
Attachment
Hi
út 5. 1. 2021 v 20:32 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Mon, Jan 04, 2021 at 06:56:17PM +0100, Pavel Stehule wrote:
> po 4. 1. 2021 v 14:58 odesílatel Dmitry Dolgov <9erthalion6@gmail.com>
> napsal:
> postgres=# update foo set a['c']['c'][10] = '10';
> postgres=# update foo set a['c'][10][10] = '10';
Yeah, there was one clumsy memory allocation. On the way I've found and
fixed another issue with jsonb generation, right now I don't see any
other problems. But as my imagination, despite all the sci-fi I've read
this year, is apparently not so versatile, I'll rely on yours, could you
please check this version again?
this case should to raise exception - the value should be changed or error should be raised
postgres=# insert into foo values('{}');
INSERT 0 1
postgres=# update foo set a['a'] = '100';
UPDATE 1
postgres=# select * from foo;
┌────────────┐
│ a │
╞════════════╡
│ {"a": 100} │
└────────────┘
(1 row)
postgres=# update foo set a['a'][1] = '-1';
UPDATE 1
postgres=# select * from foo;
┌────────────┐
│ a │
╞════════════╡
│ {"a": 100} │
└────────────┘
(1 row)
INSERT 0 1
postgres=# update foo set a['a'] = '100';
UPDATE 1
postgres=# select * from foo;
┌────────────┐
│ a │
╞════════════╡
│ {"a": 100} │
└────────────┘
(1 row)
postgres=# update foo set a['a'][1] = '-1';
UPDATE 1
postgres=# select * from foo;
┌────────────┐
│ a │
╞════════════╡
│ {"a": 100} │
└────────────┘
(1 row)
Regards
Pavel
> On Wed, Jan 06, 2021 at 09:22:53PM +0100, Pavel Stehule wrote: > > this case should to raise exception - the value should be changed or error > should be raised > > postgres=# insert into foo values('{}'); > postgres=# update foo set a['a'] = '100'; > postgres=# update foo set a['a'][1] = '-1'; > postgres=# select * from foo; > ┌────────────┐ > │ a │ > ╞════════════╡ > │ {"a": 100} │ > └────────────┘ I was expecting this question, as I've left this like that intentionally because of two reasons: * Opposite to other changes, to implement this one we need to introduce a condition more interfering with normal processing, which raises performance issues for already existing functionality in jsonb_set. * I vaguely recall there was a similar discussion about jsonb_set with the similar solution. For the references what I mean I've attached the third patch, which does this. My opinion would be to not consider it, but I'm fine leaving this decision to committer.
Attachment
čt 7. 1. 2021 v 9:15 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> napsal:
> On Wed, Jan 06, 2021 at 09:22:53PM +0100, Pavel Stehule wrote:
>
> this case should to raise exception - the value should be changed or error
> should be raised
>
> postgres=# insert into foo values('{}');
> postgres=# update foo set a['a'] = '100';
> postgres=# update foo set a['a'][1] = '-1';
> postgres=# select * from foo;
> ┌────────────┐
> │ a │
> ╞════════════╡
> │ {"a": 100} │
> └────────────┘
I was expecting this question, as I've left this like that intentionally
because of two reasons:
* Opposite to other changes, to implement this one we need to introduce
a condition more interfering with normal processing, which raises
performance issues for already existing functionality in jsonb_set.
* I vaguely recall there was a similar discussion about jsonb_set with
the similar solution.
ok.
In this case I have a strong opinion so current behavior is wrong. It can mask errors. There are two correct possibilities
1. raising error - because the update try to apply index on scalar value
2. replace by array - a = {NULL, -1}
But isn't possible ignore assignment
What do people think about it?
For the references what I mean I've attached the third patch, which does
this. My opinion would be to not consider it, but I'm fine leaving this
decision to committer.
On Thu Jan 7, 2021 at 3:24 AM EST, Pavel Stehule wrote: > čt 7. 1. 2021 v 9:15 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> > napsal: > > > > On Wed, Jan 06, 2021 at 09:22:53PM +0100, Pavel Stehule wrote: > > > > > > this case should to raise exception - the value should be changed or > > error > > > should be raised > > > > > > postgres=# insert into foo values('{}'); > > > postgres=# update foo set a['a'] = '100'; > > > postgres=# update foo set a['a'][1] = '-1'; > > > postgres=# select * from foo; > > > ┌────────────┐ > > > │ a │ > > > ╞════════════╡ > > > │ {"a": 100} │ > > > └────────────┘ > > > > I was expecting this question, as I've left this like that intentionally > > because of two reasons: > > > > * Opposite to other changes, to implement this one we need to introduce > > a condition more interfering with normal processing, which raises > > performance issues for already existing functionality in jsonb_set. > > > > * I vaguely recall there was a similar discussion about jsonb_set with > > the similar solution. > > > > ok. > > In this case I have a strong opinion so current behavior is wrong. It > can > mask errors. There are two correct possibilities > > 1. raising error - because the update try to apply index on scalar value > > 2. replace by array - a = {NULL, -1} > > But isn't possible ignore assignment > > What do people think about it? I've been following this thread looking forward to the feature and was all set to come in on the side of raising an exception here, but then I tried it in a JS REPL: ; a = {} {} ; a['a'] = '100' '100' ; a['a'][1] = -1 -1 ; a { a: '100' } ; b = {} {} ; b['b'] = 100 100 ; b['b'][1] = -1 -1 ; b { b: 100 } Even when the value shouldn't be subscriptable _at all_, the invalid assignment is ignored silently. But since the patch follows some of JavaScript's more idiosyncratic leads in other respects (e.g. padding out arrays with nulls when something is inserted at a higher subscript), the current behavior makes at least as much sense as JavaScript's canonical behavior. There's also the bulk update case to think about. An error makes sense when there's only one tuple being affected at a time, but with 1000 tuples, should a few no-ops where the JSON turns out to be a structural mismatch stop the rest from changing correctly? What's the alternative? The only answer I've got is double-checking the structure in the WHERE clause, which seems like a lot of effort to go to for something that's supposed to make working with JSON easier. Changing the surrounding structure (e.g. turning a['a'] into an array) seems much more surprising than the no-op, and more likely to have unforeseen consequences in client code working with the JSON. Ignoring invalid assignments -- like JavaScript itself -- seems like the best solution to me.
so 9. 1. 2021 v 21:06 odesílatel Dian M Fay <dian.m.fay@gmail.com> napsal:
On Thu Jan 7, 2021 at 3:24 AM EST, Pavel Stehule wrote:
> čt 7. 1. 2021 v 9:15 odesílatel Dmitry Dolgov <9erthalion6@gmail.com>
> napsal:
>
> > > On Wed, Jan 06, 2021 at 09:22:53PM +0100, Pavel Stehule wrote:
> > >
> > > this case should to raise exception - the value should be changed or
> > error
> > > should be raised
> > >
> > > postgres=# insert into foo values('{}');
> > > postgres=# update foo set a['a'] = '100';
> > > postgres=# update foo set a['a'][1] = '-1';
> > > postgres=# select * from foo;
> > > ┌────────────┐
> > > │ a │
> > > ╞════════════╡
> > > │ {"a": 100} │
> > > └────────────┘
> >
> > I was expecting this question, as I've left this like that intentionally
> > because of two reasons:
> >
> > * Opposite to other changes, to implement this one we need to introduce
> > a condition more interfering with normal processing, which raises
> > performance issues for already existing functionality in jsonb_set.
> >
> > * I vaguely recall there was a similar discussion about jsonb_set with
> > the similar solution.
> >
>
> ok.
>
> In this case I have a strong opinion so current behavior is wrong. It
> can
> mask errors. There are two correct possibilities
>
> 1. raising error - because the update try to apply index on scalar value
>
> 2. replace by array - a = {NULL, -1}
>
> But isn't possible ignore assignment
>
> What do people think about it?
I've been following this thread looking forward to the feature and was
all set to come in on the side of raising an exception here, but then I
tried it in a JS REPL:
; a = {}
{}
; a['a'] = '100'
'100'
; a['a'][1] = -1
-1
; a
{ a: '100' }
; b = {}
{}
; b['b'] = 100
100
; b['b'][1] = -1
-1
; b
{ b: 100 }
Even when the value shouldn't be subscriptable _at all_, the invalid
assignment is ignored silently. But since the patch follows some of
JavaScript's more idiosyncratic leads in other respects (e.g. padding
out arrays with nulls when something is inserted at a higher subscript),
the current behavior makes at least as much sense as JavaScript's
canonical behavior.
There's also the bulk update case to think about. An error makes sense
when there's only one tuple being affected at a time, but with 1000
tuples, should a few no-ops where the JSON turns out to be a structural
mismatch stop the rest from changing correctly? What's the alternative?
The only answer I've got is double-checking the structure in the WHERE
clause, which seems like a lot of effort to go to for something that's
supposed to make working with JSON easier.
Changing the surrounding structure (e.g. turning a['a'] into an array)
seems much more surprising than the no-op, and more likely to have
unforeseen consequences in client code working with the JSON. Ignoring
invalid assignments -- like JavaScript itself -- seems like the best
solution to me.
We don't need 100% compatibility in possible buggy behaviour.
I very much disliked the situation when the system reports ok, but the operation was ignored. It is pretty hard to identify bugs in this system.
What can be the benefit or use case for this behavior? JavaScript is designed for use in web browsers - and a lot of technologies there are fault tolerant. But this is a database. I would like to know about all the errors there.
On Sat Jan 9, 2021 at 3:34 PM EST, Pavel Stehule wrote: > so 9. 1. 2021 v 21:06 odesílatel Dian M Fay <dian.m.fay@gmail.com> > napsal: > > > On Thu Jan 7, 2021 at 3:24 AM EST, Pavel Stehule wrote: > > > čt 7. 1. 2021 v 9:15 odesílatel Dmitry Dolgov <9erthalion6@gmail.com> > > > napsal: > > > > > > > > On Wed, Jan 06, 2021 at 09:22:53PM +0100, Pavel Stehule wrote: > > > > > > > > > > this case should to raise exception - the value should be changed or > > > > error > > > > > should be raised > > > > > > > > > > postgres=# insert into foo values('{}'); > > > > > postgres=# update foo set a['a'] = '100'; > > > > > postgres=# update foo set a['a'][1] = '-1'; > > > > > postgres=# select * from foo; > > > > > ┌────────────┐ > > > > > │ a │ > > > > > ╞════════════╡ > > > > > │ {"a": 100} │ > > > > > └────────────┘ > > > > > > > > I was expecting this question, as I've left this like that > > intentionally > > > > because of two reasons: > > > > > > > > * Opposite to other changes, to implement this one we need to introduce > > > > a condition more interfering with normal processing, which raises > > > > performance issues for already existing functionality in jsonb_set. > > > > > > > > * I vaguely recall there was a similar discussion about jsonb_set with > > > > the similar solution. > > > > > > > > > > ok. > > > > > > In this case I have a strong opinion so current behavior is wrong. It > > > can > > > mask errors. There are two correct possibilities > > > > > > 1. raising error - because the update try to apply index on scalar value > > > > > > 2. replace by array - a = {NULL, -1} > > > > > > But isn't possible ignore assignment > > > > > > What do people think about it? > > > > I've been following this thread looking forward to the feature and was > > all set to come in on the side of raising an exception here, but then I > > tried it in a JS REPL: > > > > ; a = {} > > {} > > ; a['a'] = '100' > > '100' > > ; a['a'][1] = -1 > > -1 > > ; a > > { a: '100' } > > > > ; b = {} > > {} > > ; b['b'] = 100 > > 100 > > ; b['b'][1] = -1 > > -1 > > ; b > > { b: 100 } > > > > Even when the value shouldn't be subscriptable _at all_, the invalid > > assignment is ignored silently. But since the patch follows some of > > JavaScript's more idiosyncratic leads in other respects (e.g. padding > > out arrays with nulls when something is inserted at a higher subscript), > > the current behavior makes at least as much sense as JavaScript's > > canonical behavior. > > > > There's also the bulk update case to think about. An error makes sense > > when there's only one tuple being affected at a time, but with 1000 > > tuples, should a few no-ops where the JSON turns out to be a structural > > mismatch stop the rest from changing correctly? What's the alternative? > > The only answer I've got is double-checking the structure in the WHERE > > clause, which seems like a lot of effort to go to for something that's > > supposed to make working with JSON easier. > > > > Changing the surrounding structure (e.g. turning a['a'] into an array) > > seems much more surprising than the no-op, and more likely to have > > unforeseen consequences in client code working with the JSON. Ignoring > > invalid assignments -- like JavaScript itself -- seems like the best > > solution to me. > > > > We don't need 100% compatibility in possible buggy behaviour. > > I very much disliked the situation when the system reports ok, but the > operation was ignored. It is pretty hard to identify bugs in this > system. > > What can be the benefit or use case for this behavior? JavaScript is > designed for use in web browsers - and a lot of technologies there are > fault tolerant. But this is a database. I would like to know about all > the > errors there. I'm thinking of the update path as a kind of implicit schema. JSON is intentionally not bound to any schema on creation, so I don't see a failure to enforce another schema at runtime (and outside the WHERE clause, at that) as an error exactly. But I looked into the bulk case a little further, and "outside the WHERE clause" cuts both ways. The server reports an update whether or not the JSON could have been modified, which suggests triggers will fire for no-op updates. That's more clearly a problem. insert into j (val) values ('{"a": 100}'), ('{"a": "200"}'), ('{"b": "300"}'), ('{"c": {"d": 400}}'), ('{"a": {"z": 500}}'); INSERT 0 5 update j set val['a']['z'] = '600' returning *; val ──────────────────────────────────── {"a": 100} {"a": "200"} {"a": {"z": 600}, "b": "300"} {"a": {"z": 600}, "c": {"d": 400}} {"a": {"z": 600}} (5 rows) *UPDATE 5*
Hi
I'm thinking of the update path as a kind of implicit schema. JSON is
intentionally not bound to any schema on creation, so I don't see a
failure to enforce another schema at runtime (and outside the WHERE
clause, at that) as an error exactly.
This concept is not consistent with other implemented behaviour.
1. The schema is dynamically enhanced - so although the path doesn't exists, it is created and data are changed
postgres=# create table foo(a jsonb);
CREATE TABLE
postgres=# insert into foo values('{}');
INSERT 0 1
postgres=# update foo set a['a']['a'][10] = '0';
UPDATE 1
postgres=# select * from foo;
┌───────────────────────────────────────────────────────────────────────────────┐
│ a │
╞═══════════════════════════════════════════════════════════════════════════════╡
│ {"a": {"a": [null, null, null, null, null, null, null, null, null, null, 0]}} │
└───────────────────────────────────────────────────────────────────────────────┘
(1 row)
CREATE TABLE
postgres=# insert into foo values('{}');
INSERT 0 1
postgres=# update foo set a['a']['a'][10] = '0';
UPDATE 1
postgres=# select * from foo;
┌───────────────────────────────────────────────────────────────────────────────┐
│ a │
╞═══════════════════════════════════════════════════════════════════════════════╡
│ {"a": {"a": [null, null, null, null, null, null, null, null, null, null, 0]}} │
└───────────────────────────────────────────────────────────────────────────────┘
(1 row)
So although the path [a,a,10] was not exists, it was created.
2. this update fails (and it is correct)
postgres=# update foo set a['a']['a']['b'] = '0';
ERROR: path element at position 3 is not an integer: "b"
ERROR: path element at position 3 is not an integer: "b"
although the path [a,a,b] doesn't exists, and it is not ignored.
This implementation doesn't do only UPDATE (and then analogy with WHERE clause isn't fully adequate). It does MERGE. This is necessary, because without it, the behaviour will be pretty unfriendly - because there is not any external schema. I think so this is important - and it can be little bit messy. I am not sure if I use correct technical terms - we try to use LAX update in first step, and if it is not successful, then we try to do LAX insert. This is maybe correct from JSON semantic - but for developer it is unfriendly, because he hasn't possibility to detect if insert was or was not successful. In special JSON functions I can control behave and can specify LAX or STRICT how it is necessity. But in this interface (subscripting) this possibility is missing.
I think so there should be final check (semantically) if value was updated, and if the value was changed. If not, then error should be raised. It should be very similar like RLS update. I know and I understand so there should be more than one possible implementations, but safe is only one - after successful update I would to see new value inside, and when it is not possible, then I expect exception. I think so it is more practical too. I can control filtering with WHERE clause. But I cannot to control MERGE process. Manual recheck after every update can be terrible slow.
Regards
Pavel
But I looked into the bulk case a little further, and "outside the
WHERE clause" cuts both ways. The server reports an update whether or
not the JSON could have been modified, which suggests triggers will
fire for no-op updates. That's more clearly a problem.
insert into j (val) values
('{"a": 100}'),
('{"a": "200"}'),
('{"b": "300"}'),
('{"c": {"d": 400}}'),
('{"a": {"z": 500}}');
INSERT 0 5
update j set val['a']['z'] = '600' returning *;
val
────────────────────────────────────
{"a": 100}
{"a": "200"}
{"a": {"z": 600}, "b": "300"}
{"a": {"z": 600}, "c": {"d": 400}}
{"a": {"z": 600}}
(5 rows)
*UPDATE 5*
ne 10. 1. 2021 v 19:52 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:
Hi
I'm thinking of the update path as a kind of implicit schema. JSON is
intentionally not bound to any schema on creation, so I don't see a
failure to enforce another schema at runtime (and outside the WHERE
clause, at that) as an error exactly.This concept is not consistent with other implemented behaviour.1. The schema is dynamically enhanced - so although the path doesn't exists, it is created and data are changedpostgres=# create table foo(a jsonb);
CREATE TABLE
postgres=# insert into foo values('{}');
INSERT 0 1
postgres=# update foo set a['a']['a'][10] = '0';
UPDATE 1
postgres=# select * from foo;
┌───────────────────────────────────────────────────────────────────────────────┐
│ a │
╞═══════════════════════════════════════════════════════════════════════════════╡
│ {"a": {"a": [null, null, null, null, null, null, null, null, null, null, 0]}} │
└───────────────────────────────────────────────────────────────────────────────┘
(1 row)So although the path [a,a,10] was not exists, it was created.2. this update fails (and it is correct)postgres=# update foo set a['a']['a']['b'] = '0';
ERROR: path element at position 3 is not an integer: "b"although the path [a,a,b] doesn't exists, and it is not ignored.This implementation doesn't do only UPDATE (and then analogy with WHERE clause isn't fully adequate). It does MERGE. This is necessary, because without it, the behaviour will be pretty unfriendly - because there is not any external schema. I think so this is important - and it can be little bit messy. I am not sure if I use correct technical terms - we try to use LAX update in first step, and if it is not successful, then we try to do LAX insert. This is maybe correct from JSON semantic - but for developer it is unfriendly, because he hasn't possibility to detect if insert was or was not successful. In special JSON functions I can control behave and can specify LAX or STRICT how it is necessity. But in this interface (subscripting) this possibility is missing.I think so there should be final check (semantically) if value was updated, and if the value was changed. If not, then error should be raised. It should be very similar like RLS update. I know and I understand so there should be more than one possible implementations, but safe is only one - after successful update I would to see new value inside, and when it is not possible, then I expect exception. I think so it is more practical too. I can control filtering with WHERE clause. But I cannot to control MERGE process. Manual recheck after every update can be terrible slow.
I tested behaviour and I didn't find anything other than the mentioned issue.
Now I can check this feature from plpgsql, and it is working. Because there is no special support in plpgsql runtime, the update of jsonb is significantly slower than in update of arrays, and looks so update of jsonb has O(N2) cost. I don't think it is important at this moment - more important is fact, so I didn't find any memory problems.
postgres=# do $$
declare v int[] = array_fill(0, ARRAY[10,10,10,20]);
begin
for i1 in 1..10 loop
for i2 in 1..10 loop
for i3 in 1..10 loop
for i4 in 1..20 loop
v[i1][i2][i3][i4] = 10;
raise notice '% % % %', i1, i2, i3, i4;
end loop;
end loop;
end loop;
end loop;
end;
$$;
declare v int[] = array_fill(0, ARRAY[10,10,10,20]);
begin
for i1 in 1..10 loop
for i2 in 1..10 loop
for i3 in 1..10 loop
for i4 in 1..20 loop
v[i1][i2][i3][i4] = 10;
raise notice '% % % %', i1, i2, i3, i4;
end loop;
end loop;
end loop;
end loop;
end;
$$;
postgres=# do $$
declare v jsonb;
begin
for i1 in 1..10 loop
for i2 in 1..10 loop
for i3 in 1..10 loop
for i4 in 1..20 loop
v[i1][i2][i3][i4] = '10'::jsonb;
raise notice '% % % %', i1, i2, i3, i4;
end loop;
end loop;
end loop;
end loop;
end;
$$;
declare v jsonb;
begin
for i1 in 1..10 loop
for i2 in 1..10 loop
for i3 in 1..10 loop
for i4 in 1..20 loop
v[i1][i2][i3][i4] = '10'::jsonb;
raise notice '% % % %', i1, i2, i3, i4;
end loop;
end loop;
end loop;
end loop;
end;
$$;
There are some unwanted white spaces
+ Jsonb *jsonbSource = DatumGetJsonbP(*op->resvalue);
+ sbsrefstate->prevvalue = jsonb_get_element(jsonbSource,
+ sbsrefstate->upperindex,
+ sbsrefstate->numupper,
+ &sbsrefstate->prevnull,
+ false);
+ sbsrefstate->prevvalue = jsonb_get_element(jsonbSource,
+ sbsrefstate->upperindex,
+ sbsrefstate->numupper,
+ &sbsrefstate->prevnull,
+ false);
+ workspace = palloc0(MAXALIGN(sizeof(JsonbSubWorkspace)) +
+ nupper * (sizeof(Datum) + sizeof(Oid)));
+ workspace->expectArray = false;
+ ptr = ((char *) workspace) + MAXALIGN(sizeof(JsonbSubWorkspace));
+ nupper * (sizeof(Datum) + sizeof(Oid)));
+ workspace->expectArray = false;
+ ptr = ((char *) workspace) + MAXALIGN(sizeof(JsonbSubWorkspace));
Regards
Pavel
RegardsPavel
But I looked into the bulk case a little further, and "outside the
WHERE clause" cuts both ways. The server reports an update whether or
not the JSON could have been modified, which suggests triggers will
fire for no-op updates. That's more clearly a problem.
insert into j (val) values
('{"a": 100}'),
('{"a": "200"}'),
('{"b": "300"}'),
('{"c": {"d": 400}}'),
('{"a": {"z": 500}}');
INSERT 0 5
update j set val['a']['z'] = '600' returning *;
val
────────────────────────────────────
{"a": 100}
{"a": "200"}
{"a": {"z": 600}, "b": "300"}
{"a": {"z": 600}, "c": {"d": 400}}
{"a": {"z": 600}}
(5 rows)
*UPDATE 5*
> On Tue, Jan 12, 2021 at 08:02:59PM +0100, Pavel Stehule wrote: > ne 10. 1. 2021 v 19:52 odesílatel Pavel Stehule <pavel.stehule@gmail.com> > napsal: > > I tested behaviour and I didn't find anything other than the mentioned > issue. > > Now I can check this feature from plpgsql, and it is working. Because there > is no special support in plpgsql runtime, the update of jsonb is > significantly slower than in update of arrays, and looks so update of jsonb > has O(N2) cost. I don't think it is important at this moment - more > important is fact, so I didn't find any memory problems. Thanks for testing. Regarding updates when the structure doesn't match provided path as I've mentioned I don't have strong preferences, but on the second though probably more inclined for returning an error in this case. Since there are pros and cons for both suggestions, it could be decided by vote majority between no update (Dian) or an error (Pavel, me) options. Any +1 to one of the options from others? Other than that, since I've already posted the patch for returning an error option, it seems that the only thing left is to decide with which version to go.
On Thu Jan 14, 2021 at 10:04 AM EST, Dmitry Dolgov wrote: > > On Tue, Jan 12, 2021 at 08:02:59PM +0100, Pavel Stehule wrote: > > ne 10. 1. 2021 v 19:52 odesílatel Pavel Stehule <pavel.stehule@gmail.com> > > napsal: > > > > I tested behaviour and I didn't find anything other than the mentioned > > issue. > > > > Now I can check this feature from plpgsql, and it is working. Because there > > is no special support in plpgsql runtime, the update of jsonb is > > significantly slower than in update of arrays, and looks so update of jsonb > > has O(N2) cost. I don't think it is important at this moment - more > > important is fact, so I didn't find any memory problems. > > Thanks for testing. Regarding updates when the structure doesn't match > provided path as I've mentioned I don't have strong preferences, but on > the second though probably more inclined for returning an error in this > case. Since there are pros and cons for both suggestions, it could be > decided by vote majority between no update (Dian) or an error (Pavel, > me) options. Any +1 to one of the options from others? > > Other than that, since I've already posted the patch for returning an > error option, it seems that the only thing left is to decide with which > version to go. The trigger issue (which I did verify) makes the "no update" option unworkable imo, JavaScript's behavior notwithstanding. But it should be called out very clearly in the documentation, since it does depart from what people more familiar with that behavior may expect. Here's a quick draft, based on your v44 patch: <para> <type>jsonb</type> data type supports array-style subscripting expressions to extract or update particular elements. It's possible to use multiple subscripting expressions to extract nested values. In this case, a chain of subscripting expressions follows the same rules as the <literal>path</literal> argument in <literal>jsonb_set</literal> function, e.g. in case of arrays it is a 0-based operation or that negative integers that appear in <literal>path</literal> count from the end of JSON arrays. The result of subscripting expressions is always of the jsonb data type. </para> <para> <command>UPDATE</command> statements may use subscripting in the <literal>SET</literal> clause to modify <type>jsonb</type> values. Every affected value must conform to the path defined by the subscript(s). If the path cannot be followed to its end for any individual value (e.g. <literal>val['a']['b']['c']</literal> where <literal>val['a']</literal> or <literal>val['b']</literal> is null, a string, or a number), an error is raised even if other values do conform. </para> <para> An example of subscripting syntax:
čt 14. 1. 2021 v 18:09 odesílatel Dian M Fay <dian.m.fay@gmail.com> napsal:
On Thu Jan 14, 2021 at 10:04 AM EST, Dmitry Dolgov wrote:
> > On Tue, Jan 12, 2021 at 08:02:59PM +0100, Pavel Stehule wrote:
> > ne 10. 1. 2021 v 19:52 odesílatel Pavel Stehule <pavel.stehule@gmail.com>
> > napsal:
> >
> > I tested behaviour and I didn't find anything other than the mentioned
> > issue.
> >
> > Now I can check this feature from plpgsql, and it is working. Because there
> > is no special support in plpgsql runtime, the update of jsonb is
> > significantly slower than in update of arrays, and looks so update of jsonb
> > has O(N2) cost. I don't think it is important at this moment - more
> > important is fact, so I didn't find any memory problems.
>
> Thanks for testing. Regarding updates when the structure doesn't match
> provided path as I've mentioned I don't have strong preferences, but on
> the second though probably more inclined for returning an error in this
> case. Since there are pros and cons for both suggestions, it could be
> decided by vote majority between no update (Dian) or an error (Pavel,
> me) options. Any +1 to one of the options from others?
>
> Other than that, since I've already posted the patch for returning an
> error option, it seems that the only thing left is to decide with which
> version to go.
The trigger issue (which I did verify) makes the "no update" option
unworkable imo, JavaScript's behavior notwithstanding. But it should be
called out very clearly in the documentation, since it does depart from
what people more familiar with that behavior may expect. Here's a quick
draft, based on your v44 patch:
<para>
<type>jsonb</type> data type supports array-style subscripting expressions
to extract or update particular elements. It's possible to use multiple
subscripting expressions to extract nested values. In this case, a chain of
subscripting expressions follows the same rules as the
<literal>path</literal> argument in <literal>jsonb_set</literal> function,
e.g. in case of arrays it is a 0-based operation or that negative integers
that appear in <literal>path</literal> count from the end of JSON arrays.
The result of subscripting expressions is always of the jsonb data type.
</para>
<para>
<command>UPDATE</command> statements may use subscripting in the
<literal>SET</literal> clause to modify <type>jsonb</type> values. Every
affected value must conform to the path defined by the subscript(s). If the
path cannot be followed to its end for any individual value (e.g.
<literal>val['a']['b']['c']</literal> where <literal>val['a']</literal> or
<literal>val['b']</literal> is null, a string, or a number), an error is
raised even if other values do conform.
</para>
<para>
An example of subscripting syntax:
+1
Pavel
> On Thu, Jan 14, 2021 at 12:02:42PM -0500, Dian M Fay wrote: > > Other than that, since I've already posted the patch for returning an > > error option, it seems that the only thing left is to decide with which > > version to go. > > The trigger issue (which I did verify) makes the "no update" option > unworkable imo, JavaScript's behavior notwithstanding. But it should be > called out very clearly in the documentation, since it does depart from > what people more familiar with that behavior may expect. Here's a quick > draft, based on your v44 patch: > > <para> > <type>jsonb</type> data type supports array-style subscripting expressions > to extract or update particular elements. It's possible to use multiple > subscripting expressions to extract nested values. In this case, a chain of > subscripting expressions follows the same rules as the > <literal>path</literal> argument in <literal>jsonb_set</literal> function, > e.g. in case of arrays it is a 0-based operation or that negative integers > that appear in <literal>path</literal> count from the end of JSON arrays. > The result of subscripting expressions is always of the jsonb data type. > </para> > <para> > <command>UPDATE</command> statements may use subscripting in the > <literal>SET</literal> clause to modify <type>jsonb</type> values. Every > affected value must conform to the path defined by the subscript(s). If the > path cannot be followed to its end for any individual value (e.g. > <literal>val['a']['b']['c']</literal> where <literal>val['a']</literal> or > <literal>val['b']</literal> is null, a string, or a number), an error is > raised even if other values do conform. > </para> > <para> > An example of subscripting syntax: Yes, makes sense. I've incorporated your suggestion into the last patch, thanks.
Attachment
Hi
I found minor issues.
Doc - missing tag
and three whitespaces issues
see attached patch
Following sentence is hard to read due long nested example
If the
+ path contradicts structure of modified <type>jsonb</type> for any individual
+ value (e.g. path <literal>val['a']['b']['c']</literal> assumes keys
+ <literal>'a'</literal> and <literal>'b'</literal> have object values
+ assigned to them, but if <literal>val['a']</literal> or
+ <literal>val['b']</literal> is null, a string, or a number, then the path
+ contradicts with the existing structure), an error is raised even if other
+ values do conform.
+ path contradicts structure of modified <type>jsonb</type> for any individual
+ value (e.g. path <literal>val['a']['b']['c']</literal> assumes keys
+ <literal>'a'</literal> and <literal>'b'</literal> have object values
+ assigned to them, but if <literal>val['a']</literal> or
+ <literal>val['b']</literal> is null, a string, or a number, then the path
+ contradicts with the existing structure), an error is raised even if other
+ values do conform.
It can be divided into two sentences - predicate, and example.
Regards
Pavel
Attachment
On Tue Jan 19, 2021 at 1:42 PM EST, Pavel Stehule wrote: > Hi > > I found minor issues. > > Doc - missing tag > > and three whitespaces issues > > see attached patch > > Following sentence is hard to read due long nested example > > If the > + path contradicts structure of modified <type>jsonb</type> for any > individual > + value (e.g. path <literal>val['a']['b']['c']</literal> assumes keys > + <literal>'a'</literal> and <literal>'b'</literal> have object values > + assigned to them, but if <literal>val['a']</literal> or > + <literal>val['b']</literal> is null, a string, or a number, then the > path > + contradicts with the existing structure), an error is raised even if > other > + values do conform. > > It can be divided into two sentences - predicate, and example. > > Regards > > Pavel Here's a full editing pass on the documentation, with v45 and Pavel's doc-whitespaces-fix.patch applied. I also corrected a typo in one of the added hints.
Attachment
> On Tue Jan 19, 2021 at 1:42 PM EST, Pavel Stehule wrote: > > I found minor issues. > > Doc - missing tag > > and three whitespaces issues > > see attached patch Thanks, I need to remember to not skipp doc building for testing process even for such small changes. Hope now I didn't forget anything. > On Wed, Jan 20, 2021 at 09:58:43AM -0500, Dian M Fay wrote: > Here's a full editing pass on the documentation, with v45 and Pavel's > doc-whitespaces-fix.patch applied. I also corrected a typo in one of the > added hints. Great! I've applied almost all of it, except: + A <type>jsonb</type> value will accept assignments to nonexistent subscript + paths as long as the nonexistent elements being traversed are all arrays. Maybe I've misunderstood the intention, but there is no requirement about arrays for creating such an empty path. I've formulated it as: + A <type>jsonb</type> value will accept assignments to nonexistent subscript + paths as long as the last existing path key is an object or an array.
Attachment
On Wed Jan 20, 2021 at 11:22 AM EST, Dmitry Dolgov wrote: > > On Tue Jan 19, 2021 at 1:42 PM EST, Pavel Stehule wrote: > > > > I found minor issues. > > > > Doc - missing tag > > > > and three whitespaces issues > > > > see attached patch > > Thanks, I need to remember to not skipp doc building for testing process > even for such small changes. Hope now I didn't forget anything. > > > On Wed, Jan 20, 2021 at 09:58:43AM -0500, Dian M Fay wrote: > > > Here's a full editing pass on the documentation, with v45 and Pavel's > > doc-whitespaces-fix.patch applied. I also corrected a typo in one of the > > added hints. > > Great! I've applied almost all of it, except: > > + A <type>jsonb</type> value will accept assignments to nonexistent > subscript > + paths as long as the nonexistent elements being traversed are all > arrays. > > Maybe I've misunderstood the intention, but there is no requirement > about arrays for creating such an empty path. I've formulated it as: > > + A <type>jsonb</type> value will accept assignments to nonexistent > subscript > + paths as long as the last existing path key is an object or an array. My intention there was to highlight the difference between: * SET obj['a']['b']['c'] = '"newvalue"' * SET arr[0][0][3] = '"newvalue"' obj has to conform to {"a": {"b": {...}}} in order to receive the assignment of the nested c. If it doesn't, that's the error case we discussed earlier. But arr can be null, [], and so on, and any missing structure [[[null, null, null, "newvalue"]]] will be created. Take 2: A <type>jsonb</type> value will accept assignments to nonexistent subscript paths as long as object key subscripts can be traversed as described above. The final subscript is not traversed and, if it describes a missing object key, will be created. Nested arrays will always be created and <literal>NULL</literal>-padded according to the path until the value can be placed appropriately.
> On Wed, Jan 20, 2021 at 11:34:16AM -0500, Dian M Fay wrote: > > Thanks, I need to remember to not skipp doc building for testing process > > even for such small changes. Hope now I didn't forget anything. > > > > > On Wed, Jan 20, 2021 at 09:58:43AM -0500, Dian M Fay wrote: > > > > > Here's a full editing pass on the documentation, with v45 and Pavel's > > > doc-whitespaces-fix.patch applied. I also corrected a typo in one of the > > > added hints. > > > > Great! I've applied almost all of it, except: > > > > + A <type>jsonb</type> value will accept assignments to nonexistent > > subscript > > + paths as long as the nonexistent elements being traversed are all > > arrays. > > > > Maybe I've misunderstood the intention, but there is no requirement > > about arrays for creating such an empty path. I've formulated it as: > > > > + A <type>jsonb</type> value will accept assignments to nonexistent > > subscript > > + paths as long as the last existing path key is an object or an array. > > My intention there was to highlight the difference between: > > * SET obj['a']['b']['c'] = '"newvalue"' > * SET arr[0][0][3] = '"newvalue"' > > obj has to conform to {"a": {"b": {...}}} in order to receive the > assignment of the nested c. If it doesn't, that's the error case we > discussed earlier. But arr can be null, [], and so on, and any missing > structure [[[null, null, null, "newvalue"]]] will be created. If arr is 'null', or any other scalar value, such subscripting will work only one level deep because they represented internally as an array of one element. If arr is '[]' the path will comply by definition. So it's essentially the same as for objects with no particular difference. If such a quirk about scalars being treated like arrays is bothering, we could also bend it in this case as well (see the attached version).
Attachment
On Wed Jan 20, 2021 at 2:08 PM EST, Dmitry Dolgov wrote: > > On Wed, Jan 20, 2021 at 11:34:16AM -0500, Dian M Fay wrote: > > > Thanks, I need to remember to not skipp doc building for testing process > > > even for such small changes. Hope now I didn't forget anything. > > > > > > > On Wed, Jan 20, 2021 at 09:58:43AM -0500, Dian M Fay wrote: > > > > > > > Here's a full editing pass on the documentation, with v45 and Pavel's > > > > doc-whitespaces-fix.patch applied. I also corrected a typo in one of the > > > > added hints. > > > > > > Great! I've applied almost all of it, except: > > > > > > + A <type>jsonb</type> value will accept assignments to nonexistent > > > subscript > > > + paths as long as the nonexistent elements being traversed are all > > > arrays. > > > > > > Maybe I've misunderstood the intention, but there is no requirement > > > about arrays for creating such an empty path. I've formulated it as: > > > > > > + A <type>jsonb</type> value will accept assignments to nonexistent > > > subscript > > > + paths as long as the last existing path key is an object or an array. > > > > My intention there was to highlight the difference between: > > > > * SET obj['a']['b']['c'] = '"newvalue"' > > * SET arr[0][0][3] = '"newvalue"' > > > > obj has to conform to {"a": {"b": {...}}} in order to receive the > > assignment of the nested c. If it doesn't, that's the error case we > > discussed earlier. But arr can be null, [], and so on, and any missing > > structure [[[null, null, null, "newvalue"]]] will be created. > > If arr is 'null', or any other scalar value, such subscripting will work > only one level deep because they represented internally as an array of > one element. If arr is '[]' the path will comply by definition. So it's > essentially the same as for objects with no particular difference. If > such a quirk about scalars being treated like arrays is bothering, we > could also bend it in this case as well (see the attached version). I missed that distinction in the original UPDATE paragraph too. Here's another revision based on v48.
Attachment
> On Wed, Jan 20, 2021 at 11:37:32PM -0500, Dian M Fay wrote: > On Wed Jan 20, 2021 at 2:08 PM EST, Dmitry Dolgov wrote: > > > On Wed, Jan 20, 2021 at 11:34:16AM -0500, Dian M Fay wrote: > > > > Thanks, I need to remember to not skipp doc building for testing process > > > > even for such small changes. Hope now I didn't forget anything. > > > > > > > > > On Wed, Jan 20, 2021 at 09:58:43AM -0500, Dian M Fay wrote: > > > > > > > > > Here's a full editing pass on the documentation, with v45 and Pavel's > > > > > doc-whitespaces-fix.patch applied. I also corrected a typo in one of the > > > > > added hints. > > > > > > > > Great! I've applied almost all of it, except: > > > > > > > > + A <type>jsonb</type> value will accept assignments to nonexistent > > > > subscript > > > > + paths as long as the nonexistent elements being traversed are all > > > > arrays. > > > > > > > > Maybe I've misunderstood the intention, but there is no requirement > > > > about arrays for creating such an empty path. I've formulated it as: > > > > > > > > + A <type>jsonb</type> value will accept assignments to nonexistent > > > > subscript > > > > + paths as long as the last existing path key is an object or an array. > > > > > > My intention there was to highlight the difference between: > > > > > > * SET obj['a']['b']['c'] = '"newvalue"' > > > * SET arr[0][0][3] = '"newvalue"' > > > > > > obj has to conform to {"a": {"b": {...}}} in order to receive the > > > assignment of the nested c. If it doesn't, that's the error case we > > > discussed earlier. But arr can be null, [], and so on, and any missing > > > structure [[[null, null, null, "newvalue"]]] will be created. > > > > If arr is 'null', or any other scalar value, such subscripting will work > > only one level deep because they represented internally as an array of > > one element. If arr is '[]' the path will comply by definition. So it's > > essentially the same as for objects with no particular difference. If > > such a quirk about scalars being treated like arrays is bothering, we > > could also bend it in this case as well (see the attached version). > > I missed that distinction in the original UPDATE paragraph too. Here's > another revision based on v48. Looks good, I've applied it, thanks.
Attachment
Hi
Looks good, I've applied it, thanks.
I tested last set of patches
1. There is no problem with patching and compilation
2. make check-world passed
3. build doc without problems
4. I have not any objections against implemented functionality, implementation and tests
I'll mark this patch as ready for committers
Thank you for your work. It will be nice feature
Regards
Pavel
On Thu, Jan 21, 2021 at 11:14 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: >> Looks good, I've applied it, thanks. > > I tested last set of patches > > 1. There is no problem with patching and compilation > 2. make check-world passed > 3. build doc without problems > 4. I have not any objections against implemented functionality, implementation and tests > > I'll mark this patch as ready for committers > > Thank you for your work. It will be nice feature I've skimmed through the thread, it seems that consensus over functionality is reached. Patchset themself looks good for me. I'm going to push this if no objections. ------ Regards, Alexander Korotkov
On Fri, Jan 29, 2021 at 7:01 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > On Thu, Jan 21, 2021 at 11:14 PM Pavel Stehule <pavel.stehule@gmail.com> wrote: > >> Looks good, I've applied it, thanks. > > > > I tested last set of patches > > > > 1. There is no problem with patching and compilation > > 2. make check-world passed > > 3. build doc without problems > > 4. I have not any objections against implemented functionality, implementation and tests > > > > I'll mark this patch as ready for committers > > > > Thank you for your work. It will be nice feature > > I've skimmed through the thread, it seems that consensus over > functionality is reached. Patchset themself looks good for me. I'm > going to push this if no objections. Pushed with minor cleanup. ------ Regards, Alexander Korotkov
Alexander Korotkov <aekorotkov@gmail.com> writes: > Pushed with minor cleanup. thorntail seems unhappy: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thorntail&dt=2021-01-31%2020%3A58%3A12 ======-=-====== stack trace: pgsql.build/src/test/regress/tmp_check/data/core ======-=-====== [New LWP 2266507] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/sparc64-linux-gnu/libthread_db.so.1". Core was generated by `postgres: nm regression [local] SELECT '. Program terminated with signal SIGILL, Illegal instruction. #0 0x000001000075c410 in jsonb_subscript_check_subscripts (state=<optimized out>, op=0x10000d852b0, econtext=<optimizedout>) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/jsonbsubs.c:198 198 for (int i = 0; i < sbsrefstate->numupper; i++) #0 0x000001000075c410 in jsonb_subscript_check_subscripts (state=<optimized out>, op=0x10000d852b0, econtext=<optimizedout>) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/utils/adt/jsonbsubs.c:198 #1 0x00000100003e55c0 in ExecInterpExpr (state=0x10000d85068, econtext=0x10000d85660, isnull=0x7feffa2fbbc) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/executor/execExprInterp.c:1402 #2 0x00000100003de4bc in ExecInterpExprStillValid (state=0x10000d85068, econtext=0x10000d85660, isNull=0x7feffa2fbbc) at/home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/executor/execExprInterp.c:1765 #3 0x000001000054fbd4 in ExecEvalExprSwitchContext (isNull=0x7feffa2fbbc, econtext=<optimized out>, state=0x10000d85068)at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/include/executor/executor.h:315 #4 evaluate_expr (expr=<optimized out>, result_type=<optimized out>, result_typmod=<optimized out>, result_collation=<optimizedout>) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:4533 #5 0x00000100005513b8 in eval_const_expressions_mutator (node=0x10000dce218, context=0x7feffa30108) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:2883 #6 0x00000100004b4968 in expression_tree_mutator (node=0x10000cc10e8, mutator=0x1000054fca4 <eval_const_expressions_mutator>,context=0x7feffa30108) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/nodes/nodeFuncs.c:2762 #7 0x000001000054fd0c in eval_const_expressions_mutator (node=0x10000cc10e8, context=0x7feffa30108) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:3312 #8 0x00000100004b52d0 in expression_tree_mutator (node=0x10000cc1140, mutator=0x1000054fca4 <eval_const_expressions_mutator>,context=0x7feffa30108) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/nodes/nodeFuncs.c:3050 #9 0x000001000054fd0c in eval_const_expressions_mutator (node=0x10000cc1140, context=0x7feffa30108) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:3312 #10 0x000001000055284c in eval_const_expressions (root=0x10000dcdca0, node=0x10000cc1140) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/optimizer/util/clauses.c:2034 #11 0x0000010000523134 in preprocess_expression (root=0x10000dcdca0, expr=0x10000cc1140, kind=<optimized out>) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/optimizer/plan/planner.c:1088 #12 0x000001000052ed3c in subquery_planner (glob=<optimized out>, parse=0x10000cc0350, parent_root=<optimized out>, hasRecursion=<optimizedout>, tuple_fraction=0) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/optimizer/plan/planner.c:765 #13 0x0000010000531afc in standard_planner (parse=0x10000cc0350, query_string=<optimized out>, cursorOptions=<optimized out>,boundParams=0x0) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/optimizer/plan/planner.c:402 #14 0x0000010000696d6c in pg_plan_query (querytree=0x10000cc0350, query_string=0x10000cbf340 "select ('123'::jsonb)['a'];",cursorOptions=<optimized out>, boundParams=0x0) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c:876 #15 0x0000010000696f14 in pg_plan_queries (querytrees=0x10000dcdbb0, query_string=0x10000cbf340 "select ('123'::jsonb)['a'];",cursorOptions=<optimized out>, boundParams=0x0) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c:967 #16 0x00000100006976e4 in exec_simple_query (query_string=0x10000cbf340 "select ('123'::jsonb)['a'];") at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c:1159 #17 0x000001000069a0e0 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimizedout>) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/tcop/postgres.c:4394 #18 0x00000100005a94ec in BackendRun (port=0x10000ce4000) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:4484 #19 BackendStartup (port=0x10000ce4000) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:4206 #20 ServerLoop () at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:1730 #21 0x00000100005aaa0c in PostmasterMain (argc=<optimized out>, argv=0x10000cb9ff0) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:1402 #22 0x00000100000db054 in main (argc=<optimized out>, argv=0x10000cb9ff0) at /home/nm/farm/sparc64_deb10_gcc_64_ubsan/HEAD/pgsql.build/../pgsql/src/backend/main/main.c:209 $1 = {si_signo = 4, si_errno = 0, si_code = 4, _sifields = {_pad = {256, 7717904, 5, 0 <repeats 25 times>}, _kill = {si_pid= 256, si_uid = 7717904}, _timer = {si_tid = 256, si_overrun = 7717904, si_sigval = {sival_int = 5, sival_ptr = 0x500000000}},_rt = {si_pid = 256, si_uid = 7717904, si_sigval = {sival_int = 5, sival_ptr = 0x500000000}}, _sigchld = {si_pid= 256, si_uid = 7717904, si_status = 5, si_utime = 0, si_stime = 0}, _sigfault = {si_addr = 0x1000075c410 <jsonb_subscript_check_subscripts+636>},_sigpoll = {si_band = 1099519345680, si_fd = 5}}} regards, tom lane
> On Fri, Jan 29, 2021 at 7:01 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > Pushed with minor cleanup. Thanks a lot! > On Sun, Jan 31, 2021 at 05:23:25PM -0500, Tom Lane wrote: > > thorntail seems unhappy: > > [From 7c5d57c...] > Fix portability issue in new jsonbsubs code. > > On machines where sizeof(Datum) > sizeof(Oid) (that is, any 64-bit > platform), the previous coding would compute a misaligned > workspace->index pointer if nupper is odd. Architectures where > misaligned access is a hard no-no would then fail. This appears > to explain why thorntail is unhappy but other buildfarm members > are not. Yeah, that was an unexpected issue, thanks! I assume few other failing buildfarm members are the same, as they show similar symptoms (e.g. mussurana or ibisbill).