Thread: Re: [GENERAL] Empty arrays with ARRAY[]

Re: [GENERAL] Empty arrays with ARRAY[]

From
"Brendan Jurd"
Date:
On Nov 26, 2007 3:58 AM, Martijn van Oosterhout <kleptog@svana.org> wrote:
> On Mon, Nov 26, 2007 at 03:51:37AM +1100, Brendan Jurd wrote:
> > I noticed in the 8.3 release notes that ARRAY(SELECT ...) now returns
> > an empty array if there are no rows returned by the subquery.
>
> This has come up before, Tom had an idea about how to fix it:
>
> http://groups.google.com/group/pgsql.general/browse_thread/thread/911791e145a17daa/6b035035aeaac399
> http://www.mail-archive.com/pgsql-general@postgresql.org/msg90681.html

[moving thread to -hackers]

Thanks for the link Martijn.  I'd be interested in taking a swing at
this if nobody else has laid claim.  Since that thread died back in
January, I'm guessing it's wide open.

Regards,
BJ


Re: [GENERAL] Empty arrays with ARRAY[]

From
"Brendan Jurd"
Date:
Quoting Tom, from the previous thread linked by Martijn:

> It could be pretty ugly, because type assignment normally proceeds
> bottom-up :-(.  What you might have to do is make the raw grammar
> representation of ARRAY[] work like A_Const does, ie, there's a
> slot to plug in a typecast.  That's pretty much vestigial now for
> A_Const, if memory serves, but it'd be needful if ARRAY[] has to
> be able to "see" the typecast that would otherwise be above it in
> the parse tree.

This approach is making sense to me, but I've run into a bit of a
dependency issue.  A_Const does indeed have a slot for typecasts by
way of a TypeName member.  A_Const and TypeName are both defined in
parsenodes.h, whereas ArrayExpr is defined in primnodes.h.  So
unfortunately I can't just add a TypeName member to ArrayExpr.

I'm new to this area of the codebase (and parsers generally), so I'm
treading carefully.  What would be the best way to resolve this?
Would moving TypeName into primnodes.h be acceptable?

Thanks for your time,
BJ


Re: [GENERAL] Empty arrays with ARRAY[]

From
Tom Lane
Date:
"Brendan Jurd" <direvus@gmail.com> writes:
> This approach is making sense to me, but I've run into a bit of a
> dependency issue.  A_Const does indeed have a slot for typecasts by
> way of a TypeName member.  A_Const and TypeName are both defined in
> parsenodes.h, whereas ArrayExpr is defined in primnodes.h.  So
> unfortunately I can't just add a TypeName member to ArrayExpr.

That would be quite the wrong thing to do anyway, since ArrayExpr is
a run-time representation and shouldn't have any such thing attached
to it.  What you probably need is a separate parse-time representation
of ARRAY[], a la the difference between A_Const and Const.

Another possibility is to just hack up a private communication path
between transformExpr and transformArrayExpr, ie when you see TypeCast
check to see if its argument is ArrayExpr and do something different.
This would be a mite klugy but it'd be a much smaller patch that way.
        regards, tom lane


Re: [GENERAL] Empty arrays with ARRAY[]

From
"Brendan Jurd"
Date:
On Nov 27, 2007 8:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Brendan Jurd" <direvus@gmail.com> writes:
> > ... So
> > unfortunately I can't just add a TypeName member to ArrayExpr.
>
> That would be quite the wrong thing to do anyway, since ArrayExpr is
> a run-time representation and shouldn't have any such thing attached
> to it.  What you probably need is a separate parse-time representation
> of ARRAY[], a la the difference between A_Const and Const.
>

Ah.  I wasn't aware of the distinction; I started by looking in gram.y
and saw that the ARRAY parse path creates an ArrayExpr node, whilst
the constant parse paths create A_Const nodes.  I didn't realise that
ArrayExpr was "skipping ahead" and creating the same kind of object
that the transform produces.

Glad I stopped and asked for directions then. =)

I'm not 100% clear on what the A_ prefix signifies ... is A_ArrayExpr
a good name for the parse-time structure?

Thanks for your time,
BJ


Re: [GENERAL] Empty arrays with ARRAY[]

From
Tom Lane
Date:
"Brendan Jurd" <direvus@gmail.com> writes:
> I'm not 100% clear on what the A_ prefix signifies ... is A_ArrayExpr
> a good name for the parse-time structure?

Yeah, might as well use that for consistency.  The A_ doesn't seem
very meaningful to me either, but I don't want to rename the existing
examples ...
        regards, tom lane


Re: [GENERAL] Empty arrays with ARRAY[]

From
"Brendan Jurd"
Date:
So far I've only considered the '::' cast syntax suggested in the
original proposal, e.g.:

ARRAY[]::text[]

I wonder whether we are also interested in catching CAST(), e.g.:

CAST(ARRAY[] AS text[])

I'm personally okay with leaving it at support for '::', but
admittedly I am heavily biased towards this syntax (I find CAST very
ugly).  I suppose supporting CAST as well would be the more
predictable behaviour; I think people might be surprised if we
supported one form of casting but not the other.

Comments?

Regards,
BJ


Re: [GENERAL] Empty arrays with ARRAY[]

From
Tom Lane
Date:
"Brendan Jurd" <direvus@gmail.com> writes:
> So far I've only considered the '::' cast syntax suggested in the
> original proposal, e.g.:

> ARRAY[]::text[]

> I wonder whether we are also interested in catching CAST(), e.g.:

> CAST(ARRAY[] AS text[])

I think you'll find that it's just about impossible to not handle both,
because they look the same after the grammar gets done.
        regards, tom lane


Re: [GENERAL] Empty arrays with ARRAY[]

From
"Brendan Jurd"
Date:
On Nov 28, 2007 2:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I wonder whether we are also interested in catching CAST(), e.g.:
>
> > CAST(ARRAY[] AS text[])
>
> I think you'll find that it's just about impossible to not handle both,
> because they look the same after the grammar gets done.

Thanks Tom ... your comment makes me suspect I've been barking up the
wrong tree.

My original intent was to modify the grammar rules to catch an array
expression followed by a typecast, and put the target typename of the
cast directly into the A_ArrayExpr struct.  That notion came from
looking at the way that TypeName gets put into A_Const --
makeStringConst() takes an optional TypeName argument.

Looking at the code in the context of your comment, that was probably
a bad approach.  I may've taken the A_Const analogy too far.

Now I'm thinking I leave the grammar rules alone (apart from making it
legal to specify an empty list of elements), and instead push the
typename down into the child node from makeTypeCast(), if the child is
an A_ArrayExpr.  Does that work better?

Regards,
BJ


Re: [GENERAL] Empty arrays with ARRAY[]

From
Tom Lane
Date:
"Brendan Jurd" <direvus@gmail.com> writes:
> Now I'm thinking I leave the grammar rules alone (apart from making it
> legal to specify an empty list of elements), and instead push the
> typename down into the child node from makeTypeCast(), if the child is
> an A_ArrayExpr.  Does that work better?

Actually, if you do that you might as well forego the separate node type
(which requires a nontrivial amount of infrastructure).  I think it
would work just about as well to have transformExpr check whether the
argument of a TypeCast is an ArrayExpr, and if so call
transformArrayExpr directly from there, passing the TypeName as an
additional argument.  Kinda ugly, but not really any worse than the way
A_Const is handled in that same routine.  (In fact, we could use the
same technique to get rid of the typename field in A_Const ... might
be worth doing?)
        regards, tom lane


Re: [GENERAL] Empty arrays with ARRAY[]

From
"Brendan Jurd"
Date:
On Nov 28, 2007 4:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Brendan Jurd" <direvus@gmail.com> writes:
> > Now I'm thinking I leave the grammar rules alone (apart from making it
> > legal to specify an empty list of elements), and instead push the
> > typename down into the child node from makeTypeCast(), if the child is
> > an A_ArrayExpr.  Does that work better?
>
> Actually, if you do that you might as well forego the separate node type
> (which requires a nontrivial amount of infrastructure).  I think it
> would work just about as well to have transformExpr check whether the
> argument of a TypeCast is an ArrayExpr, and if so call
> transformArrayExpr directly from there, passing the TypeName as an
> additional argument.

I actually thought that A_ArrayExpr would be a good addition even if
you ignore the matter of typecasting.  It always seemed weird to me
that the parser generates an ArrayExpr directly.  ArrayExpr has a
bunch of members that are only set by the transform; all the parser
does is set the 'elements' member.  And then the transform creates a
brand new ArrayExpr and populates it based on what's in the 'elements'
member of the otherwise-empty ArrayExpr passed to it.

So my feeling is that an A_ArrayExpr is a better fit for the parser
output than ArrayExpr, and more in keeping with how the rest of the
code does things.

Mind you I'm also okay with your suggestion to let transformExpr take
care of it.  But I'm not adverse to putting in the legwork to set up
the infrastructure for A_ArrayExpr, if it's a nice outcome.

> Kinda ugly, but not really any worse than the way
> A_Const is handled in that same routine.  (In fact, we could use the
> same technique to get rid of the typename field in A_Const ... might
> be worth doing?)

I had a bit of a dig into this.  A_Const->typename gets set directly
by the parse paths for "INTERVAL [(int)] string [interval range]".  In
fact, as far as I can tell that's the _only_ place A_Const->typename
gets used at all.  And all the transform does with that piece of
information is treat the node like a typecast.

I'm not seeing a huge amount of value in this special treatment.  Why
not just have the parser build this as an A_Const inside a TypeCast
and then let the transform deal with it in the usual way?  I found the
following comment at parsenodes.h:244

* NOTE: for mostly historical reasons, A_Const parsenodes contain
* room for a TypeName; we only generate a separate TypeCast node if the
* argument to be casted is not a constant.  In theory either representation
* would work, but the combined representation saves a bit of code in many
* productions in gram.y.

However, this is no longer the case.  makeTypeCast() doesn't care
about whether its argument is a constant anymore:

* Earlier we would determine whether an A_Const would
* be acceptable, however Domains require coerce_type()
* to process them -- applying constraints as required.

And in "many productions in gram.y", "many" == 2.  Currently the
combined representation requires more code than it saves.

So, I get the impression the use-case for A_Const->typename has become
extinct.  I think it could be removed with a minimum of fuss, and I'd
be happy to include same with my patch (or, submit it as a separate
patch; let me know your preference).

Regards,
BJ


Re: [GENERAL] Empty arrays with ARRAY[]

From
Tom Lane
Date:
"Brendan Jurd" <direvus@gmail.com> writes:
> I actually thought that A_ArrayExpr would be a good addition even if
> you ignore the matter of typecasting.  It always seemed weird to me
> that the parser generates an ArrayExpr directly.  ArrayExpr has a
> bunch of members that are only set by the transform; all the parser
> does is set the 'elements' member.

Well, that's a reasonable argument.  And now that I think about it,
a parser-only node type doesn't have nearly the support overhead that
a full-fledged executable node does.  So no objection to A_ArrayExpr
if you want to do that.

> I had a bit of a dig into this.  A_Const->typename gets set directly
> by the parse paths for "INTERVAL [(int)] string [interval range]".  In
> fact, as far as I can tell that's the _only_ place A_Const->typename
> gets used at all.

Uh, you missed quite a lot of others ... see CURRENT_DATE and a lot of
other productions.
        regards, tom lane


Re: [GENERAL] Empty arrays with ARRAY[]

From
"Brendan Jurd"
Date:
On Nov 28, 2007 9:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I had a bit of a dig into this.  A_Const->typename gets set directly
> > by the parse paths for "INTERVAL [(int)] string [interval range]".  In
> > fact, as far as I can tell that's the _only_ place A_Const->typename
> > gets used at all.
>
> Uh, you missed quite a lot of others ... see CURRENT_DATE and a lot of
> other productions.
>

Thanks again.  I missed those because they don't use
makeStringConst().  Looking again, it turns out "many productions" is
more like 15.

That's a bigger number, certainly, but it's still manageable.  It
wouldn't be hard to convert them to generate a const-in-a-cast.  In
fact with the addition of a makeCastStringConst(), I think the code
saving from A_Const->typename would be cancelled out.

If the only reason for keeping A_Const->typename around is the alleged
code saving (as indicated by the code comments), my offer to do away
with it is still on the table.

Regards,
BJ


Re: [GENERAL] Empty arrays with ARRAY[]

From
Alvaro Herrera
Date:
Brendan Jurd escribió:

> If the only reason for keeping A_Const->typename around is the alleged
> code saving (as indicated by the code comments), my offer to do away
> with it is still on the table.

Code cleanup is always welcome.

-- 
Alvaro Herrera                          Developer, http://www.PostgreSQL.org/
"The eagle never lost so much time, as
when he submitted to learn of the crow." (William Blake)


Re: [GENERAL] Empty arrays with ARRAY[]

From
"Brendan Jurd"
Date:
Hi folks,

The patch is coming along nicely now.  I do have a couple of questions
about the implementation in transformArrayExpr though.

----
1) How should we determine whether the array is multidimensional if we
know the type in advance?

Currently, transformArrayExpr uses the results of its search for a
common element type to figure out whether the array is
multidimensional.  If we know the type in advance, we don't need to do
the common type search (a nice side-effect), so we need some other way
of figuring out how to set ArrayExpr->multidims on the new node.

I could just check the nodeTag of the elements as they are
transformed, but I'm concerned that the existing code might be relying
on select_common_type to catch stupid input, like a mixture of scalar
and array elements.  If that's the case it might be unwise to bypass
select_common_type or, at least, I'd need to come up with something
else to provide the same level of sanity assurance in both code paths.

----
2) Should the typecast propagate downwards into nested array elements?

If we have a nested array written as, say, ARRAY[ARRAY[1, 2], ARRAY[3,
4], ARRAY[5, 6]]::float[], should we treat the inner arrays the same
way as the outer array (with the advance knowledge that the array type
should be float[])?

If I'm reading the code correctly, the end result should be much the
same, because the inner arrays will end up being coerced to float[]
anyway.  But shortcutting the coercion could save some cycles.

Comments?

Regards,
BJ


Re: [GENERAL] Empty arrays with ARRAY[]

From
Martijn van Oosterhout
Date:
On Fri, Nov 30, 2007 at 06:13:20AM +1100, Brendan Jurd wrote:
> Hi folks,
>
> The patch is coming along nicely now.  I do have a couple of questions
> about the implementation in transformArrayExpr though.

Awesome.

> 1) How should we determine whether the array is multidimensional if we
> know the type in advance?

Well, given the array should be regular you should be able to just look
at the first element, if it's a array  look at it's first element, etc
to determine the dimensions. This'll be fairly quick.

> 2) Should the typecast propagate downwards into nested array elements?

IMHO yes, you have th einfo you may as well use it.

> If we have a nested array written as, say, ARRAY[ARRAY[1, 2], ARRAY[3,
> 4], ARRAY[5, 6]]::float[], should we treat the inner arrays the same
> way as the outer array (with the advance knowledge that the array type
> should be float[])?

TBH, I think you're going to have to go through the whole array to
coerce them and check, so you may as well determine the dimensions at
the same time. In general I think it's better to mark the type up
front.

In don't know if you should actually do the conversion straight away,
but at least you don't need to guess the type anymore.

Hope this helps,

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
>  -- John F Kennedy

Re: [GENERAL] Empty arrays with ARRAY[]

From
Tom Lane
Date:
Martijn van Oosterhout <kleptog@svana.org> writes:
>> 1) How should we determine whether the array is multidimensional if we
>> know the type in advance?

> Well, given the array should be regular you should be able to just look
> at the first element, if it's a array  look at it's first element, etc
> to determine the dimensions. This'll be fairly quick.

How does that work with non-constant array constructor members?
        regards, tom lane


Re: [GENERAL] Empty arrays with ARRAY[]

From
"Brendan Jurd"
Date:
As discussed on -hackers, this patch allows the construction of an
empty array if an explicit cast to an array type is given (as in,
ARRAY[]::int[]).

postgres=# select array[]::int[];
 array
-------
 {}

postgres=# select array[];
ERROR:  no target type for empty array
HINT:  Empty arrays must be explictly cast to the desired array type,
e.g. ARRAY[]::int[]

A few notes on the implementation:

 * The syntax now allows an ARRAY constructor with an empty expression
list (array_expr_list may be empty).

 * I've added a new parsenode for arrays, A_ArrayExpr (previously the
parser would create ArrayExpr primnodes).

 * transformArrayExpr() now takes two extra arguments, a type oid and
a typmod.  When transforming a typecast which casts an A_ArrayExpr to
an array type, transformExpr passes these type details down to
transformArrayExpr, and skips the typecast.

 * transformArrayExpr() behaves slightly differently when passed type
information.  The overall type of the array is set to the given type,
and all elements are explictly coerced to the equivalent element type.
 If it was not passed a type, then the behaviour is as previous; the
function looks for a common type among the elements, and coerces them
to that type.  The overall type of the array is derived from the
common element type.

The patch is very invasive (at least compared to any of my previous
patches), but so far I haven't managed to find any broken behaviour.
All regression tests pass, and the regression tests for arrays seem to
be quite comprehensive.  I did add a couple of new tests for the empty
array behaviours, but the rest I've left alone.

I look forward to your comments -- although given the length of the
8.4 patch review queue, that will probably be an exercise in extreme
patience!

Major thanks go out to Tom for all his guidance on -hackers while I
developed the patch.

Regards,
BJ

Attachment

Re: [GENERAL] Empty arrays with ARRAY[]

From
Gregory Stark
Date:
"Brendan Jurd" <direvus@gmail.com> writes:

> The patch is very invasive (at least compared to any of my previous
> patches), but so far I haven't managed to find any broken behaviour.

I'm sorry to suggest anything at this point, but... would it be less invasive
if instead of requiring the immediate cast you created a special case in the
array code to allow a placeholder object for "empty array of unknown type".
The only operation which would be allowed on it would be to cast it to some
specific array type.

That way things like

UPDATE foo SET col = array[];
INSERT INTO foo (col) VALUES (array[]);

could be allowed if they could be contrived to introduce an assignment cast.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

Re: [GENERAL] Empty arrays with ARRAY[]

From
"Brendan Jurd"
Date:
On Nov 30, 2007 9:09 PM, Gregory Stark <stark@enterprisedb.com> wrote:
> I'm sorry to suggest anything at this point, but... would it be less invasive
> if instead of requiring the immediate cast you created a special case in the
> array code to allow a placeholder object for "empty array of unknown type".
> The only operation which would be allowed on it would be to cast it to some
> specific array type.
>
> That way things like
>
> UPDATE foo SET col = array[];
> INSERT INTO foo (col) VALUES (array[]);
>
> could be allowed if they could be contrived to introduce an assignment cast.

Hi Gregory.

Not sure it would be less invasive, but I do like the outcome of being
able to create an empty array pending assignment.  In addition to your
examples, it might also make it possible to do things like this in
plpgsql

DECLARE
 a text[] := array[];

Whereas my patch requires you to write

 a text[]: =array[]::text[];

... which seems pretty stupid.

So, I like your idea a lot from a usability point of view.  But I
really, really hate it from a "just spent half a week on this patch"
point of view =/

Any suggestions about how you would enforce the "only allow casts to
array types" restriction on the empty array?

Cheers
BJ

Re: [GENERAL] Empty arrays with ARRAY[]

From
"Brendan Jurd"
Date:
A quick recap: I submitted a patch for empty ARRAY[] syntax back in
November, and as far as I can see it never made it to the patches
list.  Gregory suggested a different way of approaching the problem
(quoted below), but nobody commented further about how it might be
made to work.

I'd like to RFC again on Gregory's idea, and if that doesn't bear any
fruit I'd like to submit the patch as-is for review.

Regards,
BJ

On 01/12/2007, Brendan Jurd <direvus@gmail.com> wrote:
> On Nov 30, 2007 9:09 PM, Gregory Stark <stark@enterprisedb.com> wrote:
>  > I'm sorry to suggest anything at this point, but... would it be less invasive
>  > if instead of requiring the immediate cast you created a special case in the
>  > array code to allow a placeholder object for "empty array of unknown type".
>  > The only operation which would be allowed on it would be to cast it to some
>  > specific array type.
>  >
>  > That way things like
>  >
>  > UPDATE foo SET col = array[];
>  > INSERT INTO foo (col) VALUES (array[]);
>  >
>  > could be allowed if they could be contrived to introduce an assignment cast.
>
>  Not sure it would be less invasive, but I do like the outcome of being
>  able to create an empty array pending assignment.  In addition to your
>  examples, it might also make it possible to do things like this in
>  plpgsql
>
>  DECLARE
>   a text[] := array[];
>
>  Whereas my patch requires you to write
>
>   a text[]: =array[]::text[];
>
>  ... which seems pretty stupid.
>
...
>  Any suggestions about how you would enforce the "only allow casts to
>  array types" restriction on the empty array?
>

Re: [PATCHES] [GENERAL] Empty arrays with ARRAY[]

From
Tom Lane
Date:
"Brendan Jurd" <direvus@gmail.com> writes:
> A quick recap: I submitted a patch for empty ARRAY[] syntax back in
> November, and as far as I can see it never made it to the patches
> list.  Gregory suggested a different way of approaching the problem
> (quoted below), but nobody commented further about how it might be
> made to work.

> I'd like to RFC again on Gregory's idea, and if that doesn't bear any
> fruit I'd like to submit the patch as-is for review.

Greg's idea is basically to invent array-of-UNKNOWN as a genuine
datatype, which as I stated way back when seems fairly dangerous
to me.  UNKNOWN is already a pretty slippery animal, and I don't
know what cast paths we might open up by doing that.  I think
the require-a-cast solution is a lot less likely to result in
unforeseen side-effects.

>> Whereas my patch requires you to write
>> a text[]: =array[]::text[];
>> ... which seems pretty stupid.

In practice you'd write

    DECLARE
      a text[] := '{}';

which is even shorter, so I don't find this convincing.

            regards, tom lane

Re: [PATCHES] [GENERAL] Empty arrays with ARRAY[]

From
Tom Lane
Date:
"Brendan Jurd" <direvus@gmail.com> writes:
> As discussed on -hackers, this patch allows the construction of an
> empty array if an explicit cast to an array type is given (as in,
> ARRAY[]::int[]).

Applied with minor fixes; mostly, ensuring that the cast action would
propagate down to sub-arrays, as in

regression=# select array[[1],[2.2]]::int[];
   array
-----------
 {{1},{2}}
(1 row)

I was interested to realize that this fix validates the decision to
pass down the type information on-the-fly during transformExpr recursion.
It would have been a lot more painful to do it if we'd taken the A_Const
approach.

I didn't do anything about removing A_Const's typename field, but I'm
thinking that would be a good cleanup patch.

            regards, tom lane

Re: [PATCHES] [GENERAL] Empty arrays with ARRAY[]

From
"Brendan Jurd"
Date:
On 21/03/2008, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Brendan Jurd" <direvus@gmail.com> writes:
>
> > As discussed on -hackers, this patch allows the construction of an
>  > empty array if an explicit cast to an array type is given (as in,
>  > ARRAY[]::int[]).
>
>
> Applied with minor fixes; mostly, ensuring that the cast action would
>  propagate down to sub-arrays, as in

Great, thanks Tom.

>  I was interested to realize that this fix validates the decision to
>  pass down the type information on-the-fly during transformExpr recursion.
>  It would have been a lot more painful to do it if we'd taken the A_Const
>  approach.
>

Indeed.

>  I didn't do anything about removing A_Const's typename field, but I'm
>  thinking that would be a good cleanup patch.
>

I'd be happy to take this on.  My day job is pretty busy at the moment
but I should be able to submit something in a week or so.

Cheers,
BJ