Thread: User Defined Types in Java
Hi, I'd like to enable UDT's written in Java and made some initial trial and error. I don't get very far. Here's what I do: I take the 'complex' type example described in '31.11 User-Defined Types' and change it to use Java functions (see below). But I get: ERROR: type "complex" does not exist as soon as I execute the first CREATE statement. If I change the language from java to C and try again, the message is different (and more according to the docs): NOTICE: type "complex" is not yet defined DETAIL: Creating a shell type definition. The documentation says "Creating a new base type requires implementing functions to operate on the type in a low-level language, usually C". I read that as it would be possible to use other languages. Apparently java is not one of them. What can I do to change that? Kind regards, Thomas Hallgren CREATE FUNCTION complex_in(cstring) RETURNS complex AS 'org.postgresql.pljava.example.Complex.in' LANGUAGE java IMMUTABLESTRICT; CREATE FUNCTION complex_out(complex) RETURNS cstring AS 'org.postgresql.pljava.example.Complex.out' LANGUAGE java IMMUTABLESTRICT; CREATE FUNCTION complex_recv(internal) RETURNS complex AS 'org.postgresql.pljava.example.Complext.recv' LANGUAGE javaIMMUTABLE STRICT; CREATE FUNCTION complex_send(complex) RETURNS bytea AS 'org.postgresql.pljava.example.Complext.send' LANGUAGE javaIMMUTABLE STRICT; CREATE TYPE complex ( internallength = 16, input = complex_in, output = complex_out, receive = complex_recv, send =complex_send, alignment = double );
On Thu, Feb 09, 2006 at 01:08:01PM +0100, Thomas Hallgren wrote: > Hi, > I'd like to enable UDT's written in Java and made some initial trial and > error. I don't get very far. Here's what I do: > > I take the 'complex' type example described in '31.11 User-Defined > Types' and change it to use Java functions (see below). But I get: > > ERROR: type "complex" does not exist If you look at the code it says in a comment: /* * Only C-coded functions can be I/O functions. We enforce this * restrictionhere mainly to prevent littering the catalogs with * shell types due to simple typos in user-definedfunction * definitions. */ However, you could probably work around this like so: CREATE FUNCTION dummy(cstring) RETURNS complex AS [random existing function] LANGUAGE INTERNAL; This will create the shell type. You then create your other functions and finally the type, at which point you can delete the dummy function again. Roundabout, but it should work (I hope). Note, if you have a validator on your java code that tries to lookup the return type, you might get some interesting issues. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout wrote: > On Thu, Feb 09, 2006 at 01:08:01PM +0100, Thomas Hallgren wrote: >> Hi, >> I'd like to enable UDT's written in Java and made some initial trial and >> error. I don't get very far. Here's what I do: >> >> I take the 'complex' type example described in '31.11 User-Defined >> Types' and change it to use Java functions (see below). But I get: >> >> ERROR: type "complex" does not exist > > If you look at the code it says in a comment: > > /* > * Only C-coded functions can be I/O functions. We enforce this > * restriction here mainly to prevent littering the catalogs with > * shell types due to simple typos in user-defined function > * definitions. > */ > Ouch. Any chance of getting this changed? If we can agree on a good design I'd be happy to implement and submit it. > However, you could probably work around this like so: > > CREATE FUNCTION dummy(cstring) RETURNS complex AS [random existing > function] LANGUAGE INTERNAL; > > This will create the shell type. You then create your other functions > and finally the type, at which point you can delete the dummy function > again. > Great. Thanks. Then at least I can test if what I have in mind is feasible. > Roundabout, but it should work (I hope). Note, if you have a validator > on your java code that tries to lookup the return type, you might get > some interesting issues. > I don't yet. But I'll keep it in mind to watch out for shell types once I do. Regards, Thomas Hallgren
On Thu, Feb 09, 2006 at 01:53:13PM +0100, Thomas Hallgren wrote: > >If you look at the code it says in a comment: > > > > /* > > * Only C-coded functions can be I/O functions. We > > enforce this > > * restriction here mainly to prevent littering the > > catalogs with > > * shell types due to simple typos in user-defined function > > * definitions. > > */ > > > > Ouch. Any chance of getting this changed? If we can agree on a good design > I'd be happy to implement and submit it. Actually, I'm think this whole automatic creation of a shell-type a bit silly anyway. Why not simply solve the problem directly like so: CREATE TYPE complex AS SHELL; or DECLARE TYPE complex; Don't beat around the bush, say what you mean. Thoughts? -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout wrote: > On Thu, Feb 09, 2006 at 01:53:13PM +0100, Thomas Hallgren wrote: > >>> If you look at the code it says in a comment: >>> >>> /* >>> * Only C-coded functions can be I/O functions. We >>> enforce this >>> * restriction here mainly to prevent littering the >>> catalogs with >>> * shell types due to simple typos in user-defined function >>> * definitions. >>> */ >>> >>> >> Ouch. Any chance of getting this changed? If we can agree on a good design >> I'd be happy to implement and submit it. >> > > Actually, I'm think this whole automatic creation of a shell-type a bit > silly anyway. Why not simply solve the problem directly like so: > > CREATE TYPE complex AS SHELL; > > or > > DECLARE TYPE complex; > > Don't beat around the bush, say what you mean. > > Thoughts? > I'd perhaps take it even further. Why not just: CREATE TYPE complex; similar to an anonymous struct in C. My favorite alternative is to do something like this: CREATE TYPE complex ( internallength = 16, input = complex_in, output = complex_out, ... AS 'filename' LANGUAGEC ); A construct like that would remove a lot of clutter (and source of errors). The IMMUTABLE STRICT along with all return and parameter types are pre-defined anyway and the likelihood of the functions living in different files (or as in my case, different classes) or using different languages for one specific type is second to none. Regards, Thomas Hallgren
Thomas Hallgren <thomas@tada.se> writes: > I'd like to enable UDT's written in Java Does Java really give you enough control over the bit-level representation of an object for this goal to be considered sane? In particular, it seems unsafe to use a Java class as a PG UDT, because the method pointers wouldn't remain the same across backend runs. regards, tom lane
Tom Lane wrote: > Thomas Hallgren <thomas@tada.se> writes: > >> I'd like to enable UDT's written in Java >> > > Does Java really give you enough control over the bit-level > representation of an object for this goal to be considered sane? > > Most definitely yes! > In particular, it seems unsafe to use a Java class as a PG UDT, > because the method pointers wouldn't remain the same across > backend runs. > > I'm not sure I understand what you mean. I assume that all calls would come in through the java_call_handler. The java_call_handler will make sure that the correct class and method is called. How is that different from any other declared function? Or are you thinking of the lifecycle of the binary data versus the lifecycle of the methods that manipulate it? They might be different and that might cause problems. But that's true for a UDT defined in C as well. Regards, Thomas Hallgren
Martijn van Oosterhout <kleptog@svana.org> writes: > Actually, I'm think this whole automatic creation of a shell-type a bit > silly anyway. Why not simply solve the problem directly like so: > CREATE TYPE complex AS SHELL; One of the unwritten consequences of the way that it works now is that only superusers can "clutter the catalogs" with shell types. Not sure how important that is, but I suspect that the system is not all that robust against use of shell types where a completed type is expected. You'd have to go over a lot of code with a fine-tooth comb before putting this ability into the hands of ordinary users, else you'd be creating loopholes for DOS attacks (or worse). Having said that, I agree that this seems conceptually cleaner, though I'm not sure we could ever get rid of the old way because of backward compatibility issues. regards, tom lane
Tom Lane wrote: >Martijn van Oosterhout <kleptog@svana.org> writes: > > >>Actually, I'm think this whole automatic creation of a shell-type a bit >>silly anyway. Why not simply solve the problem directly like so: >> >> > > > >>CREATE TYPE complex AS SHELL; >> >> > >One of the unwritten consequences of the way that it works now is that >only superusers can "clutter the catalogs" with shell types. > > I suppose we could restrict this variant to superusers, at least initially. [snip] >Having said that, I agree that this seems conceptually cleaner, though >I'm not sure we could ever get rid of the old way because of backward >compatibility issues. > > > They are not mutually exclusive, are they? I too like Martijn's suggestion. cheers andrew
Thomas Hallgren <thomas@tada.se> writes: > Tom Lane wrote: >> In particular, it seems unsafe to use a Java class as a PG UDT, >> because the method pointers wouldn't remain the same across >> backend runs. >> > I'm not sure I understand what you mean. Doesn't a Java object contain a method-table pointer that is used at runtime to dispatch method calls on the object? If the object is dumped bitwise into a PG table, and then reloaded into another backend session (which maybe has loaded Java at different addresses), that pointer will be invalid. regards, tom lane
Tom Lane wrote: > Thomas Hallgren <thomas@tada.se> writes: > >> Tom Lane wrote: >> >>> In particular, it seems unsafe to use a Java class as a PG UDT, >>> because the method pointers wouldn't remain the same across >>> backend runs. >>> >>> >> I'm not sure I understand what you mean. >> > > Doesn't a Java object contain a method-table pointer that is used at > runtime to dispatch method calls on the object? If the object is dumped > bitwise into a PG table, and then reloaded into another backend session > (which maybe has loaded Java at different addresses), that pointer will > be invalid. > Ah, now I understand. I'm not planning on using the binary image of the Java object as such. There's no API that would allow me to do that (JNI uses handles, not pointers). Java has a number of standards for how objects can be serialized/deserialized in a safe, JVM independent way. If the class of the object is known at all times (as it will be for all UDT's), I can use a DataInputStream/DataOutputStream pair to read/write data. It's very similar to how the complex example type uses pg_sendxxx and pg_getmsgxxx functions. The Java implementation of that example will also use 16 bytes for storage. More complex types can use the standard Java serialization mechanism. It will waste some more space (must be declared variable in length) but it maintains data integrity over time through serialVersionUUID's (a 64 bit number generated based on the constitution of the class). JDBC defines a standard that involves usage of three different interfaces, SQLData, SQLInput, and SQLOutput. I plan to support that also. Kind Regards, Thomas Hallgren
On Thu, Feb 09, 2006 at 09:33:35AM -0500, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > Actually, I'm think this whole automatic creation of a shell-type a bit > > silly anyway. Why not simply solve the problem directly like so: > > > CREATE TYPE complex AS SHELL; > > One of the unwritten consequences of the way that it works now is that > only superusers can "clutter the catalogs" with shell types. Not sure > how important that is, but I suspect that the system is not all that > robust against use of shell types where a completed type is expected. > You'd have to go over a lot of code with a fine-tooth comb before > putting this ability into the hands of ordinary users, else you'd be > creating loopholes for DOS attacks (or worse). Would it be reasonable to set this up so you can only create a shell type within an explicit transaction and that you are required to define the type completely before commit. That would minimize the exposure to incomplete types. I don't know if the structure exists to support this (dynamic on-commit trigger). > Having said that, I agree that this seems conceptually cleaner, though > I'm not sure we could ever get rid of the old way because of backward > compatibility issues. True. But this way allows us to remove the restriction on only allow C functions for type input/output. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Thu, Feb 09, 2006 at 09:33:35AM -0500, Tom Lane wrote: >> You'd have to go over a lot of code with a fine-tooth comb before >> putting this ability into the hands of ordinary users, else you'd be >> creating loopholes for DOS attacks (or worse). > Would it be reasonable to set this up so you can only create a shell > type within an explicit transaction and that you are required to define > the type completely before commit. I don't see any very easy way to enforce that, and it would break existing datatype-definition scripts if we did. However, since posting that it's occurred to me that we could do a lot to make the shell-type situation more robust. The problem with shell types is that the pg_type row is mostly left as zeroes, which means that any code that inspects the type without checking typisdefined is going to get a bogus value that may make it behave strangely. But we've since come up with a much better approach: see pseudotypes. We should set things up so that a shell type has valid but dummy entries in its pg_type row, including references to I/O functions that will just report an error if invoked. Also a shell type should be properly owned by its creator, which would let the creator drop it if it had been a mistake (right now, I think you can't get rid of it except by "DELETE FROM pg_type" :-(). With an arrangement like that, I'd feel much less worried about shell-type-related bugs. regards, tom lane
Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: >> On Thu, Feb 09, 2006 at 09:33:35AM -0500, Tom Lane wrote: >>> You'd have to go over a lot of code with a fine-tooth comb before >>> putting this ability into the hands of ordinary users, else you'd be >>> creating loopholes for DOS attacks (or worse). > >> Would it be reasonable to set this up so you can only create a shell >> type within an explicit transaction and that you are required to define >> the type completely before commit. > > I don't see any very easy way to enforce that, and it would break > existing datatype-definition scripts if we did. > What do you think of my earlier suggestion. Skip all the 'create function' statements and just add the "AS 'filename' LANGUAGE C" to the CREATE TYPE. It could be implemented while maintaining backward compatibility I think? Regards, Thomas Hallgren
Thomas Hallgren <thomas@tada.se> writes: > What do you think of my earlier suggestion. Skip all the 'create function' statements and > just add the "AS 'filename' LANGUAGE C" to the CREATE TYPE. Very little, as it makes unjustifiable assumptions about all the datatype's support functions being predictably propertied. (There's more than one possible signature, let alone any secondary properties such as volatility or other stuff we might think of in future.) I think it'd be unworkable from pg_dump's point of view, as well. regards, tom lane
Tom Lane wrote: > Thomas Hallgren <thomas@tada.se> writes: > >> What do you think of my earlier suggestion. Skip all the 'create function' statements and >> just add the "AS 'filename' LANGUAGE C" to the CREATE TYPE. >> > > Very little, as it makes unjustifiable assumptions about all the > datatype's support functions being predictably propertied. (There's > more than one possible signature, let alone any secondary properties > such as volatility or other stuff we might think of in future.) > I think it'd be unworkable from pg_dump's point of view, as well. > > I wasn't aware that there was more then one possible signature. What other signatures are possible (I have a working draft in PL/Java now and I don't want to miss anything)? Just a thought, but future properties like volatility should perhaps be annotations on the type rather than on its functions? I guess the pg_dump problem that you're thinking of is that there's no way to associate the functions with the type that they would belong to. Perhaps this could be done by adding a 'protype oid' column to the pg_proc table? Introducing that would probably help introducing SQL 2003 semantics further on (I'm thinking of methods that belongs to types. Not very different from a function taking the type as it's first argument). In any case; at present I use a dummy function to circumvent the Java function shell type problem. What was the outcome of the shell type discussion? Will a 'CREATE TYPE xxx AS SHELL' or similar be considered for 8.2? Kind Regards, Thomas Hallgren
On Sun, Feb 12, 2006 at 07:33:30PM +0100, Thomas Hallgren wrote: > >Very little, as it makes unjustifiable assumptions about all the > >datatype's support functions being predictably propertied. (There's > >more than one possible signature, let alone any secondary properties > >such as volatility or other stuff we might think of in future.) > >I think it'd be unworkable from pg_dump's point of view, as well. > I wasn't aware that there was more then one possible signature. What > other signatures are possible (I have a working draft in PL/Java now and > I don't want to miss anything)? The docs are your friend, see[1] in particular the input_function and the receive_function. [1] http://www.postgresql.org/docs/8.1/interactive/sql-createtype.html > I guess the pg_dump problem that you're thinking of is that there's no > way to associate the functions with the type that they would belong to. > Perhaps this could be done by adding a 'protype oid' column to the > pg_proc table? Introducing that would probably help introducing SQL 2003 > semantics further on (I'm thinking of methods that belongs to types. Not > very different from a function taking the type as it's first argument). I think the pg_dump is the fact that pg_dump needs to produce output that can be parsed to recreate the type and your suggestion only covers a very small set of possible type definitions (all in same lib, external func name = postgres func name, etc). > In any case; at present I use a dummy function to circumvent the Java > function shell type problem. What was the outcome of the shell type > discussion? Will a 'CREATE TYPE xxx AS SHELL' or similar be considered > for 8.2? Hopefully something will be considered, but the first person who produces a patch will probably get priority :) Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Martijn van Oosterhout wrote: > The docs are your friend, see[1] in particular the input_function and > the receive_function. > > [1] http://www.postgresql.org/docs/8.1/interactive/sql-createtype.html > > Ok, so there are two 'optional' arguments. Following my suggestion, the input and receive function would always take 3 arguments. Then, it's up to the function as such if it makes use of them or not. Do you see any problem with that? Is it bad from a performance perspective to always produce the values for the additional arguments? If so, an additional clause 'WITH EXTENDED PARAMETERS' (pending better suggestions :-) ) could be added to the CREATE TYPE. >> I guess the pg_dump problem that you're thinking of is that there's no >> way to associate the functions with the type that they would belong to. >> Perhaps this could be done by adding a 'protype oid' column to the >> pg_proc table? Introducing that would probably help introducing SQL 2003 >> semantics further on (I'm thinking of methods that belongs to types. Not >> very different from a function taking the type as it's first argument). >> > > I think the pg_dump is the fact that pg_dump needs to produce output > that can be parsed to recreate the type and your suggestion only covers > a very small set of possible type definitions (all in same lib, > external func name = postgres func name, etc). > > That's very intentional. It will keep the functions of a type declaration in one place. I seriously doubt that there's an advantage to splitting functions for a single type between different libraries. I just can't see why or when that would be a good thing. Can you? I also find it hard to come up with reasons why the PostgreSQL function name should be different from the name of the C function. I find no reasons at all that would motivate all the clutter equipped with the current construct. > Hopefully something will be considered, but the first person who > produces a patch will probably get priority :) > > I kind of guessed that would be the answer. I took a quick glance at the yacc grammar. Seems any of the suggestions would be possible to implement. So which is it? CREATE TYPE complex; CREATE TYPE complex AS SHELL; DECLARE TYPE complex; The first one has my vote. Regards, Thomas Hallgren
Thomas Hallgren <thomas@tada.se> writes: > Ok, so there are two 'optional' arguments. Following my suggestion, the > input and receive function would always take 3 arguments. Then, it's up > to the function as such if it makes use of them or not. Do you see any > problem with that? (1) backwards compatibility (2) inability to ever add a fourth optional argument without creating a flag day for everyone I'm all for cleaning up the handling of shell types (and in fact have had that on my personal TODO list for ages). But I see zero if not negative usefulness in these ideas about changing CREATE TYPE. The certain outcome of that is to import all the complications of CREATE FUNCTION into CREATE TYPE, and for what gain? > So which is it? > CREATE TYPE complex; > CREATE TYPE complex AS SHELL; > DECLARE TYPE complex; I'd go with the first. regards, tom lane
I'm not suggesting that we remove the current way of doing things. I understand that if we did that, it would cause problem for everyone that has created scalar types up to this day. What I'm proposing is an alternative way of doing this, not a replacement. And as things stand today, I'd be happy if this alternative way was the only way to create a type that didn't use C functions (hence, no need for a special construct to create a shell type). That wouldn't break anything. What I'm proposing should be an addition that also can be seen as the beginning of a path to migrate the CREATE TYPE construct to conform with the SQL 2003 standard. Regards, Thomas Hallgren Tom Lane wrote: > Thomas Hallgren <thomas@tada.se> writes: > >> Ok, so there are two 'optional' arguments. Following my suggestion, the >> input and receive function would always take 3 arguments. Then, it's up >> to the function as such if it makes use of them or not. Do you see any >> problem with that? >> > > (1) backwards compatibility > (2) inability to ever add a fourth optional argument without creating > a flag day for everyone > > I'm all for cleaning up the handling of shell types (and in fact have > had that on my personal TODO list for ages). But I see zero if not > negative usefulness in these ideas about changing CREATE TYPE. The > certain outcome of that is to import all the complications of CREATE > FUNCTION into CREATE TYPE, and for what gain? > >
Thomas Hallgren <thomas@tada.se> writes: > What I'm proposing should be an addition that also can be seen as the > beginning of a path to migrate the CREATE TYPE construct to conform with > the SQL 2003 standard. I'd be interested to see where in the SQL2003 spec the syntax you are proposing can be found. regards, tom lane
Tom Lane wrote: > Thomas Hallgren <thomas@tada.se> writes: > >> What I'm proposing should be an addition that also can be seen as the >> beginning of a path to migrate the CREATE TYPE construct to conform with >> the SQL 2003 standard. >> > > I'd be interested to see where in the SQL2003 spec the syntax you are > proposing can be found. > The PostgreSQL specific semantics with input/output/receive/send is are not in the SQL 2003 standard. But it does define how methods can be declared on a type and to me, the input/output/receive/send are as close to constructors and instance methods as you can get. I'm *not* suggesting a full implementation of this. I am merely stating that moving functions into the type and adding a column to the pg_proc table that allows a function to be tied to a type is a step in that direction: Here's an example using SQL 2003 syntax (from JRT-2003). The BNF is below: CREATE TYPE addr EXTERNAL NAME 'address_classes_jar:Address' LANGUAGE JAVA AS ( street_attr CHARACTER VARYING(50)EXTERNAL NAME 'street', zip_attr CHARACTER(10) EXTERNAL NAME 'zip' ) CONSTRUCTOR METHOD addr () RETURNS addr SELF AS RESULT EXTERNAL NAME 'Address', CONSTRUCTOR METHOD addr (s_parm CHARACTER VARYING(50),z_parm CHARACTER(10)) RETURNS addr SELF AS RESULT EXTERNAL NAME 'Address', METHOD to_string () RETURNS CHARACTERVARYING(255) EXTERNAL NAME 'toString', STATIC METHOD contiguous (A1 addr, A2 addr) RETURNS CHARACTER(3) EXTERNAL NAME 'contiguous'; <user-defined type definition> ::= CREATE TYPE <user-defined type body> <user-defined type body> ::= <schema-resolved user-defined type name> [ <subtype clause> ] [ AS <representation> ] [ <user-defined type option list> ] [ <method specification list> ] ... <method specification list> ::= <method specification> [ { <comma> <method specification> }... ] <method specification> ::= <original method specification>| <overriding method specification> <original method specification> ::= <partial method specification> [ SELF AS RESULT ] [ SELF AS LOCATOR ] [ <method characteristics>] <overriding method specification> ::= OVERRIDING <partial method specification> <partial method specification> ::= [ INSTANCE | STATIC | CONSTRUCTOR ] METHOD <method name> <SQL parameter declarationlist> <returns clause> [ SPECIFIC <specific method name> ] <specific method name> ::= [ <schema name> <period> ]<qualified identifier> <method characteristics> ::= <method characteristic>... <method characteristic> ::= <language clause> | <parameter style clause> | <deterministic characteristic> | <SQL-data accessindication> | <null-call clause> Regards, Thomas Hallgren