Thread: User-defined operator function: what parameter type to use for uncast character string?

(Cross-posted to StackOverflow:
http://stackoverflow.com/questions/25041100/postgresql-user-defined-operator-function-what-parameter-type-to-use-for-uncast
)

I'm defining my own domain and a equality operator.  I cannot cause
PostgreSQL to use my operator function in a query without explicitly
casting a character-string argument to the type I used in defining the
operator.  My question is how to cause my custom operator to be used
without the cast.

As an example: first I define a domain and a table column of that
type.  For this example, the type requires its values to be three
uppercase letters A, B or C, and the equality operator will cause
queries to match regardless of case.

    CREATE domain my_domain as char(3) check(VALUE similar to '[A-C]{3}');
    CREATE TABLE my_table (val my_domain);
    INSERT INTO my_table VALUES ('ABC');

The type of the column is my_domain:

    sandbox=> \d my_table
        Table "public.my_table"
     Column |   Type    | Modifiers
    --------+-----------+-----------
     val    | my_domain |

Before defining the custom equality operator, case-sensitive queries
work as I expect.  The row in the table is capital letters, so the
query must contain capital letters to match the row

    sandbox=> SELECT * FROM my_table WHERE val='abc';
     val
    -----
    (0 rows)

    sandbox=> SELECT * FROM my_table WHERE val='ABC';
     val
    -----
     ABC
    (1 row)

Next I create an equality operator to do case-insensitive matching:

    CREATE FUNCTION my_equals(this my_domain, that text) RETURNS boolean AS
    'SELECT CAST (this AS text) = upper(that)' LANGUAGE SQL;
    CREATE OPERATOR = (procedure=my_equals, leftarg=my_domain, rightarg = text);

The new operator is invoked causing a query containing lowercase
letters to match the uppercase column value, but only if I cast the
type of the WHERE clause:

    sandbox=> SELECT * FROM my_table WHERE val=CAST ('abc' AS text);
     val
    -----
     ABC
    (1 row)

    sandbox=> SELECT * FROM my_table WHERE val='abc';
     val
    -----
    (0 rows)

Question: What can I do so my custom equality operator is used without
the cast?  In other words, how to cause the last query above return
the table row (without changing the query)?  I have tried defining
my_equals() so its second parameter type is either varchar and
char(3), but those still require a cast in the WHERE-clause of the
query.  I've also tried anyelement, but that does not work even with a
cast.

Thank you,
--
Adam Mackler


Adam Mackler-5 wrote
> (Cross-posted to StackOverflow:
>
http://stackoverflow.com/questions/25041100/postgresql-user-defined-operator-function-what-parameter-type-to-use-for-uncast
> )
>
> I'm defining my own domain and a equality operator.
> Next I create an equality operator to do case-insensitive matching:
>
>     CREATE FUNCTION my_equals(this my_domain, that text) RETURNS boolean
> AS
>     'SELECT CAST (this AS text) = upper(that)' LANGUAGE SQL;
>     CREATE OPERATOR = (procedure=my_equals, leftarg=my_domain, rightarg =
> text);
>
> The new operator is invoked causing a query containing lowercase
> letters to match the uppercase column value, but only if I cast the
> type of the WHERE clause:
>
>     sandbox=> SELECT * FROM my_table WHERE val=CAST ('abc' AS text);
>      val
>     -----
>      ABC
>     (1 row)
>
>     sandbox=> SELECT * FROM my_table WHERE val='abc';
>      val
>     -----
>     (0 rows)
>
> Question: What can I do so my custom equality operator is used without
> the cast?

ISTM that if this was supported you would be doing it correctly.  The main
problem is you are abusing DOMAIN - which is strictly the base type with
constraints - and trying to add operators specific to the DOMAIN (i.e., ones
that would not work with the base type).  And so now you have "domain =
unknown" and the system is trying to figure out what unknown should be and
also which operator to pick and it decides that since =(text,text) covers
the domain and the unknown that is what it will pick.

Maybe you should consider using an "enum"

http://www.postgresql.org/docs/9.2/interactive/datatype-enum.html

This doesn't solve the case insensitivity concern directly but you might be
able to get the custom operator to work correctly on an enum where you
apparently cannot with a DOMAIN.

David J.








--
View this message in context:
http://postgresql.1045698.n5.nabble.com/User-defined-operator-function-what-parameter-type-to-use-for-uncast-character-string-tp5813386p5813389.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


On Wed, Jul 30, 2014 at 10:59:28PM -0700, David G Johnston wrote:
> ISTM that if this was supported you would be doing it correctly.

Thank you for the quick response.  I'm not understanding you.  Could you elaborate?

> The main problem is you are abusing DOMAIN - which is strictly the
> base type with constraints - and trying to add operators specific to
> the DOMAIN (i.e., ones that would not work with the base type).

Can you explain what the abuse is?  Also why the "=" operator does not
work even without the domain?  If I do everything the same, but make
the column type "char(3)" rather than "my_domain", then still I need
to cast the literal in the query to "text".  I'm not understanding (1)
why postgres doesn't use the type of the column--either char(3) or
my_domain--and then choose the operator function that has that type
for its first paramater even where the second is "unknown", and (2)
why using the cast "WHERE val='abc'::text" makes it work.  It seems as
if the determining factor is the type of the literal in the
WHERE_clause.

> And so now you have "domain = unknown" and the system is trying to
> figure out what unknown should be and also which operator to pick
> and it decides that since =(text,text) covers the domain and the
> unknown that is what it will pick.

If =(text,text) is chosen where the second argument is "unknown"
rather than "text", then postgres has no problem deciding that a
parameter defined for "text" will handle "unknown", right?  So if I
define =(my_domain,text) or =(char(3),text) then wouldn't those be
preferable to =(text,text) where the first argument type can be known
from the type of the column (as defined in the table) used as the
first argument to the operator function?

If I give my operator the unique name "~~~~" then my operator function
is chosen without the cast.  That makes it seem like a matter of
priority, where the built-in "=" operator takes priority over my
user-defined one.  If postgres will accept my operator as being
appropriate, isn't there some way to give it priority over the built-in
=(text,text) operator?

In other words, "\d my_table" shows the type of the column as
"my_domain" or "char(3)" depending on how I define it.  So why isn't
that taken into acount when choosing the operator function when the
second argument is "unknown"?


> Maybe you should consider using an "enum"

I don't see how I can use an enum.  Wouldn't that require defining all
possible cominations, which even in this limited example would be 26^3
values?  I just used three uppercase letters (and case-insensitive
matching) as an example.  In my actual application I have twelve
characters and the operator function is doing more advanced
regular-expression matching.


Thanks again,
--
Adam Mackler


On Thu, Jul 31, 2014 at 12:19 AM, Adam Mackler-5 [via PostgreSQL] <[hidden email]> wrote:
On Wed, Jul 30, 2014 at 10:59:28PM -0700, David G Johnston wrote:
> ISTM that if this was supported you would be doing it correctly.  

Thank you for the quick response.  I'm not understanding you.  Could you elaborate?

​Basically I would expect what you tried to either work or conclude that it is unsupported.  There is no simple alternative I can see trying​ that would work.


> The main problem is you are abusing DOMAIN - which is strictly the
> base type with constraints - and trying to add operators specific to
> the DOMAIN (i.e., ones that would not work with the base type).

Can you explain what the abuse is?  Also why the "=" operator does not
work even without the domain?  If I do everything the same, but make
the column type "char(3)" rather than "my_domain", then still I need
to cast the literal in the query to "text".  I'm not understanding (1)
why postgres doesn't use the type of the column--either char(3) or
my_domain--and then choose the operator function that has that type
for its first paramater even where the second is "unknown", and (2)
why using the cast "WHERE val='abc'::text" makes it work.  It seems as
if the determining factor is the type of the literal in the
WHERE_clause.

When you write, explicitly, "domain = text" the operator is directly ​found.
When you write, implicitly, "domain = unknown" there is no operator and so the system has to go looking.  It decides that the operator least likely to cause a problem is the "text = text" (varchar, char(3), text - these are all implemented in the same manner and are effectively synonyms).
 

> And so now you have "domain = unknown" and the system is trying to
> figure out what unknown should be and also which operator to pick
> and it decides that since =(text,text) covers the domain and the
> unknown that is what it will pick.

If =(text,text) is chosen where the second argument is "unknown"
rather than "text", then postgres has no problem deciding that a
parameter defined for "text" will handle "unknown", right?  So if I
define =(my_domain,text) or =(char(3),text) then wouldn't those be
preferable to =(text,text) where the first argument type can be known
from the type of the column (as defined in the table) used as the
first argument to the operator function?

​Again, I am extrapolating from observed behavior but by casting "unknown" to text the system thinks it is safer to use the base type of the domain to perform the comparison since the system believes that anything that would apply to the domain itself should also apply to the base type - the domain IS the base type but with a constraint on its value.

This is the abuse part - you want "text = text" to be different than "domain = text" but the system doesn't help you out here since now your domain exhibits behavior that does not apply for its base type.​


If I give my operator the unique name "~~~~" then my operator function
is chosen without the cast.  That makes it seem like a matter of
priority, where the built-in "=" operator takes priority over my
user-defined one.  If postgres will accept my operator as being
appropriate, isn't there some way to give it priority over the built-in
=(text,text) operator?

In other words, "\d my_table" shows the type of the column as
"my_domain" or "char(3)" depending on how I define it.  So why isn't
that taken into acount when choosing the operator function when the
second argument is "unknown"?


​I doubt it.​
 
 
> Maybe you should consider using an "enum"

I don't see how I can use an enum.  Wouldn't that require defining all
possible cominations, which even in this limited example would be 26^3
values?  I just used three uppercase letters (and case-insensitive
matching) as an example.  In my actual application I have twelve
characters and the operator function is doing more advanced
regular-expression matching.


​Then, yeah, the enum option is out.​


Given the desire for custom behavior you need to develop a full-fledged type.

CREATE TYPE name (   INPUT = input_function,   OUTPUT = output_function   [ , RECEIVE = receive_function ]   [ , SEND = send_function ]   [ , TYPMOD_IN = type_modifier_input_function ]   [ , TYPMOD_OUT = type_modifier_output_function ]   [ , ANALYZE = analyze_function ]   [ , INTERNALLENGTH = { internallength | VARIABLE } ]   [ , PASSEDBYVALUE ]   [ , ALIGNMENT = alignment ]   [ , STORAGE = storage ]   [ , LIKE = like_type ]   [ , CATEGORY = category ]   [ , PREFERRED = preferred ]   [ , DEFAULT = default ]   [ , ELEMENT = element ]   [ , DELIMITER = delimiter ]   [ , COLLATABLE = collatable ]
)


Note the comment:  Generally these functions have to be coded in C or another low-level language ​

I suppose this means you can code these with a higher-level language but I'm guessing you are going to take a significant performance hit in doing so...

Others will likely chime in with further clarification on "domain" usage and options that you have though I suspect my conclusion is pretty much on-the-mark - as supported by your empirical evidence.

David J.




View this message in context: Re: User-defined operator function: what parameter type to use for uncast character string?
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Adam Mackler <pgsql-general@mackler.org> wrote:

>     CREATE domain my_domain as char(3) check(VALUE similar to '[A-C]{3}');
>     CREATE TABLE my_table (val my_domain);
>     INSERT INTO my_table VALUES ('ABC');

>     sandbox=> SELECT * FROM my_table WHERE val='abc';
>     val
>     -----
>     (0 rows)
>
> Question: What can I do so my custom equality operator is used without
> the cast?

If you are trying to get this particular case to work, you are
going about it the hard way, since case-insensitive comparisons are
supported by the citext extension.  This particular case could be
solved by:

CREATE EXTENSION citext;

CREATE domain my_domain as citext check(VALUE ~ '^[A-C]{3}$');
CREATE TABLE my_table (val my_domain);
INSERT INTO my_table VALUES ('ABC');

SELECT * FROM my_table WHERE val = 'abc';

If this is just a simplified example, you might want to look at the
CREATE CAST statement.

http://www.postgresql.org/docs/current/interactive/sql-createcast.html

Or look through the source code for the citext extension to see
what other sorts of issues you might need to cover.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


David G Johnston <david.g.johnston@gmail.com> writes:
> On Thu, Jul 31, 2014 at 12:19 AM, Adam Mackler-5 [via PostgreSQL] <
> ml-node+s1045698n5813399h93@n5.nabble.com> wrote:
>> Can you explain what the abuse is?  Also why the "=" operator does not
>> work even without the domain?

> When you write, explicitly, "domain = text" the operator is directly ​found.
> When you write, implicitly, "domain = unknown" there is no operator and so
> the system has to go looking.  It decides that the operator least likely to
> cause a problem is the "text = text" (varchar, char(3), text - these are
> all implemented in the same manner and are effectively synonyms).

There are a couple of reasons why this operator doesn't get selected:

1. Unless there's an exact match out-of-the-box, the operator resolution
code's second step is to smash domains to base types.  So
"my_domain = unknown" is reduced to "char = unknown" and from there
you're going to end up with the char = char operator.

2. text is the preferred type among the string class, so any case where
you have text on one side and some other string type on the other is
going to get resolved as text vs text.

There's documentation about this at
http://www.postgresql.org/docs/9.3/static/typeconv-oper.html
although I see that it omits to mention the domain-flattening step.

Because of #1, domain-specific functions and operators tend to be pretty
useless; you find yourself always having to cast the other side to get
an exact match.  There's been discussion about changing that, but it's
not clear how to do it without breaking the fact that usually you *do*
want the base type's operators to apply to the domain.

If you're intent on having this behavior, the way to go at it is to make
your own actual datatype (not a domain) and create all your own comparison
operators for it.  You can add an implicit cast to text for cases where
you don't want to define your own operator.  See contrib/citext for an
example.

(BTW, I wonder whether you could solve your problem without a pile of
C programming by defining a domain over citext.)

            regards, tom lane


On Thu, Jul 31, 2014 at 10:03:00AM -0400, Tom Lane wrote:
> 2. text is the preferred type among the string class, so any case where
> you have text on one side and some other string type on the other is
> going to get resolved as text vs text.
> Because of #1, domain-specific functions and operators tend to be pretty
> useless; you find yourself always having to cast the other side to get
>
> If you're intent on having this behavior, the way to go at it is to make
> your own actual datatype (not a domain) and create all your own comparison
> operators for it.  You can add an implicit cast to text for cases where

Thanks to everyone who responded to my question.  Yes, the
case-insensitivity was an example contrivance.  My actual operator
function is using regular expressions to do text replacements, so
citext is not going to solve all my problems.  Looks like 'CREATE
TYPE' is the only way to get what I'm after here, and since there are
regular expressions involved, a C-language solution is not seeming
like it's going to be very convenient.  On top of that, the content of
my regular-expression replacement strings are constructed from data
in my database, so really my SQL-language operator function seemed
ideal except--of course--for the limitations you all have explained to
me.

One final question: the 'CREATE CAST' command got my interest.  I'm
assuming that when the docs say it 'performs a conversion between two
data types,' that the meaning of "data type" includes only those
created using 'CREATE TYPE' and excludes domains.  If I am mistaken on
that point I would be grateful to learn of that mistake.

Thanks again,
--
Adam Mackler


On Thursday, July 31, 2014, Adam Mackler-5 [via PostgreSQL] <<a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;ml-node%2Bs1045698n5813494h38@n5.nabble.com&#39;);" target="_blank">ml-node+s1045698n5813494h38@...> wrote:
On Thu, Jul 31, 2014 at 10:03:00AM -0400, Tom Lane wrote:
> 2. text is the preferred type among the string class, so any case where
> you have text on one side and some other string type on the other is
> going to get resolved as text vs text.
> Because of #1, domain-specific functions and operators tend to be pretty
> useless; you find yourself always having to cast the other side to get
>
> If you're intent on having this behavior, the way to go at it is to make
> your own actual datatype (not a domain) and create all your own comparison
> operators for it.  You can add an implicit cast to text for cases where

Thanks to everyone who responded to my question.  Yes, the
case-insensitivity was an example contrivance.  My actual operator
function is using regular expressions to do text replacements, so
citext is not going to solve all my problems.  
Looks like 'CREATE TYPE' is the only way to get what I'm after here, and since there are
regular expressions involved, a C-language solution is not seeming
like it's going to be very convenient.  On top of that, the content of
my regular-expression replacement strings are constructed from data
in my database, so really my SQL-language operator function seemed
ideal except--of course--for the limitations you all have explained to
me. 

I'm obviously under informed but you may wish to step back and consider whether you are being too clever/novel...

One area you need to evaluate is how what you do interplays with indexing - if necessary - and performance generally.


One final question: the 'CREATE CAST' command got my interest.  I'm
assuming that when the docs say it 'performs a conversion between two
data types,' that the meaning of "data type" includes only those
created using 'CREATE TYPE' and excludes domains.  If I am mistaken on
that point I would be grateful to learn of that mistake.


 Seems easy enough to test...though since 'value'::domain is valid syntax I would have to assume your conclusion is wrong.

David J.





View this message in context: Re: User-defined operator function: what parameter type to use for uncast character string?
Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Adam Mackler <pgsql-general@mackler.org> writes:
> One final question: the 'CREATE CAST' command got my interest.  I'm
> assuming that when the docs say it 'performs a conversion between two
> data types,' that the meaning of "data type" includes only those
> created using 'CREATE TYPE' and excludes domains.  If I am mistaken on
> that point I would be grateful to learn of that mistake.

I wouldn't recommend it.  The expected behavior of up-casting to a domain
from its base type is that the value doesn't change but the domain's
current check constraints are applied.  I think that if you defined
a cast via CREATE CAST it would override that behavior (I've not verified
this though), and it would then be entirely on your head whether the
resulting value was actually a legal member of the domain.

            regards, tom lane