Re: Adding a distinct "pattern" type to resolve the "~" commutator stalemate - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: Adding a distinct "pattern" type to resolve the "~" commutator stalemate
Date
Msg-id CA02028E-58B6-4764-95C1-1551891995AD@phlo.org
Whole thread Raw
In response to Re: Adding a distinct "pattern" type to resolve the "~" commutator stalemate  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: Adding a distinct "pattern" type to resolve the "~" commutator stalemate
List pgsql-hackers
On Jun20, 2011, at 00:56 , Andrew Dunstan wrote:
> On 06/19/2011 05:02 PM, Florian Pflug wrote:
>> The only argument against that I can see is that it poses
>> a compatibility problem if "~" remains the pattern matching
>> operator. I do believe, however, that the chance of
>>   unknown ~ unknown
>> appearing in actual applications is rather small - that'd only
>> happen if people used postgresql's regexp engine together with
>> purely external data.
> 
> People can store regular expressions in text fields now, and do,
> let me assure you. So the chances you'll encounter text ~ unknown
> or unknown ~ text  or text ~ text are 100%

Hm, it seems we either all have different idea about how such
a pattern type would be be defined, or have grown so accustomed to
pg's type system that we've forgotten how powerful it really
is ;-) (For me, the latter is surely true...).

I've now created a primitive prototype that uses a composite
type for "pattern". That changes the input syntax for patterns
(you need to enclose them in brackets), but should model all
the implicit and explicit casting rules and operator selection
correctly. It also uses "~~~" in place of "~", for obvious
reasons and again without changing the casting and overloading
rules.

The prototype defines text ~~~ text text ~~~ pattern pattern ~~~ text
and can be found at end of this mail.

With that prototype, *all* the cases (even unknown ~~~ unknown)
work as today, i.e. all of the statements below return true

postgres=# select 'abc' ~~~ '^ab+c$';
postgres=# select 'abc'::text ~~~ '^ab+c$';
postgres=# select 'abc' ~~~ '^ab+c$'::text;
postgres=# select 'abc' ~~~ '(^ab+c$)'::pattern;
postgres=# select '(^ab+c$)'::pattern ~~~ 'abc';

(The same happens with and without setting pattern's typcategory
to 'S'. Not really sure if the category has any effect here
at all).

That's not exactly what I had in mind - I'd have preferred unknown ~~~ unknown
to return an error but text ~~~ unknown
and unknown ~~~ text
to work, but it looks that that's not easily done.

Still, I believe the behaviour of the prototype is acceptable.

BTW, The reason that 'unknown ~~~ unknown' works is, I believe
the following comment func_select_candidate, together with the
fact that 'text' is the preferred type in the string category.
 If any candidate has an input datatype of STRING category, use STRING category (this bias towards STRING is
appropriatesince unknown-type literals look like strings).
 

best regards,
Florian Pflug

create type pattern as (p text);

create function match_right(l text, r text) returns boolean as $$ select $1 ~ $2
$$ language sql strict immutable;

create operator ~~~ ( procedure = match_right, leftarg = text, rightarg = text
);

create function match_right(l text, r pattern) returns boolean as $$ select $1 ~ $2.p
$$ language sql strict immutable;

create operator ~~~ ( procedure = match_right, commutator = '~~~', leftarg = text, rightarg = pattern
);

create function match_left(l pattern, r text) returns boolean as $$ select $2 ~ $1.p
$$ language sql strict immutable;

create operator ~~~ ( procedure = match_left, commutator = '~~~', leftarg = pattern, rightarg = text
);

update pg_type set typcategory = 'S' where oid = 'pattern'::regtype;




pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Small SSI issues
Next
From: Florian Pflug
Date:
Subject: Re: the big picture for index-only scans