Thread: Problem with a Pettern Matching Check

Problem with a Pettern Matching Check

From
Sebastian Siewior
Date:
Hello hopefully correct List,

I was trying to do something that is not working as it supposed to.
First I created a table:

create table t ( col CHAR (3) CONSTRAINT numonly_col CHECK ( col ~ '^\\d+$' ) 
);

This check avoids non-numbers like '1a1' and allows '123'. For some
reason, I'm unable to find out why, it also avoids things like '1' and
'12'. Could someone please give me hint? :)

I was trying this one on PostgreSQL 8.0.3

-- 
Regards
Sebastian Siewior


Re: Problem with a Pettern Matching Check

From
"Dmitri Bichko"
Date:
I'm guessing it's because char gets padded with spaces to the specified
length.

Any reason you are using char(3) instead of varchar(3)?  And why are you
storing numbers as a string, anyway?  If you defined the column as a
numeric type, postgres will tell you if you try to insert something
non-numeric.

Dmitri

-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Sebastian Siewior
Sent: Monday, August 15, 2005 6:20 PM
To: pgsql-sql@postgresql.org
Subject: [SQL] Problem with a Pettern Matching Check


Hello hopefully correct List,

I was trying to do something that is not working as it supposed to.
First I created a table:

create table t ( col CHAR (3) CONSTRAINT numonly_col CHECK ( col ~ '^\\d+$' )
);

This check avoids non-numbers like '1a1' and allows '123'. For some
reason, I'm unable to find out why, it also avoids things like '1' and
'12'. Could someone please give me hint? :)

I was trying this one on PostgreSQL 8.0.3

--
Regards
Sebastian Siewior

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to      choose an index scan if your joining column's
datatypesdo not      match 
The information transmitted is intended only for the person or entity to which it is addressed and may contain
confidentialand/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any
actionin reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you
receivedthis in error, please contact the sender and delete the material from any computer 


Re: Problem with a Pettern Matching Check

From
Andreas Seltenreich
Date:
Sebastian Siewior schrob:

> Hello hopefully correct List,

perfectly.

> I was trying to do something that is not working as it supposed to.
> First I created a table:
>
> create table t (
>   col CHAR (3) CONSTRAINT numonly_col CHECK ( col ~ '^\\d+$' ) 
> );
>
> This check avoids non-numbers like '1a1' and allows '123'. For some
> reason, I'm unable to find out why, it also avoids things like '1' and
> '12'. Could someone please give me hint? :)

Char is padded with spaces, and that is also why your regexp is not
matching in these situations. You could either adjust your regexp to
match the trailing spaces or use varchar(3) instead:

--8<---------------cut here---------------start------------->8---
scratch=# select '1'::char(3) ~ '^\\d+$';?column? 
----------f
(1 row)

scratch=# select '1'::char(3) ~ '^\\d+\\s*$';?column? 
----------t
(1 row)

scratch=# select '1'::varchar(3) ~ '^\\d+$';?column? 
----------t
(1 row)
--8<---------------cut here---------------end--------------->8---

regards
Andreas
-- 


Re: Problem with a Pettern Matching Check

From
Michael Fuhr
Date:
On Tue, Aug 16, 2005 at 12:19:50AM +0200, Sebastian Siewior wrote:
>
> create table t (
>   col CHAR (3) CONSTRAINT numonly_col CHECK ( col ~ '^\\d+$' ) 
> );
> 
> This check avoids non-numbers like '1a1' and allows '123'. For some
> reason, I'm unable to find out why, it also avoids things like '1' and
> '12'. Could someone please give me hint? :)

The CHAR(3) specification causes the value to be space-padded, so
'1' becomes '1  ' (the digit "one" followed by two spaces).  See
"Character Types" in the documentation:

http://www.postgresql.org/docs/8.0/static/datatype-character.html

Do you have a reason for using a character type instead of a numeric
type like integer?

-- 
Michael Fuhr


Re: Problem with a Pettern Matching Check

From
Sebastian Siewior
Date:
On Mon, 15 Aug 2005 18:37:52 -0400
"Dmitri Bichko" <dbichko@aveopharma.com> wrote:

> I'm guessing it's because char gets padded with spaces to the
> specified length.

argh. Thank you.

> Any reason you are using char(3) instead of varchar(3)?  

The numbers will have 2 or 3 digits so I tried to save some space :)

> And why are you storing numbers as a string, anyway?  If you defined
> the column as a numeric type, postgres will tell you if you try to
> insert something non-numeric.

Correct. I will not let the values to be used together with
sum/avg/+/...

> 
> Dmitri
> 

-- 
Regards
Sebastian Siewior


Re: Problem with a Pettern Matching Check

From
"Dmitri Bichko"
Date:
>> Any reason you are using char(3) instead of varchar(3)?

>The numbers will have 2 or 3 digits so I tried to save some space :)

Well, smallint is only 2 bytes, so it would be more compact than either
char(3) or varchar(3).

Dmitri
The information transmitted is intended only for the person or entity to which it is addressed and may contain
confidentialand/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any
actionin reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you
receivedthis in error, please contact the sender and delete the material from any computer 


Re: Problem with a Pettern Matching Check

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> The CHAR(3) specification causes the value to be space-padded, so
> '1' becomes '1  ' (the digit "one" followed by two spaces).

Actually, we seem to be going out of our way to make this case fail.
Given that we consider trailing spaces in char(n) to be semantically
insignificant, would it make sense to strip them before doing the
regex pattern match?  That would happen automatically if we allowed
the char(n) value to promote to text --- and the only reason it's
not doing so is that there's an extra ~ operator definition that
specifically prevents that (bpcharregexeq).

I have a feeling that we added that operator definition at some point
for backwards compatibility, but it seems a bit odd now.
        regards, tom lane


Re: Problem with a Pettern Matching Check

From
Michael Fuhr
Date:
On Mon, Aug 15, 2005 at 08:21:23PM -0400, Tom Lane wrote:
> Michael Fuhr <mike@fuhr.org> writes:
> > The CHAR(3) specification causes the value to be space-padded, so
> > '1' becomes '1  ' (the digit "one" followed by two spaces).
> 
> Actually, we seem to be going out of our way to make this case fail.
> Given that we consider trailing spaces in char(n) to be semantically
> insignificant, would it make sense to strip them before doing the
> regex pattern match?

How standards-compliant would that be?  Does the standard specify
what should happen when using SIMILAR TO with a char(n) value?
I notice that equality and SQL regular expressions treat trailing
spaces differently -- is that intentional or accidental?

CREATE TABLE foo (test char(3));
INSERT INTO foo VALUES ('1');

SELECT test, test = '1', test SIMILAR TO '1' FROM foo;test | ?column? | ?column? 
------+----------+----------1    | t        | f
(1 row)

SELECT test, test = '1  ', test SIMILAR TO '1  ' FROM foo;test | ?column? | ?column? 
------+----------+----------1    | t        | t
(1 row)

> That would happen automatically if we allowed
> the char(n) value to promote to text --- and the only reason it's
> not doing so is that there's an extra ~ operator definition that
> specifically prevents that (bpcharregexeq).

"bpcharregexeq" -- didn't Philip Glass write the score to that?

-- 
Michael Fuhr


Re: Problem with a Pettern Matching Check

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> On Mon, Aug 15, 2005 at 08:21:23PM -0400, Tom Lane wrote:
>> Given that we consider trailing spaces in char(n) to be semantically
>> insignificant, would it make sense to strip them before doing the
>> regex pattern match?

> How standards-compliant would that be?  Does the standard specify
> what should happen when using SIMILAR TO with a char(n) value?

Hmm ... suddenly I'm getting a strong sense of deja vu ... think we've
been around this merry-go-round before.  SQL99 says
            ii) The <predicate>
                  MC LIKE PC
                is true if there exists a partitioning of MCV into                substrings such that:
                1) A substring of MCV is a sequence of 0 (zero) or more                  contiguous <character
representation>sof MCV and each                  <character representation> of MCV is part of exactly one
  substring.
 
                2) If the i-th substring specifier of PCV is an arbitrary                  character specifier, the
i-thsubstring of MCV is any                  single <character representation>.
 
                3) If the i-th substring specifier of PCV is an arbitrary                  string specifier, then the
i-thsubstring of MCV                  is any sequence of 0 (zero) or more <character
representation>s.
                4) If the i-th substring specifier of PCV is neither an                  arbitrary character specifier
noran arbitrary string                  specifier, then the i-th substring of MCV is equal to                  that
substringspecifier according to the collating                  sequence of the <like predicate>, without the appending
               of <space> characters to MCV, and has the same length as                  that substring specifier.
 
                5) The number of substrings of MCV is equal to the number                  of substring specifiers of
PCV.

Rule ii.4 says that you use the collating sequence associated with the
data values, which is where the SQL spec keeps its space sensitivity
information --- but the restrictions about not adding space characters
and having the same length seem to be intended to prevent use of
pad-space-insensitivity to create a match.

I think we read this text before, came to the same conclusion, and
put in the special operator to make it behave that way.  So ...
never mind.
        regards, tom lane