Thread: Problem with a Pettern Matching Check
Hello hopefully correct List, I was trying to do something that is not working as it supposed to. First I created a table: create table t ( col CHAR (3) CONSTRAINT numonly_col CHECK ( col ~ '^\\d+$' ) ); This check avoids non-numbers like '1a1' and allows '123'. For some reason, I'm unable to find out why, it also avoids things like '1' and '12'. Could someone please give me hint? :) I was trying this one on PostgreSQL 8.0.3 -- Regards Sebastian Siewior
I'm guessing it's because char gets padded with spaces to the specified length. Any reason you are using char(3) instead of varchar(3)? And why are you storing numbers as a string, anyway? If you defined the column as a numeric type, postgres will tell you if you try to insert something non-numeric. Dmitri -----Original Message----- From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Sebastian Siewior Sent: Monday, August 15, 2005 6:20 PM To: pgsql-sql@postgresql.org Subject: [SQL] Problem with a Pettern Matching Check Hello hopefully correct List, I was trying to do something that is not working as it supposed to. First I created a table: create table t ( col CHAR (3) CONSTRAINT numonly_col CHECK ( col ~ '^\\d+$' ) ); This check avoids non-numbers like '1a1' and allows '123'. For some reason, I'm unable to find out why, it also avoids things like '1' and '12'. Could someone please give me hint? :) I was trying this one on PostgreSQL 8.0.3 -- Regards Sebastian Siewior ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypesdo not match The information transmitted is intended only for the person or entity to which it is addressed and may contain confidentialand/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any actionin reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you receivedthis in error, please contact the sender and delete the material from any computer
Sebastian Siewior schrob: > Hello hopefully correct List, perfectly. > I was trying to do something that is not working as it supposed to. > First I created a table: > > create table t ( > col CHAR (3) CONSTRAINT numonly_col CHECK ( col ~ '^\\d+$' ) > ); > > This check avoids non-numbers like '1a1' and allows '123'. For some > reason, I'm unable to find out why, it also avoids things like '1' and > '12'. Could someone please give me hint? :) Char is padded with spaces, and that is also why your regexp is not matching in these situations. You could either adjust your regexp to match the trailing spaces or use varchar(3) instead: --8<---------------cut here---------------start------------->8--- scratch=# select '1'::char(3) ~ '^\\d+$';?column? ----------f (1 row) scratch=# select '1'::char(3) ~ '^\\d+\\s*$';?column? ----------t (1 row) scratch=# select '1'::varchar(3) ~ '^\\d+$';?column? ----------t (1 row) --8<---------------cut here---------------end--------------->8--- regards Andreas --
On Tue, Aug 16, 2005 at 12:19:50AM +0200, Sebastian Siewior wrote: > > create table t ( > col CHAR (3) CONSTRAINT numonly_col CHECK ( col ~ '^\\d+$' ) > ); > > This check avoids non-numbers like '1a1' and allows '123'. For some > reason, I'm unable to find out why, it also avoids things like '1' and > '12'. Could someone please give me hint? :) The CHAR(3) specification causes the value to be space-padded, so '1' becomes '1 ' (the digit "one" followed by two spaces). See "Character Types" in the documentation: http://www.postgresql.org/docs/8.0/static/datatype-character.html Do you have a reason for using a character type instead of a numeric type like integer? -- Michael Fuhr
On Mon, 15 Aug 2005 18:37:52 -0400 "Dmitri Bichko" <dbichko@aveopharma.com> wrote: > I'm guessing it's because char gets padded with spaces to the > specified length. argh. Thank you. > Any reason you are using char(3) instead of varchar(3)? The numbers will have 2 or 3 digits so I tried to save some space :) > And why are you storing numbers as a string, anyway? If you defined > the column as a numeric type, postgres will tell you if you try to > insert something non-numeric. Correct. I will not let the values to be used together with sum/avg/+/... > > Dmitri > -- Regards Sebastian Siewior
>> Any reason you are using char(3) instead of varchar(3)? >The numbers will have 2 or 3 digits so I tried to save some space :) Well, smallint is only 2 bytes, so it would be more compact than either char(3) or varchar(3). Dmitri The information transmitted is intended only for the person or entity to which it is addressed and may contain confidentialand/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any actionin reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you receivedthis in error, please contact the sender and delete the material from any computer
Michael Fuhr <mike@fuhr.org> writes: > The CHAR(3) specification causes the value to be space-padded, so > '1' becomes '1 ' (the digit "one" followed by two spaces). Actually, we seem to be going out of our way to make this case fail. Given that we consider trailing spaces in char(n) to be semantically insignificant, would it make sense to strip them before doing the regex pattern match? That would happen automatically if we allowed the char(n) value to promote to text --- and the only reason it's not doing so is that there's an extra ~ operator definition that specifically prevents that (bpcharregexeq). I have a feeling that we added that operator definition at some point for backwards compatibility, but it seems a bit odd now. regards, tom lane
On Mon, Aug 15, 2005 at 08:21:23PM -0400, Tom Lane wrote: > Michael Fuhr <mike@fuhr.org> writes: > > The CHAR(3) specification causes the value to be space-padded, so > > '1' becomes '1 ' (the digit "one" followed by two spaces). > > Actually, we seem to be going out of our way to make this case fail. > Given that we consider trailing spaces in char(n) to be semantically > insignificant, would it make sense to strip them before doing the > regex pattern match? How standards-compliant would that be? Does the standard specify what should happen when using SIMILAR TO with a char(n) value? I notice that equality and SQL regular expressions treat trailing spaces differently -- is that intentional or accidental? CREATE TABLE foo (test char(3)); INSERT INTO foo VALUES ('1'); SELECT test, test = '1', test SIMILAR TO '1' FROM foo;test | ?column? | ?column? ------+----------+----------1 | t | f (1 row) SELECT test, test = '1 ', test SIMILAR TO '1 ' FROM foo;test | ?column? | ?column? ------+----------+----------1 | t | t (1 row) > That would happen automatically if we allowed > the char(n) value to promote to text --- and the only reason it's > not doing so is that there's an extra ~ operator definition that > specifically prevents that (bpcharregexeq). "bpcharregexeq" -- didn't Philip Glass write the score to that? -- Michael Fuhr
Michael Fuhr <mike@fuhr.org> writes: > On Mon, Aug 15, 2005 at 08:21:23PM -0400, Tom Lane wrote: >> Given that we consider trailing spaces in char(n) to be semantically >> insignificant, would it make sense to strip them before doing the >> regex pattern match? > How standards-compliant would that be? Does the standard specify > what should happen when using SIMILAR TO with a char(n) value? Hmm ... suddenly I'm getting a strong sense of deja vu ... think we've been around this merry-go-round before. SQL99 says ii) The <predicate> MC LIKE PC is true if there exists a partitioning of MCV into substrings such that: 1) A substring of MCV is a sequence of 0 (zero) or more contiguous <character representation>sof MCV and each <character representation> of MCV is part of exactly one substring. 2) If the i-th substring specifier of PCV is an arbitrary character specifier, the i-thsubstring of MCV is any single <character representation>. 3) If the i-th substring specifier of PCV is an arbitrary string specifier, then the i-thsubstring of MCV is any sequence of 0 (zero) or more <character representation>s. 4) If the i-th substring specifier of PCV is neither an arbitrary character specifier noran arbitrary string specifier, then the i-th substring of MCV is equal to that substringspecifier according to the collating sequence of the <like predicate>, without the appending of <space> characters to MCV, and has the same length as that substring specifier. 5) The number of substrings of MCV is equal to the number of substring specifiers of PCV. Rule ii.4 says that you use the collating sequence associated with the data values, which is where the SQL spec keeps its space sensitivity information --- but the restrictions about not adding space characters and having the same length seem to be intended to prevent use of pad-space-insensitivity to create a match. I think we read this text before, came to the same conclusion, and put in the special operator to make it behave that way. So ... never mind. regards, tom lane