Thread: BUG #11523: Regular expressions work differently on different platforms
BUG #11523: Regular expressions work differently on different platforms
From
dmigowski@ikoffice.de
Date:
The following bug has been logged on the website: Bug reference: 11523 Logged by: Daniel Migowski Email address: dmigowski@ikoffice.de PostgreSQL version: 9.1.2 Operating system: Debian Linux 6.0.6 + Windows 7 Description: I recently found that regular expressions, or specifically the [:space:] shorthand escape work differntly on Windows and Linux. On Linux the non-brakeable space is not included in the shorthand escape, on windows it is. The following statement is therefore true on Windows and false on Linux: select convert_from(E'\\xA0'::bytea,'ISO8859-1') ~ '\s' This brakes email validation here, and the insert of a linux created backup into my windows machine. Is it possible to fix that? Is there a reason that UTF-8 on Linux differs from UTF-8 on Windows?
dmigowski@ikoffice.de writes: > I recently found that regular expressions, or specifically the [:space:] > shorthand escape work differntly on Windows and Linux. On Linux the > non-brakeable space is not included in the shorthand escape, on windows it > is. That would depend on what locale you're using for LC_CTYPE. We can't do much about the fact that locale definitions vary across platforms. In principle you could use C locale, which *is* standardized, but that cure may be worse than the disease for your purposes. You could always spell it out with whatever set of characters you consider whitespace: [ \t\r\n] or something like that. For purposes like email address validation, the set of whitespace characters allowed by the relevant RFCs is probably smaller than most locales' [:space:] anyway. regards, tom lane