Thread: BUG #11523: Regular expressions work differently on different platforms

BUG #11523: Regular expressions work differently on different platforms

From
dmigowski@ikoffice.de
Date:
The following bug has been logged on the website:

Bug reference:      11523
Logged by:          Daniel Migowski
Email address:      dmigowski@ikoffice.de
PostgreSQL version: 9.1.2
Operating system:   Debian Linux 6.0.6 + Windows 7
Description:

I recently found that regular expressions, or specifically the [:space:]
shorthand escape work differntly on Windows and Linux. On Linux the
non-brakeable space is not included in the shorthand escape, on windows it
is. The following statement is therefore true on Windows and false on
Linux:

    select convert_from(E'\\xA0'::bytea,'ISO8859-1') ~ '\s'

This brakes email validation here, and the insert of a linux created backup
into my windows machine. Is it possible to fix that? Is there a reason that
UTF-8 on Linux differs from UTF-8 on Windows?

Re: BUG #11523: Regular expressions work differently on different platforms

From
Tom Lane
Date:
dmigowski@ikoffice.de writes:
> I recently found that regular expressions, or specifically the [:space:]
> shorthand escape work differntly on Windows and Linux. On Linux the
> non-brakeable space is not included in the shorthand escape, on windows it
> is.

That would depend on what locale you're using for LC_CTYPE.  We can't do
much about the fact that locale definitions vary across platforms.  In
principle you could use C locale, which *is* standardized, but that cure
may be worse than the disease for your purposes.

You could always spell it out with whatever set of characters you consider
whitespace: [ \t\r\n] or something like that.  For purposes like email
address validation, the set of whitespace characters allowed by the
relevant RFCs is probably smaller than most locales' [:space:] anyway.

            regards, tom lane