Thread: BUG #16512: Character considered as a number by regex but can not convert to numeric
BUG #16512: Character considered as a number by regex but can not convert to numeric
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16512 Logged by: tester375 Email address: oytf@drivetagdev.com PostgreSQL version: 12.3 Operating system: Windows 10 Description: When installed Postgresql 12.3 for testing purposes we have discovered following possible bug: Character "8" (different from standard number 8) is treated as a number by regular expression ~'\d+' but can not be converted to numeric by ::numeric How to reproduce the bug: Command: SELECT '8' ~ '\d+' returns "false" for PostgreSQL 10.4 on both Windows 10 and CentOS Linux 7 returns "false" for PostgreSQL 11.5 on CentOS Linux 7 returns "true" for PostgreSQL 12.3 on Windows 10 - but in this case, character still can not be converted to numeric. This behavior causes obvious problems when using data type conversion like: CASE WHEN column1 ~ '\d+' THEN column1::numeric AS column2 Error: '8' can not be cast to numeric
Re: BUG #16512: Character considered as a number by regex but can not convert to numeric
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > When installed Postgresql 12.3 for testing purposes we have discovered > following possible bug: > Character "8" (different from standard number 8) is treated as a number by > regular expression > ~'\d+' > but can not be converted to numeric by > ::numeric I see no reason to consider this a bug. The set of characters matched by \d is, as documented, dependent on your collation choice or locale settings (LC_CTYPE in particular). We are not going to try to guess at what random Unicode glyphs might be included in a particular locale, much less make numeric_in take them all. If you want to know whether a string is acceptable to numeric_in I'd suggest writing something based around [0-9] not \d. regards, tom lane
Re: BUG #16512: Character considered as a number by regex but can not convert to numeric
From
tester357
Date:
Behavior has been tested on both Postgresql 10 and Postgresql 12 with same LC_CTYPE set. Actualy it was the same Windows computer. The problem might be fixed by using [0-9\.]+ but it does not explain the different behaviour among Postgres versions. -- Sent from: https://www.postgresql-archive.org/PostgreSQL-bugs-f2117394.html
Re: BUG #16512: Character considered as a number by regex but can not convert to numeric
From
Tom Lane
Date:
tester357 <tester357.postgres@email.cz> writes: > Behavior has been tested on both Postgresql 10 and Postgresql 12 with same > LC_CTYPE set. Actualy it was the same Windows computer. > The problem might be fixed by using [0-9\.]+ but it does not explain the > different behaviour among Postgres versions. True. If the older version were pre-v10 the explanation would be that it lacked support for regex character-class checks on characters above U+7FF. But AFAICS, we should act the same in v10 and later versions: we just ask the platform's iswdigit() whether the character is a digit. I have to conclude there's something inconsistent in iswdigit(). I notice that POSIX seems to mandate that the [:digit:] class be exactly '0' through '9': the "Locale" chapter says digit Define the characters to be classified as numeric digits. In the POSIX locale, only: 0 1 2 3 4 5 6 7 8 9 shall be included. In a locale definition file, only the digits <zero>, <one>, <two>, <three>, <four>, <five>, <six>, <seven>, <eight>, and <nine> shall be specified, and in contiguous ascending sequence by numerical value. The digits <zero> to <nine> of the portable character set are automatically included in this class. The Linux man page for iswdigit() says the same, so it's no surprise that the test case returns "false" on Linux. Sadly, it's also little surprise if Microsoft thinks they're not bound by the POSIX spec. In general, I'm not finding anything particularly investigation-worthy in this report. If somebody with an appropriate system wanted to chase down exactly why the v10 vs v12 difference exists, that might be marginally interesting, but I doubt we'd consider it a bug. Platform-dependent results are expected here. regards, tom lane