Thread: BUG #18870: weird behavior with regexp_replace
The following bug has been logged on the website: Bug reference: 18870 Logged by: reinko Email address: devops@key2asset.com PostgreSQL version: 17.4 Operating system: Ubuntu 11.4.0-1ubuntu1~22.04 Description: select regexp_replace(LOWER('Örebro'), '\W', '_', 'g') in postgres 15 the result is örebro which is correct since ö should fit in the \w for a regex. select regexp_replace(LOWER('Örebro'), '\W', '_', 'g') --> since postgres 17 the result is _rebro which is incorrect since \w should also contain characters like ö, ä, ë. the to lower is not really relevant to this issue the same happens when it's just a direct string aswell. this issue happens with alot of special a-z characters é, è have the same issue for example. Kind regards, Reinko Brink
PG Bug reporting form <noreply@postgresql.org> writes: > select regexp_replace(LOWER('Örebro'), '\W', '_', 'g') in postgres 15 the > result is örebro which is correct since ö should fit in the \w for a > regex. > select regexp_replace(LOWER('Örebro'), '\W', '_', 'g') --> since postgres 17 > the result is _rebro which is incorrect since \w should also contain > characters like ö, ä, ë. This most likely indicates that you've got a different database collation selected in the v17 installation. Postgres defers to the LC_CTYPE setting (or, in some configurations, the ICU collation) to decide what is a letter. See https://www.postgresql.org/docs/current/charset.html psql's "\l" command will give a quick overview of what collations you have selected. regards, tom lane