Thread: Regular expression for lower case to upper case.
Hi all,
I want a regex to change the case of a field from UPPER to lower.
I know about the UPPER() and LOWER() functions and they are not what I want.
I would have thought this should be very simple, but I've searched a lot and can't seem to get an answer.
Here's a fiddle with a couple of things that I've tried - I'm obviously missing something that should be starting me in the face.
Any input appreciated.
E.
On 2022-12-10 09:40:22 +0000, Eagna wrote: > I want a regex to change the case of a field from UPPER to lower. First a note about terminology: A regular expression matches a string, it doesn't replace anything. The regexp_replace function uses a regular expression to match parts of a string and then uses to replacement string to replace them - but the replacement string is not itself a regular expresssion. Tha said, the replacement string in some editors (like Vim) and some programming languages (like Perl) provides syntax for changing case (both vi(m) and Perl use \u and \U...\E for uppercasing). https://www.postgresql.org/docs/current/functions-matching.html#FUNCTIONS-POSIX-REGEXP doesn't mention any special escapes except \1 through \9 and \&. So it is extremely likely that no such escapes exist. > I know about the UPPER() and LOWER() functions and they are not what I want. Can you elaborate why you can't use those? hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On Saturday, December 10, 2022, Eagna <eagna@protonmail.com> wrote:
Hi all,I want a regex to change the case of a field from UPPER to lower.I know about the UPPER() and LOWER() functions and they are not what I want.I would have thought this should be very simple, but I've searched a lot and can't seem to get an answer.Here's a fiddle with a couple of things that I've tried - I'm obviously missing something that should be starting me in the face.Any input appreciated.E.
RegExp by itself cannot do this. You have to match all parts of the input into different capturing groups, then use lower() combined with format() to build a new string. Putting the capturing groups into an array is the most useful option.
David J.
Hi, and thanks for your input, > Tha said, the replacement string in some editors (like Vim) and some > programming languages (like Perl) provides syntax for changing case > (both vi(m) and Perl use \u and \U...\E for uppercasing). This is probably why I was so frustrated - I thought that there should be an equivalent in PostgreSQL. I was going mad trying all of these from my searches. > > I know about the UPPER() and LOWER() functions and they are not what I want. > Can you elaborate why you can't use those? I want to index on a REGEXP_REPLACE() - I thought using lower -> upper would be a good test. I could always have used another REGEXP_REPLACE() for my testing, but I then became "obsessed" with the idea of using REGEXP_REPLACE()as a substitute for UPPER() - kind of an obfuscated code competition with myself! :-) Again, thanks for your input. E. > hp
Hi, and thanks for your input. > RegExp by itself cannot do this. You have to match all parts of the input into different capturing groups, then use lower()combined with format() to build a new string. Putting the capturing groups into an array is the most useful option. OK - I *_kind_* of see what you're saying. There's a small fiddle here (https://dbfiddle.uk/rhw1AdBY) if you'd care to give an outline of the solution that you propose. Thanks again, E. > David J.
On 2022-12-10 11:00:48 +0000, Eagna wrote: > > RegExp by itself cannot do this. You have to match all parts of the > > input into different capturing groups, then use lower() combined > > with format() to build a new string. Putting the capturing groups > > into an array is the most useful option. > > OK - I *_kind_* of see what you're saying. > > There's a small fiddle here (https://dbfiddle.uk/rhw1AdBY) if you'd > care to give an outline of the solution that you propose. For example like this: INSERT INTO test VALUES ('abc_def_ghi'); Let's say I want to uppercase the part between the two underscores. First use regexp_replace to split the string into three parts: One before the match, the match and one after the match: SELECT regexp_replace(x, '(.*_)(.*)(_.*)', '\1'), regexp_replace(x, '(.*_)(.*)(_.*)', '\2'), regexp_replace(x, '(.*_)(.*)(_.*)', '\3') FROM test; ╔════════════════╤════════════════╤════════════════╗ ║ regexp_replace │ regexp_replace │ regexp_replace ║ ╟────────────────┼────────────────┼────────────────╢ ║ abc_ │ def │ _ghi ║ ╚════════════════╧════════════════╧════════════════╝ (1 row) Once that works, uppercase the part you want and concatenate everything together again: SELECT regexp_replace(x, '(.*_)(.*)(_.*)', '\1') || upper(regexp_replace(x, '(.*_)(.*)(_.*)', '\2')) || regexp_replace(x, '(.*_)(.*)(_.*)', '\3') FROM test; ╔═════════════╗ ║ ?column? ║ ╟─────────────╢ ║ abc_DEF_ghi ║ ╚═════════════╝ (1 row) hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
On Sat, Dec 10, 2022 at 6:32 AM Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2022-12-10 11:00:48 +0000, Eagna wrote:
> > RegExp by itself cannot do this. You have to match all parts of the
> > input into different capturing groups, then use lower() combined
> > with format() to build a new string. Putting the capturing groups
> > into an array is the most useful option.
>
> OK - I *_kind_* of see what you're saying.
>
> There's a small fiddle here (https://dbfiddle.uk/rhw1AdBY) if you'd
> care to give an outline of the solution that you propose.
For example like this:
INSERT INTO test VALUES
('abc_def_ghi');
Let's say I want to uppercase the part between the two underscores.
First use regexp_replace to split the string into three parts: One
before the match, the match and one after the match:
SELECT
regexp_replace(x, '(.*_)(.*)(_.*)', '\1'),
regexp_replace(x, '(.*_)(.*)(_.*)', '\2'),
regexp_replace(x, '(.*_)(.*)(_.*)', '\3')
FROM test;
A bit too inefficient for my taste.
I was describing the following:
with parts as materialized (
select regexp_match(
'abc_def_ghi',
'^([^_]*_)([^_]*_)([^_]*)$') as part_array
)
select format(
'%s%s%s',
part_array[1],
upper(part_array[2]),
part_array[3])
from parts;
David J.
> On 10 Dec 2022, at 12:00, Eagna <eagna@protonmail.com> wrote: > > > Hi, and thanks for your input. > > >> RegExp by itself cannot do this. You have to match all parts of the input into different capturing groups, then use lower()combined with format() to build a new string. Putting the capturing groups into an array is the most useful option. > > > OK - I *_kind_* of see what you're saying. > > There's a small fiddle here (https://dbfiddle.uk/rhw1AdBY) if you'd care to give an outline of the solution that you propose. If you put all the regexes and their replacements into a table[1], you could use an aggregate over them to combine all thereplacements into the final string. It would need some aggregate like regex_replace_agg, which would probably be a customaggregate. [1]: If you stick to ASCII, you could just calculate them and even omit storing them in a physical table. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.