Thread: Regular expression and Group By

Regular expression and Group By

From

Clodoaldo

Date:

16 December 2006, 18:00:45

Regular expression and Group By

There is a varchar column which I need to group by an "uppered"
substring inside '[]' like in 'xxx[substring]yyy'. All the other lines
should not be changed.

I can do it using a union. I would like to reduce it to a single
query, but after much thought I can't. So I'm asking the regex experts
a hand.

This is how I do it:

-- drop table test_table;
create table test_table (tname varchar, value integer);
insert into test_table values ('[ab]x', 1);
insert into test_table values ('[ab]y', 2);
insert into test_table values ('[Ab]z', 3);
insert into test_table values ('w[aB]', 8);
insert into test_table values ('[abx', 4);
insert into test_table values ('ab]x', 5);
insert into test_table values ('xyz', 6);
insert into test_table values ('Xyz', 7);

select
   count(*) as total,
   tname,
   sum(value) as value_total
from (

   select
      substring(upper(tname) from E'\\[.*\\]') as tname,
      value
   from test_table
   where tname ~ E'\\[.*\\]'

   union all

   select tname, value
   from test_table
   where tname !~ E'\\[.*\\]'

) as a
group by tname
order by tname;

The result which is correct:

 total | tname | value_total
-------+-------+-------------
     4 | [AB]  |          14
     1 | [abx  |           4
     1 | ab]x  |           5
     1 | xyz   |           6
     1 | Xyz   |           7
(5 rows)

Regards,
--
Clodoaldo Pinto Neto

Re: Regular expression and Group By

From

Jeff Davis

Date:

18 December 2006, 14:40:53

On Fri, 2006-12-15 at 17:21 -0200, Clodoaldo wrote:
> -- drop table test_table;
> create table test_table (tname varchar, value integer);
> insert into test_table values ('[ab]x', 1);
> insert into test_table values ('[ab]y', 2);
> insert into test_table values ('[Ab]z', 3);
> insert into test_table values ('w[aB]', 8);
> insert into test_table values ('[abx', 4);
> insert into test_table values ('ab]x', 5);
> insert into test_table values ('xyz', 6);
> insert into test_table values ('Xyz', 7);
>
...
>  total | tname | value_total
> -------+-------+-------------
>      4 | [AB]  |          14
>      1 | [abx  |           4
>      1 | ab]x  |           5
>      1 | xyz   |           6
>      1 | Xyz   |           7
> (5 rows)
>

It looks like what you need is a function that returns the upper() of
the substring enclosed by "[]" if that substring exists, or else the
original string.

In other words, make a user-defined function that turns the string into
"tname" and GROUP BY the result of that function.

Hope this helps,
    Jeff Davis

Re: Regular expression and Group By

From

Clodoaldo

Date:

19 December 2006, 11:51:58

2006/12/18, Jeff Davis <pgsql@j-davis.com>:
> On Fri, 2006-12-15 at 17:21 -0200, Clodoaldo wrote:
> > -- drop table test_table;
> > create table test_table (tname varchar, value integer);
> > insert into test_table values ('[ab]x', 1);
> > insert into test_table values ('[ab]y', 2);
> > insert into test_table values ('[Ab]z', 3);
> > insert into test_table values ('w[aB]', 8);
> > insert into test_table values ('[abx', 4);
> > insert into test_table values ('ab]x', 5);
> > insert into test_table values ('xyz', 6);
> > insert into test_table values ('Xyz', 7);
> >
> ...
> >  total | tname | value_total
> > -------+-------+-------------
> >      4 | [AB]  |          14
> >      1 | [abx  |           4
> >      1 | ab]x  |           5
> >      1 | xyz   |           6
> >      1 | Xyz   |           7
> > (5 rows)
> >
>
> It looks like what you need is a function that returns the upper() of
> the substring enclosed by "[]" if that substring exists, or else the
> original string.
>
> In other words, make a user-defined function that turns the string into
> "tname" and GROUP BY the result of that function.
>

That worked. I was fixed in solving it with a regular expression and
didn't see the obvious.

Thanks.
--
Clodoaldo Pinto Neto