Re: split string by special characters - Mailing list pgsql-general

From Andreas Wenk
Subject Re: split string by special characters
Date
Msg-id 4A6A262F.3050506@netzmeister-st-pauli.de
Whole thread Raw
In response to split string by special characters  (Jan-Erik <jan-erik.larka@os2world.com>)
List pgsql-general
Jan-Erik wrote:
> I wonder if you could please help me out to extract a character string
> to an array or better yet, a table.
>
> I'd like to split strings of text up into words and delimiters (but
> not delete the delimiters). The delimiters are defined as comma,
> space, dot, singe/double quotation mark, question mark etc.¹ in a
> separate table (delimiters) depending on what rules apply for the
> input.
>
> regexp_split_to_array/table seem quite suitable but I have
> difficulties to form the right expression with it, apart from that it
> remove the delimiters as well.
>
> Example:
> This is just a text that contain special characters such as , (comma),
> "(", ")" (left and right parenthesis) as well as "?" question mark.
> How do I split it up with PostgreSQL?
>
> Expected result:
> {This, " ", is, " ", just, " ", a, ..., PostgreSQL, "?" }
> __________________
> ¹)  Also later on tags such as <html> and at other times something
> else depending on the circumstances.
>
> //Jan-Erik
>
Hi,

I was thinking about that and in my opinion the approach to let the
database do that is the wrong direction. Sure you can do a lot with
regexp_split_to_table or regexp_split_to_array but they are kind of
limited compared to a programming language using regular expressions. If
I had to try to get your jobdone, I would try regexp_matches() like:

SELECT regexp_matches('This is just a text, that contain special
characters such as, (comma),"(", ")" (left and right parenthesis) as
well as "?" question, mark.How do I split it up with PostgreSQL?',
E'(\\w*.)\\s+','g');

regexp_matches
----------------
  {This}
  {is}
  {just}
  {a}
  {"text,"}
  {that}
  {contain}
  {special}
  {characters}
  {such}
  {"as,"}
  {","}
  {"\""}
  {left}
  {and}
  {right}
  {parenthesis)}
  {as}
  {well}
  {as}
  {"\""}
  {"question,"}
  {How}
  {do}
  {I}
  {split}
  {it}
  {up}
  {with}
(29 rows)

So, you have the ability to catch the seperators like ','. But for now,
teh example just catches the comma. But you want to catch a lot of other
seperators as well. I suggest you do that within the logic of your
coding language because I don't think this will be an easy way to walk
;-). This is no database job in my opinion.

Cheers

Andy




pgsql-general by date:

Previous
From: Peter Hunsberger
Date:
Subject: Find difference between two Text fields
Next
From: "Brian A. Seklecki"
Date:
Subject: Disable databse listing for non-superuser (\l) ?