Re: split string by special characters - Mailing list pgsql-general

From Jan-Erik
Subject Re: split string by special characters
Date
Msg-id cfe54060-28a8-475b-80b1-281390713277@r2g2000yqm.googlegroups.com
Whole thread Raw
In response to split string by special characters  (Jan-Erik <jan-erik.larka@os2world.com>)
Responses Re: split string by special characters
List pgsql-general
On 24 Juli, 23:22, a.w...@netzmeister-st-pauli.de (Andreas Wenk)
wrote:
>
> Hi,
>
> I was thinking about that and in my opinion the approach to let the
> database do that is the wrong direction. Sure you can do a lot with
> regexp_split_to_table or regexp_split_to_array but they are kind of

Yes, I see. You're quite right, the split was intended to do give me
everything in processed chunks it in some easy way as the last part of
the interpretation of the text.

> limited compared to a programming language using regular expressions. If
> I had to try to get your jobdone, I would try regexp_matches() like:
>
> SELECT regexp_matches('This is just a text, that contain special
> characters such as, (comma),"(", ")" (left and right parenthesis) as
> well as "?" question, mark.How do I split it up with PostgreSQL?',
> E'(\\w*.)\\s+','g');
>
> regexp_matches
> ----------------
>   {This}
>   {is}
>   {just}
>   {a}
>   {"text,"}
>   {that}
>   {contain}
>   {special}
>   {characters}
>   {such}
>   {"as,"}
>   {","}
>   {"\""}
>   {left}
>   {and}
>   {right}
>   {parenthesis)}
>   {as}
>   {well}
>   {as}
>   {"\""}
>   {"question,"}
>   {How}
>   {do}
>   {I}
>   {split}
>   {it}
>   {up}
>   {with}
> (29 rows)
>
> So, you have the ability to catch the seperators like ','. But for now,
> teh example just catches the comma. But you want to catch a lot of other

Yes, but then I ran into the problems with separators that regexp
consider as part of the expression and how to dynamically build the
right expression in some unified way for each language.

> seperators as well. I suggest you do that within the logic of your
> coding language because I don't think this will be an easy way to walk

Guess you're right, because I didn't know how to handle it with the
regexp-approach.
I sat down yesterday and wrote a function that does the job for me in
PL/pgSQL, I'm not quite finished, but can see the light at the end of
the tunnel.
The basic approach I'm working with now is to let it find the position
of each delimiter combination within the text, then sort the resulting
array to get it ordered and extract each part.
It won't be fast as lightning, but sufficient for now and as it seem,
allow me to parse text from various files written in different
languages (e.g. programming) just by specifying the delimiters.

> ;-). This is no database job in my opinion.

I didn't intend to try it either before I spotted some of those
functions... :-)
Then figured it would be nice to do it within the db-engine as all the
data is present there.
I wrote code outside the db-engine some time ago, but then other
aspects made it less desirable to use.
>
> Cheers
>
> Andy
>

Thank you Andy for the code example and your advice.
I really appreciate that you took your time to show me how and explain
why.

//Jan-Erik

pgsql-general by date:

Previous
From: Aleksander Kmetec
Date:
Subject: Re: Find difference between two Text fields
Next
From: Craig Ringer
Date:
Subject: Re: Copying only incremental records to another DB..