Re: split string by special characters - Mailing list pgsql-general

From Andreas Wenk
Subject Re: split string by special characters
Date
Msg-id 4A6B1039.4040205@netzmeister-st-pauli.de
Whole thread Raw
In response to Re: split string by special characters  (Jan-Erik <jan-erik.larka@os2world.com>)
List pgsql-general
Jan-Erik schrieb:
> On 24 Juli, 23:22, a.w...@netzmeister-st-pauli.de (Andreas Wenk)
> wrote:
>> Hi,
>>
>> I was thinking about that and in my opinion the approach to let the
>> database do that is the wrong direction. Sure you can do a lot with
>> regexp_split_to_table or regexp_split_to_array but they are kind of
>
> Yes, I see. You're quite right, the split was intended to do give me
> everything in processed chunks it in some easy way as the last part of
> the interpretation of the text.
>
>> limited compared to a programming language using regular expressions. If
>> I had to try to get your jobdone, I would try regexp_matches() like:
>>
>> SELECT regexp_matches('This is just a text, that contain special
>> characters such as, (comma),"(", ")" (left and right parenthesis) as
>> well as "?" question, mark.How do I split it up with PostgreSQL?',
>> E'(\\w*.)\\s+','g');
>>
>> regexp_matches
>> ----------------
>>   {This}
>>   {is}
>>   {just}
>>   {a}
>>   {"text,"}
>>   {that}
>>   {contain}
>>   {special}
>>   {characters}
>>   {such}
>>   {"as,"}
>>   {","}
>>   {"\""}
>>   {left}
>>   {and}
>>   {right}
>>   {parenthesis)}
>>   {as}
>>   {well}
>>   {as}
>>   {"\""}
>>   {"question,"}
>>   {How}
>>   {do}
>>   {I}
>>   {split}
>>   {it}
>>   {up}
>>   {with}
>> (29 rows)
>>
>> So, you have the ability to catch the seperators like ','. But for now,
>> teh example just catches the comma. But you want to catch a lot of other
>
> Yes, but then I ran into the problems with separators that regexp
> consider as part of the expression and how to dynamically build the
> right expression in some unified way for each language.
>
>> seperators as well. I suggest you do that within the logic of your
>> coding language because I don't think this will be an easy way to walk
>
> Guess you're right, because I didn't know how to handle it with the
> regexp-approach.
> I sat down yesterday and wrote a function that does the job for me in
> PL/pgSQL, I'm not quite finished, but can see the light at the end of
> the tunnel.

just in case you are running into a black performance hole - you could
try to write it in C as a user defined function. Actually for me it
would be a real big challenge ;-)

> The basic approach I'm working with now is to let it find the position
> of each delimiter combination within the text, then sort the resulting
> array to get it ordered and extract each part.
> It won't be fast as lightning, but sufficient for now and as it seem,
> allow me to parse text from various files written in different
> languages (e.g. programming) just by specifying the delimiters.
>
>> ;-). This is no database job in my opinion.
>
> I didn't intend to try it either before I spotted some of those
> functions... :-)
> Then figured it would be nice to do it within the db-engine as all the
> data is present there.
> I wrote code outside the db-engine some time ago, but then other
> aspects made it less desirable to use.
>> Cheers
>>
>> Andy
>>
>
> Thank you Andy for the code example and your advice.
> I really appreciate that you took your time to show me how and explain
> why.
 > //Jan-Erik

hey you're welcome. When you're done it would be really great to see the
resulting function ;-)

Cheers

Andy



pgsql-general by date:

Previous
From: Andreas Wenk
Date:
Subject: Re: Disable databse listing for non-superuser (\l) ?
Next
From: Merlin Moncure
Date:
Subject: Re: Very slow joins