Re: SELECT substring with regex - Mailing list pgsql-sql
From | T E Schmitz |
---|---|
Subject | Re: SELECT substring with regex |
Date | |
Msg-id | 44AEC8C5.80103@numerixtechnology.de Whole thread Raw |
In response to | Re: SELECT substring with regex (Erik Jones <erik@myemma.com>) |
Responses |
Re: SELECT substring with regex
Re: SELECT substring with regex Re: SELECT substring with regex Re: SELECT substring with regex |
List | pgsql-sql |
Erik Jones wrote: > T E Schmitz wrote: > >> Gary Stainburn wrote: >> >>> On Friday 07 July 2006 14:51, T E Schmitz wrote: >>> >>>> I would like to split the contents of a column using substring with a >>>> regular expression: >>>> >>>> The column contains something like >>>> "150mm LD AD Asp XR Macro" >>>> I want to split this into >>>> "150mm", "LD AD Asp XR Macro" >>>> >>> >>> >>> select substring('150mm LD AD Asp XR Macro','^\\d+mm') as BASE_NAME, >>> substring('150mm LD AD Asp XR Macro','^\\d+mm (.*)$') as SUFFIX; >>> base_name | suffix >>> -----------+-------------------- >>> 150mm | LD AD Asp XR Macro >>> (1 row) >>> >>> The brackets surround the required match >> But that takes me to the next problem: >> >> For the sake of the example I simplified the regular pattern. >> In reality, BASE_NAME might be: >> >> 28mm >> 28-70mm >> >> So the reg. expr. requires brackets: >> >> substring (NAME, '^(\\d+(-\\d+)?mm)' ) as BASE_NAME >> > Will the mm always be the end of the base name? I had thought it best to simplify the problem for the purposes of the mailing list but maybe I should supply the complete problem and describe the purpose of the exercise: I am trying to come up with a semi-automatic solution to tidy up some data. If it's got to be done manually via the GUI it would mean a lot of dummy work [for the customer]. First of all I did a 5 table join to select those NAMEs which don't follow the required pattern: the pattern describes a photographic lens (focal length followed by lens speed (aperture)) and nothing else. Unfortuantely, there are a few hundred occurences where a few attributes have been appended which should have been stored elsewhere. valid entries would be: "28mm F2.8" (prime lens) "30-70mm F4" (zoom lens) "30-70mm F2.8" (zoom lens) "30-100mm F4.5-5.6" (zoom lens with variable speed) In the WHERE clause I have specified all those NAMEs, which follow that pattern but have some gubbins appended: WHERE NAME ~ '^((\\d+(-\\d+)?)mm F((\\d+(\.\\d+)?)+(\.(\\d+(\.\\d+)?)+)?))\\D+$' which gives me a listing of those candidates that need to be amended - manually or otherwise. Next, I wanted to produce a result which splits NAME into what it should be (BASE) and attributes (SUFFIX). Maybe I can generate some SQL from that to tidy up the data. -- Regards, Tarlika Elisabeth Schmitz