Thread: SUBSTRING for a regular expression

SUBSTRING for a regular expression

From
Date:
I can't seem to get right the regular expression for parsing data like
these four sample rows (names and addresses changed to ficticious values)
from a text-type column:

Yolanda Harris, 38, of 40 South Main St., Newtown City, was charged
Sunday with breach of peace and interfering with a police officer.

Allen K. George, 30, of 88 Beverly Court was charged Saturday with
possession of marijuana, third-degree criminal mischief, breach of peace,
evading responsibility, interfering with a police officer, driving with a
suspended license, driving under the influence of drugs or alcohol,
failure to drive right and failure to have proper insurance.

Brain T. Grafton, 18, of 97 Bristol Ave. was charged Sunday with
possession of marijuana, possession of alcohol by a minor and failure to
wear a seat belt.

Brian D. Sptizer Jr., 18, of 20 Walling Ave., Northford, was charged
Sunday with driving under the influence of drugs or alcohol, evading
responsibility and following too closely.

Into separate columns for: name, age, address, charge. For example the
first record would have

name='Yolanda Harris'
age=38
address='40 South Main St., Newtown City'
charge='was charged Sunday with breach of peace and interfering with a
police officer.'

To get the name, for instance, I tried

SELECT SUBSTRING(description FROM '^([:alnum:]*), \d{2}, .*$') FROM
police_log;

or the age value

SELECT SUBSTRING(description FROM '^[:alnum:]*, (\d{2}), .*$') FROM
police_log;

But return values are all NULL. Can anyone give me some RE help, please?

--Berend Tober




Re: SUBSTRING for a regular expression

From
mike g
Date:
What language are you using for this plperl?

That command listed in the documentation certainly works in psql or
plpgsql but I don't know about plperl.

Are you assigning the result to a variable and then printing the result
of that variable before the function ends or printing based on what the
function returns?

I think you will probably need to show us the rest of the code in your
function.

On Mon, 2004-07-05 at 19:40, btober@computer.org wrote:
> I can't seem to get right the regular expression for parsing data like
> these four sample rows (names and addresses changed to ficticious values)
> from a text-type column:
>
> Yolanda Harris, 38, of 40 South Main St., Newtown City, was charged
> Sunday with breach of peace and interfering with a police officer.
>
> Allen K. George, 30, of 88 Beverly Court was charged Saturday with
> possession of marijuana, third-degree criminal mischief, breach of peace,
> evading responsibility, interfering with a police officer, driving with a
> suspended license, driving under the influence of drugs or alcohol,
> failure to drive right and failure to have proper insurance.
>
> Brain T. Grafton, 18, of 97 Bristol Ave. was charged Sunday with
> possession of marijuana, possession of alcohol by a minor and failure to
> wear a seat belt.
>
> Brian D. Sptizer Jr., 18, of 20 Walling Ave., Northford, was charged
> Sunday with driving under the influence of drugs or alcohol, evading
> responsibility and following too closely.
>
> Into separate columns for: name, age, address, charge. For example the
> first record would have
>
> name='Yolanda Harris'
> age=38
> address='40 South Main St., Newtown City'
> charge='was charged Sunday with breach of peace and interfering with a
> police officer.'
>
> To get the name, for instance, I tried
>
> SELECT SUBSTRING(description FROM '^([:alnum:]*), \d{2}, .*$') FROM
> police_log;
>
> or the age value
>
> SELECT SUBSTRING(description FROM '^[:alnum:]*, (\d{2}), .*$') FROM
> police_log;
>
> But return values are all NULL. Can anyone give me some RE help, please?
>
> --Berend Tober
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match

Re: SUBSTRING for a regular expression

From
Harald Fuchs
Date:
In article <64617.206.53.65.243.1089074434.squirrel@$HOSTNAME>,
<btober@computer.org> writes:

> I can't seem to get right the regular expression for parsing data like
> these four sample rows (names and addresses changed to ficticious values)
> from a text-type column:

> Yolanda Harris, 38, of 40 South Main St., Newtown City, was charged
> Sunday with breach of peace and interfering with a police officer.

> Allen K. George, 30, of 88 Beverly Court was charged Saturday with
> possession of marijuana, third-degree criminal mischief, breach of peace,
> evading responsibility, interfering with a police officer, driving with a
> suspended license, driving under the influence of drugs or alcohol,
> failure to drive right and failure to have proper insurance.

> Brain T. Grafton, 18, of 97 Bristol Ave. was charged Sunday with
> possession of marijuana, possession of alcohol by a minor and failure to
> wear a seat belt.

> Brian D. Sptizer Jr., 18, of 20 Walling Ave., Northford, was charged
> Sunday with driving under the influence of drugs or alcohol, evading
> responsibility and following too closely.

> Into separate columns for: name, age, address, charge. For example the
> first record would have

> name='Yolanda Harris'
> age=38
> address='40 South Main St., Newtown City'
> charge='was charged Sunday with breach of peace and interfering with a
> police officer.'

> To get the name, for instance, I tried

> SELECT SUBSTRING(description FROM '^([:alnum:]*), \d{2}, .*$') FROM
> police_log;

> or the age value

> SELECT SUBSTRING(description FROM '^[:alnum:]*, (\d{2}), .*$') FROM
> police_log;

> But return values are all NULL. Can anyone give me some RE help, please?

Could you use Perl?  A Perl regexp for that would be

/^(.+), (\d+), of (.+?),? (was charged.+)$/

Re: SUBSTRING for a regular expression

From
Date:
>
> SELECT
>   SUBSTRING(description FROM '(.*), \\d{1,3},') AS vname,
>   SUBSTRING(description FROM '.*, (\\d{1,3}),') AS age,
>   SUBSTRING(description FROM '\\d{1,3}, of (.*?),? was charged') AS
> address, SUBSTRING(description FROM ' was charged ([^ ]+)') AS dow,
>   SUBSTRING(description FROM ' was charged [^ ]+ with (.+)') AS charge
> FROM police_log;

Aha! The old double-slash escape. Thank you very much.

--Berend Tober




Re: SUBSTRING for a regular expression

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


SELECT
  SUBSTRING(description FROM '(.*), \\d{1,3},') AS vname,
  SUBSTRING(description FROM '.*, (\\d{1,3}),') AS age,
  SUBSTRING(description FROM '\\d{1,3}, of (.*?),? was charged')
    AS address,
  SUBSTRING(description FROM ' was charged ([^ ]+)') AS dow,
  SUBSTRING(description FROM ' was charged [^ ]+ with (.+)') AS charge
FROM police_log;

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200407062103
-----BEGIN PGP SIGNATURE-----

iD8DBQFA60wAvJuQZxSWSsgRAuKPAJ0QAeG0hdoJ/Ofqq/lXVtwMjyzQjACgoer3
kwPy0xvRiZxwr3cgPq6Rjwc=
=mF/C
-----END PGP SIGNATURE-----



Re: SUBSTRING for a regular expression

From
Date:
> SELECT
>   SUBSTRING(description FROM '(.*), \\d{1,3},') AS vname,
>   SUBSTRING(description FROM '.*, (\\d{1,3}),') AS age,
>   SUBSTRING(description FROM '\\d{1,3}, of (.*?),? was charged')
>     AS address,
>   SUBSTRING(description FROM ' was charged ([^ ]+)') AS dow,
>   SUBSTRING(description FROM ' was charged [^ ]+ with (.+)') AS charge
> FROM police_log;
>
> - --
> Greg Sabino Mullane greg@turnstep.com
> PGP Key: 0x14964AC8 200407062103

Thanks Greg,

I was hopeful that this would work, since I had missed the need to double
the back-slash escape character in my original work, but something still
isn't right.

First I got an error message that psql didn't like the "?" characters in
the RE, so I eliminated them and wrote

SELECT
SUBSTRING(description FROM '(.*), \\d{1,3},') AS vname,
SUBSTRING(description FROM '.*, (\\d{1,3}),') AS age,
SUBSTRING(description FROM '\\d{1,3}, of (.*), was charged') AS address,
SUBSTRING(description FROM ' was charged ([^ ]+)') AS dow,
SUBSTRING(description FROM ' was charged [^ ]+ with (.+)') AS charge
FROM police_log;


This modified version of your suggestion gets the dow, and charge columns
right, but I'm still not seeing the vname, age, and address columns --
they return null. You've gotten me part way there, and I appreciate that.
Any further ideas?

--Berend Tober




Re: SUBSTRING for a regular expression

From
Date:
>
>> I was hopeful that this would work, since I had missed the need to
>> double the back-slash escape character in my original work, but
>> something still isn't right.
>>
>> First I got an error message that psql didn't like the "?" characters
>> in the RE, so I eliminated them and wrote
>
>
> Sounds like you may be using an older version of Pg than me. What
> version are you running? Only that can tell us how limiting your regex
> capabilitites are....

Not that old (I think)!

crime=# select version();
version
--------------------------------------------------------------------------------
-----------------------------------------------
PostgreSQL 7.3.4 on i386-redhat-linux-gnu, compiled by GCC
i386-redhat-linux-gc c (GCC) 3.2.2 20030222 (Red Hat Linux 3.2.2-5)
(1 row)


--Berend Tober



Re: SUBSTRING for a regular expression

From
Tom Lane
Date:
<btober@computer.org> writes:
>> Sounds like you may be using an older version of Pg than me. What
>> version are you running? Only that can tell us how limiting your regex
>> capabilitites are....

> Not that old (I think)!

> PostgreSQL 7.3.4 on i386-redhat-linux-gnu, compiled by GCC
> i386-redhat-linux-gc c (GCC) 3.2.2 20030222 (Red Hat Linux 3.2.2-5)

Nonetheless too old --- 7.4 contains an entirely new regex package.
The one in 7.3 is quite brain-dead compared to any modern regexes...

            regards, tom lane