Thread: back references using regex

back references using regex

From
Matthew Peter
Date:
Hi.

I'm trying to do a slice directly from a table so I
can get a brief preview of the articles content by
counting \s (spaces), not new paragraphs.

Anyone know how it could be done using regular
expressions natively? I read the doc but it didn't
help me much.

Many thanks.
MP




______________________________________________________
Click here to donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/

Re: back references using regex

From
Michael Fuhr
Date:
On Tue, Sep 06, 2005 at 11:40:18PM -0700, Matthew Peter wrote:
> I'm trying to do a slice directly from a table so I
> can get a brief preview of the articles content by
> counting \s (spaces), not new paragraphs.

Are you trying to extract the first N words from a string?  If
that's not what you mean then please clarify.

> Anyone know how it could be done using regular
> expressions natively? I read the doc but it didn't
> help me much.

Regular expressions aren't the only way to solve the problem, but
maybe the following example will give you some ideas:

CREATE TABLE article (id integer, content text);

INSERT INTO article VALUES (1, 'one');
INSERT INTO article VALUES (2, 'one two');
INSERT INTO article VALUES (3, 'one two three');
INSERT INTO article VALUES (4, 'one two three four');
INSERT INTO article VALUES (5, 'one two three four five');
INSERT INTO article VALUES (6, 'one two three four five six');

SELECT id, substring(content FROM '(([^[:space:]]+[[:space:]]*){1,3})')
FROM article;
 id |   substring
----+----------------
  1 | one
  2 | one two
  3 | one two three
  4 | one two three
  5 | one two three
  6 | one two three
(6 rows)

In PostgreSQL 7.4 and later you could shorten the regular expression:

SELECT id, substring(content FROM '((\\S+\\s*){1,3})')
FROM article;

If this example isn't what you're looking for then please explain
what you're trying to do.

--
Michael Fuhr

Re: back references using regex

From
Matthew Peter
Date:
Thanks. I'll check it out asap. I didn't realize the
regex expressions needed to be escaped for it to be a
valid expression. I thought it was the other way
around. Would it be possible to choose what paragraph
to use in a summary? I'll try to clarify... Let's say
I have 14 paragraphs of lorem lipsum text. Let's say I
want to show the 3rd paragraph... Could I use a regex
to search the content and return that 3rd paragraph to
use as a summary of that article?

--- Michael Fuhr <mike@fuhr.org> wrote:

> On Tue, Sep 06, 2005 at 11:40:18PM -0700, Matthew
> Peter wrote:
> > I'm trying to do a slice directly from a table so
> I
> > can get a brief preview of the articles content by
> > counting \s (spaces), not new paragraphs.
>
> Are you trying to extract the first N words from a
> string?  If
> that's not what you mean then please clarify.
>
> > Anyone know how it could be done using regular
> > expressions natively? I read the doc but it didn't
> > help me much.
>
> Regular expressions aren't the only way to solve the
> problem, but
> maybe the following example will give you some
> ideas:
>
> CREATE TABLE article (id integer, content text);
>
> INSERT INTO article VALUES (1, 'one');
> INSERT INTO article VALUES (2, 'one two');
> INSERT INTO article VALUES (3, 'one two three');
> INSERT INTO article VALUES (4, 'one two three
> four');
> INSERT INTO article VALUES (5, 'one two three four
> five');
> INSERT INTO article VALUES (6, 'one two three four
> five six');
>
> SELECT id, substring(content FROM
> '(([^[:space:]]+[[:space:]]*){1,3})')
> FROM article;
>  id |   substring
> ----+----------------
>   1 | one
>   2 | one two
>   3 | one two three
>   4 | one two three
>   5 | one two three
>   6 | one two three
> (6 rows)
>
> In PostgreSQL 7.4 and later you could shorten the
> regular expression:
>
> SELECT id, substring(content FROM
> '((\\S+\\s*){1,3})')
> FROM article;
>
> If this example isn't what you're looking for then
> please explain
> what you're trying to do.
>
> --
> Michael Fuhr
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: back references using regex

From
Michael Fuhr
Date:
On Wed, Sep 07, 2005 at 05:26:07PM -0700, Matthew Peter wrote:
> Thanks. I'll check it out asap. I didn't realize the
> regex expressions needed to be escaped for it to be a
> valid expression.

If you use ordinary quotes (') around the regular expression then
you have to escape the backslashes because there's an extra level
of string parsing that you're probably unaccustomed to.  If you use
dollar quotes (available since 8.0) then you don't need the extra
escapes:

SELECT id, substring(content FROM $$((\S+\s*){1,3})$$) FROM article;

> Would it be possible to choose what paragraph to use in a summary?

You might be able to use split_part().  For example, if paragraphs
are separated by pairs of newline (\n) characters, then the following
should return each article's third paragraph:

SELECT id, split_part(content, '\n\n', 3) FROM article;

See "String Functions and Operators" and "Pattern Matching" in the
documentation for more information.  If you need to get fancy then
consider writing a function in a language like PL/Perl.

--
Michael Fuhr

Re: back references using regex

From
Michael Fuhr
Date:
[Please copy the mailing list on replies so others can participate
in and learn from the discussion.]

On Wed, Sep 07, 2005 at 10:40:22PM -0700, Matthew Peter wrote:
> I did read the docs ;)  I always do. The question I
> really wanted answered is how to reference the back
> references in my regular expressions parentheses. Like
> the 2nd position or 4th from a group. Like \2 or $2.
> Can I do this in postgres in the query?

Are you looking for something like this?

SELECT substring('abc.foo.foo.xyz' FROM '(([[:alpha:]]+)\\.\\2)');
 substring
-----------
 foo.foo
(1 row)

That is, one or more alphabetic characters followed by a dot followed
by the same set of characters (this is a simplistic example: it would
also match 'foo.oog' and return 'oo.oo').

Note that the back reference is \2 because it refers to the inner
set of parentheses (i.e., the subexpression with the second opening
parenthesis); the outer set is used here for capturing.  And again,
note the escaped backslashes because we're using ordinary quotes.
With dollar quotes the above query would be:

SELECT substring('abc.foo.foo.xyz' FROM $$(([[:alpha:]]+)\.\2)$$);

--
Michael Fuhr

Re: back references using regex

From
Matthew Peter
Date:
That doesn't seem to work with digits

SELECT substring('12.00.00.34' FROM $$((\d+)\.\2)$$);
 or
SELECT substring('12.00.00.34' FROM $$((\w+)\.\2)$$);

but works with strings

SELECT substring('abc.foo.foo.xyz' FROM
$$((\w+)\.\2)$$);

What I basically want to do is have a slice function
like Python, where I can slice out items from a \s, \.
or \n\n separated list.

Where I could start it at a certain point and end it
at another. Like slicing out paragraph 3-6 (delimiter
\n\n) or the 2nd-6th sentence in a article (delimiter
\.). That is what I am trying to do. I know if I can
figure how to get that working I can figure out how to
deal with extending it/handling misc. characters/etc
in pgsql. I only need a working vanilla example.
Thanks for all you help.


--- Michael Fuhr <mike@fuhr.org> wrote:
> SELECT substring('abc.foo.foo.xyz' FROM
> '(([[:alpha:]]+)\\.\\2)');
>  substring
> -----------
>  foo.foo
> (1 row)
>
> That is, one or more alphabetic characters followed
> by a dot followed
> by the same set of characters (this is a simplistic
> example: it would
> also match 'foo.oog' and return 'oo.oo').
>
> Note that the back reference is \2 because it refers
> to the inner
> set of parentheses (i.e., the subexpression with the
> second opening
> parenthesis); the outer set is used here for
> capturing.  And again,
> note the escaped backslashes because we're using
> ordinary quotes.
> With dollar quotes the above query would be:
>
> SELECT substring('abc.foo.foo.xyz' FROM
> $$(([[:alpha:]]+)\.\2)$$);
>
> --
> Michael Fuhr
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: back references using regex

From
Peter Fein
Date:
Matthew Peter wrote:
> That doesn't seem to work with digits
>
> SELECT substring('12.00.00.34' FROM $$((\d+)\.\2)$$);
>  or
> SELECT substring('12.00.00.34' FROM $$((\w+)\.\2)$$);
>
> but works with strings
>
> SELECT substring('abc.foo.foo.xyz' FROM
> $$((\w+)\.\2)$$);
>
> What I basically want to do is have a slice function
> like Python, where I can slice out items from a \s, \.
> or \n\n separated list.

You could always just write it in pl/python...

--
Peter Fein                 pfein@pobox.com                 773-575-0694

Basically, if you're not a utopianist, you're a schmuck. -J. Feldman

Re: back references using regex

From
Matthew Peter
Date:
Ya, but I'd have to recompile to get python in. Plus,
I don't want to use Python. I want to use and learn
more pgsql. Keep things clean and lean if possible...
I just got a postgres book yesterday for additional
reading which it only had 2 pages on regex's in the
index :(

--- Peter Fein <pfein@pobox.com> wrote:

> Matthew Peter wrote:
> > That doesn't seem to work with digits
> >
> > SELECT substring('12.00.00.34' FROM
> $$((\d+)\.\2)$$);
> >  or
> > SELECT substring('12.00.00.34' FROM
> $$((\w+)\.\2)$$);
> >
> > but works with strings
> >
> > SELECT substring('abc.foo.foo.xyz' FROM
> > $$((\w+)\.\2)$$);
> >
> > What I basically want to do is have a slice
> function
> > like Python, where I can slice out items from a
> \s, \.
> > or \n\n separated list.
>
> You could always just write it in pl/python...
>
> --
> Peter Fein                 pfein@pobox.com
>       773-575-0694
>
> Basically, if you're not a utopianist, you're a
> schmuck. -J. Feldman
>





______________________________________________________
Click here to donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/

Re: back references using regex

From
Michael Fuhr
Date:
On Thu, Sep 08, 2005 at 12:45:40PM -0700, Matthew Peter wrote:
> Ya, but I'd have to recompile to get python in.

Recompiling to add support for another procedural language is a
one-time operation and it's easy to do, so that's not a good argument.

> Plus, I don't want to use Python. I want to use and learn
> more pgsql.  Keep things clean and lean if possible...

"Clean and lean" suggests using the right tool for the job.  Languages
like Perl and Python are better at string manipulation than PL/pgSQL,
and one of PostgreSQL's strengths is that it allows you to write
server-side functions in those languages.  Exploit such strengths
when it makes sense.

> I just got a postgres book yesterday for additional
> reading which it only had 2 pages on regex's in the
> index :(

Regular expressions aren't specific to PostgreSQL; there's ample
material covering them elsewhere.  See for example _Mastering
Regular Expressions_ by Jeffrey Friedl.  I'm sure a search engine
would yield many free tutorials on the subject.

--
Michael Fuhr

Re: back references using regex

From
Matthew Peter
Date:
I knew I should never have said Python. I know regular
expressions, just not how postgresql handles them. The
fact of the matter is I don't want to use Python, it
was an example of the functionality I'm interested in
accomplishing with pgsql. Plus, I would like to use
other regex's once I figure out how they are used.

I only need a regular expression in the substring of a
where cluase. Not entire language support for a single
function.

It's not a complex regex as I have wrote one that does
what I want, yet not at the database level. The docs
didn't help clarify anything. I'm still not clear on
how it determines where the back reference comes from
in the previous example you gave. And why digits
wouldn't work.

I would like a basic example that accomplishes what
I'm trying to do if at all possible?


My original message/problem...

What I basically want to do is have a slice function
like Python, where I can slice out items from a \s, \.
or \n\n separated list. Where I'll just change the
delimiter for the query that it applies.

Where I could start it at a certain point and end it
at another. Like slicing out paragraph 3-6 (delimiter
\n\n) or the 2nd-6th sentence in a article (delimiter
\.). That is what I am trying to do. I know if I can
figure how to get that working I can figure out how to
deal with extending it/handling misc. characters/etc
in pgsql. I only need a working vanilla example.
Thanks for all your help.

--- Michael Fuhr <mike@fuhr.org> wrote:

> On Thu, Sep 08, 2005 at 12:45:40PM -0700, Matthew
> Peter wrote:
> > Ya, but I'd have to recompile to get python in.
>
> Recompiling to add support for another procedural
> language is a
> one-time operation and it's easy to do, so that's
> not a good argument.
>
> > Plus, I don't want to use Python. I want to use
> and learn
> > more pgsql.  Keep things clean and lean if
> possible...
>
> "Clean and lean" suggests using the right tool for
> the job.  Languages
> like Perl and Python are better at string
> manipulation than PL/pgSQL,
> and one of PostgreSQL's strengths is that it allows
> you to write
> server-side functions in those languages.  Exploit
> such strengths
> when it makes sense.
>
> > I just got a postgres book yesterday for
> additional
> > reading which it only had 2 pages on regex's in
> the
> > index :(
>
> Regular expressions aren't specific to PostgreSQL;
> there's ample
> material covering them elsewhere.  See for example
> _Mastering
> Regular Expressions_ by Jeffrey Friedl.  I'm sure a
> search engine
> would yield many free tutorials on the subject.
>
> --
> Michael Fuhr
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: back references using regex

From
Alvaro Herrera
Date:
On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew Peter wrote:

> What I basically want to do is have a slice function
> like Python, where I can slice out items from a \s, \.
> or \n\n separated list. Where I'll just change the
> delimiter for the query that it applies.

There is a function for some sort of text slicing, though I'm not sure
if it does what you want.  It's called split_part().  Have a look at the
documentation.

--
Alvaro Herrera -- Valdivia, Chile         Architect, www.EnterpriseDB.com
"Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)

Re: back references using regex

From
Michael Fuhr
Date:
On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew Peter wrote:
> It's not a complex regex as I have wrote one that does
> what I want, yet not at the database level. The docs
> didn't help clarify anything. I'm still not clear on
> how it determines where the back reference comes from
> in the previous example you gave. And why digits
> wouldn't work.

Back references work as they usually do in regular expressions:
they refer to the matched value of a previous parenthesized
subexpression.  If you have multiple open parentheses then you
need to refer to the correct subexpression to get what you want.
Example:

CREATE TABLE foo (t text);
INSERT INTO foo VALUES ('abc.foo.foo.xyz');
INSERT INTO foo VALUES ('12.00.00.34');
INSERT INTO foo VALUES ('abc.def.ghi');

SELECT t FROM foo WHERE t ~ $$(\w+)\.\1$$;
        t
-----------------
 abc.foo.foo.xyz
 12.00.00.34
(2 rows)

In the above query the regular expression has only one set of
parentheses, so the back reference refers to \1.  The result set
contains the two rows that have one or more word characters followed
by a dot followed by the same set of characters.

In the following query, note the difference between the regular
expression in the select list and the one in the where clause:

SELECT substring(t FROM $$((\w+)\.\2)$$)
FROM foo
WHERE t ~ $$(\w+)\.\1$$;
 substring
-----------
 foo.foo
 00.00
(2 rows)

In the regular expression in the select list, we use the outermost
set of parentheses for grouping, so the back reference needs to
refer to the subexpression that begins with the second open
parenthesis (i.e., we must use \2).  In the regular expression
in the where clause, we have only one set of parentheses so the
back reference is \1.

Regarding digits, you didn't post any output in your example, so
we don't know if it really doesn't work or if it just doesn't do
what you were expecting.  Here's what I get from your examples:

SELECT substring('12.00.00.34' FROM $$((\d+)\.\2)$$);
 substring
-----------
 00.00
(1 row)

SELECT substring('12.00.00.34' FROM $$((\w+)\.\2)$$);
 substring
-----------
 00.00
(1 row)

SELECT substring('abc.foo.foo.xyz' FROM $$((\w+)\.\2)$$);
 substring
-----------
 foo.foo
(1 row)

Do you get different results, or do these results surprise you?
They all appear to be correct.

> What I basically want to do is have a slice function
> like Python, where I can slice out items from a \s, \.
> or \n\n separated list. Where I'll just change the
> delimiter for the query that it applies.
>
> Where I could start it at a certain point and end it
> at another. Like slicing out paragraph 3-6 (delimiter
> \n\n) or the 2nd-6th sentence in a article (delimiter
> \.). That is what I am trying to do.

You can use split_part() to get a single item or string_to_array()
to build an array from which you can extract multiple items.

CREATE TABLE foo (t text);
INSERT INTO foo VALUES ('one.two.three.four.five.six.');

SELECT (string_to_array(t, '.'))[3:5] FROM foo;
  string_to_array
-------------------
 {three,four,five}
(1 row)

SELECT array_to_string((string_to_array(t, '.'))[3:5], '.') FROM foo;
 array_to_string
-----------------
 three.four.five
(1 row)

Is that what you're looking for?

--
Michael Fuhr

Re: back references using regex

From
Matthew Peter
Date:
Thank you for your patience and such a complete
answer. I'm not on the pgbox right now but those
examples did help clarify how to reference the back
references, which was my problem.

I wasn't aware the 1st parenthesis must be counted as
part of the regex, I assumed it was a wrapper. Thanks
for helping me out and putting up with me. :)

matt

--- Michael Fuhr <mike@fuhr.org> wrote:

> On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew
> Peter wrote:
> > It's not a complex regex as I have wrote one that
> does
> > what I want, yet not at the database level. The
> docs
> > didn't help clarify anything. I'm still not clear
> on
> > how it determines where the back reference comes
> from
> > in the previous example you gave. And why digits
> > wouldn't work.
>
> Back references work as they usually do in regular
> expressions:
> they refer to the matched value of a previous
> parenthesized
> subexpression.  If you have multiple open
> parentheses then you
> need to refer to the correct subexpression to get
> what you want.
> Example:
>
> CREATE TABLE foo (t text);
> INSERT INTO foo VALUES ('abc.foo.foo.xyz');
> INSERT INTO foo VALUES ('12.00.00.34');
> INSERT INTO foo VALUES ('abc.def.ghi');
>
> SELECT t FROM foo WHERE t ~ $$(\w+)\.\1$$;
>         t
> -----------------
>  abc.foo.foo.xyz
>  12.00.00.34
> (2 rows)
>
> In the above query the regular expression has only
> one set of
> parentheses, so the back reference refers to \1.
> The result set
> contains the two rows that have one or more word
> characters followed
> by a dot followed by the same set of characters.
>
> In the following query, note the difference between
> the regular
> expression in the select list and the one in the
> where clause:
>
> SELECT substring(t FROM $$((\w+)\.\2)$$)
> FROM foo
> WHERE t ~ $$(\w+)\.\1$$;
>  substring
> -----------
>  foo.foo
>  00.00
> (2 rows)
>
> In the regular expression in the select list, we use
> the outermost
> set of parentheses for grouping, so the back
> reference needs to
> refer to the subexpression that begins with the
> second open
> parenthesis (i.e., we must use \2).  In the regular
> expression
> in the where clause, we have only one set of
> parentheses so the
> back reference is \1.
>
> Regarding digits, you didn't post any output in your
> example, so
> we don't know if it really doesn't work or if it
> just doesn't do
> what you were expecting.  Here's what I get from
> your examples:
>
> SELECT substring('12.00.00.34' FROM
> $$((\d+)\.\2)$$);
>  substring
> -----------
>  00.00
> (1 row)
>
> SELECT substring('12.00.00.34' FROM
> $$((\w+)\.\2)$$);
>  substring
> -----------
>  00.00
> (1 row)
>
> SELECT substring('abc.foo.foo.xyz' FROM
> $$((\w+)\.\2)$$);
>  substring
> -----------
>  foo.foo
> (1 row)
>
> Do you get different results, or do these results
> surprise you?
> They all appear to be correct.
>
> > What I basically want to do is have a slice
> function
> > like Python, where I can slice out items from a
> \s, \.
> > or \n\n separated list. Where I'll just change the
> > delimiter for the query that it applies.
> >
> > Where I could start it at a certain point and end
> it
> > at another. Like slicing out paragraph 3-6
> (delimiter
> > \n\n) or the 2nd-6th sentence in a article
> (delimiter
> > \.). That is what I am trying to do.
>
> You can use split_part() to get a single item or
> string_to_array()
> to build an array from which you can extract
> multiple items.
>
> CREATE TABLE foo (t text);
> INSERT INTO foo VALUES
> ('one.two.three.four.five.six.');
>
> SELECT (string_to_array(t, '.'))[3:5] FROM foo;
>   string_to_array
> -------------------
>  {three,four,five}
> (1 row)
>
> SELECT array_to_string((string_to_array(t,
> '.'))[3:5], '.') FROM foo;
>  array_to_string
> -----------------
>  three.four.five
> (1 row)
>
> Is that what you're looking for?
>
> --
> Michael Fuhr
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Re: back references using regex

From
Matthew Peter
Date:
One other thing, when I wrote back I actually used
34.31.29.20 (random), not 12.00.00.34 like i showed in
the example, which is why i said it didn't work on
digits.

SELECT substring('34.31.29.20' FROM $$((\w+)\.\2)$$);
  substring
 -----------

 (1 row)

little did i know writing it with 12.00.00.34 would
return 00.00... so yes, that did suprise me.
Apparently only using the identical values returns a
value. so it's saying x+ one more of the same value
separated by a period... where shouldn't it be any
"letter, number or underscore" followed by any
"letter, number or underscore"?



--- Michael Fuhr <mike@fuhr.org> wrote:

> On Thu, Sep 08, 2005 at 01:52:35PM -0700, Matthew
> Peter wrote:
> > It's not a complex regex as I have wrote one that
> does
> > what I want, yet not at the database level. The
> docs
> > didn't help clarify anything. I'm still not clear
> on
> > how it determines where the back reference comes
> from
> > in the previous example you gave. And why digits
> > wouldn't work.
>
> Back references work as they usually do in regular
> expressions:
> they refer to the matched value of a previous
> parenthesized
> subexpression.  If you have multiple open
> parentheses then you
> need to refer to the correct subexpression to get
> what you want.
> Example:
>
> CREATE TABLE foo (t text);
> INSERT INTO foo VALUES ('abc.foo.foo.xyz');
> INSERT INTO foo VALUES ('12.00.00.34');
> INSERT INTO foo VALUES ('abc.def.ghi');
>
> SELECT t FROM foo WHERE t ~ $$(\w+)\.\1$$;
>         t
> -----------------
>  abc.foo.foo.xyz
>  12.00.00.34
> (2 rows)
>
> In the above query the regular expression has only
> one set of
> parentheses, so the back reference refers to \1.
> The result set
> contains the two rows that have one or more word
> characters followed
> by a dot followed by the same set of characters.
>
> In the following query, note the difference between
> the regular
> expression in the select list and the one in the
> where clause:
>
> SELECT substring(t FROM $$((\w+)\.\2)$$)
> FROM foo
> WHERE t ~ $$(\w+)\.\1$$;
>  substring
> -----------
>  foo.foo
>  00.00
> (2 rows)
>
> In the regular expression in the select list, we use
> the outermost
> set of parentheses for grouping, so the back
> reference needs to
> refer to the subexpression that begins with the
> second open
> parenthesis (i.e., we must use \2).  In the regular
> expression
> in the where clause, we have only one set of
> parentheses so the
> back reference is \1.
>
> Regarding digits, you didn't post any output in your
> example, so
> we don't know if it really doesn't work or if it
> just doesn't do
> what you were expecting.  Here's what I get from
> your examples:
>
> SELECT substring('12.00.00.34' FROM
> $$((\d+)\.\2)$$);
>  substring
> -----------
>  00.00
> (1 row)
>
> SELECT substring('12.00.00.34' FROM
> $$((\w+)\.\2)$$);
>  substring
> -----------
>  00.00
> (1 row)
>
> SELECT substring('abc.foo.foo.xyz' FROM
> $$((\w+)\.\2)$$);
>  substring
> -----------
>  foo.foo
> (1 row)
>
> Do you get different results, or do these results
> surprise you?
> They all appear to be correct.
>
> > What I basically want to do is have a slice
> function
> > like Python, where I can slice out items from a
> \s, \.
> > or \n\n separated list. Where I'll just change the
> > delimiter for the query that it applies.
> >
> > Where I could start it at a certain point and end
> it
> > at another. Like slicing out paragraph 3-6
> (delimiter
> > \n\n) or the 2nd-6th sentence in a article
> (delimiter
> > \.). That is what I am trying to do.
>
> You can use split_part() to get a single item or
> string_to_array()
> to build an array from which you can extract
> multiple items.
>
> CREATE TABLE foo (t text);
> INSERT INTO foo VALUES
> ('one.two.three.four.five.six.');
>
> SELECT (string_to_array(t, '.'))[3:5] FROM foo;
>   string_to_array
> -------------------
>  {three,four,five}
> (1 row)
>
> SELECT array_to_string((string_to_array(t,
> '.'))[3:5], '.') FROM foo;
>  array_to_string
> -----------------
>  three.four.five
> (1 row)
>
> Is that what you're looking for?
>
> --
> Michael Fuhr
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>





______________________________________________________
Click here to donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/

Re: back references using regex

From
Douglas McNaught
Date:
Matthew Peter <survivedsushi@yahoo.com> writes:

> One other thing, when I wrote back I actually used
> 34.31.29.20 (random), not 12.00.00.34 like i showed in
> the example, which is why i said it didn't work on
> digits.
>
> SELECT substring('34.31.29.20' FROM $$((\w+)\.\2)$$);
>   substring
>  -----------
>
>  (1 row)
>
> little did i know writing it with 12.00.00.34 would
> return 00.00... so yes, that did suprise me.
> Apparently only using the identical values returns a
> value. so it's saying x+ one more of the same value
> separated by a period... where shouldn't it be any
> "letter, number or underscore" followed by any
> "letter, number or underscore"?

Backreferences match the exact string matched by the corresponding set
of parentheses.  It's not the equivalent of substituting in the
parenthesized part of the regex and testing that for a match.  The
behavior above is as expected.  If you want it as "any followed by
any" you shold write the regex as '((\w+)\.(\w+))' -- then the two
parts can differ.

-Doug

Re: back references using regex

From
Michael Fuhr
Date:
On Fri, Sep 09, 2005 at 10:11:46PM -0700, Matthew Peter wrote:
> One other thing, when I wrote back I actually used
> 34.31.29.20 (random), not 12.00.00.34 like i showed in
> the example, which is why i said it didn't work on
> digits.

When posting examples, please post something you actually tried so
other people can see exactly what you're doing and attempt to
duplicate your results.  Little differences sometimes matter, as
this case illustrates.

> SELECT substring('34.31.29.20' FROM $$((\w+)\.\2)$$);
>   substring
>  -----------
>
>  (1 row)
>
> little did i know writing it with 12.00.00.34 would
> return 00.00... so yes, that did suprise me.
> Apparently only using the identical values returns a
> value.

That's what a back reference is: it means "match the same value you
matched before," not just anything that matches the same regular
expression.

Usenet junkies might find it amusing to use back references to
search their new server's list of newsgroups:

egrep '([^.]+)\.\1\.\1' newsgroups

--
Michael Fuhr

Re: back references using regex

From
Matthew Peter
Date:
ahhhhhhhhh I swear I never came across any of these
gems of information in the docs. It was these subtle
differences that were throwing me.

I didn't originally catch that regex's were based on
grep/sed/awk syntax which I haven't studied throughly
yet. I've only used some basic operations in bash
scripts. I'll read up more on those. Thanks.

MP


--- Douglas McNaught <doug@mcnaught.org> wrote:

> Matthew Peter <survivedsushi@yahoo.com> writes:
>
> > One other thing, when I wrote back I actually used
> > 34.31.29.20 (random), not 12.00.00.34 like i
> showed in
> > the example, which is why i said it didn't work on
> > digits.
> >
> > SELECT substring('34.31.29.20' FROM
> $$((\w+)\.\2)$$);
> >   substring
> >  -----------
> >
> >  (1 row)
> >
> > little did i know writing it with 12.00.00.34
> would
> > return 00.00... so yes, that did suprise me.
> > Apparently only using the identical values returns
> a
> > value. so it's saying x+ one more of the same
> value
> > separated by a period... where shouldn't it be any
> > "letter, number or underscore" followed by any
> > "letter, number or underscore"?
>
> Backreferences match the exact string matched by the
> corresponding set
> of parentheses.  It's not the equivalent of
> substituting in the
> parenthesized part of the regex and testing that for
> a match.  The
> behavior above is as expected.  If you want it as
> "any followed by
> any" you shold write the regex as '((\w+)\.(\w+))'
> -- then the two
> parts can differ.
>
> -Doug
>





______________________________________________________
Click here to donate to the Hurricane Katrina relief effort.
http://store.yahoo.com/redcross-donate3/

Re: back references using regex

From
Michael Fuhr
Date:
On Sat, Sep 10, 2005 at 10:27:13AM -0700, Matthew Peter wrote:
> ahhhhhhhhh I swear I never came across any of these
> gems of information in the docs. It was these subtle
> differences that were throwing me.

From "Regular Expression Escapes" in the "Pattern Matching" section
of the manual:

  A back reference (\n) matches the same string matched by the
  previous parenthesized subexpression specified by the number n
  (see Table 9-18).  For example, ([bc])\1 matches bb or cc but not
  bc or cb.  The subexpression must entirely precede the back
  reference in the RE.  Subexpressions are numbered in the order
  of their leading parentheses.  Non-capturing parentheses do not
  define subexpressions.

http://www.postgresql.org/docs/8.0/static/functions-matching.html#POSIX-ESCAPE-SEQUENCES

--
Michael Fuhr

Re: back references using regex

From
Matthew Peter
Date:
How about this then, I didn't retain that information
from the doc. ;) I sometimes glaze over important gems
every now and then. It happens. I'm not a robot, yet.
At least I know the answer to my question is now
retained. You were a big help too. Thank you very
much. I appreciate it.

Speaking of data manipulation in a table... I was
thinking about storing and manipulating a list in a
column...

Is it possible to append and delete (unknown location)
items in a list? Or is another way more efficient? I'm
relatively new so sorry if I'm asking too many
questions. If possible, I will read about it if
there's docs for it so I don't trouble anyone with my
questions. It would be sweet to do that at the
database level.

Thanks again,
MP
Posgresql convert

--- Michael Fuhr <mike@fuhr.org> wrote:

> On Sat, Sep 10, 2005 at 10:27:13AM -0700, Matthew
> Peter wrote:
> > ahhhhhhhhh I swear I never came across any of
> these
> > gems of information in the docs. It was these
> subtle
> > differences that were throwing me.
>
> From "Regular Expression Escapes" in the "Pattern
> Matching" section
> of the manual:
>
>   A back reference (\n) matches the same string
> matched by the
>   previous parenthesized subexpression specified by
> the number n
>   (see Table 9-18).  For example, ([bc])\1 matches
> bb or cc but not
>   bc or cb.  The subexpression must entirely precede
> the back
>   reference in the RE.  Subexpressions are numbered
> in the order
>   of their leading parentheses.  Non-capturing
> parentheses do not
>   define subexpressions.
>
>
http://www.postgresql.org/docs/8.0/static/functions-matching.html#POSIX-ESCAPE-SEQUENCES
>
> --
> Michael Fuhr
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please
> send an appropriate
>        subscribe-nomail command to
> majordomo@postgresql.org so that your
>        message can get through to the mailing list
> cleanly
>





______________________________________________________
Yahoo! for Good
Watch the Hurricane Katrina Shelter From The Storm concert
http://advision.webevents.yahoo.com/shelter


Re: back references using regex

From
Michael Fuhr
Date:
On Sun, Sep 11, 2005 at 12:30:59AM -0700, Matthew Peter wrote:
> Speaking of data manipulation in a table... I was
> thinking about storing and manipulating a list in a
> column...

Please ask new questions in a new thread with a Subject header
related to the new topic.  People who might be interested in following
the discussion, either to learn from it or to provide answers, might
miss it if the new topic hijacks an old thread.

--
Michael Fuhr

Building postgres on Suze

From
Christian Goetze
Date:
I'm trying to build postgres on Suze with --with-pam, and it tells me:

/usr/lib/gcc-lib/i586-suse-linux/3.3.5/../../../../i586-suse-linux/bin/ld:
cannot find -lpam

I know it is actually installed, and disecting the configure script and
hand-compiling the test program works if I say "/lib/libpam.so.0", but
fails if I say "-lpam". Very bizarre - anyone encountered this?

I realize this is probably a suze issue and not a postgres issue, but I
hope someone here has seen this problem

Thanks in advance!
--
cg


Re: Building postgres on Suze

From
Thomas Pundt
Date:
Hi,

first off - what annoyes me in the PostgreSQL mailing lists is that
posters regularly break threads. Why can't you create a new mail for
a new topic without replying to another mail with a totally different
topic?

On Monday 12 September 2005 23:51, Christian Goetze wrote:
| I'm trying to build postgres on Suze with --with-pam, and it tells me:

I don't want to sound pedantic, but it's called SuSE :-)

| /usr/lib/gcc-lib/i586-suse-linux/3.3.5/../../../../i586-suse-linux/bin/ld:
| cannot find -lpam
|
| I know it is actually installed, and disecting the configure script and
| hand-compiling the test program works if I say "/lib/libpam.so.0", but
| fails if I say "-lpam". Very bizarre - anyone encountered this?

Nothing bizarre here - you most probably have to install the pam-devel
package.

Ciao,
Thomas

--
Thomas Pundt <thomas.pundt@rp-online.de> ---- http://rp-online.de/ ----