Thread: Psql regex is NFA or DFA?

Psql regex is NFA or DFA?

From
Josh Jore
Date:
So I've finished reading Jeffery Friedl's _Mastering Regular Expressions_
and while I don't need regex in PostgreSQL I know I'll do it for something
- eventually. The book makes a distinction between DFA, POSIX NFA and
Traditional NFA and then ascribes some properties and behaviours to each.
So what sort does PostgreSQL have?

Joshua b. Jore -{ weird geeky madness }-> http://www.greentechnologist.org


Re: Psql regex is NFA or DFA?

From
Karel Zak
Date:
On Tue, Sep 10, 2002 at 02:16:51AM -0500, Josh Jore wrote:
> So I've finished reading Jeffery Friedl's _Mastering Regular Expressions_
> and while I don't need regex in PostgreSQL I know I'll do it for something
> - eventually. The book makes a distinction between DFA, POSIX NFA and
> Traditional NFA and then ascribes some properties and behaviours to each.
> So what sort does PostgreSQL have?

 Regex in PostgreSQL code:

    Copyright (c) 1992, 1993, 1994 Henry Spencer.
    Copyright (c) 1992, 1993, 1994
    The Regents of the University of California.  All rights reserved.

 I think it's:

    POSIX 1003.2, section 2.8 (Regular Expression Notation)


--
 Karel Zak  <zakkr@zf.jcu.cz>
 http://home.zf.jcu.cz/~zakkr/

 C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz

Re: Psql regex is NFA or DFA?

From
Tom Lane
Date:
Josh Jore <josh@greentechnologist.org> writes:
> So I've finished reading Jeffery Friedl's _Mastering Regular Expressions_
> and while I don't need regex in PostgreSQL I know I'll do it for something
> - eventually. The book makes a distinction between DFA, POSIX NFA and
> Traditional NFA and then ascribes some properties and behaviours to each.
> So what sort does PostgreSQL have?

Well, you could read the code (src/backend/regex), or you could apply
the tests that Friedl suggests to distinguish the type of an unknown
engine ...

My guess is that it's an NFA, but I dunno if Spencer did the POSIX
semantics or not.

            regards, tom lane

Re: Psql regex is NFA or DFA?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Josh Jore <josh@greentechnologist.org> writes:
> > So I've finished reading Jeffery Friedl's _Mastering Regular Expressions_
> > and while I don't need regex in PostgreSQL I know I'll do it for something
> > - eventually. The book makes a distinction between DFA, POSIX NFA and
> > Traditional NFA and then ascribes some properties and behaviours to each.
> > So what sort does PostgreSQL have?
>
> Well, you could read the code (src/backend/regex), or you could apply
> the tests that Friedl suggests to distinguish the type of an unknown
> engine ...
>
> My guess is that it's an NFA, but I dunno if Spencer did the POSIX
> semantics or not.

I am continuing to talk to Henry about getting a newer version of his
regex library. He is working on it but is not yet finished.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Psql regex is NFA or DFA?

From
Alvaro Herrera
Date:
On Tue, 10 Sep 2002, Bruce Momjian wrote:

> Tom Lane wrote:
> > Josh Jore <josh@greentechnologist.org> writes:
> > > So I've finished reading Jeffery Friedl's _Mastering Regular Expressions_
> > > and while I don't need regex in PostgreSQL I know I'll do it for something
> > > - eventually. The book makes a distinction between DFA, POSIX NFA and
> > > Traditional NFA and then ascribes some properties and behaviours to each.
> > > So what sort does PostgreSQL have?
> >
> > Well, you could read the code (src/backend/regex), or you could apply
> > the tests that Friedl suggests to distinguish the type of an unknown
> > engine ...
> >
> > My guess is that it's an NFA, but I dunno if Spencer did the POSIX
> > semantics or not.
>
> I am continuing to talk to Henry about getting a newer version of his
> regex library. He is working on it but is not yet finished.

Do you have some details about the implementation itself?  I would like
to work on implementing a regex engine using the Shift-And algorithm
families, and that can probably give a performance improvement over the
current engine.

--
Alvaro Herrera (<alvherre[@]dcc.uchile.cl>)


Re: Psql regex is NFA or DFA?

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> > I am continuing to talk to Henry about getting a newer version of his
> > regex library. He is working on it but is not yet finished.
>
> Do you have some details about the implementation itself?  I would like
> to work on implementing a regex engine using the Shift-And algorithm
> families, and that can probably give a performance improvement over the
> current engine.

[ CC'ing Henry Spencer. ]

No, I don't know any details except that his regex library is in the new
tcl code and has to repackaged as a separate library.  Henry, could we
help you finish up your regex stuff?

Henry's regex work is the same code that is in *BSD regex (at least
BSD/OS, FreeBSD, NetBSD), which I have found to be pretty slow in
certain complex cases, so much so that I have moved to super-sed (ssed)
for use on my local machine where I have seen 100x speed improvements
with ssed.

I am not sure how slow our regex really is, but for the sake of
PostgreSQL and of the BSD regex library, I would love to see a newer
version.

In fact, I just had an email exchange with Henry about two weeks ago.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Psql regex is NFA or DFA?

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Henry's regex work is the same code that is in *BSD regex (at least
> BSD/OS, FreeBSD, NetBSD), which I have found to be pretty slow in
> certain complex cases,

You're speaking of his *old* package (the one we currently use), no?

Friedl seems to think that the current Tcl regex engine (Henry's new
code) is the most advanced thing on the planet.

            regards, tom lane

Re: Psql regex is NFA or DFA?

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Henry's regex work is the same code that is in *BSD regex (at least
> > BSD/OS, FreeBSD, NetBSD), which I have found to be pretty slow in
> > certain complex cases,
>
> You're speaking of his *old* package (the one we currently use), no?

Yes, that is the old stuff shipped with BSD 4.4 and included in all the
*BSD releases I have checked.

> Friedl seems to think that the current Tcl regex engine (Henry's new
> code) is the most advanced thing on the planet.

I have not heard that, but it is good to hear.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: Psql regex is NFA or DFA?

From
Alvaro Herrera
Date:
Tom Lane dijo:

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Henry's regex work is the same code that is in *BSD regex (at least
> > BSD/OS, FreeBSD, NetBSD), which I have found to be pretty slow in
> > certain complex cases,
>
> You're speaking of his *old* package (the one we currently use), no?
>
> Friedl seems to think that the current Tcl regex engine (Henry's new
> code) is the most advanced thing on the planet.

Oh, so the TODO item "replace with newer code" is not just vaporware?
Well, I won't try to compete with Spencer's code in that case.

--
Alvaro Herrera (<alvherre[a]atentus.com>)
"Linux transformó mi computadora, de una `máquina para hacer cosas',
en un aparato realmente entretenido, sobre el cual cada día aprendo
algo nuevo" (Jaime Salinas)


Re: Psql regex is NFA or DFA?

From
Tom Lane
Date:
Alvaro Herrera <alvherre@atentus.com> writes:
> Tom Lane dijo:
>> Friedl seems to think that the current Tcl regex engine (Henry's new
>> code) is the most advanced thing on the planet.

> Oh, so the TODO item "replace with newer code" is not just vaporware?
> Well, I won't try to compete with Spencer's code in that case.

The code is certainly not vaporware: Tcl's been using it for awhile.
But the code in a Tcl-independent package that we could easily use
is a different story.  Perhaps you could offer Henry some help in
making it into a clean library package ...

            regards, tom lane