Thread: WITH clause
Kind people, I'm looking to the SQL WITH clause as a way to get better regex support in PostgreSQL. I've been chatting a little bit about this, and here's an idea for a behavior. Implementation details TBD. WITH res = match (x.foo, '([0-9]+)x([0-9]+)') SELECT * FROM x WHERE y = res[2] OR y = res[3]; Here res[1] would be the whole pattern match, res[2] & res[3] would be the set of digits before x and the set of digits after x, respectively. Ideally, there could also be (optionally) res[pre] and res[post] which would refer to the stuff in foo that preceds the match and stuff in foo that follows the match, respectively. Perl weenies may think of $&, $1, ... $n, $` and $'. Comments, ideas, brickbats, and "you're on crack" comments welcome. :) Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 cell: +1 415 235 3778
David Fetter wrote: > I'm looking to the SQL WITH clause as a way to get better regex > support in PostgreSQL. I've been chatting a little bit about this, > and here's an idea for a behavior. Implementation details TBD. > > WITH res = match (x.foo, '([0-9]+)x([0-9]+)') > SELECT * > FROM x > WHERE y = res[2] > OR y = res[3]; Two questions: 1. What does this have to do with the SQL WITH clause? 2. How does the generalize to arbitrary user-defined operators?
David Fetter <david@fetter.org> writes: > I'm looking to the SQL WITH clause as a way to get better regex > support in PostgreSQL. I've been chatting a little bit about this, > and here's an idea for a behavior. Implementation details TBD. > WITH res = match (x.foo, '([0-9]+)x([0-9]+)') > SELECT * > FROM x > WHERE y = res[2] > OR y = res[3]; This doesn't seem to be even remotely compatible with SQL99's idea of WITH. regards, tom lane
On Fri, Dec 12, 2003 at 07:47:26PM +0100, Peter Eisentraut wrote: > David Fetter wrote: > > I'm looking to the SQL WITH clause as a way to get better regex > > support in PostgreSQL. I've been chatting a little bit about > > this, and here's an idea for a behavior. Implementation details > > TBD. > > WITH res = match (x.foo, '([0-9]+)x([0-9]+)') > > SELECT * > > FROM x > > WHERE y = res[2] > > OR y = res[3]; > Two questions: > > 1. What does this have to do with the SQL WITH clause? It seems that the SQL WITH clause is a way to get what I'd think of as "statement-level environment variables" available to the SQL statement. > 2. How does the generalize to arbitrary user-defined operators? I guess that would depend on whether they return arrays or singletons. Even better, they'd be able to take as arguments and return hashes aka associative arrays (cf. the TODO for plpgsql named parameters). I'm not sure what the SQL spec has to say about that last, tho. Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 cell: +1 415 235 3778
On Fri, Dec 12, 2003 at 10:13:56AM -0800, David Fetter wrote: > I'm looking to the SQL WITH clause as a way to get better regex > support in PostgreSQL. I've been chatting a little bit about this, > and here's an idea for a behavior. Implementation details TBD. I think you could be rather looking at keeping the regmatch_t * from RE_compile_and_execute somewhere in the TopTransactionContext, and create accessor functions to it. You could do BEGIN; SELECT ... x ~ '([0-9]+)x([0-9]+)'; SELECT regex_parens(1), regex_parens(2); COMMIT; Etc, you get the idea. Of course you could only access the captured expressions from the last regex, and only within the same transaction block. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) Maybe there's lots of data loss but the records of data loss are also lost. (Lincoln Yeoh)
In article <20031212190754.GA11548@dcc.uchile.cl> you wrote: > On Fri, Dec 12, 2003 at 10:13:56AM -0800, David Fetter wrote: > >> I'm looking to the SQL WITH clause as a way to get better regex >> support in PostgreSQL. I've been chatting a little bit about this, >> and here's an idea for a behavior. Implementation details TBD. > I think you could be rather looking at keeping the regmatch_t * from > RE_compile_and_execute somewhere in the TopTransactionContext, and > create accessor functions to it. You could do > BEGIN; > SELECT ... x ~ '([0-9]+)x([0-9]+)'; > SELECT regex_parens(1), regex_parens(2); > COMMIT; > Etc, you get the idea. Of course you could only access the captured > expressions from the last regex, and only within the same > transaction block. A couple of little issues with this: 1. It's assuming there will be exactly one regular expression with associated atoms. This is probably fixable by making the function take two arguments, and it would be really, really nice if these were nameable. F'rex, regex_matches(regex => 1, match => 2) or regex_matches(regex => 'the_one_i_wanted', match => 'post') 2. It doesn't allow for the regex atoms to be used in the SQL statement. BTW, I should have said "row-level environment variables" rather than "statement-level environment variables," although some of the latter would be nice, too :) Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 cell: +1 415 235 3778 Sophomoric (adj.) The itch to be unique.
david@fetter.org (David Fetter) wrote: > I'm looking to the SQL WITH clause as a way to get better regex > support in PostgreSQL. I've been chatting a little bit about this, > and here's an idea for a behavior. Implementation details TBD. > > WITH res = match (x.foo, '([0-9]+)x([0-9]+)') > SELECT * > FROM x > WHERE y = res[2] > OR y = res[3]; So you're proposing what amounts to the LET function in Lisp: (let* ((homepath (user-homedir-pathname)) (maildir (make-pathname :directory (append (pathname-directory homepath) '("Mail")))) (mailhome (merge-pathnames maildir homepath))) (do-something-with-mail mailhome) (do-something-with-homehomepath)) Or the ML let structure... let a = 1 and b = 2 in a + b;; - : int = 3 I think this could be a "simply smashing" idea, allowing values to get declared once, up front, offering the further potential for them to be reused. Unfortunately, this doesn't seem to fit with the way WITH is defined in SQL. And while the LISP presentation may have "way too many parentheses," the ML syntax, being pretty much orthogonal to what SQL currently offers, strikes me as being preferable. -- select 'cbbrowne' || '@' || 'ntlug.org'; http://www.ntlug.org/~cbbrowne/unix.html :FATAL ERROR -- ILLEGAL ERROR
On Sat, 13 Dec 2003, Christopher Browne wrote: > Unfortunately, this doesn't seem to fit with the way WITH is defined > in SQL. How is the WITH construct defined in SQL? -- /Dennis
David Fetter kirjutas R, 12.12.2003 kell 20:13: > Kind people, > > I'm looking to the SQL WITH clause as a way to get better regex > support in PostgreSQL. I've been chatting a little bit about this, > and here's an idea for a behavior. Implementation details TBD. > > WITH res = match (x.foo, '([0-9]+)x([0-9]+)') > SELECT * > FROM x > WHERE y = res[2] > OR y = res[3]; why not use current standard syntax SELECT x.* FROM x, (select match (x.foo, '([0-9]+)x([0-9]+)') from x innerx where innerx.pk = x.pk )as res HAVING y = get_match_group(res, 2) OR y = get_match_group(res, 3) ; with functions match(str,regex) which returns a match object and get_match_group which extracts matched groups from it. --------------- Hannu
On Sat, Dec 13, 2003 at 10:58:59PM +0200, Hannu Krosing wrote: > David Fetter kirjutas R, 12.12.2003 kell 20:13: > > Kind people, > > I'm looking to the SQL WITH clause as a way to get better regex > > support in PostgreSQL. I've been chatting a little bit about > > this, and here's an idea for a behavior. Implementation details > > TBD. > > WITH res = match (x.foo, '([0-9]+)x([0-9]+)') > > SELECT * > > FROM x > > WHERE y = res[2] > > OR y = res[3]; > > why not use current standard syntax > > SELECT x.* > FROM x, > (select match (x.foo, '([0-9]+)x([0-9]+)') > from x innerx > where innerx.pk = x.pk > ) as res > HAVING y = get_match_group(res, 2) > OR y = get_match_group(res, 3) > ; > > with functions match(str,regex) which returns a match object and > get_match_group which extracts matched groups from it. Hannu, excellent idea! This is just the kind of discussion I was trying to start :) Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 cell: +1 415 235 3778
David Fetter kirjutas L, 13.12.2003 kell 23:17: > On Sat, Dec 13, 2003 at 10:58:59PM +0200, Hannu Krosing wrote: > > David Fetter kirjutas R, 12.12.2003 kell 20:13: > > > Kind people, > > > > I'm looking to the SQL WITH clause as a way to get better regex > > > support in PostgreSQL. I've been chatting a little bit about > > > this, and here's an idea for a behavior. Implementation details > > > TBD. > > > > WITH res = match (x.foo, '([0-9]+)x([0-9]+)') > > > SELECT * > > > FROM x > > > WHERE y = res[2] > > > OR y = res[3]; > > > > why not use current standard syntax > > > > SELECT x.* > > FROM x, > > (select match (x.foo, '([0-9]+)x([0-9]+)') > > from x innerx > > where innerx.pk = x.pk > > ) as res > > HAVING y = get_match_group(res, 2) > > OR y = get_match_group(res, 3) > > ; > > > > with functions match(str,regex) which returns a match object and > > get_match_group which extracts matched groups from it. > > Hannu, excellent idea! As I understand it the SQL Standard WITH can be rewritten to SELECT in FROM if the query is not recursive and does not reference other subqueries. That's why I proposed the above syntax (we don't yet support WITH). > This is just the kind of discussion I was > trying to start :) Thanks ;) Actually I notice that WHERE could be used there instead of HAVING. Should be faster too. ------------ Hannu
Hannu Krosing <hannu@tm.ee> writes: > SELECT x.* > FROM x, > (select match (x.foo, '([0-9]+)x([0-9]+)') > from x innerx > where innerx.pk = x.pk > ) as res > HAVING y = get_match_group(res, 2) > OR y = get_match_group(res, 3) > ; Well you don't need to go fetch from the table an extra time. Presumably the data will be cached but it's still a lot of extra work to process the data twice. You could just do select * from ( select x.*, (select match(foo, '([0-9]+)x([0-9]+)') as res )where y = res[2] or y = res[3] But what Hannu's saying is that the SQL Standard WITH is precisely syntactic sugar for subqueries used like above. It sounds like WITH is to subqueries as let is to lambda.... -- greg