Thread: Proposing WITH ITERATIVE
Hey, everyone.
It's been quite a while since I last contributed a patch but, as this is a new feature, I wanted to gather feedback before doing so. I've found this functionality, already in use at Universität Tübingen, to be both highly beneficial in many of my queries as well as a small change to Postgres - a good trade-off IMO.
As you know, Postgres currently supports SQL:1999 recursive common table expressions, constructed using WITH RECURSIVE, which permit the computation of growing relations (e.g., transitive closures.) Although it is possible to use this recursion for general-purpose iterations, doing so is a deviation from its intended use case. Accordingly, an iterative-only use of WITH RECURSIVE often results in sizable overhead and poor optimizer decisions. In this email, I'd like to propose a similar but iterative-optimized form of CTE - WITH ITERATIVE.
In that it can reference a relation within its definition, this iterative variant has similar capabilities as recursive CTEs. In contrast to recursive CTEs, however, rather than appending new tuples, this variant performs a replacement of the intermediate relation. As such, the final result consists of a relation containing tuples computed during the last iteration only. Why would we want this?
This type of computation pattern is often found in data and graph mining algorithms. In PageRank, for example, the initial ranks are updated in each iteration. In clustering algorithms, assignments of data tuples to clusters are updated in every iteration. Something these types of algorithms have in common is that they operate on fixed-size datasets, where only specific values (ranks, assigned clusters, etc.) are updated.
Rather than stopping when no new tuples are generated, WITH ITERATIVE stops when a user-defined predicate evaluates to true. By providing a non-appending iteration concept with a while-loop-style stop criterion, we can massively speed up query execution due to better optimizer decisions. Although it is possible to implement the types of algorithms mentioned above using recursive CTEs, this feature has two significant advantages:
First, iterative CTEs consume significantly less memory. Consider a CTE of N tuples and I iterations. With recursive CTEs, such a relation would grow to (N * I) tuples. With iterative CTEs, however, all prior iterations are discarded early. As such, the size of the relation would remain N, requiring only a maximum of (2 * N) tuples for comparisons of the current and the prior iteration. Furthermore, in queries where the number of executed iterations represents the stop criterion, recursive CTEs require even more additional per-tuple overhead - to carry along the iteration counter.
Second, iterative CTEs have lower query response times. Because of its smaller size, the time to scan and process the intermediate relation, to re-compute ranks, clusters, etc., is significantly reduced.
In short, iterative CTEs retain the flexibility of recursive CTEs, but offer a significant advantage for algorithms which don't require results computed throughout all iterations. As this feature deviates only slightly from WITH RECURSIVE, there is very little code required to support it (~250 loc.)
If there's any interest, I'll clean-up their patch and submit. Thoughts?
It's been quite a while since I last contributed a patch but, as this is a new feature, I wanted to gather feedback before doing so. I've found this functionality, already in use at Universität Tübingen, to be both highly beneficial in many of my queries as well as a small change to Postgres - a good trade-off IMO.
As you know, Postgres currently supports SQL:1999 recursive common table expressions, constructed using WITH RECURSIVE, which permit the computation of growing relations (e.g., transitive closures.) Although it is possible to use this recursion for general-purpose iterations, doing so is a deviation from its intended use case. Accordingly, an iterative-only use of WITH RECURSIVE often results in sizable overhead and poor optimizer decisions. In this email, I'd like to propose a similar but iterative-optimized form of CTE - WITH ITERATIVE.
In that it can reference a relation within its definition, this iterative variant has similar capabilities as recursive CTEs. In contrast to recursive CTEs, however, rather than appending new tuples, this variant performs a replacement of the intermediate relation. As such, the final result consists of a relation containing tuples computed during the last iteration only. Why would we want this?
This type of computation pattern is often found in data and graph mining algorithms. In PageRank, for example, the initial ranks are updated in each iteration. In clustering algorithms, assignments of data tuples to clusters are updated in every iteration. Something these types of algorithms have in common is that they operate on fixed-size datasets, where only specific values (ranks, assigned clusters, etc.) are updated.
Rather than stopping when no new tuples are generated, WITH ITERATIVE stops when a user-defined predicate evaluates to true. By providing a non-appending iteration concept with a while-loop-style stop criterion, we can massively speed up query execution due to better optimizer decisions. Although it is possible to implement the types of algorithms mentioned above using recursive CTEs, this feature has two significant advantages:
First, iterative CTEs consume significantly less memory. Consider a CTE of N tuples and I iterations. With recursive CTEs, such a relation would grow to (N * I) tuples. With iterative CTEs, however, all prior iterations are discarded early. As such, the size of the relation would remain N, requiring only a maximum of (2 * N) tuples for comparisons of the current and the prior iteration. Furthermore, in queries where the number of executed iterations represents the stop criterion, recursive CTEs require even more additional per-tuple overhead - to carry along the iteration counter.
Second, iterative CTEs have lower query response times. Because of its smaller size, the time to scan and process the intermediate relation, to re-compute ranks, clusters, etc., is significantly reduced.
In short, iterative CTEs retain the flexibility of recursive CTEs, but offer a significant advantage for algorithms which don't require results computed throughout all iterations. As this feature deviates only slightly from WITH RECURSIVE, there is very little code required to support it (~250 loc.)
If there's any interest, I'll clean-up their patch and submit. Thoughts?
Jonah H. Harris
Hi, You might get better feedback in a month or so; right now we just got into feature freeze. On Mon, 2020-04-27 at 12:52 -0400, Jonah H. Harris wrote: > In that it can reference a relation within its definition, this > iterative variant has similar capabilities as recursive CTEs. In > contrast to recursive CTEs, however, rather than appending new > tuples, this variant performs a replacement of the intermediate > relation. As such, the final result consists of a relation containing > tuples computed during the last iteration only. Why would we want > this? Can you illustrate with some examples? I get that one is appending and the other is modifying in-place, but how does this end up looking in the query language? > This type of computation pattern is often found in data and graph > mining algorithms. It certainly sounds useful. > Rather than stopping when no new tuples are generated, WITH ITERATIVE > stops when a user-defined predicate evaluates to true. Why stop when it evaluates to true, and not false? > First, iterative CTEs consume significantly less memory. Consider a > CTE of N tuples and I iterations. With recursive CTEs, such a > relation would grow to (N * I) tuples. With iterative CTEs, however, > all prior iterations are discarded early. As such, the size of the > relation would remain N, requiring only a maximum of (2 * N) tuples > for comparisons of the current and the prior iteration. Furthermore, > in queries where the number of executed iterations represents the > stop criterion, recursive CTEs require even more additional per-tuple > overhead - to carry along the iteration counter. It seems like the benefit comes from carrying information along within tuples (by adding to scores or counters) rather than appending tuples. Is it possible to achieve this in other ways? The recursive CTE implementation is a very direct implementation of the standard, perhaps there are smarter approaches? Regards, Jeff Davis
On Mon, Apr 27, 2020 at 12:52:48PM -0400, Jonah H. Harris wrote: > Hey, everyone. > If there's any interest, I'll clean-up their patch and submit. Thoughts? Where's the current patch? Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Mon, Apr 27, 2020 at 10:32 PM David Fetter <david@fetter.org> wrote:
On Mon, Apr 27, 2020 at 12:52:48PM -0400, Jonah H. Harris wrote:
> Hey, everyone.
> If there's any interest, I'll clean-up their patch and submit. Thoughts?
Where's the current patch?
It’s private. Hence, why I’m inquiring as to interest.
Jonah H. Harris
On Mon, Apr 27, 2020 at 10:44:04PM -0400, Jonah H. Harris wrote: > On Mon, Apr 27, 2020 at 10:32 PM David Fetter <david@fetter.org> wrote: > > On Mon, Apr 27, 2020 at 12:52:48PM -0400, Jonah H. Harris wrote: > > > Hey, everyone. > > > > > If there's any interest, I'll clean-up their patch and submit. Thoughts? > > > > Where's the current patch? > > > It’s private. Hence, why I’m inquiring as to interest. Have the authors agreed to make it available to the project under a compatible license? Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Mon, Apr 27, 2020 at 11:33 PM David Fetter <david@fetter.org> wrote:
Have the authors agreed to make it available to the project under a
compatible license?
If there’s interest, obviously. Otherwise I wouldn’t be asking.
I said from the start why I wasn’t attaching a patch and that I was seeing feedback. Honestly, David, stop wasting my, and list time, asking pointless off-topic questions.
Jonah H. Harris
On Tue, Apr 28, 2020 at 5:49 AM Jonah H. Harris <jonah.harris@gmail.com> wrote:
On Mon, Apr 27, 2020 at 11:33 PM David Fetter <david@fetter.org> wrote:
Have the authors agreed to make it available to the project under a
compatible license?If there’s interest, obviously. Otherwise I wouldn’t be asking.I said from the start why I wasn’t attaching a patch and that I was seeing feedback. Honestly, David, stop wasting my, and list time, asking pointless off-topic questions.
Jonah,
I see it the other way round—it could end up as a waste of everyone's time discussing the details, if the authors don't agree to publish their code in the first place. Of course, it could also be written from scratch, in which case I guess someone else from the community (who haven't seen that private code) would have to take a stab at it, but I believe it helps to know this in advance.
I also don't see how it "obviously" follows from your two claims: "I've found this functionality" and "I'll clean-up their patch and submit", that you have even asked (or, for that matter—found) the authors of that code.
Finally, I'd like to suggest you adopt a more constructive tone and become familiar with the project's Code of Conduct[1], if you haven't yet. I am certain that constructive, respectful communication from your side will help the community to focus on the actual details of your proposal.
Kind regards,
-- On 4/27/20 6:52 PM, Jonah H. Harris wrote:> If there's any interest, I'll clean-up their patch and submit. Thoughts? Hi, Do you have any examples of queries where it would help? It is pretty hard to say how much value some new syntax adds without seeing how it improves an intended use case. Andreas
On Tue, Apr 28, 2020 at 6:16 AM Oleksandr Shulgin <oleksandr.shulgin@zalando.de> wrote:
will help the community to focus on the actual details of your proposal.
I'd like it if the community would focus on the details of the proposal as well :)
Jonah H. Harris
On Tue, Apr 28, 2020 at 6:19 AM Andreas Karlsson <andreas@proxel.se> wrote:
Do you have any examples of queries where it would help? It is pretty
hard to say how much value some new syntax adds without seeing how it
improves an intended use case.
Hey, Andreas.
Thanks for the response. I'm currently working on a few examples per Jeff's response along with some benchmark information including improvements in response time and memory usage of the current implementation. In the meantime, as this functionality has been added to a couple of other databases and there's academic research on it, if you're interested, here's a few papers with examples:
Jonah H. Harris
Greetings Jonah! * Jonah H. Harris (jonah.harris@gmail.com) wrote: > On Tue, Apr 28, 2020 at 6:19 AM Andreas Karlsson <andreas@proxel.se> wrote: > > > Do you have any examples of queries where it would help? It is pretty > > hard to say how much value some new syntax adds without seeing how it > > improves an intended use case. > > Thanks for the response. I'm currently working on a few examples per Jeff's > response along with some benchmark information including improvements in > response time and memory usage of the current implementation. In the > meantime, as this functionality has been added to a couple of other > databases and there's academic research on it, if you're interested, here's > a few papers with examples: > > http://faculty.neu.edu.cn/cc/zhangyf/papers/2018-ICDCS2018-sqloop.pdf > http://db.in.tum.de/~passing/publications/dm_in_hyper.pdf Nice! One of the first question that we need to ask though, imv anyway, is- do the other databases use the WITH ITERATIVE syntax? How many of them? Are there other approaches? Has this been brought up to the SQL committee? In general, we really prefer to avoid extending SQL beyond the standard, especially in ways that the standard is likely to be expanded. In other words, we'd really prefer to avoid the risk that the SQL committee declares WITH ITERATIVE to mean something else in the future, causing us to have to have a breaking change to become compliant. Now, if all the other major DB vendors have WITH ITERATIVE and they all work the same way, then we can have at least some confidence that the SQL committee will end up defining it in that way and there won't be any issue. We do have someone on the committee who watches these lists, hopefully they'll have a chance to comment on this. Perhaps it's already in discussion in the committee. Thanks! Stephen
Attachment
On Mon, Apr 27, 2020 at 8:50 PM Jeff Davis <pgsql@j-davis.com> wrote:
Hi,
Hey, Jeff. Long time no talk. Good to see you're still on here.
You might get better feedback in a month or so; right now we just got
into feature freeze.
Yep. No hurry. I've just been playing with this and wanted to start getting feedback. It's a side-project for me anyway, so time is limited.
Can you illustrate with some examples? I get that one is appending and
the other is modifying in-place, but how does this end up looking in
the query language?
I'm putting together a few concrete real-world examples.
> Rather than stopping when no new tuples are generated, WITH ITERATIVE
> stops when a user-defined predicate evaluates to true.
Why stop when it evaluates to true, and not false?
It's how they implemented it. A few other databases have implemented similar functionality but, as it's not standard, it's kinda just up to each implementor. I'm not married to that idea, but it has worked well for me so far.
It seems like the benefit comes from carrying information along within
tuples (by adding to scores or counters) rather than appending tuples.
Is it possible to achieve this in other ways? The recursive CTE
implementation is a very direct implementation of the standard, perhaps
there are smarter approaches?
Yeah, in that specific case, one of the other implementations seems to carry the counters along in the executor itself. But, as not all uses of this functionality are iteration-count-based, I think that's a little limiting. Using a terminator expression (of some kind) seems most adaptable, I think. I'll give some examples of both types of cases.
Jonah H. Harris
On Tue, Apr 28, 2020 at 11:51 AM Stephen Frost <sfrost@snowman.net> wrote:
Greetings Jonah!
Hey, Stephen. Hope you're doing well, man!
One of the first question that we need to ask though, imv anyway, is- do
the other databases use the WITH ITERATIVE syntax? How many of them?
Are there other approaches? Has this been brought up to the SQL
committee?
There are four that I'm aware of, but I'll put together a full list. I don't believe it's been proposed to the standards committee, but I'll see if I can find transcripts/proposals. I imagine those are all still public.
In general, we really prefer to avoid extending SQL beyond the standard,
especially in ways that the standard is likely to be expanded. In
other words, we'd really prefer to avoid the risk that the SQL committee
declares WITH ITERATIVE to mean something else in the future, causing us
to have to have a breaking change to become compliant.
Agreed. That's my main concern as well.
Now, if all the other major DB vendors have WITH ITERATIVE and they all work the same
way, then we can have at least some confidence that the SQL committee
will end up defining it in that way and there won't be any issue.
This is where it sucks, as each vendor has done it differently. One uses WITH ITERATE, one uses WITH ITERATIVE, others use, what I consider to be, a nasty operator-esque style... I think ITERATIVE makes the most sense, but it does create a future contention as that definitely seems like the syntax the committee would use as well.
Oracle ran into this issue as well, which is why they added an additional clause to WITH that permits selection of depth/breadth-first search et al. While it's kinda nasty, we could always similarly amend WITH RECURSIVE to add an additional ITERATE or something to the tail of the with_clause rule. But, that's why I'm looking for feedback :)
We do have someone on the committee who watches these lists, hopefully
they'll have a chance to comment on this. Perhaps it's already in
discussion in the committee.
That would be awesome.
--
Jonah H. Harris
On Mon, Apr 27, 2020 at 8:50 PM Jeff Davis <pgsql@j-davis.com> wrote:
Can you illustrate with some examples? I get that one is appending and
the other is modifying in-place, but how does this end up looking in
the query language?
To ensure credit is given to the original authors, the initial patch I'm working with (against Postgres 11 and 12) came from Denis Hirn, Torsten Grust, and Christian Duta. Attached is a quick-and-dirty merge of that patch that applies cleanly against 13-devel. It is not solid, at all, but demonstrates the functionality. At present, my updates can be found at https://github.com/jonahharris/postgres/tree/with-iterative
As the patch makes use of additional booleans for iteration, if there's interest in incorporating this functionality, I'd like to discuss with Tom, Robert, et al regarding the current use of booleans for CTE recursion differentiation in parsing and planning. In terms of making it production-ready, I think the cleanest method would be to use a bitfield (rather than booleans) to differentiate recursive and iterative CTEs. Though, as that would touch more code, it's obviously up for discussion.
I'm working on a few good standalone examples of PageRank, shortest path, etc. But, the simplest Fibonacci example can be found here:
EXPLAIN ANALYZE
WITH RECURSIVE fib_sum (iteration, previous_number, new_number)
AS (SELECT 1, 0::numeric, 1::numeric
UNION ALL
SELECT (iteration + 1), new_number, (previous_number + new_number)
FROM fib_sum
WHERE iteration <= 10000)
SELECT r.iteration, r.new_number
FROM fib_sum r
ORDER BY 1 DESC
LIMIT 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Limit (cost=3.81..3.81 rows=1 width=36) (actual time=44.418..44.418 rows=1 loops=1)
CTE fib_sum
-> Recursive Union (cost=0.00..3.03 rows=31 width=68) (actual time=0.005..14.002 rows=10001 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=68) (actual time=0.004..0.004 rows=1 loops=1)
-> WorkTable Scan on fib_sum (cost=0.00..0.24 rows=3 width=68) (actual time=0.001..0.001 rows=1 loops=10001)
Filter: (iteration <= 10000)
Rows Removed by Filter: 0
-> Sort (cost=0.78..0.85 rows=31 width=36) (actual time=44.417..44.417 rows=1 loops=1)
Sort Key: r.iteration DESC
Sort Method: top-N heapsort Memory: 27kB
-> CTE Scan on fib_sum r (cost=0.00..0.62 rows=31 width=36) (actual time=0.009..42.660 rows=10001 loops=1)
Planning Time: 0.331 ms
Execution Time: 45.949 ms
(13 rows)
WITH RECURSIVE fib_sum (iteration, previous_number, new_number)
AS (SELECT 1, 0::numeric, 1::numeric
UNION ALL
SELECT (iteration + 1), new_number, (previous_number + new_number)
FROM fib_sum
WHERE iteration <= 10000)
SELECT r.iteration, r.new_number
FROM fib_sum r
ORDER BY 1 DESC
LIMIT 1;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Limit (cost=3.81..3.81 rows=1 width=36) (actual time=44.418..44.418 rows=1 loops=1)
CTE fib_sum
-> Recursive Union (cost=0.00..3.03 rows=31 width=68) (actual time=0.005..14.002 rows=10001 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=68) (actual time=0.004..0.004 rows=1 loops=1)
-> WorkTable Scan on fib_sum (cost=0.00..0.24 rows=3 width=68) (actual time=0.001..0.001 rows=1 loops=10001)
Filter: (iteration <= 10000)
Rows Removed by Filter: 0
-> Sort (cost=0.78..0.85 rows=31 width=36) (actual time=44.417..44.417 rows=1 loops=1)
Sort Key: r.iteration DESC
Sort Method: top-N heapsort Memory: 27kB
-> CTE Scan on fib_sum r (cost=0.00..0.62 rows=31 width=36) (actual time=0.009..42.660 rows=10001 loops=1)
Planning Time: 0.331 ms
Execution Time: 45.949 ms
(13 rows)
-- No ORDER/LIMIT is required with ITERATIVE as only a single tuple is present
EXPLAIN ANALYZE
WITH ITERATIVE fib_sum (iteration, previous_number, new_number)
AS (SELECT 1, 0::numeric, 1::numeric
UNION ALL
SELECT (iteration + 1), new_number, (previous_number + new_number)
FROM fib_sum
WHERE iteration <= 10000)
SELECT r.iteration, r.new_number
FROM fib_sum r;
WITH ITERATIVE fib_sum (iteration, previous_number, new_number)
AS (SELECT 1, 0::numeric, 1::numeric
UNION ALL
SELECT (iteration + 1), new_number, (previous_number + new_number)
FROM fib_sum
WHERE iteration <= 10000)
SELECT r.iteration, r.new_number
FROM fib_sum r;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
CTE Scan on fib_sum r (cost=3.03..3.65 rows=31 width=36) (actual time=11.626..11.627 rows=1 loops=1)
CTE fib_sum
-> Recursive Union (cost=0.00..3.03 rows=31 width=68) (actual time=11.621..11.621 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=68) (actual time=0.001..0.001 rows=1 loops=1)
-> WorkTable Scan on fib_sum (cost=0.00..0.24 rows=3 width=68) (actual time=0.001..0.001 rows=1 loops=10001)
Filter: (iteration <= 10000)
Rows Removed by Filter: 0
Planning Time: 0.068 ms
Execution Time: 11.651 ms
(9 rows)
--------------------------------------------------------------------------------------------------------------------------
CTE Scan on fib_sum r (cost=3.03..3.65 rows=31 width=36) (actual time=11.626..11.627 rows=1 loops=1)
CTE fib_sum
-> Recursive Union (cost=0.00..3.03 rows=31 width=68) (actual time=11.621..11.621 rows=1 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=68) (actual time=0.001..0.001 rows=1 loops=1)
-> WorkTable Scan on fib_sum (cost=0.00..0.24 rows=3 width=68) (actual time=0.001..0.001 rows=1 loops=10001)
Filter: (iteration <= 10000)
Rows Removed by Filter: 0
Planning Time: 0.068 ms
Execution Time: 11.651 ms
(9 rows)
Jonah H. Harris
Attachment
Hello Jonah, Nice. > -- No ORDER/LIMIT is required with ITERATIVE as only a single tuple is > present > EXPLAIN ANALYZE > WITH ITERATIVE fib_sum (iteration, previous_number, new_number) > AS (SELECT 1, 0::numeric, 1::numeric > UNION ALL > SELECT (iteration + 1), new_number, (previous_number + new_number) > FROM fib_sum > WHERE iteration <= 10000) > SELECT r.iteration, r.new_number > FROM fib_sum r; Nice. My 0,02€ about the feature design: I'm wondering about how to use such a feature in the context of WITH query with several queries having different behaviors. Currently "WITH" introduces a named-query (like a view), "WITH RECURSIVE" introduces a mix of recursive and named queries, pg really sees whether each one is recursive or not, and "RECURSIVE" is required but could just be guessed. Now that there could be 3 variants in the mix, and for the feature to be orthogonal I think that it should be allowed. However, there is no obvious way to distinguish a RECURSIVE from an ITERATIVE, as it is more a behavioral thing than a structural one. This suggests allowing to tag each query somehow, eg before, which would be closer to the current approach: WITH foo(i) AS (simple select), RECURSIVE bla(i) AS (recursive select), ITERATIVE blup(i) AS (iterative select), or maybe after AS, which may be clearer because closer to the actual query, which looks better to me: WITH foo(i) AS (simple select), bla(i) AS RECURSIVE (recursive select), boo(i) AS ITERATIVE (iterative select), … Also, with 3 cases I would prefer that the default has a name so someone can talk about it otherwise than saying "default". Maybe SIMPLE would suffice, or something else. ISTM that as nothing is expected between AS and the open parenthesis, there is no need to have a reserved keyword, which is a good thing for the parser. -- Fabien.
On 2020-04-29 07:09, Fabien COELHO wrote: > I'm wondering about how to use such a feature in the context of WITH query > with several queries having different behaviors. Currently "WITH" > introduces a named-query (like a view), "WITH RECURSIVE" introduces a mix > of recursive and named queries, pg really sees whether each one is > recursive or not, and "RECURSIVE" is required but could just be guessed. Yeah the RECURSIVE vs ITERATIVE is a bit of a red herring here. As you say, the RECURSIVE keyword doesn't specify the processing but marks the fact that the specification of the query is recursive. I think a syntax that would fit better within the existing framework would be something like WITH RECURSIVE t AS ( SELECT base case REPLACE ALL -- instead of UNION ALL SELECT recursive case ) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Apr 29, 2020 at 7:22 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
Yeah the RECURSIVE vs ITERATIVE is a bit of a red herring here. As you
say, the RECURSIVE keyword doesn't specify the processing but marks the
fact that the specification of the query is recursive.
Agreed. I started thinking through Fabien's response last night.
I think a syntax that would fit better within the existing framework
would be something like
WITH RECURSIVE t AS (
SELECT base case
REPLACE ALL -- instead of UNION ALL
SELECT recursive case
)
I was originally thinking more along the lines of Fabien's approach, but this is similarly interesting.
Jonah H. Harris
On Wed, Apr 29, 2020 at 10:34 AM Jonah H. Harris <jonah.harris@gmail.com> wrote:
On Wed, Apr 29, 2020 at 7:22 AM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:Yeah the RECURSIVE vs ITERATIVE is a bit of a red herring here. As you
say, the RECURSIVE keyword doesn't specify the processing but marks the
fact that the specification of the query is recursive.Agreed. I started thinking through Fabien's response last night.I think a syntax that would fit better within the existing framework
would be something like
WITH RECURSIVE t AS (
SELECT base case
REPLACE ALL -- instead of UNION ALL
SELECT recursive case
)I was originally thinking more along the lines of Fabien's approach, but this is similarly interesting.
Obviously I'm very concerned about doing something that the SQL Standard will clobber somewhere down the road. Having said that, the recursive syntax always struck me as awkward even by SQL standards.
Perhaps something like this would be more readable
WITH t AS (
UPDATE ( SELECT 1 AS ctr, 'x' as val )
SET ctr = ctr + 1, val = val || 'x'
WHILE ctr <= 100
RETURNING ctr, val
)
The notion of an UPDATE on an ephemeral subquery isn't that special, see "subquery2" in https://docs.oracle.com/cd/B19306_01/appdev.102/b14261/update_statement.htm , so the only syntax here without precedence is dropping a WHILE into an UPDATE statement.
Perhaps something like this would be more readable
WITH t AS (
UPDATE ( SELECT 1 AS ctr, 'x' as val )
SET ctr = ctr + 1, val = val || 'x'
WHILE ctr <= 100
RETURNING ctr, val
)
The notion of an UPDATE on an ephemeral subquery isn't that special, see "subquery2" in https://docs.oracle.com/cd/B19306_01/appdev.102/b14261/update_statement.htm , so the only syntax here without precedence is dropping a WHILE into an UPDATE statement.
Hello Corey, Hello Peter, My 0.02 € about the alternative syntaxes: Peter: > I think a syntax that would fit better within the existing framework > would be something like > > WITH RECURSIVE t AS ( > SELECT base case > REPLACE ALL -- instead of UNION ALL > SELECT recursive case > ) A good point about this approach is that the replacement semantics is clear, whereas using ITERATIVE with UNION is very misleading, as it is *not* a union at all. This said I'm wondering how the parser would react. Moreover, having a different syntax for normal queries and inside WITH query looks very undesirable from a language design point of view. This suggests that the user should be able to write it anywhere: SELECT 1 REPLACE SELECT 2; Well, maybe. I'm unclear whether "REPLACE ALL" vs "REPLACE" makes sense, ISTM that there could be only one replacement semantics (delete the content and insert a new one)? REPLACE should have an associativity defined wrt other operators: SELECT 1 UNION SELECT 2 REPLACE SELECT 3; -- how many rows? I do not see anything obvious. Probably 2 rows. Corey: > Obviously I'm very concerned about doing something that the SQL Standard > will clobber somewhere down the road. Having said that, the recursive > syntax always struck me as awkward even by SQL standards. Indeed! > Perhaps something like this would be more readable > > WITH t AS ( > UPDATE ( SELECT 1 AS ctr, 'x' as val ) > SET ctr = ctr + 1, val = val || 'x' > WHILE ctr <= 100 > RETURNING ctr, val > ) > > The notion of an UPDATE on an ephemeral subquery isn't that special, see > "subquery2" in > https://docs.oracle.com/cd/B19306_01/appdev.102/b14261/update_statement.htm , I must admit that I do not like much needing another level of subquery, but maybe it could just be another named query in the WITH statement. ISTM that UPDATE is quite restrictive as the number of rows cannot change, which does not seem desirable at all? How could I add or remove rows from one iteration to the next? ISTM that the WHILE would be checked before updating, so that WHILE FALSE does nothing, in which case its position after SET is odd. Having both WHERE and WHILE might look awkward. Also it looks much more procedural this way, which is the point, but also depart from the declarative SELECT approach of WITH RECURSIVE. -- Fabien.
> Perhaps something like this would be more readable
>
> WITH t AS (
> UPDATE ( SELECT 1 AS ctr, 'x' as val )
> SET ctr = ctr + 1, val = val || 'x'
> WHILE ctr <= 100
> RETURNING ctr, val
> )
>
> The notion of an UPDATE on an ephemeral subquery isn't that special, see
> "subquery2" in
> https://docs.oracle.com/cd/B19306_01/appdev.102/b14261/update_statement.htm ,
I must admit that I do not like much needing another level of subquery,
but maybe it could just be another named query in the WITH statement.
So like this:
WITH initial_conditions as (SELECT 1 as ctr, 'x' as val)
UPDATE initial_conditions
SET ctr = ctr + 1, val = val || 'x'
WHILE ctr <= 100
RETURNING ctr, val
ISTM that UPDATE is quite restrictive as the number of rows cannot
change, which does not seem desirable at all? How could I add or remove
rows from one iteration to the next?
My understanding was that maintaining a fixed number of rows was a desired feature.
ISTM that the WHILE would be checked before updating, so that WHILE FALSE
does nothing, in which case its position after SET is odd.
True, but having the SELECT before the FROM is equally odd.
Having both WHERE and WHILE might look awkward.
Maybe an UNTIL instead of WHILE?
Also it looks much more procedural this way, which is the point, but also
depart from the declarative SELECT approach of WITH RECURSIVE.
Yeah, just throwing it out as a possibility. Looking again at what I suggested, it looks a bit like the Oracle "CONNECT BY level <= x" idiom.
I suspect that the SQL standards body already has some preliminary work done, and we should ultimately follow that.
On Wed, Apr 29, 2020 at 4:50 PM Corey Huinker <corey.huinker@gmail.com> wrote:
Having both WHERE and WHILE might look awkward.
Maybe an UNTIL instead of WHILE?
While I'm not a huge fan of it, one of the other databases implementing this functionality does so using the syntax:
WITH ITERATIVE R AS '(' R0 ITERATE Ri UNTIL N (ITERATIONS | UPDATES) ')' Qf
Where N in ITERATIONS represents termination at an explicit count and, in UPDATES, represents termination after Ri updates more than n rows on table R.
Jonah H. Harris
On Wed, Apr 29, 2020 at 5:54 PM Jonah H. Harris <jonah.harris@gmail.com> wrote:
On Wed, Apr 29, 2020 at 4:50 PM Corey Huinker <corey.huinker@gmail.com> wrote:Having both WHERE and WHILE might look awkward.
Maybe an UNTIL instead of WHILE?While I'm not a huge fan of it, one of the other databases implementing this functionality does so using the syntax:WITH ITERATIVE R AS '(' R0 ITERATE Ri UNTIL N (ITERATIONS | UPDATES) ')' QfWhere N in ITERATIONS represents termination at an explicit count and, in UPDATES, represents termination after Ri updates more than n rows on table R.
Sent too soon :) One of the main reasons I dislike the above is that it assumes N is known. In some cases, however, you really need termination upon a condition.
--
Jonah H. Harris
Hello, more random thoughts about syntax, semantics, and keeping it relational. > While I'm not a huge fan of it, one of the other databases implementing > this functionality does so using the syntax: > > WITH ITERATIVE R AS '(' R0 ITERATE Ri UNTIL N (ITERATIONS | UPDATES) ')' Qf > > Where N in ITERATIONS represents termination at an explicit count and, in > UPDATES, represents termination after Ri updates more than n rows on table > R. > > One of the main reasons I dislike the above is that it assumes N is > known. In some cases, however, you really need termination upon a > condition. Yes, definitely, a (boolean?) condition is really needed, but possibly above N could be an expression, maybe with some separator before the query. ISTM that using SELECT iterations is relational and close to the currently existing RECURSIVE. Separating the initialization and iterations with ITERATE is kind of the same approach than Peter's REPLACE, somehow, i.e. a new marker. The above approach bothers me because it changes the query syntax a lot. The inside-WITH syntax should be the same as the normal query syntax. First try. If we go to new markers, maybe the following, which kind of reuse Corey explicit condition, but replacing UPDATE with SELECT which makes it more generic: WITH R AS ( ITERATE [STARTING] FROM R0 WHILE/UNTIL condition REPEAT Ri ); Ok, it is quite procedural. It is really just a reordering of the syntax shown above, with a boolean condition thrown in and a heavy on (key)words SQL-like look and feel. It seems to make sense on a simple example: -- 1 by 1 count WITH counter(n) ( ITERATE STARTING FROM SELECT 1 WHILE n < 10 REPEAT SELECT n+1 FROM counter ); However I'm very unclear about the WHILE stuff, it makes some sense here because there is just one row, but what if there are severals? -- 2 by 2 count WITH counter(n) ( ITERATE [STARTING FROM? OVER? nothing?] SELECT 1 UNION SELECT 2 -- cannot be empty? why not? WHILE n < 10 REPEAT -- which n it is just above? -- shoult it add a ANY/ALL semantics? -- should it really be a sub-query returning a boolean? -- eg: WHILE TRUE = ANY/ALL (SELECT n < 10 FROM counter) -- which I find pretty ugly. -- what else could it be? SELECT n+2 FROM counter ); Also, the overall syntax does not make much sense outside a WITH because one cannot reference the initial query which has no name. Hmmm. Not very convincing:-) Let us try again. Basically iterating is a 3 select construct: one for initializing, one for iterating, one for the stopping condition, with naming issues, the last point being exactly what WITH should solve. by restricting the syntax to normal existing selects and moving things out: WITH stuff(n) AS ITERATE OVER/FROM/STARTING FROM '(' initial-sub-query ')' -- or a table? WHILE/UNTIL '(' condition-sub-query ')' -- what is TRUE/FALSE? non empty? other? -- WHILE/UNTIL [NOT] EXISTS '(' query ')' ?? REPEAT/DO/LOOP/... '(' sub-query-over-stuff ')' ); At least the 3 sub-queries are just standard queries, only the wrapping around (ITERATE ... WHILE/UNTIL ... REPEAT ...) is WITH specific, which is somehow better than having new separators in the query syntax itself. It is pretty relational inside, and procedural on the outside, the two levels are not mixed, which is the real win from my point of view. ISTM that the key take away from the above discussion is to keep the overhead syntax in WITH, it should not be moved inside the query in any way, like adding REPLACE or WHILE or whatever there. This way there is minimal interference with future query syntax extensions, there is only a specific WITH-level 3-query construct with pretty explicit markers. -- Fabien.
On Tue, 2020-04-28 at 11:57 -0400, Jonah H. Harris wrote: > Yeah, in that specific case, one of the other implementations seems > to carry the counters along in the executor itself. But, as not all > uses of this functionality are iteration-count-based, I think that's > a little limiting. Using a terminator expression (of some kind) seems > most adaptable, I think. I'll give some examples of both types of > cases. In my experience, graph algorithms or other queries doing more specialized analysis tend to get pretty complex with lots of special cases. Users may want to express these algorithms in a more familiar language (R, Python, etc.), and to version the code (probably in an extension). Have you considered taking this to the extreme and implementing something like User-Defined Table Operators[1]? Or is there a motivation for writing such algorithms inline in SQL? Regards, Jeff Davis [1] http://www.vldb.org/conf/1999/P47.pdf