Re: Puzzline CROSS JOIN when doing RECURSIVE CTE - Mailing list pgsql-general

From Alban Hertroys
Subject Re: Puzzline CROSS JOIN when doing RECURSIVE CTE
Date
Msg-id 75A77395-B71B-438C-A48B-9731A19DDCDF@gmail.com
Whole thread Raw
In response to Puzzline CROSS JOIN when doing RECURSIVE CTE  (Pól Ua Laoínecháin <linehanp@tcd.ie>)
Responses Re: Puzzline CROSS JOIN when doing RECURSIVE CTE  (Pól Ua Laoínecháin <linehanp@tcd.ie>)
List pgsql-general
> On 18 Apr 2022, at 11:56, Pól Ua Laoínecháin <linehanp@tcd.ie> wrote:

(…)

> All  of the code below is available on the fiddle here:
>
> https://dbfiddle.uk/?rdbms=postgres_13&fiddle=0cc20c9081867131260e6e3550bd08ab

(…)

> OK, grand, now I wish to perform a RECURSIVE CTE on it. So, I start by
> trying something (I thought was) very simple. Obviously, I plan to do
> more, but I wanted to get the "mechanics" correct to start with. So,
> my query is:
>
> WITH RECURSIVE cte1 (n, ln) AS
> (
>  SELECT 1 AS n, string
>  FROM line

Here is your first problem, this will yield a result for each row in your line table, numbering it ‘1’. You seem to
haveexpected just a single result here, but that is something that you need to take care of in your query. 
This part is called the base case, base step or initial step.

>  UNION ALL
>  SELECT n + 1, ln
>  FROM cte1
>  WHERE n < (SELECT COUNT(*) FROM line)

And then for each of those rows, it will add all those rows (from the same CTE!) again.
This part is called the recursive step.

You did add a termination condition here, which indeed manages to terminate, but it does so too late.

It seems that you do understand some of the concepts of recursive CTE’s, but you appear to be missing some crucial
knowledge.

For example, it is actually possible to query multiple trees with a single recursive CTE. It is not limited to a single
tree.How many trees the CTE will navigate depends on how you selected the rows in the base case. 

> )
> SELECT * FROM cte1;
>
> i.e. have a counter variable and a string from the line table

My first question is why you’re using a recursive CTE here? This doesn’t appear to be hierarchical data (such as a
tree),unless perhaps you intended to actually traverse the HTML document hierarchy? 

>
> But, then to my horror, the result of this query is
>
> 1with t(x) as (values( XMLPARSE(DOCUMENT
> ('<root><NotificationServiceDetails NotificationNo="0"
> AlarmCode="mail" AlarmStartTime="10:00:00" AlarmTime="0" Id ="2"
>> <NotificationServiceDetail
> Id="2"><Title><![CDATA[aaaaaaaaaaaaa]]></Title><ContentJson><![CDATA[
> 1 <html lang="en">
> 1 <head>
> 1 <meta charset="utf-8"/>
> 1 more stuff
> 1 more stuff
> 1 </table>
> 1 </body>
> 1 </html>
> ...
> ... snipped for brevity
> ...
>
> 256 rows!        <<=== note 256!
>
>
> So, my simple recursive CTE is
>
> a) not incrementing n and

It should be, but only after the first 16 rows from your base case.

> b) obviously doing some sort of CROSS JOIN (16^2 = 256).

Not quite, it’s selecting the same 16 rows 16 times, incrementing the numbering each time. Effectively it is indeed a
Cartesianproduct. More on that below. 

> I would be grateful if somebody could explain what's going on here. As
> shown in the fiddle, I can do the basic 1 - 10 RCTE and I've done way
> more complex ones before, so I'm a bit baffled as to what's going on
> here.

For a recursive CTE to work, you need two things, and that seems to be where you have misunderstood some things:

1. You need a base case that selects the _initial_ rows to start the recursion from.
2. You need a recursive step, where you usually join the results of your CTE to your table to figure out what the next
resultshould be. 

(It is very similar to how this works in procedural languages, where you first call a function that initiates the
recursionand then that calls a second function (with more parameters usually) that calls itself again and again, with
updatedinput parameters, until some termination condition is reached and the function returns a value.) 

For the base case, you will want to make sure that you select the correct row(s) from your line table for the initial
set.My guess is that it should be the row containing the string '<root>'. 

For the recursive step, you will want to select the next child row from your line table that is (directly) related to
theone(s) you selected in the base case, or is (directly) related to the previous iteration of the recursive case. 
It is unclear to me what that should be in your case, I don’t understand the purpose of your CTE.

Your termination condition, WHERE n < (SELECT COUNT(*) FROM line), isn’t quite doing what you expect either I think.
Let’s say that c = (SELECT COUNT(*) FROM line), so c = 16 in your example.
- The first iteration of the CTE, the base case, will select 16 lines with n=1.
- The second iteration is the first time in the recursive step, and will select all lines from the base case where n <
c,or 1 < 16, which is all of them. 
- The next iteration will select all rows where 2 < 16, then 3 < 16, … until finally in the 16th iteration it stops
withthe test 16 < 16. 

So, you managed to make the recursion terminate at least, but probably not how you intended it to work.

Something along the lines of:
WITH RECURSIVE cte (n, ln) AS (
    SELECT 1, string
    FROM line
    WHERE string LIKE '%<root>%'
    UNION ALL
    SELECT n+1, string
    FROM cte
    JOIN line ON iHaveNoClueHere
)
SELECT * FROM cte;

> Any explanations, references to URLs or other advice gratefully received.

There are plenty of explanations about recursive CTE’s in SQL on the Internet. I would probably have a good look at how
it’sdescribed in the PG documentation. There are a couple of good articles on the topic by some PG bloggers too. 

You’ll find plenty of articles related to how to do this in MSSQL or later Oracle versions. Those are totally fine for
understandingthe principles, and even the syntax is almost entirely the same. 

For a PG specific explanation, this one looks pretty thorough:
https://towardsdatascience.com/recursive-sql-queries-with-postgresql-87e2a453f1b

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




pgsql-general by date:

Previous
From: Jayadevan M
Date:
Subject: Metadata and data lineage tool
Next
From: Ram Pratap Maurya
Date:
Subject: Huge archive log generate in Postgresql-13