Thread: WITH RECURSIVE: ARRAY[id] All column datatypes must be hashable
Hello I try to build a SQL for isolating hole pedigrees/families form a table with persons. Each person may have father_id and mother_id set to their parents as foreign keys on the same persons table. I was inspired by http://akretschmer.blogspot.com/2008/10/waiting-for-84.html and I tryed to develop the idea further to isolate the hole family of a given person, not only his direct parents or children. Despite the terrible bloating of the SQL... it works as long as no consanguinity will be encountered, else the SQL runs in an endless loop. I tryed to apply the trick with "path and cycle" from the "Postgresql 8.4 Documentation" <cite> whether we have reached the same row again while following a particular path of links. We add two columns path and cycle to the loop-prone query: WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS ( SELECT g.id, g.link, g.data, 1, ARRAY[g.id],false FROM graph gUNION ALL SELECT g.id, g.link, g.data, sg.depth + 1, path || g.id, g.id = ANY(path) FROM graph g, search_graph sg WHERE g.id = sg.link AND NOT cycle ) SELECT * FROM search_graph; </cite> However it doesn't works like this: <cite> WITH RECURSIVE person (id,name,father_id,mother_id,level,path,cycle) AS ( SELECT f.id, f.name, f.father_id, f.mother_id,0, ARRAY[f.id], false FROM pedigree f </cite> I get: <cite> ERROR: could not implement recursive UNION DETAIL: All column datatypes must be hashable. ********** Error ********** ERROR: could not implement recursive UNION SQL state: 0A000 Detail: All column datatypes must be hashable. </cite> It works without the ARRAY... I would very appreciate any help, then I couldn't find anything on the net about "All column datatypes must be hashable", excepting the postgresql sources... Thanks rawi -- View this message in context: http://www.nabble.com/WITH-RECURSIVE%3A-ARRAY-id--All-column-datatypes-must-be-hashable-tp25167538p25167538.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
me again... I figured it out... The Error was caused because I used UNION in place of UNION ALL. I still don't understand why the ARRAY (path) could not be grouped... Bye rawi -- View this message in context: http://www.nabble.com/WITH-RECURSIVE%3A-ARRAY-id--All-column-datatypes-must-be-hashable-tp25167538p25168751.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
rawi <only4com@web.de> writes: > The Error was caused because I used UNION in place of UNION ALL. > I still don't understand why the ARRAY (path) could not be grouped... Yeah, it's an unimplemented feature --- there's no hashing support for arrays. I hope to get that done for 8.5. In the meantime you have to use UNION ALL there. If you really need to eliminate duplicate rows, you can do that via DISTINCT in the outer query. regards, tom lane
Re: resolved: WITH RECURSIVE: ARRAY[id] All column datatypes must be hashable
From
Rob Sargent
Date:
In so much as id-ma-pa is near and dear to my heart, I would really appreciate and performance metrics you might be able to share. Especially size of person table, typical pedigree size and pedigree retrieval time (tainted by hardware specs of course). Thanks rjs rawi wrote: > me again... > > I figured it out... > > The Error was caused because I used UNION in place of UNION ALL. > > I still don't understand why the ARRAY (path) could not be grouped... > > Bye > rawi >
Thank you very much for the answer! No, I'm not that good... I got finally a fully kinky SQL, which resolves nothing and runs endless for a couple of test records... I think, it would be better to open for this matter a new thread... Regards Rawi -- View this message in context: http://www.nabble.com/WITH-RECURSIVE%3A-ARRAY-id--All-column-datatypes-must-be-hashable-tp25167538p25190245.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
Re: resolved: WITH RECURSIVE: ARRAY[id] All column datatypes must be hashable
From
Will Furnass
Date:
>> rawi <only4com@web.de> writes: > The Error was caused because I used UNION in place of UNION ALL. > > I still don't understand why the ARRAY (path) could not be grouped... Yeah, it's an unimplemented feature --- there's no hashing support for arrays. I hope to get that done for 8.5. In the meantime you have to use UNION ALL there. If you really need to eliminate duplicate rows, you can do that via DISTINCT in the outer query. I'm trying to do a similar sort of thing for a network containing ~9000 edges. I'm using a WITH RECURSIVE subquery, an array to track visited edges and a test to see whether the id of the 'current' edge is already in that array, as per the examples in the PostgreSQL 9.0 docs. Initially my main query seemed to run indefinitely so I introduced a LIMIT. I then found that as the LIMIT was increased the number of non-distinct edges returned by the query grew at a far greater rate than the number of distinct edges (with LIMIT 50000 the number of distinct edges returned is only 628). Am I right in thinking that until arrays can be hashed that this issue could well limit the size of the networks that I can analyse, given a particular hardware config? Can anyone think of a way to use a temporary table rather than an array to store visited entities during a graph traversal as a means for overcoming this problem? FYI I've been running the following query as a test on PostgreSQL 9.0. WITH RECURSIVE upstream_pipes( downstream_end, upstream_end, name, depth, path, cycle) AS ( SELECT p.down_node, p.up_node, p.name, 1, ARRAY[p.name], False FROM pipes_table AS p WHERE p.downstream_end = '61ESI5R0WC' UNIONALL SELECT p.downstream_end, p.upstream_end, p.name, roi.depth + 1, path || p.name, p.name = ANY(path) FROM pipes_table AS p, upstream_pipesAS usp WHERE p.downstream_end = usp.upstream_end AND NOT cycle AND ( NOT ( p.end_1_impassible AND p.node_1 = p.downstream_end ) ) AND ( NOT ( p.end_2_impassible AND p.node_2 = p.downstream_end ) ) ), q as ( SELECT name FROM build_upstream_roi limit 20000 ) SELECT COUNT(DISTINCT name), COUNT(name) FROM q Cheers, Will -- View this message in context: http://postgresql.1045698.n5.nabble.com/WITH-RECURSIVE-ARRAY-id-All-column-datatypes-must-be-hashable-tp2154712p3242676.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
Re: resolved: WITH RECURSIVE: ARRAY[id] All column datatypes must be hashable
From
Will Furnass
Date:
> >> >>> rawi <only4com@web.de> writes: >>> The Error was caused because I used UNION in place of UNION ALL. >>> >>> I still don't understand why the ARRAY (path) could not be grouped... >> >> Yeah, it's an unimplemented feature --- there's no hashing support for >> arrays. I hope to get that done for 8.5. In the meantime you have >> to use UNION ALL there. If you really need to eliminate duplicate >> rows, you can do that via DISTINCT in the outer query. > > I'm trying to do a similar sort of thing for a network containing ~9000 > edges. I'm using a WITH RECURSIVE subquery, an array to track > visited edges and a test to see whether the id of the 'current' edge is > already in that array, as per the examples in the PostgreSQL 9.0 docs. > Initially my main query seemed to run indefinitely so I introduced a > LIMIT. I then found that as the LIMIT was increased the number of > non-distinct edges returned by the query grew at a far greater rate > than the number of distinct edges (with LIMIT 50000 the number of > distinct edges returned is only 628). Am I right in thinking that until > arrays can be hashed that this issue could well limit the size of the > networks that I can analyse, given a particular hardware config? Got around the problem by learning pg/plsql and writing a non-recursive breadth-first graph traversal function. It maintains an array of visited nodes and a double-ended queue, implemented as an array, of encountered nodes that require processing during the traversal. Not as fast as a 'WITH RECURSIVE' SQL-only traversal for graphs containing only a few nodes but much, much more efficient for graphs containing many nodes and a considerable number of back edges. -- View this message in context: http://postgresql.1045698.n5.nabble.com/WITH-RECURSIVE-ARRAY-id-All-column-datatypes-must-be-hashable-tp2154712p3253813.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.