Thread: WITH RECURSIVE: ARRAY[id] All column datatypes must be hashable

WITH RECURSIVE: ARRAY[id] All column datatypes must be hashable

From
rawi
Date:
Hello

I try to build a SQL for isolating hole pedigrees/families form a table with
persons.
Each person may have father_id and mother_id set to their parents as foreign
keys on the same persons table.

I was inspired by
http://akretschmer.blogspot.com/2008/10/waiting-for-84.html and I tryed to
develop the idea further to isolate the hole family of a given person, not
only his direct parents or children.
Despite the terrible bloating of the SQL... it works as long as no
consanguinity will be encountered, else the SQL runs in an endless loop.
I tryed to apply the trick with "path and cycle" from the "Postgresql 8.4
Documentation"

<cite>
whether we have reached the same row again while following a particular path
of links. We add two
columns path and cycle to the loop-prone query:

WITH RECURSIVE search_graph(id, link, data, depth, path, cycle) AS (    SELECT g.id, g.link, g.data, 1,
ARRAY[g.id],false   FROM graph gUNION ALL    SELECT g.id, g.link, g.data, sg.depth + 1, path || g.id, g.id = ANY(path)
 FROM graph g, search_graph sg    WHERE g.id = sg.link AND NOT cycle
 
)
SELECT * FROM search_graph;
</cite>

However it doesn't works like this:
<cite>
WITH RECURSIVE person (id,name,father_id,mother_id,level,path,cycle) AS (    SELECT f.id, f.name, f.father_id,
f.mother_id,0, ARRAY[f.id], false    FROM pedigree f
 
</cite>
I get:
<cite>
ERROR:  could not implement recursive UNION
DETAIL:  All column datatypes must be hashable.
********** Error **********
ERROR: could not implement recursive UNION
SQL state: 0A000
Detail: All column datatypes must be hashable.
</cite>
It works without the ARRAY...

I would very appreciate any help, then I couldn't find anything on the net
about "All column datatypes must be hashable", excepting the postgresql
sources...

Thanks
rawi
-- 
View this message in context:
http://www.nabble.com/WITH-RECURSIVE%3A-ARRAY-id--All-column-datatypes-must-be-hashable-tp25167538p25167538.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



me again...

I figured it out...

The Error was caused because I used UNION in place of UNION ALL.

I still don't understand why the ARRAY (path) could not be grouped...

Bye
rawi
-- 
View this message in context:
http://www.nabble.com/WITH-RECURSIVE%3A-ARRAY-id--All-column-datatypes-must-be-hashable-tp25167538p25168751.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



rawi <only4com@web.de> writes:
> The Error was caused because I used UNION in place of UNION ALL.

> I still don't understand why the ARRAY (path) could not be grouped...

Yeah, it's an unimplemented feature --- there's no hashing support for
arrays.  I hope to get that done for 8.5.  In the meantime you have
to use UNION ALL there.  If you really need to eliminate duplicate
rows, you can do that via DISTINCT in the outer query.
        regards, tom lane


In so much as id-ma-pa is near and dear to my heart, I would really 
appreciate and performance metrics you might be able to share. 
Especially size of person table, typical pedigree size and pedigree 
retrieval time (tainted by hardware specs of course).

Thanks

rjs



rawi wrote:
> me again...
>
> I figured it out...
>
> The Error was caused because I used UNION in place of UNION ALL.
>
> I still don't understand why the ARRAY (path) could not be grouped...
>
> Bye
> rawi
>   


hopeless, SQL not resolved yet

From
rawi
Date:
Thank you very much for the answer!

No, I'm not that good...

I got finally a fully kinky SQL, which resolves nothing and runs endless for
a couple of test records...

I think, it would be better to open for this matter a new thread...

Regards
Rawi
-- 
View this message in context:
http://www.nabble.com/WITH-RECURSIVE%3A-ARRAY-id--All-column-datatypes-must-be-hashable-tp25167538p25190245.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.



Re: resolved: WITH RECURSIVE: ARRAY[id] All column datatypes must be hashable

From
Will Furnass
Date:

>> rawi <only4com@web.de> writes:
> The Error was caused because I used UNION in place of UNION ALL.
>
> I still don't understand why the ARRAY (path) could not be grouped...

Yeah, it's an unimplemented feature --- there's no hashing support for
arrays.  I hope to get that done for 8.5.  In the meantime you have
to use UNION ALL there.  If you really need to eliminate duplicate
rows, you can do that via DISTINCT in the outer query.

I'm trying to do a similar sort of thing for a network containing ~9000
edges.  I'm using a WITH RECURSIVE subquery, an array to track visited edges
and a test to see whether the id of the 'current' edge is already in that
array, as per the examples in the PostgreSQL 9.0 docs.  Initially my main
query seemed to run indefinitely so I introduced a LIMIT.  I then found that
as the LIMIT was increased the number of non-distinct edges returned by the
query grew at a far greater rate than the number of distinct edges (with
LIMIT 50000 the number of distinct edges returned is only 628).  Am I right
in thinking that until arrays can be hashed that this issue could well limit
the size of the networks that I can analyse, given a particular hardware
config?  Can anyone think of a way to use a temporary table rather than an
array to store visited entities during a graph traversal as a means for
overcoming this problem?

FYI I've been running the following query as a test on PostgreSQL 9.0.

WITH RECURSIVE upstream_pipes( downstream_end, upstream_end, name, depth,
path, cycle) AS (                  SELECT p.down_node, p.up_node, p.name, 1, ARRAY[p.name],
False                  FROM pipes_table AS p                  WHERE p.downstream_end = '61ESI5R0WC'
UNIONALL                  SELECT p.downstream_end, p.upstream_end, p.name,
 
roi.depth + 1,                               path || p.name, p.name = ANY(path)                  FROM pipes_table AS p,
upstream_pipesAS usp                  WHERE p.downstream_end = usp.upstream_end                  AND NOT cycle
       AND ( NOT ( p.end_1_impassible AND p.node_1 =
 
p.downstream_end ) )                  AND ( NOT ( p.end_2_impassible AND p.node_2 =
p.downstream_end ) )              ),              q as (                  SELECT name FROM build_upstream_roi limit
20000             )              SELECT COUNT(DISTINCT name), COUNT(name) FROM q
 

Cheers,

Will
-- 
View this message in context:
http://postgresql.1045698.n5.nabble.com/WITH-RECURSIVE-ARRAY-id-All-column-datatypes-must-be-hashable-tp2154712p3242676.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.


Re: resolved: WITH RECURSIVE: ARRAY[id] All column datatypes must be hashable

From
Will Furnass
Date:
> 
>> 
>>> rawi <only4com@web.de> writes:
>>> The Error was caused because I used UNION in place of UNION ALL.
>>>
>>> I still don't understand why the ARRAY (path) could not be grouped...
>> 
>> Yeah, it's an unimplemented feature --- there's no hashing support for
>> arrays.  I hope to get that done for 8.5.  In the meantime you have
>> to use UNION ALL there.  If you really need to eliminate duplicate
>> rows, you can do that via DISTINCT in the outer query.
> 
> I'm trying to do a similar sort of thing for a network containing ~9000 
> edges.  I'm using a WITH RECURSIVE subquery, an array to track 
> visited edges and a test to see whether the id of the 'current' edge is
> already in that array, as per the examples in the PostgreSQL 9.0 docs.
> Initially my main query seemed to run indefinitely so I introduced a 
> LIMIT.  I then found that as the LIMIT was increased the number of 
> non-distinct edges returned by the query grew at a far greater rate 
> than the number of distinct edges (with LIMIT 50000 the number of 
> distinct edges returned is only 628).  Am I right in thinking that until 
> arrays can be hashed that this issue could well limit the size of the 
> networks that I can analyse, given a particular hardware config?  

Got around the problem by learning pg/plsql and writing a non-recursive
breadth-first graph traversal function.  It maintains an array of visited
nodes and a double-ended queue, implemented as an array, of encountered
nodes that require processing during the traversal.  Not as fast as a 'WITH
RECURSIVE' SQL-only traversal for graphs containing only a few nodes but
much, much more efficient for graphs containing many nodes and a
considerable number of back edges.

-- 
View this message in context:
http://postgresql.1045698.n5.nabble.com/WITH-RECURSIVE-ARRAY-id-All-column-datatypes-must-be-hashable-tp2154712p3253813.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.