Re: resolved: WITH RECURSIVE: ARRAY[id] All column datatypes must be hashable - Mailing list pgsql-sql

From Will Furnass
Subject Re: resolved: WITH RECURSIVE: ARRAY[id] All column datatypes must be hashable
Date
Msg-id 1288380670846-3242676.post@n5.nabble.com
Whole thread Raw
In response to Re: resolved: WITH RECURSIVE: ARRAY[id] All column datatypes must be hashable  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: resolved: WITH RECURSIVE: ARRAY[id] All column datatypes must be hashable
List pgsql-sql

>> rawi <only4com@web.de> writes:
> The Error was caused because I used UNION in place of UNION ALL.
>
> I still don't understand why the ARRAY (path) could not be grouped...

Yeah, it's an unimplemented feature --- there's no hashing support for
arrays.  I hope to get that done for 8.5.  In the meantime you have
to use UNION ALL there.  If you really need to eliminate duplicate
rows, you can do that via DISTINCT in the outer query.

I'm trying to do a similar sort of thing for a network containing ~9000
edges.  I'm using a WITH RECURSIVE subquery, an array to track visited edges
and a test to see whether the id of the 'current' edge is already in that
array, as per the examples in the PostgreSQL 9.0 docs.  Initially my main
query seemed to run indefinitely so I introduced a LIMIT.  I then found that
as the LIMIT was increased the number of non-distinct edges returned by the
query grew at a far greater rate than the number of distinct edges (with
LIMIT 50000 the number of distinct edges returned is only 628).  Am I right
in thinking that until arrays can be hashed that this issue could well limit
the size of the networks that I can analyse, given a particular hardware
config?  Can anyone think of a way to use a temporary table rather than an
array to store visited entities during a graph traversal as a means for
overcoming this problem?

FYI I've been running the following query as a test on PostgreSQL 9.0.

WITH RECURSIVE upstream_pipes( downstream_end, upstream_end, name, depth,
path, cycle) AS (                  SELECT p.down_node, p.up_node, p.name, 1, ARRAY[p.name],
False                  FROM pipes_table AS p                  WHERE p.downstream_end = '61ESI5R0WC'
UNIONALL                  SELECT p.downstream_end, p.upstream_end, p.name,
 
roi.depth + 1,                               path || p.name, p.name = ANY(path)                  FROM pipes_table AS p,
upstream_pipesAS usp                  WHERE p.downstream_end = usp.upstream_end                  AND NOT cycle
       AND ( NOT ( p.end_1_impassible AND p.node_1 =
 
p.downstream_end ) )                  AND ( NOT ( p.end_2_impassible AND p.node_2 =
p.downstream_end ) )              ),              q as (                  SELECT name FROM build_upstream_roi limit
20000             )              SELECT COUNT(DISTINCT name), COUNT(name) FROM q
 

Cheers,

Will
-- 
View this message in context:
http://postgresql.1045698.n5.nabble.com/WITH-RECURSIVE-ARRAY-id-All-column-datatypes-must-be-hashable-tp2154712p3242676.html
Sent from the PostgreSQL - sql mailing list archive at Nabble.com.


pgsql-sql by date:

Previous
From: Dmitriy Igrishin
Date:
Subject: Re: Using PL/pgSQL text argument in 'IN (INT,INT,...)' clause
Next
From: Viktor Bojović
Date:
Subject: large xml database