Thread: storing large graphs in postgres

storing large graphs in postgres

From
"Taher H. Haveliwala"
Date:
I need to store very large graphs structures in
postgres.  The graphs are close to 20GB when in
flatfile format.  I first tried using an adjacency
list representation, i.e.,

  graph (source INT8 PRIMARY KEY, dest INT8[]);

but operating on the array type seems a bit
inflexible. I took a look at the contrib/array stuff
as suggested in a previous post, but it seems like
that only allows for boolean predicates on the array.
I.e., I would like to be able to say 'return all nodes
within distance two from x' using purely sql.  Of
course I could use an edge-list format:

  graph (source INT8, dest INT8);

but this takes up almost double the space (which is
painful, given that the original input is close to
20GB).

Any way to get richer queries on array types, or some
other efficient way to store large graphs?

I suppose some python glue, making multiple db calls,
would do the trick, but it would be nicer if postgres
could take care of it all.

Thanks
Taher

__________________________________________________
Do You Yahoo!?
Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger
http://im.yahoo.com

Re: storing large graphs in postgres

From
"Timothy H. Keitt"
Date:
You could try one row per node, and a series of integer columns that
store adjacency at the bit-level.  (A function for extracting individual
bits would help here.)  Actually, I would probably go with the edge-list
layout in the end; storage is cheap.  BTW, isn't INT8 a tad small for
20GB of data?

T.

Taher H. Haveliwala wrote:

>I need to store very large graphs structures in
>postgres.  The graphs are close to 20GB when in
>flatfile format.  I first tried using an adjacency
>list representation, i.e.,
>
>  graph (source INT8 PRIMARY KEY, dest INT8[]);
>
>but operating on the array type seems a bit
>inflexible. I took a look at the contrib/array stuff
>as suggested in a previous post, but it seems like
>that only allows for boolean predicates on the array.
>I.e., I would like to be able to say 'return all nodes
>within distance two from x' using purely sql.  Of
>course I could use an edge-list format:
>
>  graph (source INT8, dest INT8);
>
>but this takes up almost double the space (which is
>painful, given that the original input is close to
>20GB).
>
>Any way to get richer queries on array types, or some
>other efficient way to store large graphs?
>
>I suppose some python glue, making multiple db calls,
>would do the trick, but it would be nicer if postgres
>could take care of it all.
>
>Thanks
>Taher
>
>__________________________________________________
>Do You Yahoo!?
>Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger
>http://im.yahoo.com
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/users-lounge/docs/faq.html
>

--
Timothy H. Keitt
Department of Ecology and Evolution
State University of New York at Stony Brook
Stony Brook, New York 11794 USA
Phone: 631-632-1101, FAX: 631-632-7626
http://life.bio.sunysb.edu/ee/keitt/




Re: storing large graphs in postgres

From
Stephan Szabo
Date:
On Wed, 5 Sep 2001, Taher H. Haveliwala wrote:

> I need to store very large graphs structures in
> postgres.  The graphs are close to 20GB when in
> flatfile format.  I first tried using an adjacency
> list representation, i.e.,
>
>   graph (source INT8 PRIMARY KEY, dest INT8[]);
>
> but operating on the array type seems a bit
> inflexible. I took a look at the contrib/array stuff
> as suggested in a previous post, but it seems like
> that only allows for boolean predicates on the array.
> I.e., I would like to be able to say 'return all nodes
> within distance two from x' using purely sql.  Of
> course I could use an edge-list format:
>
>   graph (source INT8, dest INT8);
>
> but this takes up almost double the space (which is
> painful, given that the original input is close to
> 20GB).
>
> Any way to get richer queries on array types, or some
> other efficient way to store large graphs?

You could probably write functions to do your operations
and then just use those from the sql queries.
If you use the C interface to make the functions the code
in contrib/array is probably a reasonable starting point to
look at.