Thread: storing large graphs in postgres
I need to store very large graphs structures in postgres. The graphs are close to 20GB when in flatfile format. I first tried using an adjacency list representation, i.e., graph (source INT8 PRIMARY KEY, dest INT8[]); but operating on the array type seems a bit inflexible. I took a look at the contrib/array stuff as suggested in a previous post, but it seems like that only allows for boolean predicates on the array. I.e., I would like to be able to say 'return all nodes within distance two from x' using purely sql. Of course I could use an edge-list format: graph (source INT8, dest INT8); but this takes up almost double the space (which is painful, given that the original input is close to 20GB). Any way to get richer queries on array types, or some other efficient way to store large graphs? I suppose some python glue, making multiple db calls, would do the trick, but it would be nicer if postgres could take care of it all. Thanks Taher __________________________________________________ Do You Yahoo!? Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger http://im.yahoo.com
You could try one row per node, and a series of integer columns that store adjacency at the bit-level. (A function for extracting individual bits would help here.) Actually, I would probably go with the edge-list layout in the end; storage is cheap. BTW, isn't INT8 a tad small for 20GB of data? T. Taher H. Haveliwala wrote: >I need to store very large graphs structures in >postgres. The graphs are close to 20GB when in >flatfile format. I first tried using an adjacency >list representation, i.e., > > graph (source INT8 PRIMARY KEY, dest INT8[]); > >but operating on the array type seems a bit >inflexible. I took a look at the contrib/array stuff >as suggested in a previous post, but it seems like >that only allows for boolean predicates on the array. >I.e., I would like to be able to say 'return all nodes >within distance two from x' using purely sql. Of >course I could use an edge-list format: > > graph (source INT8, dest INT8); > >but this takes up almost double the space (which is >painful, given that the original input is close to >20GB). > >Any way to get richer queries on array types, or some >other efficient way to store large graphs? > >I suppose some python glue, making multiple db calls, >would do the trick, but it would be nicer if postgres >could take care of it all. > >Thanks >Taher > >__________________________________________________ >Do You Yahoo!? >Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger >http://im.yahoo.com > >---------------------------(end of broadcast)--------------------------- >TIP 5: Have you checked our extensive FAQ? > >http://www.postgresql.org/users-lounge/docs/faq.html > -- Timothy H. Keitt Department of Ecology and Evolution State University of New York at Stony Brook Stony Brook, New York 11794 USA Phone: 631-632-1101, FAX: 631-632-7626 http://life.bio.sunysb.edu/ee/keitt/
On Wed, 5 Sep 2001, Taher H. Haveliwala wrote: > I need to store very large graphs structures in > postgres. The graphs are close to 20GB when in > flatfile format. I first tried using an adjacency > list representation, i.e., > > graph (source INT8 PRIMARY KEY, dest INT8[]); > > but operating on the array type seems a bit > inflexible. I took a look at the contrib/array stuff > as suggested in a previous post, but it seems like > that only allows for boolean predicates on the array. > I.e., I would like to be able to say 'return all nodes > within distance two from x' using purely sql. Of > course I could use an edge-list format: > > graph (source INT8, dest INT8); > > but this takes up almost double the space (which is > painful, given that the original input is close to > 20GB). > > Any way to get richer queries on array types, or some > other efficient way to store large graphs? You could probably write functions to do your operations and then just use those from the sql queries. If you use the C interface to make the functions the code in contrib/array is probably a reasonable starting point to look at.