Re: Graph datatype addition - Mailing list pgsql-hackers

From Atri Sharma
Subject Re: Graph datatype addition
Date
Msg-id CAOeZVie7enzrn-vV0m7nTw3HiR0CexQMs9r5CrCJok6O=o3gvw@mail.gmail.com
Whole thread Raw
In response to Re: Graph datatype addition  (Atri Sharma <atri.jiit@gmail.com>)
Responses Re: Graph datatype addition  (Jim Nasby <jim@nasby.net>)
List pgsql-hackers
On Thu, May 2, 2013 at 7:58 AM, Atri Sharma <atri.jiit@gmail.com> wrote:
>
>
> Sent from my iPad
>
> On 02-May-2013, at 4:33, Misa Simic <misa.simic@gmail.com> wrote:
>
>
>
> On Wednesday, May 1, 2013, Atri Sharma wrote:
>>
>> Hi all,
>>
>> Please find a probable prototype for the same:
>>
>> struct GraphNode
>> {
>>     Oid NodeOid;    // Oid of the row which is the node here. We will
>> store an identifier to it here rather than the complete row(with data)
>> itself.
>>     AdjacencyList *list;   // Pointer to the node's adjacency list.
>> };
>>
>> struct AdjacencyList
>> {
>>       Oid[] neighbours_list;
>> };
>>
>> struct AdjacencyList is probably the 'hottest' data structure in our
>> entire implementation. We can think of making a cache of recently
>> accessed struct AdjacencyList instances, or the AdjacencyList(s) of
>> the neighbours of the recently accessed nodes, because, they are most
>> likely to be accessed in near future. Advice here, please?
>>
>> So.
>>
>> struct AdjacencyCache
>> {
>>      Oid[] cache_values;
>> };
>>
>> push and pop functions for AdjacencyCache follow.
>>
>> We need a replacement and invalidation algorithm for the cache. I feel
>> a version of LRU should be good here.
>>
>> I have not given a prototype for operations and algorithm implementations.
>>
>> I feel,as suggested by Peter and Jaime, we can look at pgRouting code
>> for algorithm implementations.
>>
>> Florian's concerns are mitigated here to some extent,IMO. Since the
>> nodes and linkings are loosely coupled, and not represented as a
>> single representation, updating or changing of any part or adding a
>> new edge is no longer an expensive operation, as it only requires a
>> lookup of GraphNode and then its AdjacencyList. If we use the cache as
>> well, it will further reduce the lookup costs.
>>
>> I have not yet thought of the user visible layer as suggested by Jim.
>> Probably. once we are ok with the internal layer, we can move to the
>> user visible layer.
>>
>> Advice/Comments/Feedback please?
>>
>
> Honestly - I think I dont understand proposal...
>
> Datatypes - are about values - what will be stored in that column in a
> table....
>
> Datatype - cant have any clue about "rows"
>
> How I understand what you described - you can achieve the same with pure SQL
> - struct are equvalent to graph tables... Instead od Oid column will store
> PKs of nodes table...
>
>
> Yes, I agree.I need to think more.
>
> Let me get back with a deeper proposal.
>
> Regards,
>
> Atri

Hi all,

In continuation of the above discussion,I have been thinking about the
design of the data type. I have been thinking on the lines of making a
multi dimensional data structure for the same:

Specifically, I have been thinking about making multi lists for
representing data. After some research, I think that the following
data structure may help:

Each node will be represented as:

[Down Pointer][Atom][Right Pointer]

Suppose, a graph is like(sorry for the bad drawing):
    B  /
A    D \  /  C   \    E

can be represented as:                            C's data                    E's data         D's data
          ^                               ^                ^
 
A's data
[|][1][---------------------->[|][1][------------------>[|][1][NULL]^                          ^
[|][1][------------------>[|][0][--------------------->[|][1][NULL]
    ^                                                          B's data
 


Essentially, the Atom flag denotes if the node has any out edges from
it. If it has no out edge, ATOM is 0 and Down Pointer points to an
auxiliary structure which holds the node's data(hence, the data can be
of arbitrary size).

If the node has some out degree, then, those nodes are added to a new
sublist which starts from the node which spawns those nodes.Node's
down pointer points to the head of the new sublist.

Essentially, a sublist has all the nodes directly spawning from the
head node of the sublist.

This approach has multiple advantages in term of memory and
efficiency. Also, isolating sub graphs based on some criteria is
pretty efficient, which is good for many analytics based operations.

Access time is restricted as well.

Thoughts/Comments/Feedback please?

Regards,

Atri

--
Regards,

Atri
l'apprenant



pgsql-hackers by date:

Previous
From: carla celiberti
Date:
Subject: Taking the "varattno" in "args" (where part of a query)
Next
From: Jim Nasby
Date:
Subject: Re: Graph datatype addition