Thread: Adjacency List or Nested Sets to model file system hierarchy?

Adjacency List or Nested Sets to model file system hierarchy?

From
Bill Moseley
Date:
I'm looking for a little guidance in representing a file system --
well just the file and directory structure of a file system.

Often articles on representing a hierarchy discuss the advantages of
using Nested Sets (or nested intervals) it seems.  I'm not clear how
well they apply to a file system-like hierarchy, though.

The examples (and my limited understanding) of Nested Sets have the
leaf nodes at the end of the branches, where in a file system a node
can have both leaf nodes (files) and branches (directories).

Also, the Nested Sets seem to solve problems I don't have -- such as
finding all descendants of a given node.

My simple requirements are:

    -- Quickly be able to lookup content by a full "path" name

    -- Provide "directory" views that shows parent, list of contents
       including any "sub-directories".

    -- To be able to easily move branches.

It will not be a large collection of "files" in the tree, so that's
not an issue.

Seems like an Adjacency List along with a de-normalized "path" column
in the leaf nodes would meet the requirements.  But, as I see nested
sets discussed so often I wonder which is a better approach.

I assume this is a reasonably common problem so I'm curious how others
have implemented it.

Thanks,


--
Bill Moseley
moseley@hank.org


Re: Adjacency List or Nested Sets to model file system hierarchy?

From
"Merlin Moncure"
Date:
On 2/11/07, Bill Moseley <moseley@hank.org> wrote:
> I'm looking for a little guidance in representing a file system --
> well just the file and directory structure of a file system.
>
> Often articles on representing a hierarchy discuss the advantages of
> using Nested Sets (or nested intervals) it seems.  I'm not clear how
> well they apply to a file system-like hierarchy, though.
>
> The examples (and my limited understanding) of Nested Sets have the
> leaf nodes at the end of the branches, where in a file system a node
> can have both leaf nodes (files) and branches (directories).
>
> Also, the Nested Sets seem to solve problems I don't have -- such as
> finding all descendants of a given node.
>
> My simple requirements are:
>
>     -- Quickly be able to lookup content by a full "path" name
>
>     -- Provide "directory" views that shows parent, list of contents
>        including any "sub-directories".
>
>     -- To be able to easily move branches.
>
> It will not be a large collection of "files" in the tree, so that's
> not an issue.
>
> Seems like an Adjacency List along with a de-normalized "path" column
> in the leaf nodes would meet the requirements.  But, as I see nested
> sets discussed so often I wonder which is a better approach.

Can you describe in a little bit more detail about what you mean by
'Adjaceny LIst'?

merlin

Re: Adjacency List or Nested Sets to model file system hierarchy?

From
Richard Broersma Jr
Date:
> Can you describe in a little bit more detail about what you mean by
> 'Adjaceny LIst'?

Adjaceny list is the term used in the celko book to refer to a table that is recurively related to
itself.

create table foo (
id        integer  primary key,
parentid  integer references foo (id),
name      varchar not null,
);


Re: Adjacency List or Nested Sets to model file system hierarchy?

From
"Merlin Moncure"
Date:
On 2/12/07, Richard Broersma Jr <rabroersma@yahoo.com> wrote:
> > Can you describe in a little bit more detail about what you mean by
> > 'Adjaceny LIst'?
>
> Adjaceny list is the term used in the celko book to refer to a table that is recurively related to
> itself.
>
> create table foo (
> id        integer  primary key,
> parentid  integer references foo (id),
> name      varchar not null,
> );

bleh.  requires 'n' queries to pull out data where n is nesting depth
(or a recursive function, or recursive query tricks which i dont
like).  or you can save of left, right extents on a key range (I've
seen that advocated by celko), but that appraoch is non-scalable imo.

Above approach is ok but I can think of at least two other methods
that are probably better.  First approach is to just store the whole
path in every record for each file.  Yes, this is a pain for updates
but searching and children discovery is simple.  in that case I would
define pkey as (path, file).

second approach which is a bit more complex but possibly a win if your
directories change a lot is to use array type to store segmented
paths.  These could be text segments (basically another spin on the
above approach) or integer keys out to another table.  In this case
you are buying a cheap rename in exchange for some complexity.  yes,
you can index an integer array type and yes, you can do children
discovery in a cheap operation.

merlin

Re: Adjacency List or Nested Sets to model file system hierarchy?

From
"Ian Harding"
Date:
On 2/12/07, Bill Moseley <moseley@hank.org> wrote:
> I'm looking for a little guidance in representing a file system --
> well just the file and directory structure of a file system.
>
> Often articles on representing a hierarchy discuss the advantages of
> using Nested Sets (or nested intervals) it seems.  I'm not clear how
> well they apply to a file system-like hierarchy, though.
>
> The examples (and my limited understanding) of Nested Sets have the
> leaf nodes at the end of the branches, where in a file system a node
> can have both leaf nodes (files) and branches (directories).
>
> Also, the Nested Sets seem to solve problems I don't have -- such as
> finding all descendants of a given node.
>
> My simple requirements are:
>
>     -- Quickly be able to lookup content by a full "path" name
>
>     -- Provide "directory" views that shows parent, list of contents
>        including any "sub-directories".
>
>     -- To be able to easily move branches.
>
> It will not be a large collection of "files" in the tree, so that's
> not an issue.
>

You don't mention the ltree contrib module, have you looked at it?  It
can easily meet your requirements without having to reinvent anything.
 It may be what you're referring to as Nested Sets, I don't know.  I
use it and like it a lot.

-Ian

Re: Adjacency List or Nested Sets to model file system hierarchy?

From
Bill Moseley
Date:
On Mon, Feb 12, 2007 at 05:36:37PM +0000, Ian Harding wrote:
> You don't mention the ltree contrib module, have you looked at it?  It
> can easily meet your requirements without having to reinvent anything.
> It may be what you're referring to as Nested Sets, I don't know.  I
> use it and like it a lot.

Yes, I have seen it.  I just thought it seemed like a very large
"hammer" to use form my task -- quite a few more query methods than I
need .  But, perhaps I should look at it again and get a better
understanding of what it can do.

Thanks,

--
Bill Moseley
moseley@hank.org


Re: Adjacency List or Nested Sets to model file system hierarchy?

From
Bill Moseley
Date:
On Mon, Feb 12, 2007 at 10:53:53AM -0500, Merlin Moncure wrote:
> On 2/12/07, Richard Broersma Jr <rabroersma@yahoo.com> wrote:
> >> Can you describe in a little bit more detail about what you mean by
> >> 'Adjaceny LIst'?
> >
> >Adjaceny list is the term used in the celko book to refer to a table that
> >is recurively related to
> >itself.
> >
> >create table foo (
> >id        integer  primary key,
> >parentid  integer references foo (id),
> >name      varchar not null,
> >);
>
> Above approach is ok but I can think of at least two other methods
> that are probably better.  First approach is to just store the whole
> path in every record for each file.  Yes, this is a pain for updates
> but searching and children discovery is simple.  in that case I would
> define pkey as (path, file).

Yes, that's what I meant by using a de-normalized table -- including
the full path in the row.  That would provide fast access to each row
via a path name.  And the parent id makes it easy to find all children
of a given node and, well, the parent too.

Separating the path and file as you suggest would make finding all
"files" at a given directory level simple, too.

But, I'm not thrilled about the possibility of the hard-coded path not
matching the path up the tree to the root node, though.  Which, of
course, is why I posted.  But, I'll give it a test.

Thanks,




--
Bill Moseley
moseley@hank.org


Re: Adjacency List or Nested Sets to model file system hierarchy?

From
"Ian Harding"
Date:
On 2/12/07, Bill Moseley <moseley@hank.org> wrote:
> On Mon, Feb 12, 2007 at 10:53:53AM -0500, Merlin Moncure wrote:
> > On 2/12/07, Richard Broersma Jr <rabroersma@yahoo.com> wrote:
> > >> Can you describe in a little bit more detail about what you mean by
> > >> 'Adjaceny LIst'?
> > >
> > >Adjaceny list is the term used in the celko book to refer to a table that
> > >is recurively related to
> > >itself.
> > >
> > >create table foo (
> > >id        integer  primary key,
> > >parentid  integer references foo (id),
> > >name      varchar not null,
> > >);
> >
> > Above approach is ok but I can think of at least two other methods
> > that are probably better.  First approach is to just store the whole
> > path in every record for each file.  Yes, this is a pain for updates
> > but searching and children discovery is simple.  in that case I would
> > define pkey as (path, file).
>
> Yes, that's what I meant by using a de-normalized table -- including
> the full path in the row.  That would provide fast access to each row
> via a path name.  And the parent id makes it easy to find all children
> of a given node and, well, the parent too.
>
> Separating the path and file as you suggest would make finding all
> "files" at a given directory level simple, too.
>
> But, I'm not thrilled about the possibility of the hard-coded path not
> matching the path up the tree to the root node, though.  Which, of
> course, is why I posted.  But, I'll give it a test.

The way I do it is to update the path to the parent's path, plus my id
on insert or update with a before trigger.  I have an after trigger
that simply updates any child record's parent_id, which forces an
update of the path, which forces update of their children, and so on.

You can, of course, cause a recursion problem if you're not careful...
Best to have a check for that too.

- Ian
>
> Thanks,
>
>
>
>
> --
> Bill Moseley
> moseley@hank.org
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>

Re: Adjacency List or Nested Sets to model file system hierarchy?

From
Alban Hertroys
Date:
Bill Moseley wrote:
> On Mon, Feb 12, 2007 at 05:36:37PM +0000, Ian Harding wrote:
>> You don't mention the ltree contrib module, have you looked at it?  It
>> can easily meet your requirements without having to reinvent anything.
>> It may be what you're referring to as Nested Sets, I don't know.  I
>> use it and like it a lot.
>
> Yes, I have seen it.  I just thought it seemed like a very large
> "hammer" to use form my task -- quite a few more query methods than I
> need .  But, perhaps I should look at it again and get a better
> understanding of what it can do.

What it can do is add indexed paths to your records ;)

You will still have to check consistency on updates to paths (fe. if the
root node changes name, all nodes need updating) though, but that's true
for any method including the entire path. And on delete (remove child
nodes as well) and insert (check parents existence), of course.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //

Re: Adjacency List or Nested Sets to model file system hierarchy?

From
"hubert depesz lubaczewski"
Date:
On 2/12/07, Bill Moseley <moseley@hank.org> wrote:
Also, the Nested Sets seem to solve problems I don't have -- such as
finding all descendants of a given node.

you can also check different way.
i described it here:
http://www.depesz.com/various/various-sqltrees-implementation.php
it is in polish, but it's full of examples and should be readable even for somebody with 0 knowledge of polish.
basically - this is extension of adjacency list which allows all kinds of access without recursive queries.

in case you'd have questions with this - please do not hesitate to ask,

best regards,

depesz