Thread: Tree structure

Tree structure

From
Kaare Rasmussen
Date:
Hi

I'm trying to determine the best way to represent a simple tree
structure (like a file/dir tree or a uri path). I guess that's done a
zillion times before; I just don't seem to be able to find the right
solution. I have one special request, that I'd like to find all
'shorter' paths, i.e. given 'a/b/c/d' it'll find

a
a/b
a/b/c
- but not
b
a/c
b/a

There are a number of options to test.

1. As strings
   There's no dedicated function (@>)
   WHERE clause should read something like 'a/b/c/d' LIKE column || '%',
which is both ugly and (I guess) non indexable
   Perhaps regex indexes would work, but not efficient and not optimal

2. As array of strings
   My favorite, would be elegant. A GIN index on the whole array would
make for fast performance
   Alas @> treats the arrays as a set, not an array
   WHERE col @> 'a/b/c/d' would find all of the above rows, including a,
a/c, b/a, etc.

3. ltree contrib
   The only option that actually works and uses index
   @> works as I want it to.
   But the single segments can only be alphanumeric and underscore
   ltree only supports GIST
   there's a length limit.

The last option is the one I'm using right now, but I hope that somebody
can point out where I'm wrong with regards to the other options, or tell
me that there is a better solution somewhere I didn't look.


Re: Tree structure

From
Alban Hertroys
Date:
> 1. As strings
>   There's no dedicated function (@>)
>   WHERE clause should read something like 'a/b/c/d' LIKE column || '%',
> which is both ugly and (I guess) non indexable
>   Perhaps regex indexes would work, but not efficient and not optimal
>
> 2. As array of strings
>   My favorite, would be elegant. A GIN index on the whole array would make
> for fast performance
>   Alas @> treats the arrays as a set, not an array
>   WHERE col @> 'a/b/c/d' would find all of the above rows, including a, a/c,
> b/a, etc.
>
> 3. ltree contrib
>   The only option that actually works and uses index
>   @> works as I want it to.
>   But the single segments can only be alphanumeric and underscore
>   ltree only supports GIST
>   there's a length limit.

4. Using a recursive common table expression (CTE).
  http://www.postgresql.org/docs/9.2/static/queries-with.html

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: Tree structure

From
hari.fuchs@gmail.com
Date:
Kaare Rasmussen <kaare@jasonic.dk> writes:

> Hi
>
> I'm trying to determine the best way to represent a simple tree
> structure (like a file/dir tree or a uri path). I guess that's done a
> zillion times before; I just don't seem to be able to find the right
> solution. I have one special request, that I'd like to find all
> shorter' paths, i.e. given 'a/b/c/d' it'll find
>
> a
> a/b
> a/b/c
> - but not
> b
> a/c
> b/a

If I understand you correctly, you want a prefix match, and sure there's
a PostgreSQL extension for that:

CREATE EXTENSION prefix;

CREATE TABLE t1 (
  id serial NOT NULL,
  p prefix_range NOT NULL,
  PRIMARY KEY (id)
);

CREATE INDEX pp ON t1 USING gist(p);

INSERT INTO t1 (p) VALUES
('a'),
('b'),
('a/c'),
('a/b'),
('b/a'),
('a/b/c');

EXPLAIN ANALYZE
SELECT id, p
FROM t1
WHERE p @> 'a/b/c/d'
;

Re: Tree structure

From
Kaare Rasmussen
Date:
Hi Alban

> 4. Using a recursive common table expression (CTE).
> http://www.postgresql.org/docs/9.2/static/queries-with.html

Yes, you're right. In fact that's what I'm testing a way to replace, as
I'm not confident in the performance in all situations. My fault
entirely; I should have told so from the start.




Re: Tree structure

From
Rémi Cura
Date:
BE carefull you have a number of limitation with recursive cte (I'm thinking of update and so.)
You can work around with plpgsql but it might be painfull.

You forgot a solution :
if you need powerfull graph features,
use postgres as a database and a SPARQL speaking frontend.
It may be a bit of overkill ;-)

Cheers,
Rémi-C


2013/9/23 Kaare Rasmussen <kaare@jasonic.dk>
Hi Alban


4. Using a recursive common table expression (CTE). http://www.postgresql.org/docs/9.2/static/queries-with.html

Yes, you're right. In fact that's what I'm testing a way to replace, as I'm not confident in the performance in all situations. My fault entirely; I should have told so from the start.





--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Tree structure

From
Chris Travers
Date:



On Sun, Sep 22, 2013 at 9:48 PM, Kaare Rasmussen <kaare@jasonic.dk> wrote:
Hi Alban


4. Using a recursive common table expression (CTE). http://www.postgresql.org/docs/9.2/static/queries-with.html

Yes, you're right. In fact that's what I'm testing a way to replace, as I'm not confident in the performance in all situations. My fault entirely; I should have told so from the start.

It might be helpful for you to discuss what sorts of concerns you have and how they fit into the specifics of your data.  Trees are an area where different uses may have different recommended solutions.  I gave my thoughts on performance on trees above.  There are a few really bad areas I can think of.  For example, if you had a ten-layer deep scan where each scan pulled around 10% or so of the table, you might be looking at 10 sequential scans and a fair bit of CPU time.  If the result set was very large, you might see things written to disk.  There are a number of gotchas.

This being said, *usually* I find that recursive CTE's are one of the better solutions out there for trees and I think they will perform better in more situations than many of the other solutions. 

--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.

Re: Tree structure

From
Merlin Moncure
Date:
On Fri, Sep 20, 2013 at 6:29 AM, Kaare Rasmussen <kaare@jasonic.dk> wrote:
> Hi
>
> I'm trying to determine the best way to represent a simple tree structure
> (like a file/dir tree or a uri path). I guess that's done a zillion times
> before; I just don't seem to be able to find the right solution. I have one
> special request, that I'd like to find all 'shorter' paths, i.e. given
> 'a/b/c/d' it'll find
>
> a
> a/b
> a/b/c
> - but not
> b
> a/c
> b/a
>
> There are a number of options to test.
>
> 1. As strings
>   There's no dedicated function (@>)
>   WHERE clause should read something like 'a/b/c/d' LIKE column || '%',
> which is both ugly and (I guess) non indexable
>   Perhaps regex indexes would work, but not efficient and not optimal
>
> 2. As array of strings
>   My favorite, would be elegant. A GIN index on the whole array would make
> for fast performance
>   Alas @> treats the arrays as a set, not an array
>   WHERE col @> 'a/b/c/d' would find all of the above rows, including a, a/c,
> b/a, etc.
>
> 3. ltree contrib
>   The only option that actually works and uses index
>   @> works as I want it to.
>   But the single segments can only be alphanumeric and underscore
>   ltree only supports GIST
>   there's a length limit.
>
> The last option is the one I'm using right now, but I hope that somebody can
> point out where I'm wrong with regards to the other options, or tell me that
> there is a better solution somewhere I didn't look.

All materialized path approaches will face index limit so be warned.
For my money, I'd be using ltree if you need complex indexed searching
or some TEXT+btree for simple searching (LIKE '/foo/bar%') on the
'full path' side.  maview type approaches are faster to search but
slower to update than a recursive type structure where you use
parent/child relationships + recursive CTE to query.

merlin


Re: Tree structure

From
Kaare Rasmussen
Date:
Sorry, got tangled up in this thing called 'real life'.

> If I understand you correctly, you want a prefix match, and sure there's
> a PostgreSQL extension for that:

OK, that seems to do the job, thanks a lot. The only small quibble is
that it's an extension.

I'm quite surprised there seem to be no way in core to treat an array as
an array. Using @> treats it as a set, AFAICT.



Re: Tree structure

From
Merlin Moncure
Date:
On Thu, Oct 10, 2013 at 1:00 AM, Kaare Rasmussen <kaare@jasonic.dk> wrote:
> Sorry, got tangled up in this thing called 'real life'.
>
>
>> If I understand you correctly, you want a prefix match, and sure there's
>> a PostgreSQL extension for that:
>
>
> OK, that seems to do the job, thanks a lot. The only small quibble is that
> it's an extension.
>
> I'm quite surprised there seem to be no way in core to treat an array as an
> array. Using @> treats it as a set, AFAICT.

can you elaborate on that?

merlin


Re: Tree structure

From
Kaare Rasmussen
Date:
Hi Merlin

> On Thu, Oct 10, 2013 at 1:00 AM, Kaare Rasmussen <kaare@jasonic.dk> wrote:
>> I'm quite surprised there seem to be no way in core to treat an array as an
>> array. Using @> treats it as a set, AFAICT.
> can you elaborate on that?
>
> merlin

To me, an array is a vector (or a vector of vectors). So I'm looking for
an operator where

ARRAY[1,4,3] doesn't contain ARRAY[3,1] and
ARRAY[2,7] isn't contained by ARRAY[1,7,4,2,6] (but ARRAY[1,7,4] is)

IOW order matters to me, but not to the array operators mentioned in
http://www.postgresql.org/docs/9.3/static/functions-array.html. Note
that index support is important.


Re: Tree structure

From
Rémi Cura
Date:
Hey sorry if my answer is stupid,
but there is an extension for array, even if it is limited to int (but int could be indexes of row)
It provides essential function, although lacking some (I re-implemented union of array with disjoint result).
I think this extension uses indexes

Cheers,
Rémi-C


2013/10/10 Kaare Rasmussen <kaare@jasonic.dk>
Hi Merlin

On Thu, Oct 10, 2013 at 1:00 AM, Kaare Rasmussen <kaare@jasonic.dk> wrote:
I'm quite surprised there seem to be no way in core to treat an array as an
array. Using @> treats it as a set, AFAICT.
can you elaborate on that?

merlin

To me, an array is a vector (or a vector of vectors). So I'm looking for an operator where

ARRAY[1,4,3] doesn't contain ARRAY[3,1] and
ARRAY[2,7] isn't contained by ARRAY[1,7,4,2,6] (but ARRAY[1,7,4] is)

IOW order matters to me, but not to the array operators mentioned in http://www.postgresql.org/docs/9.3/static/functions-array.html. Note that index support is important.



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Tree structure

From
Kaare Rasmussen
Date:
Hi Rémi
> Hey sorry if my answer is stupid,
> but there is an extension for array, even if it is limited to int (but
> int could be indexes of row)
> It's named http://www.postgresql.org/docs/9.3/static/intarray.html
> It provides essential function, although lacking some (I
> re-implemented union of array with disjoint result).
> I think this extension uses indexes
>

Thanks for your answer. If there were a similar strarray, it would be
top nice :-)