Thread: Trees: integer[] outperformed by ltree

Trees: integer[] outperformed by ltree

From

Jan Walter

Date:

05 November 2013, 12:25:34

Hi,

I am in a need of a very robust (esp. fast in read, non-blocking in update) tree structure storage (95% trees are of depth <4, current max. is 12). We have 10k-100k trees now, millions in the future.
I made many tests, benchmarks of usual operations, and after all, materialized path ('1.5.3' path notation) seems most promising.

My last candidates for its storage are ltree and integer[]. So I am comparing the following benchmarking tables with exactly same data (5 regular trees - each node except leaves has 5 children, depth=9, total nodes=2.441.405, id numbered according to breadth-first traversal):

TABLES:
A/ integer[]
CREATE TABLE test (
    id SERIAL,
    lpath ltree,
    CONSTRAINT test_pkey PRIMARY KEY(id)
);

CREATE INDEX test_idx1 ON test USING gist (lpath gist_ltree_ops);

B/ ltree
CREATE TABLE test (
    id SERIAL,
    apath INTEGER[],
    CONSTRAINT test_pkey PRIMARY KEY(id)
);

CREATE INDEX test_idx1 ON test USING gin (apath);

Separate single-table dbases, vacuum(analyz)ed.

TESTING MACHINE:
Windows 7, postgres 9.3.0, 4GB RAM
effective_cache_size = 2GB
work_mem = 512MB
shared_buffers = 1GB

THE PROBLEM:
My intuition says integer[] should not be much worse than ltree (rather the other way) but I am not able to reach such results. I believe I am making some trivial mistake rather than assuming false hypothesis. My general question is, what more can I do to get better performance.

WHAT I DID:

1/ I checked gist index for integer[] via intarray extension. Query plans for <@ and @> operators do not use it (reported bug/feature). That's why I am using gin.

2/ Getting ancestors - same qplans, ltree slightly wins:
A:
select * from test where apath@>(select apath from test where id=1)

Bitmap Heap Scan on test (cost=159.04..33950.48 rows=12206 width=60) (actual time=80.912..224.182 rows=488281 loops=1)
Recheck Cond: (apath @> $0)
Buffers: shared hit=18280
InitPlan 1 (returns $0)
    -> Index Scan using test_pkey on test test_1 (cost=0.43..8.45 rows=1 width=56) (actual time=0.022..0.023 rows=1 loops=1)
          Index Cond: (id = 1)
          Buffers: shared hit=4
-> Bitmap Index Scan on test_idx1 (cost=0.00..147.54 rows=12206 width=0) (actual time=76.896..76.896 rows=488281 loops=1)
        Index Cond: (apath @> $0)
        Buffers: shared hit=369
Total runtime: 240.408 ms

B:
select * from test where lpath<@(select lpath from test where id=1)

Bitmap Heap Scan on test (cost=263.81..8674.72 rows=2445 width=83) (actual time=85.395..166.683 rows=488281 loops=1)
Recheck Cond: (lpath <@ $0)
Buffers: shared hit=22448
InitPlan 1 (returns $0)
    -> Index Scan using test_pkey on test test_1 (cost=0.43..8.45 rows=1 width=79) (actual time=0.024..0.025 rows=1 loops=1)
          Index Cond: (id = 1)
          Buffers: shared hit=4
-> Bitmap Index Scan on test_idx1 (cost=0.00..254.75 rows=2445 width=0) (actual time=83.029..83.029 rows=488281 loops=1)
        Index Cond: (lpath <@ $0)
        Buffers: shared hit=12269
Total runtime: 182.239 ms

3/ Getting chosen nodes (eo) with chosen ancestors (ea) - index[] performs very poorly, it's qplan uses additional Bitmap Heap Scan, still indices used in both cases.

A:
select *
from test eo
where id in (select generate_series(3, 3000000, 5000)) and
    exists (
        select 1
        from test ea
        where ea.id in(select generate_series(1000, 3000, 3)) and
            ea.apath <@ eo.apath
    )

Nested Loop Semi Join (cost=140.10..1302851104.53 rows=6103 width=60) (actual time=1768.862..210525.597 rows=104 loops=1)
Buffers: shared hit=8420909
-> Nested Loop (cost=17.94..1652.31 rows=1220554 width=60) (actual time=0.382..17.255 rows=489 loops=1)
        Buffers: shared hit=2292
        -> HashAggregate (cost=17.51..19.51 rows=200 width=4) (actual time=0.352..1.486 rows=600 loops=1)
              -> Result (cost=0.00..5.01 rows=1000 width=0) (actual time=0.009..0.100 rows=600 loops=1)
        -> Index Scan using test_pkey on test eo (cost=0.43..8.15 rows=1 width=60) (actual time=0.017..0.021 rows=1 loops=600)
              Index Cond: (id = (generate_series(3, 3000000, 5000)))
              Buffers: shared hit=2292
-> Hash Semi Join (cost=122.16..1133.92 rows=6103 width=56) (actual time=430.482..430.482 rows=0 loops=489)
        Hash Cond: (ea.id = (generate_series(1000, 3000, 3)))
        Buffers: shared hit=8418617
        -> Bitmap Heap Scan on test ea (cost=94.65..251.23 rows=12206 width=60) (actual time=271.034..430.278 rows=8 loops=489)
              Recheck Cond: (apath <@ eo.apath)
              Rows Removed by Index Recheck: 444335
              Buffers: shared hit=8418617
              -> Bitmap Index Scan on test_idx1 (cost=0.00..91.60 rows=12206 width=0) (actual time=155.355..155.355 rows=488281 loops=489)
                    Index Cond: (apath <@ eo.apath)
                    Buffers: shared hit=237668
        -> Hash (cost=15.01..15.01 rows=1000 width=4) (actual time=0.305..0.305 rows=667 loops=1)
              Buckets: 1024 Batches: 1 Memory Usage: 24kB
              -> Result (cost=0.00..5.01 rows=1000 width=0) (actual time=0.003..0.116 rows=667 loops=1)
Total runtime: 210526.004 ms

B:
select *
from test eo
where id in (select generate_series(3, 3000000, 5000)) and
    exists (
        select 1
        from test ea
        where ea.id in(select generate_series(1000, 3000, 3)) and
            ea.lpath @> eo.lpath
    )

Nested Loop Semi Join (cost=45.86..276756955.40 rows=1223 width=83) (actual time=2.985..226.161 rows=104 loops=1)
Buffers: shared hit=27675
-> Nested Loop (cost=17.94..1687.51 rows=1222486 width=83) (actual time=0.660..5.987 rows=489 loops=1)
        Buffers: shared hit=2297
        -> HashAggregate (cost=17.51..19.51 rows=200 width=4) (actual time=0.632..1.008 rows=600 loops=1)
              -> Result (cost=0.00..5.01 rows=1000 width=0) (actual time=0.007..0.073 rows=600 loops=1)
        -> Index Scan using test_pkey on test eo (cost=0.43..8.33 rows=1 width=83) (actual time=0.007..0.007 rows=1 loops=600)
              Index Cond: (id = (generate_series(3, 3000000, 5000)))
              Buffers: shared hit=2297
-> Hash Semi Join (cost=27.92..242.30 rows=1223 width=79) (actual time=0.449..0.449 rows=0 loops=489)
        Hash Cond: (ea.id = (generate_series(1000, 3000, 3)))
        Buffers: shared hit=25378
        -> Index Scan using test_idx1 on test ea (cost=0.41..43.43 rows=2445 width=83) (actual time=0.060..0.445 rows=8 loops=489)
              Index Cond: (lpath @> eo.lpath)
              Buffers: shared hit=25378
        -> Hash (cost=15.01..15.01 rows=1000 width=4) (actual time=0.178..0.178 rows=667 loops=1)
              Buckets: 1024 Batches: 1 Memory Usage: 24kB
              -> Result (cost=0.00..5.01 rows=1000 width=0) (actual time=0.003..0.071 rows=667 loops=1)
Total runtime: 226.308 ms

3.a/ If I turn off the index for B the query is much faster:
Nested Loop Semi Join (cost=35.88..20535547247.01 rows=6103 width=60) (actual time=17.257..1112.276 rows=104 loops=1)
Join Filter: (ea.apath <@ eo.apath)
Rows Removed by Join Filter: 287529
Buffers: shared hit=1155469
-> Nested Loop (cost=17.94..1652.31 rows=1220554 width=60) (actual time=0.334..5.307 rows=489 loops=1)
        Buffers: shared hit=2292
        -> HashAggregate (cost=17.51..19.51 rows=200 width=4) (actual time=0.297..0.598 rows=600 loops=1)
              -> Result (cost=0.00..5.01 rows=1000 width=0) (actual time=0.008..0.088 rows=600 loops=1)
        -> Index Scan using test_pkey on test eo (cost=0.43..8.15 rows=1 width=60) (actual time=0.007..0.007 rows=1 loops=600)
              Index Cond: (id = (generate_series(3, 3000000, 5000)))
              Buffers: shared hit=2292
-> Nested Loop (cost=17.94..1652.31 rows=1220554 width=56) (actual time=0.004..1.946 rows=588 loops=489)
        Buffers: shared hit=1153177
        -> HashAggregate (cost=17.51..19.51 rows=200 width=4) (actual time=0.001..0.089 rows=588 loops=489)
              -> Result (cost=0.00..5.01 rows=1000 width=0) (actual time=0.005..0.103 rows=667 loops=1)
        -> Index Scan using test_pkey on test ea (cost=0.43..8.15 rows=1 width=60) (actual time=0.002..0.003 rows=1 loops=287633)
              Index Cond: (id = (generate_series(1000, 3000, 3)))
              Buffers: shared hit=1153177
Total runtime: 1112.411 ms

3.b/ With the index on, if I go down to effective_cache_size = 256MB, B also skips the index usage, same qplan as in 3.a is used. Still the index is used for both versions of query 2 and ltree version of query 3.

QUESTIONS:
1/ Is my hypothesis about similar performance of ltree and integer[] correct?
2/ If so, what should I do to get it?
3/ Is there a way to improve the performance of <@ and @> operators? In fact, as I am having a tree path in the column, I only need to check if path_a 'starts_with' path_b to get ancestors/descendants. Therefore something more effective than 'contains' might be used. Is there any way?
4/ Do I understand properly that index on integer[] is much more memory-consuming, and therefore there are differences in query plans / execution times?

Thanks for any help,

Jan

Re: Trees: integer[] outperformed by ltree

From

Merlin Moncure

Date:

05 November 2013, 17:30:35

On Tue, Nov 5, 2013 at 6:25 AM, Jan Walter <john@commontongue.com> wrote:
> Hi,
>
> I am in a need of a very robust (esp. fast in read, non-blocking in update)
> tree structure storage (95% trees are of depth <4, current max. is 12). We
> have 10k-100k trees now, millions in the future.
> I made many tests, benchmarks of usual operations, and after all,
> materialized path ('1.5.3' path notation) seems most promising.

materialized path approaches tend to be ideal if the tree remains
relatively static and is not too deep.  The downside with matpath is
that if a you have to move a node around in the tree, then all the
child elements paths' have to be expensively updated.  I bring this up
as it relates to your 'non-blocking in update' requirement: in matpath
an update to parent can update an unbounded number of records.

merlin

Re: Trees: integer[] outperformed by ltree

From

Merlin Moncure

Date:

05 November 2013, 19:51:23

On Tue, Nov 5, 2013 at 11:30 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Tue, Nov 5, 2013 at 6:25 AM, Jan Walter <john@commontongue.com> wrote:
>> Hi,
>>
>> I am in a need of a very robust (esp. fast in read, non-blocking in update)
>> tree structure storage (95% trees are of depth <4, current max. is 12). We
>> have 10k-100k trees now, millions in the future.
>> I made many tests, benchmarks of usual operations, and after all,
>> materialized path ('1.5.3' path notation) seems most promising.
>
> materialized path approaches tend to be ideal if the tree remains
> relatively static and is not too deep.  The downside with matpath is
> that if a you have to move a node around in the tree, then all the
> child elements paths' have to be expensively updated.  I bring this up
> as it relates to your 'non-blocking in update' requirement: in matpath
> an update to parent can update an unbounded number of records.


hm, why do you need gist/gin for the int[] formulation?  what are your
lookup requirements?  with int[] you can typically do contains with
simple btree.

merlin

Re: Trees: integer[] outperformed by ltree

From

Jan Walter

Date:

05 November 2013, 21:53:02

On 5.11.2013 20:51, Merlin Moncure wrote:

On Tue, Nov 5, 2013 at 11:30 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Tue, Nov 5, 2013 at 6:25 AM, Jan Walter <john@commontongue.com> wrote:

Hi,

I am in a need of a very robust (esp. fast in read, non-blocking in update)
tree structure storage (95% trees are of depth <4, current max. is 12). We
have 10k-100k trees now, millions in the future.
I made many tests, benchmarks of usual operations, and after all,
materialized path ('1.5.3' path notation) seems most promising.

materialized path approaches tend to be ideal if the tree remains
relatively static and is not too deep.  The downside with matpath is
that if a you have to move a node around in the tree, then all the
child elements paths' have to be expensively updated.  I bring this up
as it relates to your 'non-blocking in update' requirement: in matpath
an update to parent can update an unbounded number of records.

Thanks for your remark.
Materialized path is still better in updates than nested sets we are using currently.
Although adjacency lists with recursive CTEs were initially my favorite substitution (smallest lock space for node relocation), whey are completely failing in e.g. order by path (depth) task (150s vs. 31ms via integer[]), and twice slower in simple descendant-based tasks.
I am yet considering it (if I moved e.g. ordering to application server level), and still need to rewrite couple of more sophisticated scenarios to CTEs to be absolutely sure if it fails; anyway MP seems more promising.
I also tend to have the tree structure completely independent on other data belonging to nodes.

Or did you have any other model in your mind?

hm, why do you need gist/gin for the int[] formulation?  what are your
lookup requirements?  with int[] you can typically do contains with
simple btree.

I do not think so, i.e. neither my tests nor http://www.postgresql.org/docs/current/static/indexes-types.html showed it works for <@ or @>. Using it for <= operator puts btree index to query plan in some (not all) scenarios. Still it needs to be accompanied with <@, and the performance goes in more complex scenarios down.

'Start with' I was mentioning, would be ideal.

Jan

P. S. Just to have a feeling, this is a simple overview of my current benchmarks.

scenario	adjacency list	nested set	ltree path	array path	neo4j
ancestors (42)	16ms	31ms	31ms	15ms	50ms/5ms
ancestors (1.000.000)	16ms	50ms	31ms	15ms
descendants (node 42)	180ms	90ms	90ms	140ms	2s
descendants (node 1)	4s	2s	2s	2s	above all bounds
descendants 3 far	15ms	20s	31ms	65ms	50ms
order by path (depth)	150s	40ms	31ms	31ms
order by path (width)
children_above_fitting w/ tags		850ms	270ms	950ms
ids as descendants below ids		850ms	250ms	950ms

Re: Trees: integer[] outperformed by ltree

From

Merlin Moncure

Date:

05 November 2013, 22:19:33

On Tue, Nov 5, 2013 at 3:52 PM, Jan Walter <john@commontongue.com> wrote:
>
> On 5.11.2013 20:51, Merlin Moncure wrote:
>
> On Tue, Nov 5, 2013 at 11:30 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>
> On Tue, Nov 5, 2013 at 6:25 AM, Jan Walter <john@commontongue.com> wrote:
>
> Hi,
>
> I am in a need of a very robust (esp. fast in read, non-blocking in update)
> tree structure storage (95% trees are of depth <4, current max. is 12). We
> have 10k-100k trees now, millions in the future.
> I made many tests, benchmarks of usual operations, and after all,
> materialized path ('1.5.3' path notation) seems most promising.
>
> materialized path approaches tend to be ideal if the tree remains
> relatively static and is not too deep.  The downside with matpath is
> that if a you have to move a node around in the tree, then all the
> child elements paths' have to be expensively updated.  I bring this up
> as it relates to your 'non-blocking in update' requirement: in matpath
> an update to parent can update an unbounded number of records.
>
>
> Thanks for your remark.
> Materialized path is still better in updates than nested sets we are using currently.
> Although adjacency lists with recursive CTEs were initially my favorite substitution (smallest lock space for node
relocation),whey are completely failing in e.g. order by path (depth) task (150s vs. 31ms via integer[]), and twice
slowerin simple descendant-based tasks. 
> I am yet considering it (if I moved e.g. ordering to application server level), and still need to rewrite couple of
moresophisticated scenarios to CTEs to be absolutely sure if it fails; anyway MP seems more promising. 
> I also tend to have the tree structure completely independent on other data belonging to nodes.
>
> Or did you have any other model in your mind?
>
>
> hm, why do you need gist/gin for the int[] formulation?  what are your
> lookup requirements?  with int[] you can typically do contains with
> simple btree.
>
>
> I do not think so, i.e. neither my tests nor http://www.postgresql.org/docs/current/static/indexes-types.html showed
itworks for <@ or @>. Using it for <= operator puts btree index to query plan in some (not all) scenarios. Still it
needsto be accompanied with <@, and the performance goes in more complex scenarios down. 
>
> 'Start with' I was mentioning, would be ideal.

This is trivial especially if you don't have to deal with null edge
case.  If you wan to search btree'd array column for all elements
starting with [1,2,3],

SELECT * FROM foo
  WHERE
    id >= array[1,2,3]
    AND id < array[1,2,4];

merlin

Re: Trees: integer[] outperformed by ltree

From

Jan Walter

Date:

06 November 2013, 00:27:40

On 5.11.2013 23:19, Merlin Moncure wrote:

On Tue, Nov 5, 2013 at 3:52 PM, Jan Walter <john@commontongue.com> wrote:

On 5.11.2013 20:51, Merlin Moncure wrote:

On Tue, Nov 5, 2013 at 11:30 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Tue, Nov 5, 2013 at 6:25 AM, Jan Walter <john@commontongue.com> wrote:

Hi,

I am in a need of a very robust (esp. fast in read, non-blocking in update)
tree structure storage (95% trees are of depth <4, current max. is 12). We
have 10k-100k trees now, millions in the future.
I made many tests, benchmarks of usual operations, and after all,
materialized path ('1.5.3' path notation) seems most promising.

materialized path approaches tend to be ideal if the tree remains
relatively static and is not too deep. The downside with matpath is
that if a you have to move a node around in the tree, then all the
child elements paths' have to be expensively updated. I bring this up
as it relates to your 'non-blocking in update' requirement: in matpath
an update to parent can update an unbounded number of records.

Or did you have any other model in your mind?

hm, why do you need gist/gin for the int[] formulation? what are your
lookup requirements? with int[] you can typically do contains with
simple btree.

'Start with' I was mentioning, would be ideal.

This is trivial especially if you don't have to deal with null edge
case.  If you wan to search btree'd array column for all elements
starting with [1,2,3],

SELECT * FROM foo WHERE   id >= array[1,2,3]   AND id < array[1,2,4];

merlin

You meant path >= array[1,2,3], right?
Great for descendants. Almost the same performance (btree vs. gin) in simple cases.
For ancestors, ids can be retrieved directly using unnest. I have to check how fast that is in real-life situations, as well as if all my cases can be solved without <@ operators (e.g. for descendants of filtered nodes, array manipulation has to be used).

Still I do not understand the behavior of gin index, and I read your recommendation as don't use it.

Thanks a lot for your hints,
Jan

Re: Trees: integer[] outperformed by ltree

From

Merlin Moncure

Date:

06 November 2013, 14:36:15

On Tue, Nov 5, 2013 at 6:27 PM, Jan Walter <john@commontongue.com> wrote:
> On 5.11.2013 23:19, Merlin Moncure wrote:
> On Tue, Nov 5, 2013 at 3:52 PM, Jan Walter <john@commontongue.com> wrote:
> On 5.11.2013 20:51, Merlin Moncure wrote:
> On Tue, Nov 5, 2013 at 11:30 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Tue, Nov 5, 2013 at 6:25 AM, Jan Walter <john@commontongue.com> wrote:
>
> I am in a need of a very robust (esp. fast in read, non-blocking in update)
> tree structure storage (95% trees are of depth <4, current max. is 12). We
> have 10k-100k trees now, millions in the future.
> I made many tests, benchmarks of usual operations, and after all,
> materialized path ('1.5.3' path notation) seems most promising.
>
> materialized path approaches tend to be ideal if the tree remains
> relatively static and is not too deep.  The downside with matpath is
> that if a you have to move a node around in the tree, then all the
> child elements paths' have to be expensively updated.  I bring this up
> as it relates to your 'non-blocking in update' requirement: in matpath
> an update to parent can update an unbounded number of records.
>
>
> Thanks for your remark.
> Materialized path is still better in updates than nested sets we are using
> currently.
> Although adjacency lists with recursive CTEs were initially my favorite
> substitution (smallest lock space for node relocation), whey are completely
> failing in e.g. order by path (depth) task (150s vs. 31ms via integer[]),
> and twice slower in simple descendant-based tasks.
> I am yet considering it (if I moved e.g. ordering to application server
> level), and still need to rewrite couple of more sophisticated scenarios to
> CTEs to be absolutely sure if it fails; anyway MP seems more promising.
> I also tend to have the tree structure completely independent on other data
> belonging to nodes.
>
> Or did you have any other model in your mind?
>
>
> hm, why do you need gist/gin for the int[] formulation?  what are your
> lookup requirements?  with int[] you can typically do contains with
> simple btree.
>
>
> I do not think so, i.e. neither my tests nor
> http://www.postgresql.org/docs/current/static/indexes-types.html showed it
> works for <@ or @>. Using it for <= operator puts btree index to query plan
> in some (not all) scenarios. Still it needs to be accompanied with <@, and
> the performance goes in more complex scenarios down.
>
> 'Start with' I was mentioning, would be ideal.
>
> This is trivial especially if you don't have to deal with null edge
> case.  If you wan to search btree'd array column for all elements
> starting with [1,2,3],
>
> SELECT * FROM foo
>   WHERE
>     id >= array[1,2,3]
>     AND id < array[1,2,4];
>
>
> You meant path >= array[1,2,3], right?

right -- I think you get the idea.

> Great for descendants. Almost the same performance (btree vs. gin) in simple
> cases.
> For ancestors, ids can be retrieved directly using unnest. I have to check
> how fast that is in real-life situations, as well as if all my cases can be
> solved without <@ operators (e.g. for descendants of filtered nodes, array
> manipulation has to be used).

That's the key.  Basically it comes down to this.  If all your
searches are anchored from root, then you can get away with simple
btree.  By 'anchored from root',  mean you never query like this:
path = [*,*,3,4,5];  -- where * are wild cards.

The above is only indexable by GIST/GIN.  If you need to do that and
your dataset is large i'd be looking at ltree.

The reason why this works is that mat path exploits a property of a
tree such that root based searches boil down to a a range of node with
unambiguous endpoints on an ordered table.  Celko's 'nested sets'
approach (which although it gets points for very basic SQL
requirements I don't recommend it due to very poor scalability) also
exploits this.

Another way to do matpath is via strings; then you can do prefix
operations with the SQL LIKE operator (path LIKE '1.2.3.%').  Yet
another way (which I've never tried but plan to eventually is via
numerics: see http://arxiv.org/html/cs/0401014).

> Still I do not understand the behavior of gin index, and I read your
> recommendation as don't use it.

I'm not saying that at all.  My point is though is that if your
requirements are satisfied by btree then use that.  GIST/GIN support a
much broader array of operations but that support does come at a
price.

merlin