Thread: [Patch] RBTree iteration interface improvement

[Patch] RBTree iteration interface improvement

From

Aleksander Alekseev

Date:

27 July 2016, 14:28:30

Hello

While working on some new feature for PostgreSQL (which is still in
development and is irrelevant in this context) I noticed that current
RBTree implementation interface is following:

```
extern void rb_begin_iterate(RBTree *rb, RBOrderControl ctrl);
extern RBNode *rb_iterate(RBTree *rb);
```

As you see it doesn't allow to do multiple iterations over a single
tree. I believe it's a major flaw for a general-purpose container.

Proposed patch introduces a new iteration interface that doesn't has
such a flaw. Naturally I wrote some unit tests, but I was not sure where
exactly to place them in PostgreSQL source code. For now I've just
uploaded them to GitHub:

https://github.com/afiskon/c-algorithms-examples/blob/master/rbtree_example.c

According to these tests new implementation works as fast as current
implementation and produces exactly same results.

I didn't dare to remove current interface since in theory some
extensions can use it. Still I believe such a move is worth considering.

--
Best regards,
Aleksander Alekseev

Attachment

rbtree-iteration-improvement.patch

Re: [Patch] RBTree iteration interface improvement

From

Aleksander Alekseev

Date:

27 July 2016, 14:53:18

And as always, I didn't re-check a patch before sending it :) Here is a
corrected version without any fprintf's.

--
Best regards,
Aleksander Alekseev

Attachment

rbtree-iteration-improvement-v2.patch

Re: [Patch] RBTree iteration interface improvement

From

Amit Kapila

Date:

28 July 2016, 13:48:19

On Wed, Jul 27, 2016 at 7:56 PM, Aleksander Alekseev
<a.alekseev@postgrespro.ru> wrote:
> Hello
>
> While working on some new feature for PostgreSQL (which is still in
> development and is irrelevant in this context) I noticed that current
> RBTree implementation interface is following:
>
> ```
> extern void rb_begin_iterate(RBTree *rb, RBOrderControl ctrl);
> extern RBNode *rb_iterate(RBTree *rb);
> ```
>
> As you see it doesn't allow to do multiple iterations over a single
> tree. I believe it's a major flaw for a general-purpose container.
>

Can you explain use case where you need it?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [Patch] RBTree iteration interface improvement

From

Aleksander Alekseev

Date:

28 July 2016, 15:00:27

> Can you explain use case where you need it?

Sure. You can consider RBTree as a container that always keeps its
elements in sorted order.  Now imagine you would like to write a code
like this:

```
/* iterate over items in sorted order */
while(item1 = left_right_walk(tree))
{
 /* another iteration, probably even in different procedure */ while(item2 = left_right_walk(tree)) {   /* ... some
logic... */ }
 

}
```

Currently you can't do it.

Or maybe you have different objects, e.g. IndexScanDesc's, that should
iterate over some tree's independently somewhere in indexam.c
procedures. Exact order may depend on user's query so you don't even
control it.

-- 
Best regards,
Aleksander Alekseev

Re: [Patch] RBTree iteration interface improvement

From

Tom Lane

Date:

28 July 2016, 17:58:06

Aleksander Alekseev <a.alekseev@postgrespro.ru> writes:
>> Can you explain use case where you need it?

> ... Or maybe you have different objects, e.g. IndexScanDesc's, that should
> iterate over some tree's independently somewhere in indexam.c
> procedures. Exact order may depend on user's query so you don't even
> control it.

It seems clear to me that the existing arrangement is hazardous for any
RBTree that hasn't got exactly one consumer.  I think Aleksander's plan to
decouple the iteration state is probably a good one (NB: I've not read the
patch, so this is not an endorsement of details).  I'd go so far as to say
that we should remove the old API as being dangerous, rather than preserve
it on backwards-compatibility grounds.  We make bigger changes than that
in internal APIs all the time.

Having said that, though: if the iteration state is not part of the
object, it's not very clear whether we can behave sanely if someone
changes the tree while an iteration is open.  It will need careful
thought as to what sort of guarantees we can make about that.  If it's
too weak, then a separated-state version would have enough hazards of
its own that it's not necessarily any safer.
        regards, tom lane

Re: [Patch] RBTree iteration interface improvement

From

Robert Haas

Date:

04 August 2016, 20:34:18

On Thu, Jul 28, 2016 at 1:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Aleksander Alekseev <a.alekseev@postgrespro.ru> writes:
>>> Can you explain use case where you need it?
>
>> ... Or maybe you have different objects, e.g. IndexScanDesc's, that should
>> iterate over some tree's independently somewhere in indexam.c
>> procedures. Exact order may depend on user's query so you don't even
>> control it.
>
> It seems clear to me that the existing arrangement is hazardous for any
> RBTree that hasn't got exactly one consumer.  I think Aleksander's plan to
> decouple the iteration state is probably a good one (NB: I've not read the
> patch, so this is not an endorsement of details).  I'd go so far as to say
> that we should remove the old API as being dangerous, rather than preserve
> it on backwards-compatibility grounds.  We make bigger changes than that
> in internal APIs all the time.
>
> Having said that, though: if the iteration state is not part of the
> object, it's not very clear whether we can behave sanely if someone
> changes the tree while an iteration is open.  It will need careful
> thought as to what sort of guarantees we can make about that.  If it's
> too weak, then a separated-state version would have enough hazards of
> its own that it's not necessarily any safer.

+1 to all of that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [Patch] RBTree iteration interface improvement

From

Aleksander Alekseev

Date:

17 August 2016, 15:40:15

> > Having said that, though: if the iteration state is not part of the
> > object, it's not very clear whether we can behave sanely if someone
> > changes the tree while an iteration is open.  It will need careful
> > thought as to what sort of guarantees we can make about that.  If
> > it's too weak, then a separated-state version would have enough
> > hazards of its own that it's not necessarily any safer.  
> 
> +1 to all of that.
> 

The old API assumes that tree doesn't change during iteration. And it
doesn't do any checks about this. The new API behaves the same way. In
this sense patch at least doesn't make anything worse.

> It seems clear to me that the existing arrangement is hazardous for
> any RBTree that hasn't got exactly one consumer.  I think
> Aleksander's plan to decouple the iteration state is probably a good
> one (NB: I've not read the patch, so this is not an endorsement of
> details).  I'd go so far as to say that we should remove the old API
> as being dangerous, rather than preserve it on
> backwards-compatibility grounds.  We make bigger changes than that in
> internal APIs all the time.

Glad to hear it! I would like to propose a patch that removes the old
API. I have a few questions though.

Would you like it to be a separate patch, or all-in-one patch is fine
in this case? I also would like to include unit/property-based tests in
this (these) patch (patches). However as I understand there are
currently no unit tests in PostgreSQL at all, only integration/system
tests. Any advice how to do it better?

-- 
Best regards,
Aleksander Alekseev

Re: [Patch] RBTree iteration interface improvement

From

Aleksander Alekseev

Date:

22 August 2016, 10:02:07

> > It seems clear to me that the existing arrangement is hazardous for
> > any RBTree that hasn't got exactly one consumer.  I think
> > Aleksander's plan to decouple the iteration state is probably a good
> > one (NB: I've not read the patch, so this is not an endorsement of
> > details).  I'd go so far as to say that we should remove the old API
> > as being dangerous, rather than preserve it on
> > backwards-compatibility grounds.  We make bigger changes than that in
> > internal APIs all the time.
>
> Glad to hear it! I would like to propose a patch that removes the old
> API. I have a few questions though.
>
> Would you like it to be a separate patch, or all-in-one patch is fine
> in this case? I also would like to include unit/property-based tests in
> this (these) patch (patches). However as I understand there are
> currently no unit tests in PostgreSQL at all, only integration/system
> tests. Any advice how to do it better?

OK, here is a patch. I think we could call it a _replacement_ of an old API, so
there is probably no need in two separate patches. I still don't know how to
include unit test and whether they should be included at all. Thus for now
unit/property-based tests could be found only on GitHub [1].

As usual, if you have any questions, suggestions or notes regarding this patch,
please let me know.

[1] https://github.com/afiskon/c-algorithms/tree/master/test

--
Best regards,
Aleksander Alekseev

Attachment

rbtree-iteration-improvement-v3.patch

Re: [Patch] RBTree iteration interface improvement

From

Heikki Linnakangas

Date:

26 August 2016, 11:53:17

On 08/22/2016 01:00 PM, Aleksander Alekseev wrote:
>>> It seems clear to me that the existing arrangement is hazardous for
>>> any RBTree that hasn't got exactly one consumer.  I think
>>> Aleksander's plan to decouple the iteration state is probably a good
>>> one (NB: I've not read the patch, so this is not an endorsement of
>>> details).  I'd go so far as to say that we should remove the old API
>>> as being dangerous, rather than preserve it on
>>> backwards-compatibility grounds.  We make bigger changes than that in
>>> internal APIs all the time.
>>
>> Glad to hear it! I would like to propose a patch that removes the old
>> API. I have a few questions though.
>>
>> Would you like it to be a separate patch, or all-in-one patch is fine
>> in this case? I also would like to include unit/property-based tests in
>> this (these) patch (patches). However as I understand there are
>> currently no unit tests in PostgreSQL at all, only integration/system
>> tests. Any advice how to do it better?
>
> OK, here is a patch. I think we could call it a _replacement_ of an old API, so
> there is probably no need in two separate patches. I still don't know how to
> include unit test and whether they should be included at all. Thus for now
> unit/property-based tests could be found only on GitHub [1].
>
> As usual, if you have any questions, suggestions or notes regarding this patch,
> please let me know.

This also seems to change the API so that instead of a single
rb_begin_iterate()+rb_iterate() pair, there is a separate begin and
iterate function for each traversal order. That seems like an unrelated
change. Was there a particular reason for that? I think the old
rb_begin_iterate()+rb_iterate() functions were fine, we just need to
have a RBTreeIterator struct that's different from the tree itself.

Note that the API actually used to be like that. rb_begin_iterate() used
to return a palloc'd RBTreeIterator struct. It was changed in commit
0454f131, because the implementation didn't support more than one
iterator at a time, which made the separate struct pointless. But we're
fixing that problem now.

Another unrelated change in this patch is the addition of
rb_rightmost(). It's not used for anything, so I'm not sure what the
point is. Then again, there don't seem to be any callers of
rb_leftmost() either.

I think we should something like in the attached patch. It seems to pass
your test suite, but I haven't done any other testing on this. Does it
look OK to you?

- Heikki

Attachment

rbtree-iteration-improvement-heikki-v1.patch

Re: [Patch] RBTree iteration interface improvement

From

Aleksander Alekseev

Date:

26 August 2016, 13:09:12

Hello, Heikki.

Thank you for your attention to this patch!

> This also seems to change the API so that instead of a single 
> rb_begin_iterate()+rb_iterate() pair, there is a separate begin and 
> iterate function for each traversal order. That seems like an unrelated 
> change. Was there a particular reason for that? I think the old 
> rb_begin_iterate()+rb_iterate() functions were fine, we just need to 
> have a RBTreeIterator struct that's different from the tree itself.

I'm trying to avoid calling procedures by a pointer, an old habit. When
I started to work on this patch I just needed a RB-tree implementation
for a pet project. I took one from PostgreSQL code. Then I found this
flaw of iteration interface and decided to fix it. The idea to merge
this fix back to PostgreSQL code appeared later so I just wrote code the
way I like.

These days code performance depends on many factors like whether code
fits into cache, i.e not only on whether you call a procedure directly
or using a pointer. Until someone finds a real bottleneck here I think
current rb_begin_iterate()+rb_iterate() interface should do just fine.
> Another unrelated change in this patch is the addition of 
> rb_rightmost(). It's not used for anything, so I'm not sure what the 
> point is. Then again, there don't seem to be any callers of 
> rb_leftmost() either.

It's just something I needed in tests to reach a good percent of code
coverage. Implementation of rb_rightmost is trivial so we probably can do
without it.

> I think we should something like in the attached patch. It seems to pass 
> your test suite, but I haven't done any other testing on this. Does it
> look OK to you?

Looks good to me.

-- 
Best regards,
Aleksander Alekseev

Re: [Patch] RBTree iteration interface improvement

From

Heikki Linnakangas

Date:

02 September 2016, 05:43:33

On 08/26/2016 04:07 PM, Aleksander Alekseev wrote:
>> Another unrelated change in this patch is the addition of
>> rb_rightmost(). It's not used for anything, so I'm not sure what the
>> point is. Then again, there don't seem to be any callers of
>> rb_leftmost() either.
>
> It's just something I needed in tests to reach a good percent of code
> coverage. Implementation of rb_rightmost is trivial so we probably can do
> without it.

Looking closer, we don't currently use any of the iterators besides the 
left-right iterator either. Nor rb_delete().

>> I think we should something like in the attached patch. It seems to pass
>> your test suite, but I haven't done any other testing on this. Does it
>> look OK to you?
>
> Looks good to me.

Ok, committed.

- Heikki

Re: [Patch] RBTree iteration interface improvement

From

Aleksander Alekseev

Date:

02 September 2016, 11:12:59

> Ok, committed.
> 
> - Heikki

Thanks a lot for committing this patch and also for your contribution to
it!

-- 
Best regards,
Aleksander Alekseev