Thread: [Patch] RBTree iteration interface improvement
Hello While working on some new feature for PostgreSQL (which is still in development and is irrelevant in this context) I noticed that current RBTree implementation interface is following: ``` extern void rb_begin_iterate(RBTree *rb, RBOrderControl ctrl); extern RBNode *rb_iterate(RBTree *rb); ``` As you see it doesn't allow to do multiple iterations over a single tree. I believe it's a major flaw for a general-purpose container. Proposed patch introduces a new iteration interface that doesn't has such a flaw. Naturally I wrote some unit tests, but I was not sure where exactly to place them in PostgreSQL source code. For now I've just uploaded them to GitHub: https://github.com/afiskon/c-algorithms-examples/blob/master/rbtree_example.c According to these tests new implementation works as fast as current implementation and produces exactly same results. I didn't dare to remove current interface since in theory some extensions can use it. Still I believe such a move is worth considering. -- Best regards, Aleksander Alekseev
Attachment
And as always, I didn't re-check a patch before sending it :) Here is a corrected version without any fprintf's. -- Best regards, Aleksander Alekseev
Attachment
On Wed, Jul 27, 2016 at 7:56 PM, Aleksander Alekseev <a.alekseev@postgrespro.ru> wrote: > Hello > > While working on some new feature for PostgreSQL (which is still in > development and is irrelevant in this context) I noticed that current > RBTree implementation interface is following: > > ``` > extern void rb_begin_iterate(RBTree *rb, RBOrderControl ctrl); > extern RBNode *rb_iterate(RBTree *rb); > ``` > > As you see it doesn't allow to do multiple iterations over a single > tree. I believe it's a major flaw for a general-purpose container. > Can you explain use case where you need it? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
> Can you explain use case where you need it? Sure. You can consider RBTree as a container that always keeps its elements in sorted order. Now imagine you would like to write a code like this: ``` /* iterate over items in sorted order */ while(item1 = left_right_walk(tree)) { /* another iteration, probably even in different procedure */ while(item2 = left_right_walk(tree)) { /* ... some logic... */ } } ``` Currently you can't do it. Or maybe you have different objects, e.g. IndexScanDesc's, that should iterate over some tree's independently somewhere in indexam.c procedures. Exact order may depend on user's query so you don't even control it. -- Best regards, Aleksander Alekseev
Aleksander Alekseev <a.alekseev@postgrespro.ru> writes: >> Can you explain use case where you need it? > ... Or maybe you have different objects, e.g. IndexScanDesc's, that should > iterate over some tree's independently somewhere in indexam.c > procedures. Exact order may depend on user's query so you don't even > control it. It seems clear to me that the existing arrangement is hazardous for any RBTree that hasn't got exactly one consumer. I think Aleksander's plan to decouple the iteration state is probably a good one (NB: I've not read the patch, so this is not an endorsement of details). I'd go so far as to say that we should remove the old API as being dangerous, rather than preserve it on backwards-compatibility grounds. We make bigger changes than that in internal APIs all the time. Having said that, though: if the iteration state is not part of the object, it's not very clear whether we can behave sanely if someone changes the tree while an iteration is open. It will need careful thought as to what sort of guarantees we can make about that. If it's too weak, then a separated-state version would have enough hazards of its own that it's not necessarily any safer. regards, tom lane
On Thu, Jul 28, 2016 at 1:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Aleksander Alekseev <a.alekseev@postgrespro.ru> writes: >>> Can you explain use case where you need it? > >> ... Or maybe you have different objects, e.g. IndexScanDesc's, that should >> iterate over some tree's independently somewhere in indexam.c >> procedures. Exact order may depend on user's query so you don't even >> control it. > > It seems clear to me that the existing arrangement is hazardous for any > RBTree that hasn't got exactly one consumer. I think Aleksander's plan to > decouple the iteration state is probably a good one (NB: I've not read the > patch, so this is not an endorsement of details). I'd go so far as to say > that we should remove the old API as being dangerous, rather than preserve > it on backwards-compatibility grounds. We make bigger changes than that > in internal APIs all the time. > > Having said that, though: if the iteration state is not part of the > object, it's not very clear whether we can behave sanely if someone > changes the tree while an iteration is open. It will need careful > thought as to what sort of guarantees we can make about that. If it's > too weak, then a separated-state version would have enough hazards of > its own that it's not necessarily any safer. +1 to all of that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> > Having said that, though: if the iteration state is not part of the > > object, it's not very clear whether we can behave sanely if someone > > changes the tree while an iteration is open. It will need careful > > thought as to what sort of guarantees we can make about that. If > > it's too weak, then a separated-state version would have enough > > hazards of its own that it's not necessarily any safer. > > +1 to all of that. > The old API assumes that tree doesn't change during iteration. And it doesn't do any checks about this. The new API behaves the same way. In this sense patch at least doesn't make anything worse. > It seems clear to me that the existing arrangement is hazardous for > any RBTree that hasn't got exactly one consumer. I think > Aleksander's plan to decouple the iteration state is probably a good > one (NB: I've not read the patch, so this is not an endorsement of > details). I'd go so far as to say that we should remove the old API > as being dangerous, rather than preserve it on > backwards-compatibility grounds. We make bigger changes than that in > internal APIs all the time. Glad to hear it! I would like to propose a patch that removes the old API. I have a few questions though. Would you like it to be a separate patch, or all-in-one patch is fine in this case? I also would like to include unit/property-based tests in this (these) patch (patches). However as I understand there are currently no unit tests in PostgreSQL at all, only integration/system tests. Any advice how to do it better? -- Best regards, Aleksander Alekseev
> > It seems clear to me that the existing arrangement is hazardous for > > any RBTree that hasn't got exactly one consumer. I think > > Aleksander's plan to decouple the iteration state is probably a good > > one (NB: I've not read the patch, so this is not an endorsement of > > details). I'd go so far as to say that we should remove the old API > > as being dangerous, rather than preserve it on > > backwards-compatibility grounds. We make bigger changes than that in > > internal APIs all the time. > > Glad to hear it! I would like to propose a patch that removes the old > API. I have a few questions though. > > Would you like it to be a separate patch, or all-in-one patch is fine > in this case? I also would like to include unit/property-based tests in > this (these) patch (patches). However as I understand there are > currently no unit tests in PostgreSQL at all, only integration/system > tests. Any advice how to do it better? OK, here is a patch. I think we could call it a _replacement_ of an old API, so there is probably no need in two separate patches. I still don't know how to include unit test and whether they should be included at all. Thus for now unit/property-based tests could be found only on GitHub [1]. As usual, if you have any questions, suggestions or notes regarding this patch, please let me know. [1] https://github.com/afiskon/c-algorithms/tree/master/test -- Best regards, Aleksander Alekseev
Attachment
On 08/22/2016 01:00 PM, Aleksander Alekseev wrote: >>> It seems clear to me that the existing arrangement is hazardous for >>> any RBTree that hasn't got exactly one consumer. I think >>> Aleksander's plan to decouple the iteration state is probably a good >>> one (NB: I've not read the patch, so this is not an endorsement of >>> details). I'd go so far as to say that we should remove the old API >>> as being dangerous, rather than preserve it on >>> backwards-compatibility grounds. We make bigger changes than that in >>> internal APIs all the time. >> >> Glad to hear it! I would like to propose a patch that removes the old >> API. I have a few questions though. >> >> Would you like it to be a separate patch, or all-in-one patch is fine >> in this case? I also would like to include unit/property-based tests in >> this (these) patch (patches). However as I understand there are >> currently no unit tests in PostgreSQL at all, only integration/system >> tests. Any advice how to do it better? > > OK, here is a patch. I think we could call it a _replacement_ of an old API, so > there is probably no need in two separate patches. I still don't know how to > include unit test and whether they should be included at all. Thus for now > unit/property-based tests could be found only on GitHub [1]. > > As usual, if you have any questions, suggestions or notes regarding this patch, > please let me know. This also seems to change the API so that instead of a single rb_begin_iterate()+rb_iterate() pair, there is a separate begin and iterate function for each traversal order. That seems like an unrelated change. Was there a particular reason for that? I think the old rb_begin_iterate()+rb_iterate() functions were fine, we just need to have a RBTreeIterator struct that's different from the tree itself. Note that the API actually used to be like that. rb_begin_iterate() used to return a palloc'd RBTreeIterator struct. It was changed in commit 0454f131, because the implementation didn't support more than one iterator at a time, which made the separate struct pointless. But we're fixing that problem now. Another unrelated change in this patch is the addition of rb_rightmost(). It's not used for anything, so I'm not sure what the point is. Then again, there don't seem to be any callers of rb_leftmost() either. I think we should something like in the attached patch. It seems to pass your test suite, but I haven't done any other testing on this. Does it look OK to you? - Heikki
Attachment
Hello, Heikki. Thank you for your attention to this patch! > This also seems to change the API so that instead of a single > rb_begin_iterate()+rb_iterate() pair, there is a separate begin and > iterate function for each traversal order. That seems like an unrelated > change. Was there a particular reason for that? I think the old > rb_begin_iterate()+rb_iterate() functions were fine, we just need to > have a RBTreeIterator struct that's different from the tree itself. I'm trying to avoid calling procedures by a pointer, an old habit. When I started to work on this patch I just needed a RB-tree implementation for a pet project. I took one from PostgreSQL code. Then I found this flaw of iteration interface and decided to fix it. The idea to merge this fix back to PostgreSQL code appeared later so I just wrote code the way I like. These days code performance depends on many factors like whether code fits into cache, i.e not only on whether you call a procedure directly or using a pointer. Until someone finds a real bottleneck here I think current rb_begin_iterate()+rb_iterate() interface should do just fine. > Another unrelated change in this patch is the addition of > rb_rightmost(). It's not used for anything, so I'm not sure what the > point is. Then again, there don't seem to be any callers of > rb_leftmost() either. It's just something I needed in tests to reach a good percent of code coverage. Implementation of rb_rightmost is trivial so we probably can do without it. > I think we should something like in the attached patch. It seems to pass > your test suite, but I haven't done any other testing on this. Does it > look OK to you? Looks good to me. -- Best regards, Aleksander Alekseev
On 08/26/2016 04:07 PM, Aleksander Alekseev wrote: >> Another unrelated change in this patch is the addition of >> rb_rightmost(). It's not used for anything, so I'm not sure what the >> point is. Then again, there don't seem to be any callers of >> rb_leftmost() either. > > It's just something I needed in tests to reach a good percent of code > coverage. Implementation of rb_rightmost is trivial so we probably can do > without it. Looking closer, we don't currently use any of the iterators besides the left-right iterator either. Nor rb_delete(). >> I think we should something like in the attached patch. It seems to pass >> your test suite, but I haven't done any other testing on this. Does it >> look OK to you? > > Looks good to me. Ok, committed. - Heikki
> Ok, committed. > > - Heikki Thanks a lot for committing this patch and also for your contribution to it! -- Best regards, Aleksander Alekseev