[CFReview] Red-Black Tree - Mailing list pgsql-hackers

From Mark Cave-Ayland
Subject [CFReview] Red-Black Tree
Date
Msg-id 4B62E9F7.6000002@siriusit.co.uk
Whole thread Raw
Responses Re: [CFReview] Red-Black Treey
Re: [CFReview] Red-Black Tree
List pgsql-hackers
Hi Robert,

I've also spent some time reviewing this patch since it is a
pre-requisite to the KNNGiST patch. I did have a much more comprehensive
list of suggestions, but it seems you've managed to resolve most of
these with your latest re-write. Please find some more comments inline:

> Here's an edited version, which I've now reviewed more fully.  Some
> more substantive review comments:

Firstly: the re-worked patch that you have posted seems to contain
remnants of at least 2 other patches. I have extracted the rbtree-only
sections and re-attached to this email.

The patch was tested against git head 124a3cc... and applied without any
  fuzz or other issues.

> 1. I'm pretty satisfied that the rbtree code is generally sane,
> although I wonder if we should think about putting it in access/common
> rather than utils/misc.  I'm not sure that I have a sufficiently
> clearly defined notion of what each subdirectory is for to draw a
> definitive conclusion on this point; hopefully someone else will weigh
> in.

I'm happy that the code is a reasonable implementation of an RB-Tree, at
least with respect to the link to the related public domain source that
was posted. In terms of location, I think utils/misc is a reasonable
place for it to live since I see it as analogous to the hash table
implementation, i.e. it's a template RB-Tree implementation designed to
be used as required throughout the codebase. backend/access seems to be
the home of index AMs only.

Other code points:

- The new names for the iterator enum seem much better to me - or at
least it helped understand the meaning of the iterator code.

- You correctly picked up on the namespace issue, although I noticed
that you left RED and BLACK as they were. Maybe RBRED and RBBLACK would
be better, though not that there are any colour related defines around
in a database backend to make a name clash probable.

- I found the name of the "appendator" method misleading - perhaps
"updater" would make more sense?

> 2. I'm a little concerned about the performance implications of this
> patch.  Looking at http://www.sai.msu.su/~megera/wiki/2009-04-03, it's
> clear that the patch is a huge win in some cases.  But I'm also
> surprised by how much performance is lost in some of the other cases
> that you tested.  I suspect, on balance, that it's probably still a
> good idea to put this in, but I wonder if you've profiled this at all
> to see where the extra time is going and/or explored possible ways of
> squashing that overhead down a little more.
>
> 3. In ginInsertEntry(), the "else" branch of the if statement really
> looks like magic when you first read it.  I wonder if it would make
> sense to pull the contents of EAAllocate() directly into this
> function, since there's only one call site anyhow.

God yes. This is not a good example of maintainable code - even with
your comment I struggled for a while to try and figure it out :(  I
would suggest that this is refactored appropriately before commit.

> I still have not done any performance testing of my own on this code,
> and it probably needs that.  If anyone else has time to step up and
> help with that, it would be much appreciated.  It would be useful to
> have some plain old benchmarks as well as some profiling data as
> mentioned above.

As part of my testing, I attempted to duplicate some of the benchmarks
at http://www.sai.msu.su/~megera/wiki/2009-04-03. Unfortunately I was
not really able to reproduce the RND (teodor's) dataset, nor the random
array test as the SQL used to test the implementation was not present on
the page above.

For each test, I dropped and recreated the database to ensure that any
autovacuum impact would be the same.


1) Fixed length random & sequential string arrays

SELECT array_to_string(ARRAY(select '' || a || '.' || b from
generate_series(1,50) b), ' ')::tsvector AS i INTO seq FROM
generate_series(1,100000) a;

SELECT array_to_string(ARRAY(select '' || random() from
generate_series(1,50) b), ' ')::tsvector AS i INTO rnd FROM
generate_series(1,100000) a;


Before patch:

test=# create index seq_idx on seq using gin (i);
CREATE INDEX
Time: 103205.790 ms
test=# create index rnd_idx on rnd using gin (i);
CREATE INDEX
Time: 6770.131 ms


After patch:

test=# create index seq_idx on seq using gin (i);
CREATE INDEX
Time: 87724.953 ms
test=# create index rnd_idx on rnd using gin (i);
CREATE INDEX
Time: 5596.666 ms


2) Identical records, variable length test

select ARRAY(select generate_series(1,len)) as a50  into arr50 from
generate_series(1,100000) b;


Before patch:

i) len=3

select ARRAY(select generate_series(1,3)) as a3 into arr3 from
generate_series(1,100000) b;

test=# create index arr3_idx on arr3 using gin (a3);
CREATE INDEX
Time: 324.251 ms


ii) len=30

select ARRAY(select generate_series(1,30)) as a30 into arr30 from
generate_series(1,100000) b;

test=# create index arr30_idx on arr30 using gin (a30);
CREATE INDEX
Time: 2163.786 ms


iii) len=50

select ARRAY(select generate_series(1,50)) as a50 into arr50 from
generate_series(1,100000) b;

test=# create index arr50_idx on arr50 using gin (a50);
CREATE INDEX
Time: 3306.074 ms


iv) len=random

select ARRAY(select generate_series(1, (random() * 100)::int)) as arand
into arrrand from generate_series(1,100000) b;

test=# create index arrrand_idx on arrrand using gin (arand);
CREATE INDEX
Time: 4725.556 ms


After patch:

i) len=3

select ARRAY(select generate_series(1,3)) as a3 into arr3 from
generate_series(1,100000) b;

test=# create index arr3_idx on arr3 using gin (a3);
CREATE INDEX
Time: 299.090 ms


ii) len=30

select ARRAY(select generate_series(1,30)) as a30 into arr30 from
generate_series(1,100000) b;

test=# create index arr30_idx on arr30 using gin (a30);
CREATE INDEX
Time: 2828.424 ms


iii) len=50

select ARRAY(select generate_series(1,50)) as a50 into arr50 from
generate_series(1,100000) b;

test=# create index arr50_idx on arr50 using gin (a50);
CREATE INDEX
Time: 3984.456 ms


iv) len=random

select ARRAY(select generate_series(1, (random() * 100)::int)) as arand
into arrrand from generate_series(1,100000) b;

test=# create index arrrand_idx on arrrand using gin (arand);
CREATE INDEX
Time: 3514.972 ms


Summary
=======

I believe Robert has done a good deal of the groundwork required to get
this patch ready for inclusion. With the current version, I was able to
see a measurable performance improvement in some test cases with no
significant performance regression. It would have been nice to be able
to reproduce the whole set of test cases but this was not possible from
the information specified.

With perhaps some minor tweaks to some of the names and a rework of the
else clause in ginInsertEntry(), I feel this patch is reasonably close
to commit.

I shall now continue my review of the associated KNNGiST patch.


ATB,

Mark.

--
Mark Cave-Ayland - Senior Technical Architect
PostgreSQL - PostGIS
Sirius Corporation plc - control through freedom
http://www.siriusit.co.uk
t: +44 870 608 0063

Sirius Labs: http://www.siriusit.co.uk/labs

diff --git a/src/backend/access/gin/ginbulk.c b/src/backend/access/gin/ginbulk.c
index 954884d..e043713 100644
--- a/src/backend/access/gin/ginbulk.c
+++ b/src/backend/access/gin/ginbulk.c
@@ -22,15 +22,60 @@
 #define DEF_NENTRY    2048
 #define DEF_NPTR    4

+static void*
+ginAppendData(void *old, void *new, void *arg)
+{
+    EntryAccumulator    *eo = (EntryAccumulator*)old,
+                        *en = (EntryAccumulator*)new;
+
+    BuildAccumulator    *accum = (BuildAccumulator*)arg;
+
+    if (eo->number >= eo->length)
+    {
+        accum->allocatedMemory -= GetMemoryChunkSpace(eo->list);
+        eo->length *= 2;
+        eo->list = (ItemPointerData *) repalloc(eo->list,
+                                    sizeof(ItemPointerData) * eo->length);
+        accum->allocatedMemory += GetMemoryChunkSpace(eo->list);
+    }
+
+    /* If item pointers are not ordered, they will need to be sorted. */
+    if (eo->shouldSort == FALSE)
+    {
+        int            res;
+
+        res = compareItemPointers(eo->list + eo->number - 1, en->list);
+        Assert(res != 0);
+
+        if (res > 0)
+            eo->shouldSort = TRUE;
+    }
+
+    eo->list[eo->number] = en->list[0];
+    eo->number++;
+
+    return old;
+}
+
+static int
+cmpEntryAccumulator(const void *a, const void *b, void *arg)
+{
+    EntryAccumulator    *ea = (EntryAccumulator*)a;
+    EntryAccumulator    *eb = (EntryAccumulator*)b;
+    BuildAccumulator    *accum = (BuildAccumulator*)arg;
+
+    return compareAttEntries(accum->ginstate, ea->attnum, ea->value,
+                             eb->attnum, eb->value);
+}
+
 void
 ginInitBA(BuildAccumulator *accum)
 {
-    accum->maxdepth = 1;
-    accum->stackpos = 0;
-    accum->entries = NULL;
-    accum->stack = NULL;
     accum->allocatedMemory = 0;
     accum->entryallocator = NULL;
+    accum->tree = rb_create(cmpEntryAccumulator, ginAppendData, NULL, accum);
+    accum->iterator = NULL;
+    accum->tmpList = NULL;
 }

 static EntryAccumulator *
@@ -48,36 +93,6 @@ EAAllocate(BuildAccumulator *accum)
 }

 /*
- * Stores heap item pointer. For robust, it checks that
- * item pointer are ordered
- */
-static void
-ginInsertData(BuildAccumulator *accum, EntryAccumulator *entry, ItemPointer heapptr)
-{
-    if (entry->number >= entry->length)
-    {
-        accum->allocatedMemory -= GetMemoryChunkSpace(entry->list);
-        entry->length *= 2;
-        entry->list = (ItemPointerData *) repalloc(entry->list,
-                                    sizeof(ItemPointerData) * entry->length);
-        accum->allocatedMemory += GetMemoryChunkSpace(entry->list);
-    }
-
-    if (entry->shouldSort == FALSE)
-    {
-        int            res = compareItemPointers(entry->list + entry->number - 1, heapptr);
-
-        Assert(res != 0);
-
-        if (res > 0)
-            entry->shouldSort = TRUE;
-    }
-
-    entry->list[entry->number] = *heapptr;
-    entry->number++;
-}
-
-/*
  * This is basically the same as datumCopy(), but modified to count
  * palloc'd space in accum.
  */
@@ -103,57 +118,38 @@ getDatumCopy(BuildAccumulator *accum, OffsetNumber attnum, Datum value)
 static void
 ginInsertEntry(BuildAccumulator *accum, ItemPointer heapptr, OffsetNumber attnum, Datum entry)
 {
-    EntryAccumulator *ea = accum->entries,
-               *pea = NULL;
-    int            res = 0;
-    uint32        depth = 1;
+    EntryAccumulator     *key = EAAllocate(accum),
+                        *ea;

-    while (ea)
-    {
-        res = compareAttEntries(accum->ginstate, attnum, entry, ea->attnum, ea->value);
-        if (res == 0)
-            break;                /* found */
-        else
-        {
-            pea = ea;
-            if (res < 0)
-                ea = ea->left;
-            else
-                ea = ea->right;
-        }
-        depth++;
-    }
+    key->attnum = attnum;
+    key->value = entry;
+    if (accum->tmpList == NULL)
+        accum->tmpList =
+            (ItemPointerData *) palloc(sizeof(ItemPointerData) * DEF_NPTR);
+    key->list = accum->tmpList;
+    key->list[0] = *heapptr;

-    if (depth > accum->maxdepth)
-        accum->maxdepth = depth;
+    ea = rb_insert(accum->tree, key);

     if (ea == NULL)
     {
-        ea = EAAllocate(accum);
-
-        ea->left = ea->right = NULL;
-        ea->attnum = attnum;
-        ea->value = getDatumCopy(accum, attnum, entry);
-        ea->length = DEF_NPTR;
-        ea->number = 1;
-        ea->shouldSort = FALSE;
-        ea->list = (ItemPointerData *) palloc(sizeof(ItemPointerData) * DEF_NPTR);
-        accum->allocatedMemory += GetMemoryChunkSpace(ea->list);
-        ea->list[0] = *heapptr;
-
-        if (pea == NULL)
-            accum->entries = ea;
-        else
-        {
-            Assert(res != 0);
-            if (res < 0)
-                pea->left = ea;
-            else
-                pea->right = ea;
-        }
+        /*
+         * The key has been inserted, so continue initialization.
+         */
+        key->value = getDatumCopy(accum, attnum, entry);
+        key->length = DEF_NPTR;
+        key->number = 1;
+        key->shouldSort = FALSE;
+        accum->allocatedMemory += GetMemoryChunkSpace(key->list);
+        accum->tmpList = NULL;
     }
     else
-        ginInsertData(accum, ea, heapptr);
+    {
+        /*
+         * The key has been appended, so reset
+         */
+        accum->length--;
+    }
 }

 /*
@@ -219,86 +215,16 @@ qsortCompareItemPointers(const void *a, const void *b)
     return res;
 }

-/*
- * walk on binary tree and returns ordered nodes
- */
-static EntryAccumulator *
-walkTree(BuildAccumulator *accum)
-{
-    EntryAccumulator *entry = accum->stack[accum->stackpos];
-
-    if (entry->list != NULL)
-    {
-        /* return entry itself: we already was at left sublink */
-        return entry;
-    }
-    else if (entry->right && entry->right != accum->stack[accum->stackpos + 1])
-    {
-        /* go on right sublink */
-        accum->stackpos++;
-        entry = entry->right;
-
-        /* find most-left value */
-        for (;;)
-        {
-            accum->stack[accum->stackpos] = entry;
-            if (entry->left)
-            {
-                accum->stackpos++;
-                entry = entry->left;
-            }
-            else
-                break;
-        }
-    }
-    else
-    {
-        /* we already return all left subtree, itself and  right subtree */
-        if (accum->stackpos == 0)
-            return 0;
-        accum->stackpos--;
-        return walkTree(accum);
-    }
-
-    return entry;
-}
-
 ItemPointerData *
 ginGetEntry(BuildAccumulator *accum, OffsetNumber *attnum, Datum *value, uint32 *n)
 {
     EntryAccumulator *entry;
     ItemPointerData *list;

-    if (accum->stack == NULL)
-    {
-        /* first call */
-        accum->stack = palloc0(sizeof(EntryAccumulator *) * (accum->maxdepth + 1));
-        accum->allocatedMemory += GetMemoryChunkSpace(accum->stack);
-        entry = accum->entries;
-
-        if (entry == NULL)
-            return NULL;
-
-        /* find most-left value */
-        for (;;)
-        {
-            accum->stack[accum->stackpos] = entry;
-            if (entry->left)
-            {
-                accum->stackpos++;
-                entry = entry->left;
-            }
-            else
-                break;
-        }
-    }
-    else
-    {
-        accum->allocatedMemory -= GetMemoryChunkSpace(accum->stack[accum->stackpos]->list);
-        pfree(accum->stack[accum->stackpos]->list);
-        accum->stack[accum->stackpos]->list = NULL;
-        entry = walkTree(accum);
-    }
+    if (accum->iterator == NULL)
+        accum->iterator = rb_begin_iterate(accum->tree, LeftRightWalk);
+
+    entry = rb_iterate(accum->iterator);

     if (entry == NULL)
         return NULL;
diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c
index 8d48cdf..3fb4441 100644
--- a/src/backend/access/gin/ginfast.c
+++ b/src/backend/access/gin/ginfast.c
@@ -765,8 +765,7 @@ ginInsertCleanup(Relation index, GinState *ginstate,
          */
         if (GinPageGetOpaque(page)->rightlink == InvalidBlockNumber ||
             (GinPageHasFullRow(page) &&
-             (accum.allocatedMemory >= maintenance_work_mem * 1024L ||
-              accum.maxdepth > GIN_MAX_TREE_DEPTH)))
+             (accum.allocatedMemory >= maintenance_work_mem * 1024L)))
         {
             ItemPointerData *list;
             uint32        nlist;
diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c
index 902e361..97fc417 100644
--- a/src/backend/access/gin/gininsert.c
+++ b/src/backend/access/gin/gininsert.c
@@ -247,9 +247,7 @@ ginBuildCallback(Relation index, HeapTuple htup, Datum *values,
                                                             &htup->t_self);

     /* If we've maxed out our available memory, dump everything to the index */
-    /* Also dump if the tree seems to be getting too unbalanced */
-    if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L ||
-        buildstate->accum.maxdepth > GIN_MAX_TREE_DEPTH)
+    if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L)
     {
         ItemPointerData *list;
         Datum        entry;
diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile
index 03a155c..e8866df 100644
--- a/src/backend/utils/misc/Makefile
+++ b/src/backend/utils/misc/Makefile
@@ -14,7 +14,8 @@ include $(top_builddir)/src/Makefile.global

 override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS)

-OBJS = guc.o help_config.o pg_rusage.o ps_status.o superuser.o tzparser.o
+OBJS = guc.o help_config.o pg_rusage.o ps_status.o superuser.o tzparser.o \
+       rbtree.o

 # This location might depend on the installation directories. Therefore
 # we can't subsitute it into pg_config.h.
diff --git a/src/backend/utils/misc/rbtree.c b/src/backend/utils/misc/rbtree.c
new file mode 100644
index 0000000..b10c669
--- /dev/null
+++ b/src/backend/utils/misc/rbtree.c
@@ -0,0 +1,814 @@
+/*-------------------------------------------------------------------------
+ *
+ * rbtree.c
+ *      implementation for PostgreSQL generic Red-Black binary tree package
+ *      Adopted from http://algolist.manual.ru/ds/rbtree.php
+ *
+ * This code comes from Thomas Niemann's "Sorting and Searching Algorithms:
+ * a Cookbook".
+ *
+ * See http://www.cs.auckland.ac.nz/software/AlgAnim/niemann/s_man.htm for
+ * license terms: "Source code, when part of a software project, may be used
+ * freely without reference to the author."
+ *
+ * Red-black trees are a type of balanced binary tree wherein (1) any child of
+ * a red node is always black, and (2) every path from root to leaf traverses
+ * an equal number of black nodes.  From these properties, it follows that the
+ * longest path from root to leaf is only about twice as long as the shortest,
+ * so lookups are guaranteed to run in O(lg n) time.
+ *
+ * Copyright (c) 1996-2009, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *      $PostgreSQL: pgsql/src/backend/nodes/rbtree.c,v 1.69 2008/01/01 19:45:50 momjian Exp $
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "utils/rbtree.h"
+
+/**********************************************************************
+ *                         Declarations                                  *
+ **********************************************************************/
+
+/*
+ * Values for RBNode->iteratorState
+ */
+#define InitialState     (0)
+#define FirstStepDone    (1)
+#define SecondStepDone    (2)
+#define ThirdStepDone    (3)
+
+/*
+ * Colors of node
+ */
+#define BLACK        (0)
+#define RED            (1)
+
+typedef struct RBNode
+{
+    uint32        iteratorState:2,
+                color:    1 ,
+                unused: 29;
+    struct RBNode *left;
+    struct RBNode *right;
+    struct RBNode *parent;
+    void       *data;
+}    RBNode;
+
+struct RBTree
+{
+    RBNode       *root;
+    rb_comparator comparator;
+    rb_appendator appendator;
+    rb_freefunc freefunc;
+    void       *arg;
+};
+
+struct RBTreeIterator
+{
+    RBNode       *node;
+    void       *(*iterate) (RBTreeIterator *iterator);
+};
+
+/*
+ * all leafs are sentinels, use castimized NIL name to prevent
+ * collision with sytem-wide NIL which is actually NULL
+ */
+#define RBNIL &sentinel
+
+RBNode        sentinel = {InitialState, BLACK, 0, RBNIL, RBNIL, NULL, NULL};
+
+/**********************************************************************
+ *                          Create/free                                  *
+ **********************************************************************/
+
+RBTree *
+rb_create(rb_comparator comparator, rb_appendator appendator,
+                  rb_freefunc freefunc, void *arg)
+{
+    RBTree       *tree = palloc(sizeof(RBTree));
+
+    tree->root = RBNIL;
+    tree->comparator = comparator;
+    tree->appendator = appendator;
+    tree->freefunc = freefunc;
+    tree->arg = arg;
+
+    return tree;
+}
+
+static void
+rb_free_recursive(RBNode *node, rb_freefunc freefunc)
+{
+    if (node->left != RBNIL)
+        rb_free_recursive(node->left, freefunc);
+    if (node->right != RBNIL)
+        rb_free_recursive(node->right, freefunc);
+    if (freefunc && node->data)
+        freefunc(node->data);
+    pfree(node);
+}
+
+void
+rb_free(RBTree *rb)
+{
+    if (!rb)
+        return;
+
+    if (rb->root != RBNIL)
+        rb_free_recursive(rb->root, rb->freefunc);
+
+    pfree(rb);
+}
+
+/**********************************************************************
+ *                          Search                                      *
+ **********************************************************************/
+
+void *
+rb_find(RBTree *rb, void *data)
+{
+    RBNode       *node = rb->root;
+    int            cmp;
+
+    while (node != RBNIL)
+    {
+        cmp = rb->comparator(data, node->data, rb->arg);
+
+        if (cmp == 0)
+            return node->data;
+        else if (cmp < 0)
+            node = node->left;
+        else
+            node = node->right;
+    }
+
+    return NULL;
+}
+
+/**********************************************************************
+ *                              Insertion                                  *
+ **********************************************************************/
+
+/*
+ * Rotate node x to left.
+ *
+ * x's right child takes its place in the tree, and x becomes the left
+ * child of that node.
+ */
+static void
+rb_rotate_left(RBTree *rb, RBNode *x)
+{
+    RBNode       *y = x->right;
+
+    /* establish x->right link */
+    x->right = y->left;
+    if (y->left != RBNIL)
+        y->left->parent = x;
+
+    /* establish y->parent link */
+    if (y != RBNIL)
+        y->parent = x->parent;
+    if (x->parent)
+    {
+        if (x == x->parent->left)
+            x->parent->left = y;
+        else
+            x->parent->right = y;
+    }
+    else
+    {
+        rb->root = y;
+    }
+
+    /* link x and y */
+    y->left = x;
+    if (x != RBNIL)
+        x->parent = y;
+}
+
+/*
+ * Rotate node x to right.
+ *
+ * x's left right child takes its place in the tree, and x becomes the right
+ * child of that node.
+ */
+static void
+rb_rotate_right(RBTree *rb, RBNode *x)
+{
+    RBNode       *y = x->left;
+
+    /* establish x->left link */
+    x->left = y->right;
+    if (y->right != RBNIL)
+        y->right->parent = x;
+
+    /* establish y->parent link */
+    if (y != RBNIL)
+        y->parent = x->parent;
+    if (x->parent)
+    {
+        if (x == x->parent->right)
+            x->parent->right = y;
+        else
+            x->parent->left = y;
+    }
+    else
+    {
+        rb->root = y;
+    }
+
+    /* link x and y */
+    y->right = x;
+    if (x != RBNIL)
+        x->parent = y;
+}
+
+/*
+ * Maintain Red-Black tree balance after inserting node x.
+ *
+ * The newly inserted node is always initially marked red.  That may lead to
+ * a situation where a red node has a red child, which is prohibited.  We can
+ * always fix the problem by a series of color changes and/or "rotations",
+ * which move the problem progressively higher up in the tree.  If one of the
+ * two red nodes is the root, we can always fix the problem by changing the
+ * root from red to black.
+ *
+ * (This does not work lower down in the tree because we must also maintain
+ * the invariant that every leaf has equal black-height.)
+ */
+static void
+rb_insert_fixup(RBTree *rb, RBNode *x)
+{
+    /*
+     * x is always a red node.  Initially, it is the newly inserted node.
+     * Each iteration of this loop moves it higher up in the tree.
+     */
+    while (x != rb->root && x->parent->color == RED)
+    {
+        /*
+         * x and x->parent are both red.  Fix depends on whether x->parent is
+         * a left or right child.  In either case, we define y to be the
+         * "uncle" of x, that is, the other child of x's grandparent.
+         *
+         * If the uncle is red, we flip the grandparent to red and its two
+         * children to black.  Then we loop around again to check whether the
+         * grandparent still has a problem.
+         *
+         * If the uncle is black, we will perform one or two "rotations" to
+         * balance the tree.  Either x or x->parent will take the grandparent's
+         * position in the tree and recolored black, and the original
+         * grandparent will be recolored red and become a child of that node.
+         * This always leaves us with a valid red-black tree, so the loop
+         * will terminate.
+         */
+        if (x->parent == x->parent->parent->left)
+        {
+            RBNode       *y = x->parent->parent->right;
+
+            if (y->color == RED)
+            {
+                /* uncle is RED */
+                x->parent->color = BLACK;
+                y->color = BLACK;
+                x->parent->parent->color = RED;
+                x = x->parent->parent;
+            }
+            else
+            {
+                /* uncle is BLACK */
+                if (x == x->parent->right)
+                {
+                    /* make x a left child */
+                    x = x->parent;
+                    rb_rotate_left(rb, x);
+                }
+
+                /* recolor and rotate */
+                x->parent->color = BLACK;
+                x->parent->parent->color = RED;
+                rb_rotate_right(rb, x->parent->parent);
+            }
+        }
+        else
+        {
+            /* mirror image of above code */
+            RBNode       *y = x->parent->parent->left;
+
+            if (y->color == RED)
+            {
+                /* uncle is RED */
+                x->parent->color = BLACK;
+                y->color = BLACK;
+                x->parent->parent->color = RED;
+                x = x->parent->parent;
+            }
+            else
+            {
+                /* uncle is BLACK */
+                if (x == x->parent->left)
+                {
+                    x = x->parent;
+                    rb_rotate_right(rb, x);
+                }
+                x->parent->color = BLACK;
+                x->parent->parent->color = RED;
+                rb_rotate_left(rb, x->parent->parent);
+            }
+        }
+    }
+
+    /*
+     * The root may already have been black; if not, the black-height of every
+     * node in the tree increases by one.
+     */
+    rb->root->color = BLACK;
+}
+
+/*
+ * Allocate node for data and insert in tree.
+ *
+ * Return old data (or result of appendator method) if it exists and NULL
+ * otherwise.
+ */
+void *
+rb_insert(RBTree *rb, void *data)
+{
+    RBNode       *current,
+               *parent,
+               *x;
+    int            cmp;
+
+    /* find where node belongs */
+    current = rb->root;
+    parent = NULL;
+    while (current != RBNIL)
+    {
+        cmp = rb->comparator(data, current->data, rb->arg);
+        if (cmp == 0)
+        {
+            /*
+             * Found node with given key.  If appendator method is provided,
+             * call it to join old and new data; else, new data replaces old
+             * data.
+             */
+            if (rb->appendator)
+            {
+                current->data = rb->appendator(current->data, data, rb->arg);
+                return current->data;
+            }
+            else
+            {
+                void       *old = current->data;
+
+                current->data = data;
+                return old;
+            }
+        }
+        parent = current;
+        current = (cmp < 0) ? current->left : current->right;
+    }
+
+    /* setup new node in tree */
+    x = palloc(sizeof(RBNode));
+    x->data = data;
+    x->parent = parent;
+    x->left = RBNIL;
+    x->right = RBNIL;
+    x->color = RED;
+    x->iteratorState = InitialState;
+
+    /* insert node in tree */
+    if (parent)
+    {
+        if (cmp < 0)
+            parent->left = x;
+        else
+            parent->right = x;
+    }
+    else
+    {
+        rb->root = x;
+    }
+
+    rb_insert_fixup(rb, x);
+    return NULL;
+}
+
+/**********************************************************************
+ *                            Deletion                                  *
+ **********************************************************************/
+
+/*
+ * Maintain Red-Black tree balance after deleting a black node.
+ */
+static void
+rb_delete_fixup(RBTree *rb, RBNode *x)
+{
+    /*
+     * x is always a black node.  Initially, it is the former child of the
+     * deleted node.  Each iteration of this loop moves it higher up in the
+     * tree.
+     */
+    while (x != rb->root && x->color == BLACK)
+    {
+        /*
+         * Left and right cases are symmetric.  Any nodes that are children
+         * of x have a black-height one less than the remainder of the nodes
+         * in the tree.  We rotate and recolor nodes to move the problem up
+         * the tree: at some stage we'll either fix the problem, or reach the
+         * root (where the black-height is allowed to decrease).
+         */
+        if (x == x->parent->left)
+        {
+            RBNode       *w = x->parent->right;
+
+            if (w->color == RED)
+            {
+                w->color = BLACK;
+                x->parent->color = RED;
+                rb_rotate_left(rb, x->parent);
+                w = x->parent->right;
+            }
+
+            if (w->left->color == BLACK && w->right->color == BLACK)
+            {
+                w->color = RED;
+                x = x->parent;
+            }
+            else
+            {
+                if (w->right->color == BLACK)
+                {
+                    w->left->color = BLACK;
+                    w->color = RED;
+                    rb_rotate_right(rb, w);
+                    w = x->parent->right;
+                }
+                w->color = x->parent->color;
+                x->parent->color = BLACK;
+                w->right->color = BLACK;
+                rb_rotate_left(rb, x->parent);
+                x = rb->root;        /* Arrange for loop to terminate. */
+            }
+        }
+        else
+        {
+            RBNode       *w = x->parent->left;
+
+            if (w->color == RED)
+            {
+                w->color = BLACK;
+                x->parent->color = RED;
+                rb_rotate_right(rb, x->parent);
+                w = x->parent->left;
+            }
+
+            if (w->right->color == BLACK && w->left->color == BLACK)
+            {
+                w->color = RED;
+                x = x->parent;
+            }
+            else
+            {
+                if (w->left->color == BLACK)
+                {
+                    w->right->color = BLACK;
+                    w->color = RED;
+                    rb_rotate_left(rb, w);
+                    w = x->parent->left;
+                }
+                w->color = x->parent->color;
+                x->parent->color = BLACK;
+                w->left->color = BLACK;
+                rb_rotate_right(rb, x->parent);
+                x = rb->root;        /* Arrange for loop to terminate. */
+            }
+        }
+    }
+    x->color = BLACK;
+}
+
+/*
+ * Delete node z from tree.
+ */
+static void
+rb_delete_node(RBTree *rb, RBNode *z)
+{
+    RBNode       *x,
+               *y;
+
+    if (!z || z == RBNIL)
+        return;
+
+    /*
+     * y is the node that will actually be removed from the tree.  This will
+     * be z if z has fewer than two children, or the tree successor of z
+     * otherwise.
+     */
+    if (z->left == RBNIL || z->right == RBNIL)
+    {
+        /* y has a RBNIL node as a child */
+        y = z;
+    }
+    else
+    {
+        /* find tree successor */
+        y = z->right;
+        while (y->left != RBNIL)
+            y = y->left;
+    }
+
+    /* x is y's only child */
+    if (y->left != RBNIL)
+        x = y->left;
+    else
+        x = y->right;
+
+    /* Remove y from the tree. */
+    x->parent = y->parent;
+    if (y->parent)
+    {
+        if (y == y->parent->left)
+            y->parent->left = x;
+        else
+            y->parent->right = x;
+    }
+    else
+    {
+        rb->root = x;
+    }
+
+    /*
+     * If we removed the tree successor of z rather than z itself, then
+     * attach the data for the removed node to the one we were supposed to
+     * remove.
+     */
+    if (y != z)
+        z->data = y->data;
+
+    /*
+     * Removing a black node might make some paths from root to leaf contain
+     * fewer black nodes than others, or it might make two red nodes adjacent.
+     */
+    if (y->color == BLACK)
+        rb_delete_fixup(rb, x);
+
+    pfree(y);
+}
+
+extern void
+rb_delete(RBTree *rb, void *data)
+{
+    RBNode       *node = rb->root;
+    int            cmp;
+
+    while (node != RBNIL)
+    {
+        cmp = rb->comparator(data, node->data, rb->arg);
+
+        if (cmp == 0)
+        {
+            /* found node to delete */
+            if (rb->freefunc)
+                rb->freefunc(node->data);
+            node->data = NULL;
+            rb_delete_node(rb, node);
+            return;
+        }
+        else if (cmp < 0)
+            node = node->left;
+        else
+            node = node->right;
+    }
+}
+
+/*
+ * Return data on left most node and delete
+ * that node
+ */
+extern void *
+rb_leftmost(RBTree *rb)
+{
+    RBNode       *node = rb->root;
+    RBNode       *leftmost = rb->root;
+    void       *res = NULL;
+
+    while (node != RBNIL)
+    {
+        leftmost = node;
+        node = node->left;
+    }
+
+    if (leftmost != RBNIL)
+    {
+        res = leftmost->data;
+        leftmost->data = NULL;
+        rb_delete_node(rb, leftmost);
+    }
+
+    return res;
+}
+
+/**********************************************************************
+ *                          Traverse                                      *
+ **********************************************************************/
+
+static void *
+rb_next_node(RBTreeIterator *iterator, RBNode *node)
+{
+    node->iteratorState = InitialState;
+    iterator->node = node;
+    return iterator->iterate(iterator);
+}
+
+static void *
+rb_left_right_iterator(RBTreeIterator *iterator)
+{
+    RBNode       *node = iterator->node;
+
+    switch (node->iteratorState)
+    {
+        case InitialState:
+            if (node->left != RBNIL)
+            {
+                node->iteratorState = FirstStepDone;
+                return rb_next_node(iterator, node->left);
+            }
+        case FirstStepDone:
+            node->iteratorState = SecondStepDone;
+            return node->data;
+        case SecondStepDone:
+            if (node->right != RBNIL)
+            {
+                node->iteratorState = ThirdStepDone;
+                return rb_next_node(iterator, node->right);
+            }
+        case ThirdStepDone:
+            if (node->parent)
+            {
+                iterator->node = node->parent;
+                return iterator->iterate(iterator);
+            }
+            break;
+        default:
+            elog(ERROR, "Unknow node state: %d", node->iteratorState);
+    }
+
+    return NULL;
+}
+
+static void *
+rb_right_left_iterator(RBTreeIterator *iterator)
+{
+    RBNode       *node = iterator->node;
+
+    switch (node->iteratorState)
+    {
+        case InitialState:
+            if (node->right != RBNIL)
+            {
+                node->iteratorState = FirstStepDone;
+                return rb_next_node(iterator, node->right);
+            }
+        case FirstStepDone:
+            node->iteratorState = SecondStepDone;
+            return node->data;
+        case SecondStepDone:
+            if (node->left != RBNIL)
+            {
+                node->iteratorState = ThirdStepDone;
+                return rb_next_node(iterator, node->left);
+            }
+        case ThirdStepDone:
+            if (node->parent)
+            {
+                iterator->node = node->parent;
+                return iterator->iterate(iterator);
+            }
+            break;
+        default:
+            elog(ERROR, "Unknow node state: %d", node->iteratorState);
+    }
+
+    return NULL;
+}
+
+static void *
+rb_direct_iterator(RBTreeIterator *iterator)
+{
+    RBNode       *node = iterator->node;
+
+    switch (node->iteratorState)
+    {
+        case InitialState:
+            node->iteratorState = FirstStepDone;
+            return node->data;
+        case FirstStepDone:
+            if (node->left != RBNIL)
+            {
+                node->iteratorState = SecondStepDone;
+                return rb_next_node(iterator, node->left);
+            }
+        case SecondStepDone:
+            if (node->right != RBNIL)
+            {
+                node->iteratorState = ThirdStepDone;
+                return rb_next_node(iterator, node->right);
+            }
+        case ThirdStepDone:
+            if (node->parent)
+            {
+                iterator->node = node->parent;
+                return iterator->iterate(iterator);
+            }
+            break;
+        default:
+            elog(ERROR, "Unknow node state: %d", node->iteratorState);
+    }
+
+    return NULL;
+}
+
+static void *
+rb_inverted_iterator(RBTreeIterator *iterator)
+{
+    RBNode       *node = iterator->node;
+
+    switch (node->iteratorState)
+    {
+        case InitialState:
+            if (node->left != RBNIL)
+            {
+                node->iteratorState = FirstStepDone;
+                return rb_next_node(iterator, node->left);
+            }
+        case FirstStepDone:
+            if (node->right != RBNIL)
+            {
+                node->iteratorState = SecondStepDone;
+                return rb_next_node(iterator, node->right);
+            }
+        case SecondStepDone:
+            node->iteratorState = ThirdStepDone;
+            return node->data;
+        case ThirdStepDone:
+            if (node->parent)
+            {
+                iterator->node = node->parent;
+                return iterator->iterate(iterator);
+            }
+            break;
+        default:
+            elog(ERROR, "Unknow node state: %d", node->iteratorState);
+    }
+
+    return NULL;
+}
+
+RBTreeIterator *
+rb_begin_iterate(RBTree *rb, RBOrderControl ctrl)
+{
+    RBTreeIterator *iterator = palloc(sizeof(RBTreeIterator));
+
+    iterator->node = rb->root;
+    if (iterator->node != RBNIL)
+        iterator->node->iteratorState = InitialState;
+
+    switch (ctrl)
+    {
+        case LeftRightWalk:            /* visit left, then self, then right */
+            iterator->iterate = rb_left_right_iterator;
+            break;
+        case RightLeftWalk:            /* visit right, then self, then left */
+            iterator->iterate = rb_right_left_iterator;
+            break;
+        case DirectWalk:            /* visit self, then left, then right */
+            iterator->iterate = rb_direct_iterator;
+            break;
+        case InvertedWalk:            /* visit left, then right, then self */
+            iterator->iterate = rb_inverted_iterator;
+            break;
+        default:
+            elog(ERROR, "Unknown iterator order: %d", ctrl);
+    }
+
+    return iterator;
+}
+
+void *
+rb_iterate(RBTreeIterator *iterator)
+{
+    if (iterator->node == RBNIL)
+        return NULL;
+
+    return iterator->iterate(iterator);
+}
+
+void
+rb_free_iterator(RBTreeIterator *iterator)
+{
+    pfree(iterator);
+}
diff --git a/src/include/access/gin.h b/src/include/access/gin.h
index b96ff95..fcc5371 100644
--- a/src/include/access/gin.h
+++ b/src/include/access/gin.h
@@ -13,6 +13,7 @@
 #include "access/genam.h"
 #include "access/itup.h"
 #include "access/xlog.h"
+#include "utils/rbtree.h"
 #include "fmgr.h"


@@ -27,14 +28,6 @@
 #define GINNProcs                       5

 /*
- * Max depth allowed in search tree during bulk inserts.  This is to keep from
- * degenerating to O(N^2) behavior when the tree is unbalanced due to sorted
- * or nearly-sorted input.    (Perhaps it would be better to use a balanced-tree
- * algorithm, but in common cases that would only add useless overhead.)
- */
-#define GIN_MAX_TREE_DEPTH 100
-
-/*
  * Page opaque data in a inverted index page.
  *
  * Note: GIN does not include a page ID word as do the other index types.
@@ -571,26 +564,22 @@ extern Datum ginarrayconsistent(PG_FUNCTION_ARGS);
 typedef struct EntryAccumulator
 {
     OffsetNumber attnum;
+    bool        shouldSort;
     Datum        value;
     uint32        length;
     uint32        number;
     ItemPointerData *list;
-    bool        shouldSort;
-    struct EntryAccumulator *left;
-    struct EntryAccumulator *right;
 } EntryAccumulator;

 typedef struct
 {
     GinState   *ginstate;
-    EntryAccumulator *entries;
-    uint32        maxdepth;
-    EntryAccumulator **stack;
-    uint32        stackpos;
     long        allocatedMemory;
-
     uint32        length;
-    EntryAccumulator *entryallocator;
+    EntryAccumulator   *entryallocator;
+    ItemPointerData       *tmpList;
+    RBTree       *tree;
+    RBTreeIterator *iterator;
 } BuildAccumulator;

 extern void ginInitBA(BuildAccumulator *accum);
diff --git a/src/include/utils/rbtree.h b/src/include/utils/rbtree.h
new file mode 100644
index 0000000..6ce74b0
--- /dev/null
+++ b/src/include/utils/rbtree.h
@@ -0,0 +1,47 @@
+/*-------------------------------------------------------------------------
+ *
+ * rbtree.h
+ *    interface for PostgreSQL generic Red-Black binary tree package
+ *
+ * Copyright (c) 1996-2009, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *         $PostgreSQL: pgsql/src/backend/nodes/list.c,v 1.69 2008/01/01 19:45:50 momjian Exp $
+ *
+ *-------------------------------------------------------------------------
+ */
+
+#ifndef RBTREE_H
+#define RBTREE_H
+
+typedef struct RBTree RBTree;
+typedef struct RBTreeIterator RBTreeIterator;
+
+typedef int (*rb_comparator) (const void *a, const void *b, void *arg);
+typedef void* (*rb_appendator) (void *current, void *new, void *arg);
+typedef void (*rb_freefunc) (void *a);
+
+extern RBTree *rb_create(rb_comparator comparator,
+                            rb_appendator appendator,
+                            rb_freefunc freefunc,
+                            void *arg);
+extern void    rb_free(RBTree *rb);
+
+extern void *rb_find(RBTree *rb, void *data);
+extern void *rb_insert(RBTree *rb, void *data);
+extern void rb_delete(RBTree *rb, void *data);
+extern void *rb_leftmost(RBTree *rb);
+
+typedef enum RBOrderControl
+{
+    LeftRightWalk,
+    RightLeftWalk,
+    DirectWalk,
+    InvertedWalk
+} RBOrderControl;
+
+extern RBTreeIterator* rb_begin_iterate(RBTree *rb, RBOrderControl ctrl);
+extern void *rb_iterate(RBTreeIterator *iterator);
+extern void rb_free_iterator(RBTreeIterator *iterator);
+
+#endif


pgsql-hackers by date:

Previous
From: Tim Bunce
Date:
Subject: Add on_perl_init and proper destruction to plperl UPDATE v3 [PATCH]
Next
From: Simon Riggs
Date:
Subject: Re: Hot Standby: Relation-specific deferred conflict resolution