Thread: [CFReview] Red-Black Tree
Hi Robert, I've also spent some time reviewing this patch since it is a pre-requisite to the KNNGiST patch. I did have a much more comprehensive list of suggestions, but it seems you've managed to resolve most of these with your latest re-write. Please find some more comments inline: > Here's an edited version, which I've now reviewed more fully. Some > more substantive review comments: Firstly: the re-worked patch that you have posted seems to contain remnants of at least 2 other patches. I have extracted the rbtree-only sections and re-attached to this email. The patch was tested against git head 124a3cc... and applied without any fuzz or other issues. > 1. I'm pretty satisfied that the rbtree code is generally sane, > although I wonder if we should think about putting it in access/common > rather than utils/misc. I'm not sure that I have a sufficiently > clearly defined notion of what each subdirectory is for to draw a > definitive conclusion on this point; hopefully someone else will weigh > in. I'm happy that the code is a reasonable implementation of an RB-Tree, at least with respect to the link to the related public domain source that was posted. In terms of location, I think utils/misc is a reasonable place for it to live since I see it as analogous to the hash table implementation, i.e. it's a template RB-Tree implementation designed to be used as required throughout the codebase. backend/access seems to be the home of index AMs only. Other code points: - The new names for the iterator enum seem much better to me - or at least it helped understand the meaning of the iterator code. - You correctly picked up on the namespace issue, although I noticed that you left RED and BLACK as they were. Maybe RBRED and RBBLACK would be better, though not that there are any colour related defines around in a database backend to make a name clash probable. - I found the name of the "appendator" method misleading - perhaps "updater" would make more sense? > 2. I'm a little concerned about the performance implications of this > patch. Looking at http://www.sai.msu.su/~megera/wiki/2009-04-03, it's > clear that the patch is a huge win in some cases. But I'm also > surprised by how much performance is lost in some of the other cases > that you tested. I suspect, on balance, that it's probably still a > good idea to put this in, but I wonder if you've profiled this at all > to see where the extra time is going and/or explored possible ways of > squashing that overhead down a little more. > > 3. In ginInsertEntry(), the "else" branch of the if statement really > looks like magic when you first read it. I wonder if it would make > sense to pull the contents of EAAllocate() directly into this > function, since there's only one call site anyhow. God yes. This is not a good example of maintainable code - even with your comment I struggled for a while to try and figure it out :( I would suggest that this is refactored appropriately before commit. > I still have not done any performance testing of my own on this code, > and it probably needs that. If anyone else has time to step up and > help with that, it would be much appreciated. It would be useful to > have some plain old benchmarks as well as some profiling data as > mentioned above. As part of my testing, I attempted to duplicate some of the benchmarks at http://www.sai.msu.su/~megera/wiki/2009-04-03. Unfortunately I was not really able to reproduce the RND (teodor's) dataset, nor the random array test as the SQL used to test the implementation was not present on the page above. For each test, I dropped and recreated the database to ensure that any autovacuum impact would be the same. 1) Fixed length random & sequential string arrays SELECT array_to_string(ARRAY(select '' || a || '.' || b from generate_series(1,50) b), ' ')::tsvector AS i INTO seq FROM generate_series(1,100000) a; SELECT array_to_string(ARRAY(select '' || random() from generate_series(1,50) b), ' ')::tsvector AS i INTO rnd FROM generate_series(1,100000) a; Before patch: test=# create index seq_idx on seq using gin (i); CREATE INDEX Time: 103205.790 ms test=# create index rnd_idx on rnd using gin (i); CREATE INDEX Time: 6770.131 ms After patch: test=# create index seq_idx on seq using gin (i); CREATE INDEX Time: 87724.953 ms test=# create index rnd_idx on rnd using gin (i); CREATE INDEX Time: 5596.666 ms 2) Identical records, variable length test select ARRAY(select generate_series(1,len)) as a50 into arr50 from generate_series(1,100000) b; Before patch: i) len=3 select ARRAY(select generate_series(1,3)) as a3 into arr3 from generate_series(1,100000) b; test=# create index arr3_idx on arr3 using gin (a3); CREATE INDEX Time: 324.251 ms ii) len=30 select ARRAY(select generate_series(1,30)) as a30 into arr30 from generate_series(1,100000) b; test=# create index arr30_idx on arr30 using gin (a30); CREATE INDEX Time: 2163.786 ms iii) len=50 select ARRAY(select generate_series(1,50)) as a50 into arr50 from generate_series(1,100000) b; test=# create index arr50_idx on arr50 using gin (a50); CREATE INDEX Time: 3306.074 ms iv) len=random select ARRAY(select generate_series(1, (random() * 100)::int)) as arand into arrrand from generate_series(1,100000) b; test=# create index arrrand_idx on arrrand using gin (arand); CREATE INDEX Time: 4725.556 ms After patch: i) len=3 select ARRAY(select generate_series(1,3)) as a3 into arr3 from generate_series(1,100000) b; test=# create index arr3_idx on arr3 using gin (a3); CREATE INDEX Time: 299.090 ms ii) len=30 select ARRAY(select generate_series(1,30)) as a30 into arr30 from generate_series(1,100000) b; test=# create index arr30_idx on arr30 using gin (a30); CREATE INDEX Time: 2828.424 ms iii) len=50 select ARRAY(select generate_series(1,50)) as a50 into arr50 from generate_series(1,100000) b; test=# create index arr50_idx on arr50 using gin (a50); CREATE INDEX Time: 3984.456 ms iv) len=random select ARRAY(select generate_series(1, (random() * 100)::int)) as arand into arrrand from generate_series(1,100000) b; test=# create index arrrand_idx on arrrand using gin (arand); CREATE INDEX Time: 3514.972 ms Summary ======= I believe Robert has done a good deal of the groundwork required to get this patch ready for inclusion. With the current version, I was able to see a measurable performance improvement in some test cases with no significant performance regression. It would have been nice to be able to reproduce the whole set of test cases but this was not possible from the information specified. With perhaps some minor tweaks to some of the names and a rework of the else clause in ginInsertEntry(), I feel this patch is reasonably close to commit. I shall now continue my review of the associated KNNGiST patch. ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs diff --git a/src/backend/access/gin/ginbulk.c b/src/backend/access/gin/ginbulk.c index 954884d..e043713 100644 --- a/src/backend/access/gin/ginbulk.c +++ b/src/backend/access/gin/ginbulk.c @@ -22,15 +22,60 @@ #define DEF_NENTRY 2048 #define DEF_NPTR 4 +static void* +ginAppendData(void *old, void *new, void *arg) +{ + EntryAccumulator *eo = (EntryAccumulator*)old, + *en = (EntryAccumulator*)new; + + BuildAccumulator *accum = (BuildAccumulator*)arg; + + if (eo->number >= eo->length) + { + accum->allocatedMemory -= GetMemoryChunkSpace(eo->list); + eo->length *= 2; + eo->list = (ItemPointerData *) repalloc(eo->list, + sizeof(ItemPointerData) * eo->length); + accum->allocatedMemory += GetMemoryChunkSpace(eo->list); + } + + /* If item pointers are not ordered, they will need to be sorted. */ + if (eo->shouldSort == FALSE) + { + int res; + + res = compareItemPointers(eo->list + eo->number - 1, en->list); + Assert(res != 0); + + if (res > 0) + eo->shouldSort = TRUE; + } + + eo->list[eo->number] = en->list[0]; + eo->number++; + + return old; +} + +static int +cmpEntryAccumulator(const void *a, const void *b, void *arg) +{ + EntryAccumulator *ea = (EntryAccumulator*)a; + EntryAccumulator *eb = (EntryAccumulator*)b; + BuildAccumulator *accum = (BuildAccumulator*)arg; + + return compareAttEntries(accum->ginstate, ea->attnum, ea->value, + eb->attnum, eb->value); +} + void ginInitBA(BuildAccumulator *accum) { - accum->maxdepth = 1; - accum->stackpos = 0; - accum->entries = NULL; - accum->stack = NULL; accum->allocatedMemory = 0; accum->entryallocator = NULL; + accum->tree = rb_create(cmpEntryAccumulator, ginAppendData, NULL, accum); + accum->iterator = NULL; + accum->tmpList = NULL; } static EntryAccumulator * @@ -48,36 +93,6 @@ EAAllocate(BuildAccumulator *accum) } /* - * Stores heap item pointer. For robust, it checks that - * item pointer are ordered - */ -static void -ginInsertData(BuildAccumulator *accum, EntryAccumulator *entry, ItemPointer heapptr) -{ - if (entry->number >= entry->length) - { - accum->allocatedMemory -= GetMemoryChunkSpace(entry->list); - entry->length *= 2; - entry->list = (ItemPointerData *) repalloc(entry->list, - sizeof(ItemPointerData) * entry->length); - accum->allocatedMemory += GetMemoryChunkSpace(entry->list); - } - - if (entry->shouldSort == FALSE) - { - int res = compareItemPointers(entry->list + entry->number - 1, heapptr); - - Assert(res != 0); - - if (res > 0) - entry->shouldSort = TRUE; - } - - entry->list[entry->number] = *heapptr; - entry->number++; -} - -/* * This is basically the same as datumCopy(), but modified to count * palloc'd space in accum. */ @@ -103,57 +118,38 @@ getDatumCopy(BuildAccumulator *accum, OffsetNumber attnum, Datum value) static void ginInsertEntry(BuildAccumulator *accum, ItemPointer heapptr, OffsetNumber attnum, Datum entry) { - EntryAccumulator *ea = accum->entries, - *pea = NULL; - int res = 0; - uint32 depth = 1; + EntryAccumulator *key = EAAllocate(accum), + *ea; - while (ea) - { - res = compareAttEntries(accum->ginstate, attnum, entry, ea->attnum, ea->value); - if (res == 0) - break; /* found */ - else - { - pea = ea; - if (res < 0) - ea = ea->left; - else - ea = ea->right; - } - depth++; - } + key->attnum = attnum; + key->value = entry; + if (accum->tmpList == NULL) + accum->tmpList = + (ItemPointerData *) palloc(sizeof(ItemPointerData) * DEF_NPTR); + key->list = accum->tmpList; + key->list[0] = *heapptr; - if (depth > accum->maxdepth) - accum->maxdepth = depth; + ea = rb_insert(accum->tree, key); if (ea == NULL) { - ea = EAAllocate(accum); - - ea->left = ea->right = NULL; - ea->attnum = attnum; - ea->value = getDatumCopy(accum, attnum, entry); - ea->length = DEF_NPTR; - ea->number = 1; - ea->shouldSort = FALSE; - ea->list = (ItemPointerData *) palloc(sizeof(ItemPointerData) * DEF_NPTR); - accum->allocatedMemory += GetMemoryChunkSpace(ea->list); - ea->list[0] = *heapptr; - - if (pea == NULL) - accum->entries = ea; - else - { - Assert(res != 0); - if (res < 0) - pea->left = ea; - else - pea->right = ea; - } + /* + * The key has been inserted, so continue initialization. + */ + key->value = getDatumCopy(accum, attnum, entry); + key->length = DEF_NPTR; + key->number = 1; + key->shouldSort = FALSE; + accum->allocatedMemory += GetMemoryChunkSpace(key->list); + accum->tmpList = NULL; } else - ginInsertData(accum, ea, heapptr); + { + /* + * The key has been appended, so reset + */ + accum->length--; + } } /* @@ -219,86 +215,16 @@ qsortCompareItemPointers(const void *a, const void *b) return res; } -/* - * walk on binary tree and returns ordered nodes - */ -static EntryAccumulator * -walkTree(BuildAccumulator *accum) -{ - EntryAccumulator *entry = accum->stack[accum->stackpos]; - - if (entry->list != NULL) - { - /* return entry itself: we already was at left sublink */ - return entry; - } - else if (entry->right && entry->right != accum->stack[accum->stackpos + 1]) - { - /* go on right sublink */ - accum->stackpos++; - entry = entry->right; - - /* find most-left value */ - for (;;) - { - accum->stack[accum->stackpos] = entry; - if (entry->left) - { - accum->stackpos++; - entry = entry->left; - } - else - break; - } - } - else - { - /* we already return all left subtree, itself and right subtree */ - if (accum->stackpos == 0) - return 0; - accum->stackpos--; - return walkTree(accum); - } - - return entry; -} - ItemPointerData * ginGetEntry(BuildAccumulator *accum, OffsetNumber *attnum, Datum *value, uint32 *n) { EntryAccumulator *entry; ItemPointerData *list; - if (accum->stack == NULL) - { - /* first call */ - accum->stack = palloc0(sizeof(EntryAccumulator *) * (accum->maxdepth + 1)); - accum->allocatedMemory += GetMemoryChunkSpace(accum->stack); - entry = accum->entries; - - if (entry == NULL) - return NULL; - - /* find most-left value */ - for (;;) - { - accum->stack[accum->stackpos] = entry; - if (entry->left) - { - accum->stackpos++; - entry = entry->left; - } - else - break; - } - } - else - { - accum->allocatedMemory -= GetMemoryChunkSpace(accum->stack[accum->stackpos]->list); - pfree(accum->stack[accum->stackpos]->list); - accum->stack[accum->stackpos]->list = NULL; - entry = walkTree(accum); - } + if (accum->iterator == NULL) + accum->iterator = rb_begin_iterate(accum->tree, LeftRightWalk); + + entry = rb_iterate(accum->iterator); if (entry == NULL) return NULL; diff --git a/src/backend/access/gin/ginfast.c b/src/backend/access/gin/ginfast.c index 8d48cdf..3fb4441 100644 --- a/src/backend/access/gin/ginfast.c +++ b/src/backend/access/gin/ginfast.c @@ -765,8 +765,7 @@ ginInsertCleanup(Relation index, GinState *ginstate, */ if (GinPageGetOpaque(page)->rightlink == InvalidBlockNumber || (GinPageHasFullRow(page) && - (accum.allocatedMemory >= maintenance_work_mem * 1024L || - accum.maxdepth > GIN_MAX_TREE_DEPTH))) + (accum.allocatedMemory >= maintenance_work_mem * 1024L))) { ItemPointerData *list; uint32 nlist; diff --git a/src/backend/access/gin/gininsert.c b/src/backend/access/gin/gininsert.c index 902e361..97fc417 100644 --- a/src/backend/access/gin/gininsert.c +++ b/src/backend/access/gin/gininsert.c @@ -247,9 +247,7 @@ ginBuildCallback(Relation index, HeapTuple htup, Datum *values, &htup->t_self); /* If we've maxed out our available memory, dump everything to the index */ - /* Also dump if the tree seems to be getting too unbalanced */ - if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L || - buildstate->accum.maxdepth > GIN_MAX_TREE_DEPTH) + if (buildstate->accum.allocatedMemory >= maintenance_work_mem * 1024L) { ItemPointerData *list; Datum entry; diff --git a/src/backend/utils/misc/Makefile b/src/backend/utils/misc/Makefile index 03a155c..e8866df 100644 --- a/src/backend/utils/misc/Makefile +++ b/src/backend/utils/misc/Makefile @@ -14,7 +14,8 @@ include $(top_builddir)/src/Makefile.global override CPPFLAGS := -I. -I$(srcdir) $(CPPFLAGS) -OBJS = guc.o help_config.o pg_rusage.o ps_status.o superuser.o tzparser.o +OBJS = guc.o help_config.o pg_rusage.o ps_status.o superuser.o tzparser.o \ + rbtree.o # This location might depend on the installation directories. Therefore # we can't subsitute it into pg_config.h. diff --git a/src/backend/utils/misc/rbtree.c b/src/backend/utils/misc/rbtree.c new file mode 100644 index 0000000..b10c669 --- /dev/null +++ b/src/backend/utils/misc/rbtree.c @@ -0,0 +1,814 @@ +/*------------------------------------------------------------------------- + * + * rbtree.c + * implementation for PostgreSQL generic Red-Black binary tree package + * Adopted from http://algolist.manual.ru/ds/rbtree.php + * + * This code comes from Thomas Niemann's "Sorting and Searching Algorithms: + * a Cookbook". + * + * See http://www.cs.auckland.ac.nz/software/AlgAnim/niemann/s_man.htm for + * license terms: "Source code, when part of a software project, may be used + * freely without reference to the author." + * + * Red-black trees are a type of balanced binary tree wherein (1) any child of + * a red node is always black, and (2) every path from root to leaf traverses + * an equal number of black nodes. From these properties, it follows that the + * longest path from root to leaf is only about twice as long as the shortest, + * so lookups are guaranteed to run in O(lg n) time. + * + * Copyright (c) 1996-2009, PostgreSQL Global Development Group + * + * IDENTIFICATION + * $PostgreSQL: pgsql/src/backend/nodes/rbtree.c,v 1.69 2008/01/01 19:45:50 momjian Exp $ + * + *------------------------------------------------------------------------- + */ +#include "postgres.h" + +#include "utils/rbtree.h" + +/********************************************************************** + * Declarations * + **********************************************************************/ + +/* + * Values for RBNode->iteratorState + */ +#define InitialState (0) +#define FirstStepDone (1) +#define SecondStepDone (2) +#define ThirdStepDone (3) + +/* + * Colors of node + */ +#define BLACK (0) +#define RED (1) + +typedef struct RBNode +{ + uint32 iteratorState:2, + color: 1 , + unused: 29; + struct RBNode *left; + struct RBNode *right; + struct RBNode *parent; + void *data; +} RBNode; + +struct RBTree +{ + RBNode *root; + rb_comparator comparator; + rb_appendator appendator; + rb_freefunc freefunc; + void *arg; +}; + +struct RBTreeIterator +{ + RBNode *node; + void *(*iterate) (RBTreeIterator *iterator); +}; + +/* + * all leafs are sentinels, use castimized NIL name to prevent + * collision with sytem-wide NIL which is actually NULL + */ +#define RBNIL &sentinel + +RBNode sentinel = {InitialState, BLACK, 0, RBNIL, RBNIL, NULL, NULL}; + +/********************************************************************** + * Create/free * + **********************************************************************/ + +RBTree * +rb_create(rb_comparator comparator, rb_appendator appendator, + rb_freefunc freefunc, void *arg) +{ + RBTree *tree = palloc(sizeof(RBTree)); + + tree->root = RBNIL; + tree->comparator = comparator; + tree->appendator = appendator; + tree->freefunc = freefunc; + tree->arg = arg; + + return tree; +} + +static void +rb_free_recursive(RBNode *node, rb_freefunc freefunc) +{ + if (node->left != RBNIL) + rb_free_recursive(node->left, freefunc); + if (node->right != RBNIL) + rb_free_recursive(node->right, freefunc); + if (freefunc && node->data) + freefunc(node->data); + pfree(node); +} + +void +rb_free(RBTree *rb) +{ + if (!rb) + return; + + if (rb->root != RBNIL) + rb_free_recursive(rb->root, rb->freefunc); + + pfree(rb); +} + +/********************************************************************** + * Search * + **********************************************************************/ + +void * +rb_find(RBTree *rb, void *data) +{ + RBNode *node = rb->root; + int cmp; + + while (node != RBNIL) + { + cmp = rb->comparator(data, node->data, rb->arg); + + if (cmp == 0) + return node->data; + else if (cmp < 0) + node = node->left; + else + node = node->right; + } + + return NULL; +} + +/********************************************************************** + * Insertion * + **********************************************************************/ + +/* + * Rotate node x to left. + * + * x's right child takes its place in the tree, and x becomes the left + * child of that node. + */ +static void +rb_rotate_left(RBTree *rb, RBNode *x) +{ + RBNode *y = x->right; + + /* establish x->right link */ + x->right = y->left; + if (y->left != RBNIL) + y->left->parent = x; + + /* establish y->parent link */ + if (y != RBNIL) + y->parent = x->parent; + if (x->parent) + { + if (x == x->parent->left) + x->parent->left = y; + else + x->parent->right = y; + } + else + { + rb->root = y; + } + + /* link x and y */ + y->left = x; + if (x != RBNIL) + x->parent = y; +} + +/* + * Rotate node x to right. + * + * x's left right child takes its place in the tree, and x becomes the right + * child of that node. + */ +static void +rb_rotate_right(RBTree *rb, RBNode *x) +{ + RBNode *y = x->left; + + /* establish x->left link */ + x->left = y->right; + if (y->right != RBNIL) + y->right->parent = x; + + /* establish y->parent link */ + if (y != RBNIL) + y->parent = x->parent; + if (x->parent) + { + if (x == x->parent->right) + x->parent->right = y; + else + x->parent->left = y; + } + else + { + rb->root = y; + } + + /* link x and y */ + y->right = x; + if (x != RBNIL) + x->parent = y; +} + +/* + * Maintain Red-Black tree balance after inserting node x. + * + * The newly inserted node is always initially marked red. That may lead to + * a situation where a red node has a red child, which is prohibited. We can + * always fix the problem by a series of color changes and/or "rotations", + * which move the problem progressively higher up in the tree. If one of the + * two red nodes is the root, we can always fix the problem by changing the + * root from red to black. + * + * (This does not work lower down in the tree because we must also maintain + * the invariant that every leaf has equal black-height.) + */ +static void +rb_insert_fixup(RBTree *rb, RBNode *x) +{ + /* + * x is always a red node. Initially, it is the newly inserted node. + * Each iteration of this loop moves it higher up in the tree. + */ + while (x != rb->root && x->parent->color == RED) + { + /* + * x and x->parent are both red. Fix depends on whether x->parent is + * a left or right child. In either case, we define y to be the + * "uncle" of x, that is, the other child of x's grandparent. + * + * If the uncle is red, we flip the grandparent to red and its two + * children to black. Then we loop around again to check whether the + * grandparent still has a problem. + * + * If the uncle is black, we will perform one or two "rotations" to + * balance the tree. Either x or x->parent will take the grandparent's + * position in the tree and recolored black, and the original + * grandparent will be recolored red and become a child of that node. + * This always leaves us with a valid red-black tree, so the loop + * will terminate. + */ + if (x->parent == x->parent->parent->left) + { + RBNode *y = x->parent->parent->right; + + if (y->color == RED) + { + /* uncle is RED */ + x->parent->color = BLACK; + y->color = BLACK; + x->parent->parent->color = RED; + x = x->parent->parent; + } + else + { + /* uncle is BLACK */ + if (x == x->parent->right) + { + /* make x a left child */ + x = x->parent; + rb_rotate_left(rb, x); + } + + /* recolor and rotate */ + x->parent->color = BLACK; + x->parent->parent->color = RED; + rb_rotate_right(rb, x->parent->parent); + } + } + else + { + /* mirror image of above code */ + RBNode *y = x->parent->parent->left; + + if (y->color == RED) + { + /* uncle is RED */ + x->parent->color = BLACK; + y->color = BLACK; + x->parent->parent->color = RED; + x = x->parent->parent; + } + else + { + /* uncle is BLACK */ + if (x == x->parent->left) + { + x = x->parent; + rb_rotate_right(rb, x); + } + x->parent->color = BLACK; + x->parent->parent->color = RED; + rb_rotate_left(rb, x->parent->parent); + } + } + } + + /* + * The root may already have been black; if not, the black-height of every + * node in the tree increases by one. + */ + rb->root->color = BLACK; +} + +/* + * Allocate node for data and insert in tree. + * + * Return old data (or result of appendator method) if it exists and NULL + * otherwise. + */ +void * +rb_insert(RBTree *rb, void *data) +{ + RBNode *current, + *parent, + *x; + int cmp; + + /* find where node belongs */ + current = rb->root; + parent = NULL; + while (current != RBNIL) + { + cmp = rb->comparator(data, current->data, rb->arg); + if (cmp == 0) + { + /* + * Found node with given key. If appendator method is provided, + * call it to join old and new data; else, new data replaces old + * data. + */ + if (rb->appendator) + { + current->data = rb->appendator(current->data, data, rb->arg); + return current->data; + } + else + { + void *old = current->data; + + current->data = data; + return old; + } + } + parent = current; + current = (cmp < 0) ? current->left : current->right; + } + + /* setup new node in tree */ + x = palloc(sizeof(RBNode)); + x->data = data; + x->parent = parent; + x->left = RBNIL; + x->right = RBNIL; + x->color = RED; + x->iteratorState = InitialState; + + /* insert node in tree */ + if (parent) + { + if (cmp < 0) + parent->left = x; + else + parent->right = x; + } + else + { + rb->root = x; + } + + rb_insert_fixup(rb, x); + return NULL; +} + +/********************************************************************** + * Deletion * + **********************************************************************/ + +/* + * Maintain Red-Black tree balance after deleting a black node. + */ +static void +rb_delete_fixup(RBTree *rb, RBNode *x) +{ + /* + * x is always a black node. Initially, it is the former child of the + * deleted node. Each iteration of this loop moves it higher up in the + * tree. + */ + while (x != rb->root && x->color == BLACK) + { + /* + * Left and right cases are symmetric. Any nodes that are children + * of x have a black-height one less than the remainder of the nodes + * in the tree. We rotate and recolor nodes to move the problem up + * the tree: at some stage we'll either fix the problem, or reach the + * root (where the black-height is allowed to decrease). + */ + if (x == x->parent->left) + { + RBNode *w = x->parent->right; + + if (w->color == RED) + { + w->color = BLACK; + x->parent->color = RED; + rb_rotate_left(rb, x->parent); + w = x->parent->right; + } + + if (w->left->color == BLACK && w->right->color == BLACK) + { + w->color = RED; + x = x->parent; + } + else + { + if (w->right->color == BLACK) + { + w->left->color = BLACK; + w->color = RED; + rb_rotate_right(rb, w); + w = x->parent->right; + } + w->color = x->parent->color; + x->parent->color = BLACK; + w->right->color = BLACK; + rb_rotate_left(rb, x->parent); + x = rb->root; /* Arrange for loop to terminate. */ + } + } + else + { + RBNode *w = x->parent->left; + + if (w->color == RED) + { + w->color = BLACK; + x->parent->color = RED; + rb_rotate_right(rb, x->parent); + w = x->parent->left; + } + + if (w->right->color == BLACK && w->left->color == BLACK) + { + w->color = RED; + x = x->parent; + } + else + { + if (w->left->color == BLACK) + { + w->right->color = BLACK; + w->color = RED; + rb_rotate_left(rb, w); + w = x->parent->left; + } + w->color = x->parent->color; + x->parent->color = BLACK; + w->left->color = BLACK; + rb_rotate_right(rb, x->parent); + x = rb->root; /* Arrange for loop to terminate. */ + } + } + } + x->color = BLACK; +} + +/* + * Delete node z from tree. + */ +static void +rb_delete_node(RBTree *rb, RBNode *z) +{ + RBNode *x, + *y; + + if (!z || z == RBNIL) + return; + + /* + * y is the node that will actually be removed from the tree. This will + * be z if z has fewer than two children, or the tree successor of z + * otherwise. + */ + if (z->left == RBNIL || z->right == RBNIL) + { + /* y has a RBNIL node as a child */ + y = z; + } + else + { + /* find tree successor */ + y = z->right; + while (y->left != RBNIL) + y = y->left; + } + + /* x is y's only child */ + if (y->left != RBNIL) + x = y->left; + else + x = y->right; + + /* Remove y from the tree. */ + x->parent = y->parent; + if (y->parent) + { + if (y == y->parent->left) + y->parent->left = x; + else + y->parent->right = x; + } + else + { + rb->root = x; + } + + /* + * If we removed the tree successor of z rather than z itself, then + * attach the data for the removed node to the one we were supposed to + * remove. + */ + if (y != z) + z->data = y->data; + + /* + * Removing a black node might make some paths from root to leaf contain + * fewer black nodes than others, or it might make two red nodes adjacent. + */ + if (y->color == BLACK) + rb_delete_fixup(rb, x); + + pfree(y); +} + +extern void +rb_delete(RBTree *rb, void *data) +{ + RBNode *node = rb->root; + int cmp; + + while (node != RBNIL) + { + cmp = rb->comparator(data, node->data, rb->arg); + + if (cmp == 0) + { + /* found node to delete */ + if (rb->freefunc) + rb->freefunc(node->data); + node->data = NULL; + rb_delete_node(rb, node); + return; + } + else if (cmp < 0) + node = node->left; + else + node = node->right; + } +} + +/* + * Return data on left most node and delete + * that node + */ +extern void * +rb_leftmost(RBTree *rb) +{ + RBNode *node = rb->root; + RBNode *leftmost = rb->root; + void *res = NULL; + + while (node != RBNIL) + { + leftmost = node; + node = node->left; + } + + if (leftmost != RBNIL) + { + res = leftmost->data; + leftmost->data = NULL; + rb_delete_node(rb, leftmost); + } + + return res; +} + +/********************************************************************** + * Traverse * + **********************************************************************/ + +static void * +rb_next_node(RBTreeIterator *iterator, RBNode *node) +{ + node->iteratorState = InitialState; + iterator->node = node; + return iterator->iterate(iterator); +} + +static void * +rb_left_right_iterator(RBTreeIterator *iterator) +{ + RBNode *node = iterator->node; + + switch (node->iteratorState) + { + case InitialState: + if (node->left != RBNIL) + { + node->iteratorState = FirstStepDone; + return rb_next_node(iterator, node->left); + } + case FirstStepDone: + node->iteratorState = SecondStepDone; + return node->data; + case SecondStepDone: + if (node->right != RBNIL) + { + node->iteratorState = ThirdStepDone; + return rb_next_node(iterator, node->right); + } + case ThirdStepDone: + if (node->parent) + { + iterator->node = node->parent; + return iterator->iterate(iterator); + } + break; + default: + elog(ERROR, "Unknow node state: %d", node->iteratorState); + } + + return NULL; +} + +static void * +rb_right_left_iterator(RBTreeIterator *iterator) +{ + RBNode *node = iterator->node; + + switch (node->iteratorState) + { + case InitialState: + if (node->right != RBNIL) + { + node->iteratorState = FirstStepDone; + return rb_next_node(iterator, node->right); + } + case FirstStepDone: + node->iteratorState = SecondStepDone; + return node->data; + case SecondStepDone: + if (node->left != RBNIL) + { + node->iteratorState = ThirdStepDone; + return rb_next_node(iterator, node->left); + } + case ThirdStepDone: + if (node->parent) + { + iterator->node = node->parent; + return iterator->iterate(iterator); + } + break; + default: + elog(ERROR, "Unknow node state: %d", node->iteratorState); + } + + return NULL; +} + +static void * +rb_direct_iterator(RBTreeIterator *iterator) +{ + RBNode *node = iterator->node; + + switch (node->iteratorState) + { + case InitialState: + node->iteratorState = FirstStepDone; + return node->data; + case FirstStepDone: + if (node->left != RBNIL) + { + node->iteratorState = SecondStepDone; + return rb_next_node(iterator, node->left); + } + case SecondStepDone: + if (node->right != RBNIL) + { + node->iteratorState = ThirdStepDone; + return rb_next_node(iterator, node->right); + } + case ThirdStepDone: + if (node->parent) + { + iterator->node = node->parent; + return iterator->iterate(iterator); + } + break; + default: + elog(ERROR, "Unknow node state: %d", node->iteratorState); + } + + return NULL; +} + +static void * +rb_inverted_iterator(RBTreeIterator *iterator) +{ + RBNode *node = iterator->node; + + switch (node->iteratorState) + { + case InitialState: + if (node->left != RBNIL) + { + node->iteratorState = FirstStepDone; + return rb_next_node(iterator, node->left); + } + case FirstStepDone: + if (node->right != RBNIL) + { + node->iteratorState = SecondStepDone; + return rb_next_node(iterator, node->right); + } + case SecondStepDone: + node->iteratorState = ThirdStepDone; + return node->data; + case ThirdStepDone: + if (node->parent) + { + iterator->node = node->parent; + return iterator->iterate(iterator); + } + break; + default: + elog(ERROR, "Unknow node state: %d", node->iteratorState); + } + + return NULL; +} + +RBTreeIterator * +rb_begin_iterate(RBTree *rb, RBOrderControl ctrl) +{ + RBTreeIterator *iterator = palloc(sizeof(RBTreeIterator)); + + iterator->node = rb->root; + if (iterator->node != RBNIL) + iterator->node->iteratorState = InitialState; + + switch (ctrl) + { + case LeftRightWalk: /* visit left, then self, then right */ + iterator->iterate = rb_left_right_iterator; + break; + case RightLeftWalk: /* visit right, then self, then left */ + iterator->iterate = rb_right_left_iterator; + break; + case DirectWalk: /* visit self, then left, then right */ + iterator->iterate = rb_direct_iterator; + break; + case InvertedWalk: /* visit left, then right, then self */ + iterator->iterate = rb_inverted_iterator; + break; + default: + elog(ERROR, "Unknown iterator order: %d", ctrl); + } + + return iterator; +} + +void * +rb_iterate(RBTreeIterator *iterator) +{ + if (iterator->node == RBNIL) + return NULL; + + return iterator->iterate(iterator); +} + +void +rb_free_iterator(RBTreeIterator *iterator) +{ + pfree(iterator); +} diff --git a/src/include/access/gin.h b/src/include/access/gin.h index b96ff95..fcc5371 100644 --- a/src/include/access/gin.h +++ b/src/include/access/gin.h @@ -13,6 +13,7 @@ #include "access/genam.h" #include "access/itup.h" #include "access/xlog.h" +#include "utils/rbtree.h" #include "fmgr.h" @@ -27,14 +28,6 @@ #define GINNProcs 5 /* - * Max depth allowed in search tree during bulk inserts. This is to keep from - * degenerating to O(N^2) behavior when the tree is unbalanced due to sorted - * or nearly-sorted input. (Perhaps it would be better to use a balanced-tree - * algorithm, but in common cases that would only add useless overhead.) - */ -#define GIN_MAX_TREE_DEPTH 100 - -/* * Page opaque data in a inverted index page. * * Note: GIN does not include a page ID word as do the other index types. @@ -571,26 +564,22 @@ extern Datum ginarrayconsistent(PG_FUNCTION_ARGS); typedef struct EntryAccumulator { OffsetNumber attnum; + bool shouldSort; Datum value; uint32 length; uint32 number; ItemPointerData *list; - bool shouldSort; - struct EntryAccumulator *left; - struct EntryAccumulator *right; } EntryAccumulator; typedef struct { GinState *ginstate; - EntryAccumulator *entries; - uint32 maxdepth; - EntryAccumulator **stack; - uint32 stackpos; long allocatedMemory; - uint32 length; - EntryAccumulator *entryallocator; + EntryAccumulator *entryallocator; + ItemPointerData *tmpList; + RBTree *tree; + RBTreeIterator *iterator; } BuildAccumulator; extern void ginInitBA(BuildAccumulator *accum); diff --git a/src/include/utils/rbtree.h b/src/include/utils/rbtree.h new file mode 100644 index 0000000..6ce74b0 --- /dev/null +++ b/src/include/utils/rbtree.h @@ -0,0 +1,47 @@ +/*------------------------------------------------------------------------- + * + * rbtree.h + * interface for PostgreSQL generic Red-Black binary tree package + * + * Copyright (c) 1996-2009, PostgreSQL Global Development Group + * + * IDENTIFICATION + * $PostgreSQL: pgsql/src/backend/nodes/list.c,v 1.69 2008/01/01 19:45:50 momjian Exp $ + * + *------------------------------------------------------------------------- + */ + +#ifndef RBTREE_H +#define RBTREE_H + +typedef struct RBTree RBTree; +typedef struct RBTreeIterator RBTreeIterator; + +typedef int (*rb_comparator) (const void *a, const void *b, void *arg); +typedef void* (*rb_appendator) (void *current, void *new, void *arg); +typedef void (*rb_freefunc) (void *a); + +extern RBTree *rb_create(rb_comparator comparator, + rb_appendator appendator, + rb_freefunc freefunc, + void *arg); +extern void rb_free(RBTree *rb); + +extern void *rb_find(RBTree *rb, void *data); +extern void *rb_insert(RBTree *rb, void *data); +extern void rb_delete(RBTree *rb, void *data); +extern void *rb_leftmost(RBTree *rb); + +typedef enum RBOrderControl +{ + LeftRightWalk, + RightLeftWalk, + DirectWalk, + InvertedWalk +} RBOrderControl; + +extern RBTreeIterator* rb_begin_iterate(RBTree *rb, RBOrderControl ctrl); +extern void *rb_iterate(RBTreeIterator *iterator); +extern void rb_free_iterator(RBTreeIterator *iterator); + +#endif
Mark, do you need my data to reproduce results from http://www.sai.msu.su/~megera/wiki/2009-07-27 ? Oleg On Fri, 29 Jan 2010, Mark Cave-Ayland wrote: > Hi Robert, > > I've also spent some time reviewing this patch since it is a > pre-requisite to the KNNGiST patch. I did have a much more comprehensive > list of suggestions, but it seems you've managed to resolve most of > these with your latest re-write. Please find some more comments inline: > >> Here's an edited version, which I've now reviewed more fully. Some >> more substantive review comments: > > Firstly: the re-worked patch that you have posted seems to contain > remnants of at least 2 other patches. I have extracted the rbtree-only > sections and re-attached to this email. > > The patch was tested against git head 124a3cc... and applied without any > fuzz or other issues. > >> 1. I'm pretty satisfied that the rbtree code is generally sane, >> although I wonder if we should think about putting it in access/common >> rather than utils/misc. I'm not sure that I have a sufficiently >> clearly defined notion of what each subdirectory is for to draw a >> definitive conclusion on this point; hopefully someone else will weigh >> in. > > I'm happy that the code is a reasonable implementation of an RB-Tree, at > least with respect to the link to the related public domain source that > was posted. In terms of location, I think utils/misc is a reasonable > place for it to live since I see it as analogous to the hash table > implementation, i.e. it's a template RB-Tree implementation designed to > be used as required throughout the codebase. backend/access seems to be > the home of index AMs only. > > Other code points: > > - The new names for the iterator enum seem much better to me - or at > least it helped understand the meaning of the iterator code. > > - You correctly picked up on the namespace issue, although I noticed > that you left RED and BLACK as they were. Maybe RBRED and RBBLACK would > be better, though not that there are any colour related defines around > in a database backend to make a name clash probable. > > - I found the name of the "appendator" method misleading - perhaps > "updater" would make more sense? > >> 2. I'm a little concerned about the performance implications of this >> patch. Looking at http://www.sai.msu.su/~megera/wiki/2009-04-03, it's >> clear that the patch is a huge win in some cases. But I'm also >> surprised by how much performance is lost in some of the other cases >> that you tested. I suspect, on balance, that it's probably still a >> good idea to put this in, but I wonder if you've profiled this at all >> to see where the extra time is going and/or explored possible ways of >> squashing that overhead down a little more. >> >> 3. In ginInsertEntry(), the "else" branch of the if statement really >> looks like magic when you first read it. I wonder if it would make >> sense to pull the contents of EAAllocate() directly into this >> function, since there's only one call site anyhow. > > God yes. This is not a good example of maintainable code - even with your > comment I struggled for a while to try and figure it out :( I would suggest > that this is refactored appropriately before commit. > >> I still have not done any performance testing of my own on this code, >> and it probably needs that. If anyone else has time to step up and >> help with that, it would be much appreciated. It would be useful to >> have some plain old benchmarks as well as some profiling data as >> mentioned above. > > As part of my testing, I attempted to duplicate some of the benchmarks > at http://www.sai.msu.su/~megera/wiki/2009-04-03. Unfortunately I was not > really able to reproduce the RND (teodor's) dataset, nor the random array > test as the SQL used to test the implementation was not present on the page > above. > > For each test, I dropped and recreated the database to ensure that any > autovacuum impact would be the same. > > > 1) Fixed length random & sequential string arrays > > SELECT array_to_string(ARRAY(select '' || a || '.' || b from > generate_series(1,50) b), ' ')::tsvector AS i INTO seq FROM > generate_series(1,100000) a; > > SELECT array_to_string(ARRAY(select '' || random() from > generate_series(1,50) b), ' ')::tsvector AS i INTO rnd FROM > generate_series(1,100000) a; > > > Before patch: > > test=# create index seq_idx on seq using gin (i); > CREATE INDEX > Time: 103205.790 ms > test=# create index rnd_idx on rnd using gin (i); > CREATE INDEX > Time: 6770.131 ms > > > After patch: > > test=# create index seq_idx on seq using gin (i); > CREATE INDEX > Time: 87724.953 ms > test=# create index rnd_idx on rnd using gin (i); > CREATE INDEX > Time: 5596.666 ms > > > 2) Identical records, variable length test > > select ARRAY(select generate_series(1,len)) as a50 into arr50 from > generate_series(1,100000) b; > > > Before patch: > > i) len=3 > > select ARRAY(select generate_series(1,3)) as a3 into arr3 from > generate_series(1,100000) b; > > test=# create index arr3_idx on arr3 using gin (a3); > CREATE INDEX > Time: 324.251 ms > > > ii) len=30 > > select ARRAY(select generate_series(1,30)) as a30 into arr30 from > generate_series(1,100000) b; > > test=# create index arr30_idx on arr30 using gin (a30); > CREATE INDEX > Time: 2163.786 ms > > > iii) len=50 > > select ARRAY(select generate_series(1,50)) as a50 into arr50 from > generate_series(1,100000) b; > > test=# create index arr50_idx on arr50 using gin (a50); > CREATE INDEX > Time: 3306.074 ms > > > iv) len=random > > select ARRAY(select generate_series(1, (random() * 100)::int)) as arand into > arrrand from generate_series(1,100000) b; > > test=# create index arrrand_idx on arrrand using gin (arand); > CREATE INDEX > Time: 4725.556 ms > > > After patch: > > i) len=3 > > select ARRAY(select generate_series(1,3)) as a3 into arr3 from > generate_series(1,100000) b; > > test=# create index arr3_idx on arr3 using gin (a3); > CREATE INDEX > Time: 299.090 ms > > > ii) len=30 > > select ARRAY(select generate_series(1,30)) as a30 into arr30 from > generate_series(1,100000) b; > > test=# create index arr30_idx on arr30 using gin (a30); > CREATE INDEX > Time: 2828.424 ms > > > iii) len=50 > > select ARRAY(select generate_series(1,50)) as a50 into arr50 from > generate_series(1,100000) b; > > test=# create index arr50_idx on arr50 using gin (a50); > CREATE INDEX > Time: 3984.456 ms > > > iv) len=random > > select ARRAY(select generate_series(1, (random() * 100)::int)) as arand into > arrrand from generate_series(1,100000) b; > > test=# create index arrrand_idx on arrrand using gin (arand); > CREATE INDEX > Time: 3514.972 ms > > > Summary > ======= > > I believe Robert has done a good deal of the groundwork required to get this > patch ready for inclusion. With the current version, I was able to see a > measurable performance improvement in some test cases with no significant > performance regression. It would have been nice to be able to reproduce the > whole set of test cases but this was not possible from the information > specified. > > With perhaps some minor tweaks to some of the names and a rework of the else > clause in ginInsertEntry(), I feel this patch is reasonably close to commit. > > I shall now continue my review of the associated KNNGiST patch. > > > ATB, > > Mark. > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On Fri, Jan 29, 2010 at 9:00 AM, Mark Cave-Ayland <mark.cave-ayland@siriusit.co.uk> wrote: > I'm happy that the code is a reasonable implementation of an RB-Tree, at > least with respect to the link to the related public domain source that > was posted. In terms of location, I think utils/misc is a reasonable > place for it to live since I see it as analogous to the hash table > implementation, i.e. it's a template RB-Tree implementation designed to > be used as required throughout the codebase. backend/access seems to be > the home of index AMs only. Not really. access/common has things in it like reloptions.c and printtup.c, which are clearly not index AMs. I suppose another option is to put it in lib. The only things there right now are dllinfo.c and stringinfo.c, but in some ways generic in-memory red-black trees seem analagous to generic string buffers. > - You correctly picked up on the namespace issue, although I noticed > that you left RED and BLACK as they were. Maybe RBRED and RBBLACK would > be better, though not that there are any colour related defines around > in a database backend to make a name clash probable. Yeah, I thought about that. Since you came up with it independently, that's enough to make me think it's a good idea. > - I found the name of the "appendator" method misleading - perhaps > "updater" would make more sense? I like the existing name better. It's a little weird but I find it descriptive. >> 2. I'm a little concerned about the performance implications of this >> patch. Looking at http://www.sai.msu.su/~megera/wiki/2009-04-03, it's >> clear that the patch is a huge win in some cases. But I'm also >> surprised by how much performance is lost in some of the other cases >> that you tested. I suspect, on balance, that it's probably still a >> good idea to put this in, but I wonder if you've profiled this at all >> to see where the extra time is going and/or explored possible ways of >> squashing that overhead down a little more. >> >> 3. In ginInsertEntry(), the "else" branch of the if statement really >> looks like magic when you first read it. I wonder if it would make >> sense to pull the contents of EAAllocate() directly into this >> function, since there's only one call site anyhow. > > God yes. This is not a good example of maintainable code - even with your > comment I struggled for a while to try and figure it out :( I would suggest > that this is refactored appropriately before commit. Yeah it took me a while. I think we need Teodor and Oleg to prepare a new version of this patch based on these suggestions. This part, in particular, I feel like they need to rework rather than us. I don't know the GIN code well enough to be confident of what I'm doing. I have to say that it would be easier to understand what's going on here if there were a few more comments... > With perhaps some minor tweaks to some of the names and a rework of the else > clause in ginInsertEntry(), I feel this patch is reasonably close to commit. Yeah, I think it can get there, but only if Oleg and Teodor provide an updated version pretty soon... > I shall now continue my review of the associated KNNGiST patch. ...especially if there is to be any thought of getting ANOTHER patch after this one committed, too. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jan 29, 2010 at 9:00 AM, Mark Cave-Ayland > <mark.cave-ayland@siriusit.co.uk> wrote: >> ... In terms of location, I think utils/misc is a reasonable >> place for it to live since I see it as analogous to the hash table >> implementation, i.e. it's a template RB-Tree implementation designed to >> be used as required throughout the codebase. backend/access seems to be >> the home of index AMs only. > Not really. access/common has things in it like reloptions.c and > printtup.c, which are clearly not index AMs. That may be, but it's not a place for random generic data structures, at least not ones that might be useful outside access/. > I suppose another option is to put it in lib. The only things there > right now are dllinfo.c and stringinfo.c, but in some ways generic > in-memory red-black trees seem analagous to generic string buffers. I could live with either lib or utils/misc/. regards, tom lane
>> > With perhaps some minor tweaks to some of the names and a rework of the else >> > clause in ginInsertEntry(), I feel this patch is reasonably close to commit. > Yeah, I think it can get there, but only if Oleg and Teodor provide an > updated version pretty soon... > Updated version of patch based on version 0.7 from Mark (thank you for review!) I removed EAAollocate() function and improved comments in ginInsertEntry(). -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
2010/2/2 Teodor Sigaev <teodor@sigaev.ru>: >>> > With perhaps some minor tweaks to some of the names and a rework of >>> > the else >>> > clause in ginInsertEntry(), I feel this patch is reasonably close to >>> > commit. >> >> Yeah, I think it can get there, but only if Oleg and Teodor provide an >> updated version pretty soon... >> > > Updated version of patch based on version 0.7 from Mark (thank you for > review!) > I removed EAAollocate() function and improved comments in ginInsertEntry(). Can you rename RED and BLACK to RBRED and RBBLACK? ...Robert
> Can you rename RED and BLACK to RBRED and RBBLACK? Yes, of course, done. Any objections to commit? -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
2010/2/3 Teodor Sigaev <teodor@sigaev.ru>: >> Can you rename RED and BLACK to RBRED and RBBLACK? > > Yes, of course, done. > > Any objections to commit? I would like to see point #2 of the following email addressed before commit. As things stand, it is not clear (at least to me) whether this is a win. http://archives.postgresql.org/pgsql-hackers/2010-01/msg02552.php ...Robert
On Wed, Feb 3, 2010 at 8:48 AM, Robert Haas <robertmhaas@gmail.com> wrote: > 2010/2/3 Teodor Sigaev <teodor@sigaev.ru>: >>> Can you rename RED and BLACK to RBRED and RBBLACK? >> >> Yes, of course, done. >> >> Any objections to commit? > > I would like to see point #2 of the following email addressed before > commit. As things stand, it is not clear (at least to me) whether > this is a win. > > http://archives.postgresql.org/pgsql-hackers/2010-01/msg02552.php Specifically, on this web page: http://www.sai.msu.su/~megera/wiki/2009-04-03 There is a section that begins with this line of text: Repeat test with 100,000 identical records varying array length (len). That test shows rbtree being a third slower than HEAD. But there's not enough information on that web page to replicate that test, so it's hard to speculate on what may be going wrong. I don't think we should commit this until we understand that. ...Robert
On Wed, 3 Feb 2010, Robert Haas wrote: > On Wed, Feb 3, 2010 at 8:48 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> 2010/2/3 Teodor Sigaev <teodor@sigaev.ru>: >>>> Can you rename RED and BLACK to RBRED and RBBLACK? >>> >>> Yes, of course, done. >>> >>> Any objections to commit? >> >> I would like to see point #2 of the following email addressed before >> commit. As things stand, it is not clear (at least to me) whether >> this is a win. >> >> http://archives.postgresql.org/pgsql-hackers/2010-01/msg02552.php > > Specifically, on this web page: > > http://www.sai.msu.su/~megera/wiki/2009-04-03 > > There is a section that begins with this line of text: > > Repeat test with 100,000 identical records varying array length (len). > > That test shows rbtree being a third slower than HEAD. But there's > not enough information on that web page to replicate that test, so > it's hard to speculate on what may be going wrong. I don't think we > should commit this until we understand that. Robert, Mark described the test he did http://archives.postgresql.org/pgsql-hackers/2010-01/msg02927.php Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
2010/2/3 Oleg Bartunov <oleg@sai.msu.su>: >>> I would like to see point #2 of the following email addressed before >>> commit. As things stand, it is not clear (at least to me) whether >>> this is a win. >>> >>> http://archives.postgresql.org/pgsql-hackers/2010-01/msg02552.php >> >> Specifically, on this web page: >> >> http://www.sai.msu.su/~megera/wiki/2009-04-03 >> >> There is a section that begins with this line of text: >> >> Repeat test with 100,000 identical records varying array length (len). >> >> That test shows rbtree being a third slower than HEAD. But there's >> not enough information on that web page to replicate that test, so >> it's hard to speculate on what may be going wrong. I don't think we >> should commit this until we understand that. > > Robert, Mark described the test he did > http://archives.postgresql.org/pgsql-hackers/2010-01/msg02927.php So why did he get totally different answers than you? It's not enough to say "somebody else did a test and got better numbers than we did, so let's use theirs". ...Robert
On Wed, 3 Feb 2010, Robert Haas wrote: >> Robert, Mark described the test he did >> http://archives.postgresql.org/pgsql-hackers/2010-01/msg02927.php > > So why did he get totally different answers than you? Because I did tests with 8.3 and HEAD+rbtree, while Mark compared HEAD and HEAD+rbtree. Also, my HEAD and his HEAD are very different :) I will not mention, that we used totally different setup. > > It's not enough to say "somebody else did a test and got better > numbers than we did, so let's use theirs". I'll repeat my tests with current CVS HEAD. Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On Wed, Feb 3, 2010 at 4:17 PM, Oleg Bartunov <oleg@sai.msu.su> wrote: > On Wed, 3 Feb 2010, Robert Haas wrote: > >>> Robert, Mark described the test he did >>> http://archives.postgresql.org/pgsql-hackers/2010-01/msg02927.php >> >> So why did he get totally different answers than you? > > Because I did tests with 8.3 and HEAD+rbtree, while Mark compared > HEAD and HEAD+rbtree. Also, my HEAD and his HEAD are very different :) > I will not mention, that we used totally different setup. > >> >> It's not enough to say "somebody else did a test and got better >> numbers than we did, so let's use theirs". > > I'll repeat my tests with current CVS HEAD. OK... can you post the exact queries that you are used for the previous round of testing? ...Robert
On Wed, 3 Feb 2010, Robert Haas wrote: >> I'll repeat my tests with current CVS HEAD. > > OK... can you post the exact queries that you are used for the > previous round of testing? Robert, Mark posted all queries in his post ! See http://archives.postgresql.org/pgsql-hackers/2010-01/msg02927.php Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On Wed, Feb 3, 2010 at 4:51 PM, Oleg Bartunov <oleg@sai.msu.su> wrote: > On Wed, 3 Feb 2010, Robert Haas wrote: > >>> I'll repeat my tests with current CVS HEAD. >> >> OK... can you post the exact queries that you are used for the >> previous round of testing? > > Robert, Mark posted all queries in his post ! See > http://archives.postgresql.org/pgsql-hackers/2010-01/msg02927.php Maybe we are now getting to the heart of the confusion. Mark wrote in his email: "Unfortunately I was not really able to reproduce the RND (teodor's) dataset, nor the random array test as the SQL used to test the implementation was not present on the page above." The SQL for the fixed-length tests is posted, but the SQL for the variable length test is not - so Mark was just guessing on that one. Or am I just totally confused? ...Robert
Robert Haas wrote: > Maybe we are now getting to the heart of the confusion. Mark wrote in > his email: "Unfortunately I was not really able to reproduce the RND > (teodor's) dataset, nor the random array test as the SQL used to test > the implementation was not present on the page above." The SQL for > the fixed-length tests is posted, but the SQL for the variable length > test is not - so Mark was just guessing on that one. > > Or am I just totally confused? > > ...Robert No, that's correct. In the "Repeat test with 100,000 identical records varying array length (len)" section, it's fairly easy to substitute in the varying values of len where len = 3, 30 and 50. As documented in my review email I had a guess at generating the contents of RND (teodor's) column with this query: select ARRAY(select generate_series(1, (random() * 100)::int)) as arand into arrrand from generate_series(1,100000) b; However, unlike the other figures this is quite a bit different from Oleg/Teodor's results which make me think this is the wrong query (3.5s v 9s). Obviously Robert's concern here is that it is this column that shows one of the largest performance decreases compared to head. I've also finished benchmarking the index creation scripts yesterday on Oleg's test dataset from http://www.sai.msu.su/~megera/postgres/files/links2.sql.gz. With maintenance_work_mem set to 256Mb, the times I got with the rbtree patch applied were: rbtest=# CREATE INDEX idin_rbtree_idx ON links2 USING gin (idin); CREATE INDEX Time: 1910741.352 ms rbtest=# CREATE INDEX idout_rbtree_idx ON links2 USING gin (idout); CREATE INDEX Time: 1647609.300 ms Without the patch applied, I ended up having to shutdown my laptop after around 90 mins before the first index had even been created. So there is a definite order of magnitude speed increase with this patch applied. ATB, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs
I'm in progress of preparing this page http://www.sai.msu.su/~megera/wiki/rbtree_test Hope, tests are easy to reproduce. This is slightly improved version of rbtree patch, Teodor didn't commit yet. Random array test and real-life examples are ok, I still working on test #1, which is quite artificial test, but still I want to understand if the results are in accuracy of test. Oleg On Thu, 4 Feb 2010, Mark Cave-Ayland wrote: > Robert Haas wrote: > >> Maybe we are now getting to the heart of the confusion. Mark wrote in >> his email: "Unfortunately I was not really able to reproduce the RND >> (teodor's) dataset, nor the random array test as the SQL used to test >> the implementation was not present on the page above." The SQL for >> the fixed-length tests is posted, but the SQL for the variable length >> test is not - so Mark was just guessing on that one. >> >> Or am I just totally confused? >> >> ...Robert > > No, that's correct. In the "Repeat test with 100,000 identical records > varying array length (len)" section, it's fairly easy to substitute in the > varying values of len where len = 3, 30 and 50. As documented in my review > email I had a guess at generating the contents of RND (teodor's) column with > this query: > > select ARRAY(select generate_series(1, (random() * 100)::int)) as arand into > arrrand from generate_series(1,100000) b; > > However, unlike the other figures this is quite a bit different from > Oleg/Teodor's results which make me think this is the wrong query (3.5s v > 9s). Obviously Robert's concern here is that it is this column that shows one > of the largest performance decreases compared to head. > > I've also finished benchmarking the index creation scripts yesterday on > Oleg's test dataset from > http://www.sai.msu.su/~megera/postgres/files/links2.sql.gz. With > maintenance_work_mem set to 256Mb, the times I got with the rbtree patch > applied were: > > > rbtest=# CREATE INDEX idin_rbtree_idx ON links2 USING gin (idin); > CREATE INDEX > Time: 1910741.352 ms > > rbtest=# CREATE INDEX idout_rbtree_idx ON links2 USING gin (idout); > CREATE INDEX > Time: 1647609.300 ms > > > Without the patch applied, I ended up having to shutdown my laptop after > around 90 mins before the first index had even been created. So there is a > definite order of magnitude speed increase with this patch applied. > > > ATB, > > Mark. > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
> I would like to see point #2 of the following email addressed before > commit. As things stand, it is not clear (at least to me) whether > this is a win. Reimplementation of ginInsertRecordBA reduces difference of HEAD and HEAD+rbtree in regular case. Test suite is taken from http://www.sai.msu.su/~megera/wiki/2009-04-03: SEQ: SELECT array_to_string(ARRAY(select '' || a || '.' || b from generate_series(1,50) b), ' ')::tsvector AS i INTO foo FROM generate_series(1,100000) a; RND: SELECT array_to_string(ARRAY(select '' || random() from generate_series(1,50) b), ' ')::tsvector AS i INTO foo FROM generate_series(1,100000) a; Times in seconds: HEAD 0.9 0.11 SEQ 130 113 111 RND 11.4 12.6 11.5 The ides was to change order of insertion - now insertion order decreases number of rebalancing. Oleg's test (http://www.sai.msu.su/~megera/wiki/rbtree_test) are made with v0.10 which is differ from 0.11 only by comments around ginInsertRecordBA() -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
Teodor Sigaev wrote: >> I would like to see point #2 of the following email addressed before >> commit. As things stand, it is not clear (at least to me) whether >> this is a win. > > Reimplementation of ginInsertRecordBA reduces difference of HEAD and > HEAD+rbtree in regular case. > Test suite is taken from http://www.sai.msu.su/~megera/wiki/2009-04-03: > > SEQ: SELECT array_to_string(ARRAY(select '' || a || '.' || b from > generate_series(1,50) b), ' ')::tsvector AS i INTO foo FROM > generate_series(1,100000) a; > RND: SELECT array_to_string(ARRAY(select '' || random() from > generate_series(1,50) b), ' ')::tsvector AS i INTO foo FROM > generate_series(1,100000) a; > > Times in seconds: > HEAD 0.9 0.11 > SEQ 130 113 111 > RND 11.4 12.6 11.5 > > The ides was to change order of insertion - now insertion order > decreases number of rebalancing. > > Oleg's test (http://www.sai.msu.su/~megera/wiki/rbtree_test) are made > with v0.10 which is differ from 0.11 only by comments around > ginInsertRecordBA() Here is a quick comparison between the current 0.11 patch against my original 0.7 patch when building Oleg's simple data. (Note: due to time constraints, this is just a single run to get a feel for performance) 0.7 patch ========= rbtest=# CREATE INDEX idin_rbtree_idx ON links2 USING gin (idin); CREATE INDEX Time: 1910741.352 ms rbtest=# CREATE INDEX idout_rbtree_idx ON links2 USING gin (idout); CREATE INDEX Time: 1647609.300 ms 0.11 patch ========== rbtest=# CREATE INDEX idin_rbtree_idx ON links2 USING gin (idin); CREATE INDEX Time: 1864013.526 ms rbtest=# CREATE INDEX idout_rbtree_idx ON links2 USING gin (idout); CREATE INDEX Time: 1661200.454 ms HTH, Mark. -- Mark Cave-Ayland - Senior Technical Architect PostgreSQL - PostGIS Sirius Corporation plc - control through freedom http://www.siriusit.co.uk t: +44 870 608 0063 Sirius Labs: http://www.siriusit.co.uk/labs
That's all around 1% > 0.7 patch > ========= > > rbtest=# CREATE INDEX idin_rbtree_idx ON links2 USING gin (idin); > CREATE INDEX > Time: 1910741.352 ms > > rbtest=# CREATE INDEX idout_rbtree_idx ON links2 USING gin (idout); > CREATE INDEX > Time: 1647609.300 ms > > > 0.11 patch > ========== > > rbtest=# CREATE INDEX idin_rbtree_idx ON links2 USING gin (idin); > CREATE INDEX > Time: 1864013.526 ms > > rbtest=# CREATE INDEX idout_rbtree_idx ON links2 USING gin (idout); > CREATE INDEX > Time: 1661200.454 ms -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
2010/2/4 Teodor Sigaev <teodor@sigaev.ru>: > Oleg's test (http://www.sai.msu.su/~megera/wiki/rbtree_test) are made with > v0.10 which is differ from 0.11 only by comments around ginInsertRecordBA() That looks pretty good. I confess I don't fully understand why it works. If we're inserting a bunch of equal-key entries, why does it matter what order we insert them in? Is there some code in here (where?) that breaks ties on the basis of where they are in the input data? I think that the code in ginInsertRecordBA() is needlessly complex. As far as I can see, nNodesOnCurrentLevel is always exactly one more than nNodesOnPreviousLevel, and I think step is also basically redundant with both of these although the relationship is a little more complex. What I would suggest is something like: - initialize step to the largest power of 2 s.t. step < nentry - while step > 0 -- for (i = step; true; i += 2 * step) --- insert entry #i-1 --- if i > nentry - (2 * step) /* must test before incrementing i, to guard against overflow */ ---- break -- step = step / 2 Typos: bunary -> binary This insertion order decreases number of rebalancing for tree -> should be "number of rebalancings" castomized -> customized ...Robert
> That looks pretty good. I confess I don't fully understand why it > works. If we're inserting a bunch of equal-key entries, why does it > matter what order we insert them in? Is there some code in here > (where?) that breaks ties on the basis of where they are in the input > data? Entries to insert into GIN are unique by extractEntriesSU() call. So, instead of '{50,50,50}' array only one element 50 will be inserted. > > I think that the code in ginInsertRecordBA() is needlessly complex. > As far as I can see, nNodesOnCurrentLevel is always exactly one more > than nNodesOnPreviousLevel, and I think step is also basically > redundant with both of these although the relationship is a little > more complex. What I would suggest is something like: > > - initialize step to the largest power of 2 s.t. step< nentry > - while step> 0 > -- for (i = step; true; i += 2 * step) > --- insert entry #i-1 > --- if i> nentry - (2 * step) /* must test before incrementing i, to > guard against overflow */ > ---- break > -- step = step / 2 Good idea, implemented. > > Typos: > > bunary -> binary > This insertion order decreases number of rebalancing for tree -> > should be "number of rebalancings" > castomized -> customized Fixed -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
It seems a bit strange to have all the rb_free_recursive support and not use it anywhere ... and a freefunc callback even, whose only caller seems to set as null currently. Hmm, even in the knngist patch the rb_freefunc stuff is unused. How do we now that it works? (What, for example, if we were to allocate multiple nodes in a single palloc chunk? I'm not familiar with this stuff but that seems plausible) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Mon, Feb 8, 2010 at 3:05 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > It seems a bit strange to have all the rb_free_recursive support and not > use it anywhere ... and a freefunc callback even, whose only caller > seems to set as null currently. Hmm, even in the knngist patch the > rb_freefunc stuff is unused. I don't think it's inappropriate; it doesn't seem implausible that someone might want to free an rbtree someday. I suppose we could comment it out but I guess I don't see the point. > How do we now that it works? Visual inspection? It's not very complicated. > (What, for example, if we were to allocate multiple nodes in a single > palloc chunk? I'm not familiar with this stuff but that seems > plausible) Well, then you could have the freefunc do something ((MyStruct *) a)->is_allocated = false. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Feb 8, 2010 at 3:05 PM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: >> It seems a bit strange to have all the rb_free_recursive support and not >> use it anywhere ... and a freefunc callback even, whose only caller >> seems to set as null currently. �Hmm, even in the knngist patch the >> rb_freefunc stuff is unused. > I don't think it's inappropriate; it doesn't seem implausible that > someone might want to free an rbtree someday. I suppose we could > comment it out but I guess I don't see the point. I think the suggestion was to *remove* it not comment it out. I'm skeptical of carrying dead code. If the functionality is not used in the proposed gist patches then it's very fair to question whether it ever will be used. regards, tom lane
On Mon, Feb 8, 2010 at 10:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Feb 8, 2010 at 3:05 PM, Alvaro Herrera >> <alvherre@commandprompt.com> wrote: >>> It seems a bit strange to have all the rb_free_recursive support and not >>> use it anywhere ... and a freefunc callback even, whose only caller >>> seems to set as null currently. Hmm, even in the knngist patch the >>> rb_freefunc stuff is unused. > >> I don't think it's inappropriate; it doesn't seem implausible that >> someone might want to free an rbtree someday. I suppose we could >> comment it out but I guess I don't see the point. > > I think the suggestion was to *remove* it not comment it out. I'm > skeptical of carrying dead code. If the functionality is not used > in the proposed gist patches then it's very fair to question whether > it ever will be used. I don't think the question is unfair; I just don't happen to agree with the conclusion. But I don't care enough to argue about it either... ...Robert
2010/2/8 Teodor Sigaev <teodor@sigaev.ru>: >> I think that the code in ginInsertRecordBA() is needlessly complex. >> As far as I can see, nNodesOnCurrentLevel is always exactly one more >> than nNodesOnPreviousLevel, and I think step is also basically >> redundant with both of these although the relationship is a little >> more complex. What I would suggest is something like: >> >> - initialize step to the largest power of 2 s.t. step< nentry >> - while step> 0 >> -- for (i = step; true; i += 2 * step) >> --- insert entry #i-1 >> --- if i> nentry - (2 * step) /* must test before incrementing i, to >> guard against overflow */ >> ---- break >> -- step = step / 2 > > Good idea, implemented. Hmm. I think your implementation is prone to overflow in two places - both when computing step, and also when stepping through the array. Proposed revision attached, with also some rewriting of the comment for that function. ...Robert
Attachment
Robert Haas escribió: > On Mon, Feb 8, 2010 at 3:05 PM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: > > How do we now that it works? > > Visual inspection? It's not very complicated. Well, that works if you assume the trivial/usual malloc/free coding style, but it fails in the hypothetical scenario I described earlier. You could as well say that each rbtree must provide a memory context that is going to be deleted when the tree is freed, instead of freeing nodes one by one (and in fact it looks more efficient to do it that way ... except that we'd have to get in the business of strcpy'ing the node's data). There's no way to know how this stuff is going to be used, so if it's not going to be used now, I think we shouldn't implement it. That's why I looked at the knngist patch too. But hey, not that i care all that much either -- it's not a lot of code; a couple dozen lines at most, and not complex. > > (What, for example, if we were to allocate multiple nodes in a single > > palloc chunk? I'm not familiar with this stuff but that seems > > plausible) > > Well, then you could have the freefunc do something ((MyStruct *) > a)->is_allocated = false. Hmm, but isn't "a" gone at that point? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
>> Good idea, implemented. > > Hmm. I think your implementation is prone to overflow in two places - > both when computing step, and also when stepping through the array. Pls, point me, I don't see that ! step |= (step >> 1); ! step |= (step >> 2); ! step |= (step >> 4); ! step |= (step >> 8); ! step |= (step >> 16); ! step ++; ! step >>= 1; ! ! while(step > 0) { ! int i; ! for (i = step-1; i < nentry; i += 2 * step) ! ginInsertEntry(accum, heapptr, attnum, entries[i]); ! step >>= 1; /* /2 */ ! } > Proposed revision attached, with also some rewriting of the comment > for that function. make check fails with your patch: #3 0x083d2b50 in ExceptionalCondition (conditionName=Could not find the frame base for "ExceptionalCondition". ) at assert.c:57 #4 0x081086b6 in ginAppendData (old=0x287f2030, new=0x287f2044, arg=0xbfbfd5e4) at ginbulk.c:48 #5 0x083f5632 in rb_insert (rb=0x2acfe610, data=0x287f2044) at rbtree.c:359 #6 0x08108968 in ginInsertEntry (accum=0xbfbfd5e4, heapptr=0x28711af4, attnum=1, entry=2139062143) at ginbulk.c:135 #7 0x08108ad9 in ginInsertRecordBA (accum=0xbfbfd5e4, heapptr=0x28711af4, attnum=1, entries=0x2ac77068, nentry=6) at ginbulk.c:202 -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On Mon, 8 Feb 2010, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Feb 8, 2010 at 3:05 PM, Alvaro Herrera >> <alvherre@commandprompt.com> wrote: >>> It seems a bit strange to have all the rb_free_recursive support and not >>> use it anywhere ... and a freefunc callback even, whose only caller >>> seems to set as null currently. ═Hmm, even in the knngist patch the >>> rb_freefunc stuff is unused. > >> I don't think it's inappropriate; it doesn't seem implausible that >> someone might want to free an rbtree someday. I suppose we could >> comment it out but I guess I don't see the point. > > I think the suggestion was to *remove* it not comment it out. I'm > skeptical of carrying dead code. If the functionality is not used > in the proposed gist patches then it's very fair to question whether > it ever will be used. ok, it's not a big deal to remove code. I think it's time to submit rbtree. > > regards, tom lane > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
2010/2/9 Teodor Sigaev <teodor@sigaev.ru>: >>> Good idea, implemented. >> >> Hmm. I think your implementation is prone to overflow in two places - >> both when computing step, and also when stepping through the array. > > Pls, point me, I don't see that > ! step |= (step >> 1); > ! step |= (step >> 2); > ! step |= (step >> 4); > ! step |= (step >> 8); > ! step |= (step >> 16); So suppose at this point that step is the largest integer that can be represented... > ! step ++; Boom. > ! step >>= 1; > ! > ! while(step > 0) { > ! int i; > > ! for (i = step-1; i < nentry; i += 2 * step) And similarly here... if nentry is greater than maxint/2, then i += 2 * step will overflow, no? > ! ginInsertEntry(accum, heapptr, attnum, entries[i]); > > ! step >>= 1; /* /2 */ > ! } > > >> Proposed revision attached, with also some rewriting of the comment >> for that function. > > make check fails with your patch: > > #3 0x083d2b50 in ExceptionalCondition (conditionName=Could not find the > frame base for "ExceptionalCondition". > ) at assert.c:57 > #4 0x081086b6 in ginAppendData (old=0x287f2030, new=0x287f2044, > arg=0xbfbfd5e4) at ginbulk.c:48 > #5 0x083f5632 in rb_insert (rb=0x2acfe610, data=0x287f2044) at rbtree.c:359 > #6 0x08108968 in ginInsertEntry (accum=0xbfbfd5e4, heapptr=0x28711af4, > attnum=1, entry=2139062143) at ginbulk.c:135 > #7 0x08108ad9 in ginInsertRecordBA (accum=0xbfbfd5e4, heapptr=0x28711af4, > attnum=1, entries=0x2ac77068, nentry=6) at ginbulk.c:202 Gaah, sorry. Presumably I'm running off the end of the array, though I don't see what I did wrong. My brain is fried. ...Robert
> So suppose at this point that step is the largest integer that can be > represented... >> ! step ++; > Boom. >> ! step>>= 1; step>>= 1; step ++' Unboom? >> ! >> ! while(step> 0) { >> ! int i; >> >> ! for (i = step-1; i< nentry; i += 2 * step) > > And similarly here... if nentry is greater than maxint/2, then i += 2 > * step will overflow, no? Agree, so for (i = step - 1; i < nentry && i >= 0; i += step << 1 /* *2 */) Also, rb_free is removed per Tom's comment. Can I commit the patch? -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
Teodor Sigaev escribió: > Also, rb_free is removed per Tom's comment. Can I commit the patch? Hey, but rb_freefunc is still there ... One very minor quibble: please make the $PostgreSQL$ lines be just that (i.e. remove everything between the : to the terminating $, keeping the latter) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
2010/2/10 Teodor Sigaev <teodor@sigaev.ru>: >> So suppose at this point that step is the largest integer that can be >> represented... >>> >>> ! step ++; >> >> Boom. >>> >>> ! step>>= 1; > > step>>= 1; > step ++' > > Unboom? Yeah, that'll work. >>> ! >>> ! while(step> 0) { >>> ! int i; >>> >>> ! for (i = step-1; i< nentry; i += 2 * step) >> >> And similarly here... if nentry is greater than maxint/2, then i += 2 >> * step will overflow, no? > > Agree, so > for (i = step - 1; i < nentry && i >= 0; i += step << 1 /* *2 */) I don't think you should do it this way. I can't immediately say whether it's safe on all platforms, but it's certainly not clear. Just put the test at the bottom of the loop the way I did it (after fixing whatever I screwed up). > Also, rb_free is removed per Tom's comment. Can I commit the patch? Pending the above, go for it. ...Robert
> Hey, but rb_freefunc is still there ... It will be reintroduced when ts_stat will be rewrited to use rbtree.... > One very minor quibble: please make the $PostgreSQL$ lines be just that > (i.e. remove everything between the : to the terminating $, keeping the > latter) ok -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/