Re: intarray internals - Mailing list pgsql-hackers
From | Volkan YAZICI |
---|---|
Subject | Re: intarray internals |
Date | |
Msg-id | 20060511082529.GA223@alamut Whole thread Raw |
In response to | Re: [GENERAL] intarray internals (Teodor Sigaev <teodor@sigaev.ru>) |
List | pgsql-hackers |
Hi, First, thanks so much for your reply. On May 10 04:01, Teodor Sigaev wrote: > > Again, in g_int_decompress(), I couldn't figure out the functionality of > > below lines: > > gist__int_ops use rangeset compression technique, read about in "THE > RD-TREE: AN INDEX STRUCTURE FOR SETS", Joseph M. Hellerstein, > http://www.sai.msu.su/~megera/postgres/gist/papers/rd-tree.ps Thanks so much for the papers. I read the related section (and will read whole today or tomorrow). > * intarray_union.patch.0 - doesn't applied, but make small optimization to > reduce number non-unique values. I don't believe that one pass through > array with a lot of ifs will be faster than two pass with simple ifs. Did > you some tests? IMHO, the only significant improvement in my proposal about _int_union() is that this method will visit arrays only once (with extra price of x2 condition checks), while current one will also make a second visit to arrays to remove duplicates (with small condition checks). You can be right, maybe it doesn't worth for worrying about. Improvement (if there's any) will probably be available to see for very long arrays. (Sorry, no tests for this proposal.) > * intarray_same.patch.0 - move SORT as you suggest, but don't touch > algorithm. > 1) if (A[0] == B[0] && A[1] == B[1] && ...) > > 2) if (A[0] == B[0] && A[ N] == B[ N] && > A[1] == B[1] && A[N-1] == B[N-1] && > ...) > > Why are you sure that second a much faster? Did you make tests? Number of > comparisons is the same... Yep, both algorithms have O(n) comparisions in their worst cases. But for general purposes, AFAICS, second one will perform better. For instance consider below examples: [Best case for 2nd algo.]Input : 1, 2, 3, ..., 6, 7, *91st algo.: O(n)2nd algo.: O(1) [Worst case for 2nd algo.]Input : 1, 2, 3, 4, *4, 6, 7, 8, 91st algo.: O(n/2)2nd algo.: O(n) But as you can see, because of our arrays are sorted, any missing (or additional) element in the target array will produce a padding in the end of the array --- assuming that arrays generally don't hold duplicate values. Therefore, making comparisons for the tail elements will perform better beucause of the unmatched values caused by padding. Hope I managed to explain what I try to mean. Actually, IIRC, I saw this method (both hacks for small sized arrays and comparisons for the tail elements of a sorted array) in another FOSS project's source code --- probably PHP, but I'm not sure. For about testing, if you'd supply suitable inputs there occurs a quite much performance improve. > * intarray_sort.patch.0 - doesn't applied. isort() is very often called for > already sorted and unique arrays (which comes from index), so it should be > fast as possible for sorted arrays. Uh, sorry. I missed that point. > As I remember ordered array is a worst > case for qsort(). May be, it will be better choice to use mergesort. I'll investigate alternative methods to sort already sorted arrays. Regards.
pgsql-hackers by date: