Re: intarray internals - Mailing list pgsql-general

From Volkan YAZICI
Subject Re: intarray internals
Date
Msg-id 20060507102136.GA202@alamut
Whole thread Raw
In response to Re: intarray internals  (Volkan YAZICI <yazicivo@ttnet.net.tr>)
List pgsql-general
Hi,

I've prepared a Quick & Dirty patch serie for some missing parts in
intarray contrib module. Here they go with some explanation...


On May 06 12:38, Volkan YAZICI wrote:
> [4]
> In the inner_int_contains() function of _int_tool.c, there's a while
> loop like
>
> [Code assumes that arrays are sorted.]
>
> na = ARRNELEMS(a);
> nb = ARRNELEMS(b);
> da = ARRPTR(a);
> db = ARRPTR(b);
>
> i = j = n = 0;
> while (i < na && j < nb)
>     if (da[i] < db[j])
>         i++;
>     else if (da[i] == db[j])
>     {
>         n++;
>         i++;
>         j++;
>     }
>     else
>         j++;
>
> return (n == nb) ? TRUE : FALSE;
>
> AFAICS, last "j++" should be replaced with "return FALSE". Because, "n"
> cannot be equal to "nb" no more, if "j" gets incremented without
> incrementing "n" (remember "j < nb" in the "while" condition).

intarray_contains.patch.0 is for above problem.


[5]
ISTM, inner_int_union() of _int_tool.c, makes some redundant visits to
array:

    ...
    /* union */
    i = j = 0;
    while (i < na && j < nb)
        if (da[i] < db[j])
            *dr++ = da[i++];
        else
            *dr++ = db[j++];

    while (i < na)
        *dr++ = da[i++];
    while (j < nb)
        *dr++ = db[j++];

}

if (ARRNELEMS(r) > 1)
    r = _int_unique(r);

IMHO, uniting unique values (instead of uniting and then removing
duplicates) should work faster. intarray_union.patch.0 is for this
problem. (Patched code, handles uniting for unique values.)


[6]
There's a seperate sorting algorithm isort() in _int_tool.c. Looks like
it executes some kind of shell sort on the array and returns true if
array has any duplicates. It's used for common sorting and deciding on
executing _int_unique() on the related array if isort() says it has
duplicates.

IIRC, our inner qsort() has a smaller O(N) degree when compared to above
sorting algorithm. Also for the code's sake it'd be better to switch
using qsort() in all sorting related stuff. For these reasons,
intarray_sort.patch.0 addresses this (probable) gotcha.


All 3 patches passed regression tests for intarray contrib module. But
these are just for addressing some gotchas I found while reading code.
Your comments for these problems(?) are welcome.


Regards.

Attachment

pgsql-general by date:

Previous
From: Lincoln Yeoh
Date:
Subject: Re: Can't Figure Out Where Rows Are Going
Next
From: "Marcelo Fabiano da Silveira"
Date:
Subject: Where I find examples of pqtrace ???