Tom Lane wrote:
> Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
>> vm=# create index "gist70000" on tmp_intarray_test using GIST (my_int_array gist__int_ops);
>> CREATE INDEX
>> Time: 2069836.856 ms
>
>> Is that expected, or does it sound like a bug to take over
>> half an hour to index 70000 rows of mostly 5 and 6-element
>> integer arrays?
>
> I poked at this example with oprofile. It's entirely CPU-bound AFAICT,
Oleg pointed out to me (off-list I now see) that it's not totally
unexpected behavior and I should have been using gist__intbig_ops,
since the "big" refers to the cardinality of the entire set (which
was large, in my case) and not the length of the arrays.
Oleg Bartunov wrote:
OB:> it's not about short or long arrays, it's about small or big
OB:> cardinality of the whole set (the number of unique elements)
I'm re-reading the docs and still wasn't obvious to me. A
potential docs patch is attached below.
> and the CPU utilization is approximately
>
> 55% g_int_compress
> 35% memmove/memcpy (difficult to distinguish these)
> 1% pg_qsort
> <1% anything else
>
> Probably need to look at reducing the number of calls to g_int_compress
> ... it must be getting called a whole lot more than once per new index
> entry, and I wonder why that should need to be.
Perhaps that's a separate issue, but we're working
fine with gist__intbig_ops for the time being.
Here's a proposed docs patch that makes this more obvious.
*** a/doc/src/sgml/intarray.sgml
--- b/doc/src/sgml/intarray.sgml
***************
*** 239,245 ****
<literal>gist__int_ops</> (used by default) is suitable for
small and medium-size arrays, while
<literal>gist__intbig_ops</> uses a larger signature and is more
! suitable for indexing large arrays.
</para>
<para>
--- 239,247 ----
<literal>gist__int_ops</> (used by default) is suitable for
small and medium-size arrays, while
<literal>gist__intbig_ops</> uses a larger signature and is more
! suitable for indexing high-cardinality data sets - where there
! are a large number of unique elements across all rows being
! indexed.
</para>
<para>