Thread: GiST index performance

From:
Matthew Wakeling
Date:

Revisiting the thread a month back or so, I'm still investigating
performance problems with GiST indexes in Postgres.

Looking at http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items I'd
like to clarify the contrib/seg issue. Contrib/seg is vulnerable to
pathological behaviour which is fixed by my second patch, which can be
viewed as complete. Contrib/cube, being multi-dimensional, is not affected
to any significant degree, so should not need alteration.

A second quite distinct issue is the general performance of GiST indexes
which is also mentioned in the old thread linked from Open Items. For
that, we have a test case at
http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php for
btree_gist indexes. I have a similar example with the bioseg GiST index. I
have completely reimplemented the same algorithms in Java for algorithm
investigation and instrumentation purposes, and it runs about a hundred
times faster than in Postgres. I think this is a problem, and I'm willing
to do some investigation to try and solve it.

Do you have a recommendation for how to go about profiling Postgres, what
profiler to use, etc? I'm running on Debian Linux x86_64.

Matthew

--
 Jadzia: Don't forget the 34th rule of acquisition: Peace is good for business.
 Quark:  That's the 35th.
 Jadzia: Oh yes, that's right. What's the 34th again?
 Quark:  War is good for business. It's easy to get them mixed up.

From:
Robert Haas
Date:

On Thu, Jun 4, 2009 at 12:33 PM, Matthew Wakeling<> wrote:
> Do you have a recommendation for how to go about profiling Postgres, what
> profiler to use, etc? I'm running on Debian Linux x86_64.

I mostly compile with --enable-profiling and use gprof.  I know Tom
Lane has had success with oprofile for quick and dirty measurements
but I haven't quite figured out how to make all the bits work for that
yet.

...Robert

From:
Matthew Wakeling
Date:

On Fri, 5 Jun 2009, Robert Haas wrote:
> On Thu, Jun 4, 2009 at 12:33 PM, Matthew Wakeling<> wrote:
>> Do you have a recommendation for how to go about profiling Postgres, what
>> profiler to use, etc? I'm running on Debian Linux x86_64.
>
> I mostly compile with --enable-profiling and use gprof.  I know Tom
> Lane has had success with oprofile for quick and dirty measurements
> but I haven't quite figured out how to make all the bits work for that
> yet.

Okay, I recompiled Postgres 8.4 beta 2 with profiling, installed
btree_gist and bioseg, and did a large multi-column (btree_gist, bioseg)
index search.

EXPLAIN ANALYSE SELECT *
FROM location l1, location l2
WHERE l1.objectid = l2.objectid
     AND bioseg_create(l1.intermine_start, l1.intermine_end) && bioseg_create(l2.intermine_start, l2.intermine_end);

                                QUERY PLAN
-----------------------------------------------------------------------
  Nested Loop  (cost=0.01..9292374.77 rows=19468831 width=130)
               (actual time=0.337..24538315.569 rows=819811624 loops=1)
    ->  Seq Scan on location l1
                      (cost=0.00..90092.17 rows=4030117 width=65)
                      (actual time=0.033..2561.142 rows=4030122 loops=1)
    ->  Index Scan using location_object_bioseg on location l2
                      (cost=0.01..1.58 rows=35 width=65)
                      (actual time=0.467..5.990 rows=203 loops=4030122)
          Index Cond: ((l2.objectid = l1.objectid) AND (bioseg_create(l1.intermine_start, l1.intermine_end) &&
bioseg_create(l2.intermine_start,l2.intermine_end))) 
  Total runtime: 24613918.894 ms
(5 rows)

Here is the top of the profiling result:

Flat profile:

Each sample counts as 0.01 seconds.
   %   cumulative   self              self     total
  time   seconds   seconds    calls  Ks/call  Ks/call  name
  35.41   2087.04  2087.04 823841746     0.00     0.00  gistnext
  15.36   2992.60   905.56 8560743382     0.00     0.00  fmgr_oldstyle
   8.65   3502.37   509.77 3641723296     0.00     0.00  FunctionCall5
   7.08   3919.87   417.50 3641723296     0.00     0.00  gistdentryinit
   5.03   4216.59   296.72     6668     0.00     0.00  DirectFunctionCall1
   3.84   4443.05   226.46 3641724371     0.00     0.00  FunctionCall1
   2.32   4579.94   136.89 1362367466     0.00     0.00  hash_search_with_hash_value
   1.89   4691.15   111.21 827892583     0.00     0.00  FunctionCall2
   1.83   4799.27   108.12                             FunctionCall6
   1.77   4903.56   104.30 2799321398     0.00     0.00  LWLockAcquire
   1.45   4989.24    85.68 1043922430     0.00     0.00  PinBuffer
   1.37   5070.15    80.91 823844102     0.00     0.00  index_getnext
   1.33   5148.29    78.15 1647683886     0.00     0.00  slot_deform_tuple
   0.95   5204.36    56.07 738604164     0.00     0.00  heap_page_prune_opt


The final cumulative time is 5894.06 seconds, which doesn't seem to match
the query run time at all. Also, no calls to anything including "bioseg"
in the name are recorded, although they are definitely called as the GiST
support functions for that data type.

Could someone give me a hand decyphering this result? It seems from this
that the time is spent in the gistnext function (in
src/backend/access/gist/gistget.c) and not its children. However, it's
quite a large function containing several loops - is there a way to make
the profiling result more specific?

Matthew

--
 If you let your happiness depend upon how somebody else feels about you,
 now you have to control how somebody else feels about you. -- Abraham Hicks

From:
Tom Lane
Date:

Matthew Wakeling <> writes:
> The final cumulative time is 5894.06 seconds, which doesn't seem to match
> the query run time at all.

Depending on the platform you're using, gprof might have the wrong idea
about the kernel's tick rate, leading to its numbers being some multiple
or fraction of the true elapsed time.  I had that problem awhile back
https://bugzilla.redhat.com/show_bug.cgi?id=151763
although I'd like to think it's fixed on all recent Linuxen.  However,
your bigger problem is here:

> Also, no calls to anything including "bioseg"
> in the name are recorded, although they are definitely called as the GiST
> support functions for that data type.

I have never had any success in getting gprof to profile functions that
are in loadable libraries, which of course is exactly what you need to do
here.  I have found hints on the web claiming that it's possible, but
they don't work for me.  gprof just ignores both the functions and the
time spent in them.

Personally I generally use oprofile these days, because it doesn't have
that problem and also doesn't require any special build options.  If you
don't have that available to you, the best bet is probably to make a
test build of Postgres in which your functions are linked directly into
the main postgres executable.

            regards, tom lane

From:
Matthew Wakeling
Date:

On Wed, 10 Jun 2009, Tom Lane wrote:
>> Also, no calls to anything including "bioseg"
>> in the name are recorded, although they are definitely called as the GiST
>> support functions for that data type.
>
> I have never had any success in getting gprof to profile functions that
> are in loadable libraries, which of course is exactly what you need to do
> here.

That sucks. However, as another observation, no calls to "gistfindnext"
are recorded in the profile either, and that's in the same source file as
"gistnext" which is recorded. Could it have been inlined? Shouldn't
inlining be switched off on a profiling build?

> ...the best bet is probably to make a test build of Postgres in which
> your functions are linked directly into the main postgres executable.

I'll give that a try. Oprofile scares me with the sheer number of options.

Matthew

--
 Prolog doesn't have enough parentheses. -- Computer Science Lecturer

From:
Tom Lane
Date:

Matthew Wakeling <> writes:
> That sucks. However, as another observation, no calls to "gistfindnext"
> are recorded in the profile either, and that's in the same source file as
> "gistnext" which is recorded. Could it have been inlined?

Probably.

> Shouldn't inlining be switched off on a profiling build?

Why?  You generally want to profile the code's actual behavior, or as
near as you can get to observing that.  Defeating compiler optimizations
doesn't sound like something that -pg should do on its own.  If you
really want it, there's a switch for it.

> Oprofile scares me with the sheer number of options.

You can ignore practically all of them; the defaults are pretty sane.
The recipe I usually follow is:


Initial setup (only needed once per system boot):

sudo opcontrol --init
sudo opcontrol --setup --no-vmlinux

(If you need detail about what the kernel is doing, you need kernel
debug symbols and then specify them on the previous line)

Start/stop profiling

sudo opcontrol --start
sudo opcontrol --reset
... exercise your debug-enabled program here ...
sudo opcontrol --dump ; sudo opcontrol --shutdown

The test case should run at least a minute or two to get numbers with
reasonable accuracy.

Analysis:

opreport --long-filenames | more

opreport -l image:/path/to/postgres | more

if you really want detail:

opannotate --source /path/to/postgres >someplace

            regards, tom lane

From:
Matthew Wakeling
Date:

On Thu, 11 Jun 2009, Tom Lane wrote:
>> Oprofile scares me with the sheer number of options.
>
> You can ignore practically all of them; the defaults are pretty sane.

Thanks, that was helpful. Here is the top of opreport --long-filenames:

CPU: Core 2, speed 1998 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count
100000
CPU_CLK_UNHALT...|
   samples|      %|
------------------
   5114464 61.5404 /lib/libc-2.9.so
   1576829 18.9734 /changeable/pgsql_8.4_profiling/bin/postgres
         CPU_CLK_UNHALT...|
           samples|      %|
         ------------------
           1572346 99.7157 /changeable/pgsql_8.4_profiling/bin/postgres
              4482  0.2842 [vdso] (tgid:13593 range:0x7fff8dbff000-0x7fff8dc00000)
                 1 6.3e-05 [vdso] (tgid:13193 range:0x7fff8dbff000-0x7fff8dc00000)
    409534  4.9278 /no-vmlinux
    309990  3.7300 /changeable/pgsql_8.4_profiling/lib/btree_gist.so
    203684  2.4509 /changeable/pgsql_8.4_profiling/lib/bioseg.so

So it seems that btree_gist and bioseg are not using that much CPU at all,
compared to core postgres code. In fact, the majority of time seems to be
spent in libc. Unfortunately my libc doesn't have any debugging symbols.

Anyway, running opannotate seems to make it clear that time *is* spent in
the gistnext function, but almost all of that is in children of the
function. Lots of time is actually spent in fmgr_oldstyle though.

Here is the top of opreport -l:

CPU: Core 2, speed 1998 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count
100000
samples  %        image name               app name                 symbol name
5114464  61.5404  libc-2.9.so              libc-2.9.so              (no symbols)
496215    5.9708  postgres                 postgres                 gistnext
409534    4.9278  no-vmlinux               no-vmlinux               (no symbols)
404037    4.8616  postgres                 postgres                 fmgr_oldstyle
170079    2.0465  btree_gist.so            btree_gist.so            gbt_int4_consistent
160016    1.9254  postgres                 postgres                 gistdentryinit
153266    1.8442  nvidia_drv.so            nvidia_drv.so            (no symbols)
152463    1.8345  postgres                 postgres                 FunctionCall5
149374    1.7974  postgres                 postgres                 FunctionCall1
131112    1.5776  libxul.so                libxul.so                (no symbols)
120871    1.4544  postgres                 postgres                 .plt
94506     1.1372  bioseg.so                bioseg.so                bioseg_gist_consistent

I'm guessing my next step is to install a version of libc with debugging
symbols?

Matthew

--
 Some people, when confronted with a problem, think "I know, I'll use regular
 expressions." Now they have two problems.                  -- Jamie Zawinski

From:
Tom Lane
Date:

Matthew Wakeling <> writes:
> So it seems that btree_gist and bioseg are not using that much CPU at all,
> compared to core postgres code. In fact, the majority of time seems to be
> spent in libc. Unfortunately my libc doesn't have any debugging symbols.

hmm ... memcpy or qsort maybe?

> Anyway, running opannotate seems to make it clear that time *is* spent in
> the gistnext function, but almost all of that is in children of the
> function. Lots of time is actually spent in fmgr_oldstyle though.

So it'd be worth converting your functions to V1 style.

> I'm guessing my next step is to install a version of libc with debugging
> symbols?

Yeah, if you want to find out what's happening in libc, that's what you
need.

            regards, tom lane

From:
Matthew Wakeling
Date:

On Thu, 11 Jun 2009, Tom Lane wrote:
> So it'd be worth converting your functions to V1 style.

Does that produce a significant reduction in overhead? (You'll probably
say "yes, that's the whole point").

> hmm ... memcpy or qsort maybe?

Surprise:

CPU: Core 2, speed 1998 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count
100000
samples  %        image name               app name                 symbol name
3005354  40.2868  libc-2.9.so              libc-2.9.so              __mcount_internal
1195592  16.0269  libc-2.9.so              libc-2.9.so              mcount
549998    7.3727  postgres                 postgres                 gistnext
420465    5.6363  postgres                 postgres                 fmgr_oldstyle
376333    5.0447  no-vmlinux               no-vmlinux               (no symbols)
210984    2.8282  postgres                 postgres                 FunctionCall5
182509    2.4465  postgres                 postgres                 gistdentryinit
174356    2.3372  btree_gist.so            btree_gist.so            gbt_int4_consistent
142829    1.9146  postgres                 postgres                 FunctionCall1
129800    1.7400  postgres                 postgres                 .plt
119180    1.5976  nvidia_drv.so            nvidia_drv.so            (no symbols)
96351     1.2916  libxul.so                libxul.so                (no symbols)
91726     1.2296  btree_gist.so            btree_gist.so            gbt_num_consistent

A quick grep in the postgres source for mcount reveals no hits. No idea
what it does - there is no man page for it.

Matthew

--
 I pause for breath to allow you to get over your shock that I really did cover
 all that in only five minutes...        -- Computer Science Lecturer

From:
Matthew Wakeling
Date:

On Thu, 11 Jun 2009, Matthew Wakeling wrote:
> A quick grep in the postgres source for mcount reveals no hits. No idea what
> it does - there is no man page for it.

Ah - that's part of gprof. I'll recompile without --enable-profiling and
try again. Duh.

Matthew

--
 What goes up must come down. Ask any system administrator.

From:
Matthew Wakeling
Date:

Okay, I don't know quite what's happening here. Tom, perhaps you could
advise. Running opannotate --source, I get this sort of stuff:

/*
  * Total samples for file : ".../postgresql-8.4beta2/src/backend/access/gist/gistget.c"
  *
  *   6880  0.2680
  */

and then:

                :static int64
                :gistnext(IndexScanDesc scan, TIDBitmap *tbm)
     81  0.0032 :{ /* gistnext total: 420087 16.3649 */
                :        Page            p;



The gistnext total doesn't seem to correspond to the amount I get by
adding up all the individual lines in gistnest. Moreover, it is greater
than the total samples attributed to the whole file, and greater than the
samples assigned to all the lines where gistnext is called.

However, yes it does seem like fmgr.c accounts for a large proportion of
samples. Also, I still seem to be getting mcount, even after recompiling
without --enable-profiling.

CPU: Core 2, speed 1998 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count
100000
samples  %        image name               app name                 symbol name
460213   17.9280  postgres                 postgres                 fmgr_oldstyle
420087   16.3649  postgres                 postgres                 gistnext
254975    9.9328  postgres                 postgres                 FunctionCall5
239572    9.3327  libc-2.9.so              libc-2.9.so              mcount
219963    8.5689  libc-2.9.so              libc-2.9.so              __mcount_internal
125674    4.8957  no-vmlinux               no-vmlinux               (no symbols)
117184    4.5650  postgres                 postgres                 gistdentryinit
106967    4.1670  btree_gist.so            btree_gist.so            gbt_int4_consistent
95677     3.7272  postgres                 postgres                 FunctionCall1
75397     2.9372  bioseg.so                bioseg.so                bioseg_gist_consistent
58832     2.2919  btree_gist.so            btree_gist.so            gbt_num_consistent
39128     1.5243  bioseg.so                bioseg.so                bioseg_overlap
33874     1.3196  libxul.so                libxul.so                (no symbols)
32008     1.2469  bioseg.so                bioseg.so                bioseg_gist_leaf_consistent
20890     0.8138  nvidia_drv.so            nvidia_drv.so            (no symbols)
19321     0.7527  bioseg.so                bioseg.so                bioseg_gist_decompress
17365     0.6765  libmozjs.so.1d           libmozjs.so.1d           (no symbols)

Matthew

--
 A good programmer is one who looks both ways before crossing a one-way street.
 Considering the quality and quantity of one-way streets in Cambridge, it
 should be no surprise that there are so many good programmers there.

From:
Tom Lane
Date:

Matthew Wakeling <> writes:
> The gistnext total doesn't seem to correspond to the amount I get by
> adding up all the individual lines in gistnest.

Hmm, hadn't you determined that some other stuff was being inlined into
gistnext?  I'm not really sure how opannotate displays such cases, but
this might be an artifact of that.

> However, yes it does seem like fmgr.c accounts for a large proportion of
> samples. Also, I still seem to be getting mcount, even after recompiling
> without --enable-profiling.

You must still have some -pg code in there someplace.  Maybe you didn't
recompile bioseg.so, or even psql?  Remember the libc counts you are
looking at are for system-wide usage of libc.

            regards, tom lane

From:
Adam Gundy
Date:

Matthew Wakeling wrote:
> Okay, I don't know quite what's happening here. Tom, perhaps you could
> advise. Running opannotate --source, I get this sort of stuff:
>
> /*
>  * Total samples for file :
> ".../postgresql-8.4beta2/src/backend/access/gist/gistget.c"
>  *
>  *   6880  0.2680
>  */
>
> and then:
>
>                :static int64
>                :gistnext(IndexScanDesc scan, TIDBitmap *tbm)
>     81  0.0032 :{ /* gistnext total: 420087 16.3649 */
>                :        Page            p;
>
>
>
> The gistnext total doesn't seem to correspond to the amount I get by
> adding up all the individual lines in gistnest. Moreover, it is greater
> than the total samples attributed to the whole file, and greater than
> the samples assigned to all the lines where gistnext is called.

there's another alternative for profiling that you might try if you
can't get sensible results out of oprofile - cachegrind (which is part
of the valgrind toolset).

basically it runs the code in an emulated environment, but records every
access (reads/writes/CPU cycles/cache hits/misses/etc). it's *extremely*
good at finding hotspots, even when they are due to 'cache flushing'
behavior in your code (for example, trawling a linked list is touching a
bunch of pages and effectively blowing your CPU cache..)

there's an associated graphical tool called kcachegrind which takes the
dumped output and lets you drill down, even to the source code level
(with cycle count/percentage annotations on the source lines)

all you need to do is compile postgres with debug symbols (full
optimization ON, otherwise you end up reaching the wrong conclusions).

there's an example of running valgrind on postgres here:

   http://blog.cleverelephant.ca/2008/08/valgrinding-postgis.html

for cachegrind, you basically need to use 'cachegrind' instead of
'valgrind', and don't disable optimization when you build..

From:
Heikki Linnakangas
Date:

Tom Lane wrote:
> Matthew Wakeling <> writes:
>> I'm guessing my next step is to install a version of libc with debugging
>> symbols?
>
> Yeah, if you want to find out what's happening in libc, that's what you
> need.

Getting callgraph information from oprofile would also help. Although it
won't directly tell what function in libc is being called, you would see
where the calls are coming from, which is usually enough to guess what
the libc function is.

You can also get the oprofile data, including callgraph, into
kcachegrind, which is *very* helpful. Here's a script I use:
http://roberts.vorpus.org/~njs/op2calltree.py

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

From:
Greg Smith
Date:

On Thu, 11 Jun 2009, Tom Lane wrote:

> Matthew Wakeling <> writes:
>> Oprofile scares me with the sheer number of options.
>
> You can ignore practically all of them; the defaults are pretty sane.
> The recipe I usually follow is:

An excellent brain dump from Tom and lots of other good stuff in this
thread.  I just dumped a summary of all the profiling lore discussed onto
http://wiki.postgresql.org/wiki/Profiling_with_OProfile as I don't know
that I've ever seen a concise intro to this subject before.

--
* Greg Smith  http://www.gregsmith.com Baltimore, MD

From:
Robert Haas
Date:

On Fri, Jun 26, 2009 at 10:33 AM, Greg Smith<> wrote:
> On Thu, 11 Jun 2009, Tom Lane wrote:
>
>> Matthew Wakeling <> writes:
>>>
>>> Oprofile scares me with the sheer number of options.
>>
>> You can ignore practically all of them; the defaults are pretty sane.
>> The recipe I usually follow is:
>
> An excellent brain dump from Tom and lots of other good stuff in this
> thread.  I just dumped a summary of all the profiling lore discussed onto
> http://wiki.postgresql.org/wiki/Profiling_with_OProfile as I don't know that
> I've ever seen a concise intro to this subject before.

Nice, thanks!

...Robert

From:
Bruce Momjian
Date:

Was there every any conclusion on this issue?

---------------------------------------------------------------------------

Matthew Wakeling wrote:
>
> Revisiting the thread a month back or so, I'm still investigating
> performance problems with GiST indexes in Postgres.
>
> Looking at http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items I'd
> like to clarify the contrib/seg issue. Contrib/seg is vulnerable to
> pathological behaviour which is fixed by my second patch, which can be
> viewed as complete. Contrib/cube, being multi-dimensional, is not affected
> to any significant degree, so should not need alteration.
>
> A second quite distinct issue is the general performance of GiST indexes
> which is also mentioned in the old thread linked from Open Items. For
> that, we have a test case at
> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php for
> btree_gist indexes. I have a similar example with the bioseg GiST index. I
> have completely reimplemented the same algorithms in Java for algorithm
> investigation and instrumentation purposes, and it runs about a hundred
> times faster than in Postgres. I think this is a problem, and I'm willing
> to do some investigation to try and solve it.
>
> Do you have a recommendation for how to go about profiling Postgres, what
> profiler to use, etc? I'm running on Debian Linux x86_64.
>
> Matthew
>
> --
>  Jadzia: Don't forget the 34th rule of acquisition: Peace is good for business.
>  Quark:  That's the 35th.
>  Jadzia: Oh yes, that's right. What's the 34th again?
>  Quark:  War is good for business. It's easy to get them mixed up.
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com
  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
  + If your life is a hard drive, Christ can be your backup. +

From:
Robert Haas
Date:

On Thu, Feb 25, 2010 at 6:42 PM, Bruce Momjian <> wrote:
> Was there every any conclusion on this issue?

I don't think so.

...Robert

> ---------------------------------------------------------------------------
>
> Matthew Wakeling wrote:
>>
>> Revisiting the thread a month back or so, I'm still investigating
>> performance problems with GiST indexes in Postgres.
>>
>> Looking at http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items I'd
>> like to clarify the contrib/seg issue. Contrib/seg is vulnerable to
>> pathological behaviour which is fixed by my second patch, which can be
>> viewed as complete. Contrib/cube, being multi-dimensional, is not affected
>> to any significant degree, so should not need alteration.
>>
>> A second quite distinct issue is the general performance of GiST indexes
>> which is also mentioned in the old thread linked from Open Items. For
>> that, we have a test case at
>> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php for
>> btree_gist indexes. I have a similar example with the bioseg GiST index. I
>> have completely reimplemented the same algorithms in Java for algorithm
>> investigation and instrumentation purposes, and it runs about a hundred
>> times faster than in Postgres. I think this is a problem, and I'm willing
>> to do some investigation to try and solve it.
>>
>> Do you have a recommendation for how to go about profiling Postgres, what
>> profiler to use, etc? I'm running on Debian Linux x86_64.
>>
>> Matthew
>>
>> --
>>  Jadzia: Don't forget the 34th rule of acquisition: Peace is good for business.
>>  Quark:  That's the 35th.
>>  Jadzia: Oh yes, that's right. What's the 34th again?
>>  Quark:  War is good for business. It's easy to get them mixed up.
>>
>> --
>> Sent via pgsql-performance mailing list ()
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>
> --
>  Bruce Momjian  <>        http://momjian.us
>  EnterpriseDB                             http://enterprisedb.com
>  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
>  + If your life is a hard drive, Christ can be your backup. +
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

From:
Matthew Wakeling
Date:

On Thu, 25 Feb 2010, Bruce Momjian wrote:
> Was there every any conclusion on this issue?

Not really. Comments inline:

> Matthew Wakeling wrote:
>> Revisiting the thread a month back or so, I'm still investigating
>> performance problems with GiST indexes in Postgres.
>>
>> Looking at http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items I'd
>> like to clarify the contrib/seg issue. Contrib/seg is vulnerable to
>> pathological behaviour which is fixed by my second patch, which can be
>> viewed as complete. Contrib/cube, being multi-dimensional, is not affected
>> to any significant degree, so should not need alteration.

This issue is addressed by my patch, which AFAIK noone has reviewed.
However, that patch was derived from a patch that I applied to bioseg,
which is itself a derivative of seg. This patch works very well indeed,
and gave an approximate 100 times speed improvement in the one test I ran.

So you could say that the sister patch of the one I submitted is tried and
tested in production.

>> A second quite distinct issue is the general performance of GiST indexes
>> which is also mentioned in the old thread linked from Open Items. For
>> that, we have a test case at
>> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php for
>> btree_gist indexes. I have a similar example with the bioseg GiST index. I
>> have completely reimplemented the same algorithms in Java for algorithm
>> investigation and instrumentation purposes, and it runs about a hundred
>> times faster than in Postgres. I think this is a problem, and I'm willing
>> to do some investigation to try and solve it.

I have not made any progress on this issue. I think Oleg and Teodor would
be better placed working it out. All I can say is that I implemented the
exact same indexing algorithm in Java, and it performed 100 times faster
than Postgres. Now, Postgres has to do a lot of additional work, like
mapping the index onto disc, locking pages, and abstracting to plugin user
functions, so I would expect some difference - I'm not sure 100 times is
reasonable though. I tried to do some profiling, but couldn't see any one
section of code that was taking too much time. Not sure what I can further
do.

Matthew

--
 Some people, when confronted with a problem, think "I know, I'll use regular
 expressions." Now they have two problems.                  -- Jamie Zawinski

From:
Robert Haas
Date:

On Mon, Mar 15, 2010 at 11:58 AM, Matthew Wakeling <> wrote:
> On Thu, 25 Feb 2010, Bruce Momjian wrote:
>>
>> Was there every any conclusion on this issue?
>
> Not really. Comments inline:
>
>> Matthew Wakeling wrote:
>>>
>>> Revisiting the thread a month back or so, I'm still investigating
>>> performance problems with GiST indexes in Postgres.
>>>
>>> Looking at http://wiki.postgresql.org/wiki/PostgreSQL_8.4_Open_Items I'd
>>> like to clarify the contrib/seg issue. Contrib/seg is vulnerable to
>>> pathological behaviour which is fixed by my second patch, which can be
>>> viewed as complete. Contrib/cube, being multi-dimensional, is not
>>> affected
>>> to any significant degree, so should not need alteration.
>
> This issue is addressed by my patch, which AFAIK noone has reviewed.
> However, that patch was derived from a patch that I applied to bioseg, which
> is itself a derivative of seg. This patch works very well indeed, and gave
> an approximate 100 times speed improvement in the one test I ran.
>
> So you could say that the sister patch of the one I submitted is tried and
> tested in production.

We rely fairly heavily on the commitfest app to track which patches
need review; perhaps it would be good to add it here.

https://commitfest.postgresql.org/action/commitfest_view/open

I seem to recall thinking that this patch wasn't ready to apply for
some reason, but I'm not sure why I thought that.

...Robert

From:
Yeb Havinga
Date:

Matthew Wakeling wrote:
>> Matthew Wakeling wrote:
>>> A second quite distinct issue is the general performance of GiST
>>> indexes
>>> which is also mentioned in the old thread linked from Open Items. For
>>> that, we have a test case at
>>> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php
>>> for
>>> btree_gist indexes. I have a similar example with the bioseg GiST
>>> index. I
>>> have completely reimplemented the same algorithms in Java for algorithm
>>> investigation and instrumentation purposes, and it runs about a hundred
>>> times faster than in Postgres. I think this is a problem, and I'm
>>> willing
>>> to do some investigation to try and solve it.
> I have not made any progress on this issue. I think Oleg and Teodor
> would be better placed working it out. All I can say is that I
> implemented the exact same indexing algorithm in Java, and it
> performed 100 times faster than Postgres. Now, Postgres has to do a
> lot of additional work, like mapping the index onto disc, locking
> pages, and abstracting to plugin user functions, so I would expect
> some difference - I'm not sure 100 times is reasonable though. I tried
> to do some profiling, but couldn't see any one section of code that
> was taking too much time. Not sure what I can further do.
Hello Mathew and list,

A lot of time spent in gistget.c code and a lot of functioncall5's to
the gist's consistent function which is out of sight for gprof.
Something different but related since also gist: we noticed before that
gist indexes that use a compressed form for index entries suffer from
repeated compress calls on query operands (see
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00078.php).

The btree_gist int4 compress function calls the generic
gbt_num_compress, which does a palloc. Maybe this palloc is allso hit al
lot when scanning the index, because the constants that are queries with
are repeatedly compressed and palloced.

regards,
Yeb Havinga



From:
Yeb Havinga
Date:

Yeb Havinga wrote:
> Matthew Wakeling wrote:
>>> Matthew Wakeling wrote:
>>>> A second quite distinct issue is the general performance of GiST
>>>> indexes
>>>> which is also mentioned in the old thread linked from Open Items. For
>>>> that, we have a test case at
>>>> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php
>>>> for
>>>> btree_gist indexes. I have a similar example with the bioseg GiST
>>>> index. I
>>>> have completely reimplemented the same algorithms in Java for
>>>> algorithm
>>>> investigation and instrumentation purposes, and it runs about a
>>>> hundred
>>>> times faster than in Postgres. I think this is a problem, and I'm
>>>> willing
>>>> to do some investigation to try and solve it.
>> I have not made any progress on this issue. I think Oleg and Teodor
>> would be better placed working it out. All I can say is that I
>> implemented the exact same indexing algorithm in Java, and it
>> performed 100 times faster than Postgres. Now, Postgres has to do a
>> lot of additional work, like mapping the index onto disc, locking
>> pages, and abstracting to plugin user functions, so I would expect
>> some difference - I'm not sure 100 times is reasonable though. I
>> tried to do some profiling, but couldn't see any one section of code
>> that was taking too much time. Not sure what I can further do.
> Hello Mathew and list,
>
> A lot of time spent in gistget.c code and a lot of functioncall5's to
> the gist's consistent function which is out of sight for gprof.
> Something different but related since also gist: we noticed before
> that gist indexes that use a compressed form for index entries suffer
> from repeated compress calls on query operands (see
> http://archives.postgresql.org/pgsql-hackers/2009-05/msg00078.php).
>
> The btree_gist int4 compress function calls the generic
> gbt_num_compress, which does a palloc. Maybe this palloc is allso hit
> al lot when scanning the index, because the constants that are queries
> with are repeatedly compressed and palloced.
Looked in the code a bit more - only the index nodes are compressed at
index creation, the consistent functions does not compress queries, so
not pallocs there. However when running Mathews example from
http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php
with the gist index, the coverage shows in gistget.c: 1000000 palloc0 's
of gistsearchstack at line 152 and 2010982 palloc's also of the
gistsearchstack on line 342. Two pfrees are also hit a lot: line 195:
1010926 of a stackentry and line 293: 200056 times. My $0.02 cents is
that the pain is here. My knowledge of gistget or the other sources in
access/gist is zero, but couldn't it be possible to determine the
maximum needed size of the stack and then allocate it at once and use a
pop/push kind off api?

regards,
Yeb Havinga





>
> regards,
> Yeb Havinga
>
>


From:
Yeb Havinga
Date:

Yeb Havinga wrote:
> Yeb Havinga wrote:
>> Matthew Wakeling wrote:
>>>> Matthew Wakeling wrote:
>>>>> A second quite distinct issue is the general performance of GiST
>>>>> indexes
>>>>> which is also mentioned in the old thread linked from Open Items. For
>>>>> that, we have a test case at
>>>>> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php
>>>>> for
>>>>> btree_gist indexes. I have a similar example with the bioseg GiST
>>>>> index. I
>>>>> have completely reimplemented the same algorithms in Java for
>>>>> algorithm
>>>>> investigation and instrumentation purposes, and it runs about a
>>>>> hundred
>>>>> times faster than in Postgres. I think this is a problem, and I'm
>>>>> willing
>>>>> to do some investigation to try and solve it.
>>> I have not made any progress on this issue. I think Oleg and Teodor
>>> would be better placed working it out. All I can say is that I
>>> implemented the exact same indexing algorithm in Java, and it
>>> performed 100 times faster than Postgres. Now, Postgres has to do a
>>> lot of additional work, like mapping the index onto disc, locking
>>> pages, and abstracting to plugin user functions, so I would expect
>>> some difference - I'm not sure 100 times is reasonable though. I
>>> tried to do some profiling, but couldn't see any one section of code
>>> that was taking too much time. Not sure what I can further do.
> Looked in the code a bit more - only the index nodes are compressed at
> index creation, the consistent functions does not compress queries, so
> not pallocs there. However when running Mathews example from
> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php
> with the gist index, the coverage shows in gistget.c: 1000000 palloc0
> 's of gistsearchstack at line 152 and 2010982 palloc's also of the
> gistsearchstack on line 342. Two pfrees are also hit a lot: line 195:
> 1010926 of a stackentry and line 293: 200056 times. My $0.02 cents is
> that the pain is here. My knowledge of gistget or the other sources in
> access/gist is zero, but couldn't it be possible to determine the
> maximum needed size of the stack and then allocate it at once and use
> a pop/push kind off api?
Waisted some time today on a ghost chase... I though that removing the
millions of pallocs would help, so I wrote an alternative of the
gistsearchstack-stack to find out that it was not the index scanning
itself that caused milltions of pallocs, but the scan being in the inner
loop that was called 1000000 times. The actual scanning time was not
changed significantly.
The actual scanning time in my vm is for btree (actual
time=0.006..0.008) and gist (actual time=0.071..0.074). An error in my
searchstack alternative caused pages to be scanned twice, returing twice
the amount of rows (6 instead of 3 each time). This resulted in a
likewise increase of ms (actual time=0.075..0.150). Somewhere I hit
something that causes ~= 0.070 ms twice. For a single index scan,
0.070ms startup time for gist vs 0.006 for btree doesn't seem like a big
problem, but yeah when calling it a million times...

regards,
Yeb Havinga


From:
Yeb Havinga
Date:

Yeb Havinga wrote:
>>>>> Matthew Wakeling wrote:
>>>>>> A second quite distinct issue is the general performance of GiST
>>>>>> indexes
>>>>>> which is also mentioned in the old thread linked from Open Items.
>>>>>> For
>>>>>> that, we have a test case at
>>>>>> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php
>>>>>> for
>>>>>> btree_gist indexes. I have a similar example with the bioseg GiST
>>>>>> index. I
>>>>>> have completely reimplemented the same algorithms in Java for
>>>>>> algorithm
>>>>>> investigation and instrumentation purposes, and it runs about a
>>>>>> hundred
>>>>>> times faster than in Postgres. I think this is a problem, and I'm
>>>>>> willing
>>>>>> to do some investigation to try and solve it.
More gist performance..

Some background: I've been involved in developing several datatypes that
make use of gist indexes (we just published a paper on it, see
http://arxiv.org/abs/1003.3370), that's the main reason I'm very much
interested in possible improvements in gist index speed. One of the
datatypes was 'very badly' indexable: non leaf pages were getting very
general keys, so in the end we could see from the scanning times
compared to sequential scans that the whole index was being scanned. One
thing I remember was the idea that somehow it would be nice if the dept
of the gist tree could be fiddled with: in that case keys of non leaf
nodes would be more specific again. In the original Guttman R-tree paper
there was mention of a parameter that determined the size of entries in
nodes and thereby indirectly the depth of the tree. I missed that in the
PostgreSQL gist api.

One thing Gist scanning does very often is key comparisons. So another
approach is to try to limit those and again this might be possible by
increasing the depth / decrease number of entries per page. I just did a
test where in gistfitpage the gistpagesize was divided by 5 and
something similar in gistnospace.

Scantime before adjustment: about 70 seconds.
After adjustment: 45 seconds.

With gist_stat from the gevel package confirmed that the depth was now 4
(original 3). Drawback: bigger index size because pages are not filled
completely anymore.

The explain shows  (actual time=0.030..0.032) for the inner loop times,
which is less then half of the original scan time.

Since the gistpagesize is derived from the database blocksize, it might
be wise to set the blocksize low for this case, I'm going to play with
this a bit more.

regards,
Yeb Havinga



From:
Yeb Havinga
Date:

Yeb Havinga wrote:
>
> Since the gistpagesize is derived from the database blocksize, it
> might be wise to set the blocksize low for this case, I'm going to
> play with this a bit more.
Ok, one last mail before it turns into spam: with a 1KB database
blocksize, the query now runs in 30 seconds (original 70 on my machine,
shared buffers 240MB).
The per inner loop access time now 24 microsecs compared to on my
machine original 74 microsecs with 8KB size and 8 for the btree scan.
Not a bad speedup with such a simple parameter :-)

postgres=# EXPLAIN ANALYSE SELECT * FROM a, b WHERE a.a BETWEEN b.b AND
b.b + 2;
                                                        QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..4169159462.20 rows=111109777668 width=8)
(actual time=0.184..29540.355 rows=2999997 loops=1)
   ->  Seq Scan on b  (cost=0.00..47037.62 rows=999962 width=4) (actual
time=0.024..1783.484 rows=1000000 loops=1)
   ->  Index Scan using a_a on a  (cost=0.00..2224.78 rows=111114
width=4) (actual time=0.021..0.024 rows=3 loops=1000000)
         Index Cond: ((a.a >= b.b) AND (a.a <= (b.b + 2)))
 Total runtime: 30483.303 ms
(5 rows)


postgres=# select gist_stat('a_a');
                 gist_stat
-------------------------------------------
 Number of levels:          5             +
 Number of pages:           47618         +
 Number of leaf pages:      45454         +
 Number of tuples:          1047617       +
 Number of invalid tuples:  0             +
 Number of leaf tuples:     1000000       +
 Total size of tuples:      21523756 bytes+
 Total size of leaf tuples: 20545448 bytes+
 Total size of index:       48760832 bytes+

(1 row)


From:
Kenneth Marshall
Date:

Hi Yeb,

I have not looked at the gist code, but would it be possible to
make virtual pages that have a size that is 1/power-of-2 * blocksize.
Then the leaf node could be 1/8 or even 1/16 the size of the full
pagesize.

Regards,
Ken

On Fri, Mar 19, 2010 at 09:49:30PM +0100, Yeb Havinga wrote:
> Yeb Havinga wrote:
>>
>> Since the gistpagesize is derived from the database blocksize, it might be
>> wise to set the blocksize low for this case, I'm going to play with this a
>> bit more.
> Ok, one last mail before it turns into spam: with a 1KB database blocksize,
> the query now runs in 30 seconds (original 70 on my machine, shared buffers
> 240MB).
> The per inner loop access time now 24 microsecs compared to on my machine
> original 74 microsecs with 8KB size and 8 for the btree scan. Not a bad
> speedup with such a simple parameter :-)
>
> postgres=# EXPLAIN ANALYSE SELECT * FROM a, b WHERE a.a BETWEEN b.b AND b.b
> + 2;
>                                                        QUERY PLAN
>
>
---------------------------------------------------------------------------------------------------------------------------
> Nested Loop  (cost=0.00..4169159462.20 rows=111109777668 width=8) (actual
> time=0.184..29540.355 rows=2999997 loops=1)
>   ->  Seq Scan on b  (cost=0.00..47037.62 rows=999962 width=4) (actual
> time=0.024..1783.484 rows=1000000 loops=1)
>   ->  Index Scan using a_a on a  (cost=0.00..2224.78 rows=111114 width=4)
> (actual time=0.021..0.024 rows=3 loops=1000000)
>         Index Cond: ((a.a >= b.b) AND (a.a <= (b.b + 2)))
> Total runtime: 30483.303 ms
> (5 rows)
>
>
> postgres=# select gist_stat('a_a');
>                 gist_stat
> -------------------------------------------
> Number of levels:          5             +
> Number of pages:           47618         +
> Number of leaf pages:      45454         +
> Number of tuples:          1047617       +
> Number of invalid tuples:  0             +
> Number of leaf tuples:     1000000       +
> Total size of tuples:      21523756 bytes+
> Total size of leaf tuples: 20545448 bytes+
> Total size of index:       48760832 bytes+
> (1 row)
>
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

From:
Yeb Havinga
Date:

On Fri, Mar 19, 2010 at 10:16 PM, Kenneth Marshall <> wrote:
Hi Yeb,

I have not looked at the gist code, but would it be possible to
make virtual pages that have a size that is 1/power-of-2 * blocksize.
Then the leaf node could be 1/8 or even 1/16 the size of the full
pagesize.

Hello Ken,

The gist virtual pages would then match more the original blocksizes that were used in Guttman's R-tree paper (first google result, then figure 4.5). Since the nature/characteristics of the underlying datatypes and keys is not changed, it might be that with the disk pages getting larger, gist indexing has therefore become unexpectedly inefficient.

But I am also not really into the core-gist code, but do have a motivation to dive into it (more than 200% performance increase in Mathew's test case). However I'd like to verify for community support before working on it.

Maybe Theodor or Oleg could say something about how easy or hard it is to do?

regards,
Yeb Havinga
 

Regards,
Ken

On Fri, Mar 19, 2010 at 09:49:30PM +0100, Yeb Havinga wrote:
> Yeb Havinga wrote:
>>
>> Since the gistpagesize is derived from the database blocksize, it might be
>> wise to set the blocksize low for this case, I'm going to play with this a
>> bit more.
> Ok, one last mail before it turns into spam: with a 1KB database blocksize,
> the query now runs in 30 seconds (original 70 on my machine, shared buffers
> 240MB).
> The per inner loop access time now 24 microsecs compared to on my machine
> original 74 microsecs with 8KB size and 8 for the btree scan. Not a bad
> speedup with such a simple parameter :-)
>
> postgres=# EXPLAIN ANALYSE SELECT * FROM a, b WHERE a.a BETWEEN b.b AND b.b
> + 2;
>                                                        QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------
> Nested Loop  (cost=0.00..4169159462.20 rows=111109777668 width=8) (actual
> time=0.184..29540.355 rows=2999997 loops=1)
>   ->  Seq Scan on b  (cost=0.00..47037.62 rows=999962 width=4) (actual
> time=0.024..1783.484 rows=1000000 loops=1)
>   ->  Index Scan using a_a on a  (cost=0.00..2224.78 rows=111114 width=4)
> (actual time=0.021..0.024 rows=3 loops=1000000)
>         Index Cond: ((a.a >= b.b) AND (a.a <= (b.b + 2)))
> Total runtime: 30483.303 ms
> (5 rows)
>
>
> postgres=# select gist_stat('a_a');
>                 gist_stat
> -------------------------------------------
> Number of levels:          5             +
> Number of pages:           47618         +
> Number of leaf pages:      45454         +
> Number of tuples:          1047617       +
> Number of invalid tuples:  0             +
> Number of leaf tuples:     1000000       +
> Total size of tuples:      21523756 bytes+
> Total size of leaf tuples: 20545448 bytes+
> Total size of index:       48760832 bytes+
> (1 row)
>
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

From:
Matthew Wakeling
Date:

On Sat, 20 Mar 2010, Yeb Havinga wrote:
> The gist virtual pages would then match more the original blocksizes that
> were used in Guttman's R-tree paper (first google result, then figure 4.5).
> Since the nature/characteristics of the underlying datatypes and keys is not
> changed, it might be that with the disk pages getting larger, gist indexing
> has therefore become unexpectedly inefficient.

Yes, that is certainly a factor. For example, the page size for bioseg
which we use here is 130 entries, which is very excessive, and doesn't
allow very deep trees. On the other hand, it means that a single disc seek
performs quite a lot of work.

> But I am also not really into the core-gist code, but do have a motivation
> to dive into it (more than 200% performance increase in Mathew's test case).
> However I'd like to verify for community support before working on it.

I'd also love to dive into the core gist code, but am rather daunted by
it. I believe that there is something there that is taking more time than
I can account for. The indexing algorithm itself is good.

Matthew

--
 "The problem with defending the purity of the English language is that
 English is about as pure as a cribhouse whore. We don't just borrow words;
 on occasion, English has pursued other languages down alleyways to beat
 them unconscious and rifle their pockets for new vocabulary." - James Nicoll

From:
Yeb Havinga
Date:

Matthew Wakeling wrote:
> On Sat, 20 Mar 2010, Yeb Havinga wrote:
>> The gist virtual pages would then match more the original blocksizes
>> that
>> were used in Guttman's R-tree paper (first google result, then figure
>> 4.5).
>> Since the nature/characteristics of the underlying datatypes and keys
>> is not
>> changed, it might be that with the disk pages getting larger, gist
>> indexing
>> has therefore become unexpectedly inefficient.
>
> Yes, that is certainly a factor. For example, the page size for bioseg
> which we use here is 130 entries, which is very excessive, and doesn't
> allow very deep trees. On the other hand, it means that a single disc
> seek performs quite a lot of work.
Yeah, I only did in-memory fitting tests and wondered about increased
io's. However I bet that even for bigger than ram db's, the benefit of
having to fan out to less pages still outweighs the over-general non
leaf nodes and might still result in less disk io's. I redid some
earlier benchmarking with other datatypes with a 1kB block size and also
multicolumn gist and the multicolumn variant had an ever greater benefit
than the single column indexes, both equality and range scans. (Like
execution times down to 20% of original). If gist is important to you, I
really recommend doing a test with 1kB blocks.

regards,
Yeb Havinga

From:
Matthew Wakeling
Date:

On Mon, 22 Mar 2010, Yeb Havinga wrote:
>> Yes, that is certainly a factor. For example, the page size for bioseg
>> which we use here is 130 entries, which is very excessive, and doesn't
>> allow very deep trees. On the other hand, it means that a single disc seek
>> performs quite a lot of work.

> Yeah, I only did in-memory fitting tests and wondered about increased io's.
> However I bet that even for bigger than ram db's, the benefit of having to
> fan out to less pages still outweighs the over-general non leaf nodes and
> might still result in less disk io's. I redid some earlier benchmarking with
> other datatypes with a 1kB block size and also multicolumn gist and the
> multicolumn variant had an ever greater benefit than the single column
> indexes, both equality and range scans. (Like execution times down to 20% of
> original). If gist is important to you, I really recommend doing a test with
> 1kB blocks.

Purely from a disc seek count point of view, assuming an infinite CPU
speed and infinite disc transfer rate, the larger the index pages the
better. The number of seeks per fetch will be equivalent to the depth of
the tree.

If you take disc transfer rate into account, the break-even point is when
you spend an equal time transferring as seeking, which places the page
size around 500kB on a modern disc, assuming RAID stripe alignment doesn't
make that into two seeks instead of one.

However, for efficient CPU usage, the ideal page size for a tree index is
much smaller - between two and ten entries, depending on the type of the
data.

There may be some mileage in reorganising indexes into a two-level system.
That is, have an index format where the page size is 512kB or similar, but
each page is internally a CPU-efficient tree itself.

However, this is beyond the scope of the problem of speeding up gist.

Matthew

--
 If you let your happiness depend upon how somebody else feels about you,
 now you have to control how somebody else feels about you. -- Abraham Hicks