Re: GiST index performance - Mailing list pgsql-performance

From Yeb Havinga
Subject Re: GiST index performance
Date
Msg-id e4961fb71003200916r6c9913dag28c04aef290f877d@mail.gmail.com
Whole thread Raw
In response to Re: GiST index performance  (Kenneth Marshall <ktm@rice.edu>)
Responses Re: GiST index performance
List pgsql-performance
On Fri, Mar 19, 2010 at 10:16 PM, Kenneth Marshall <ktm@rice.edu> wrote:
Hi Yeb,

I have not looked at the gist code, but would it be possible to
make virtual pages that have a size that is 1/power-of-2 * blocksize.
Then the leaf node could be 1/8 or even 1/16 the size of the full
pagesize.

Hello Ken,

The gist virtual pages would then match more the original blocksizes that were used in Guttman's R-tree paper (first google result, then figure 4.5). Since the nature/characteristics of the underlying datatypes and keys is not changed, it might be that with the disk pages getting larger, gist indexing has therefore become unexpectedly inefficient.

But I am also not really into the core-gist code, but do have a motivation to dive into it (more than 200% performance increase in Mathew's test case). However I'd like to verify for community support before working on it.

Maybe Theodor or Oleg could say something about how easy or hard it is to do?

regards,
Yeb Havinga
 

Regards,
Ken

On Fri, Mar 19, 2010 at 09:49:30PM +0100, Yeb Havinga wrote:
> Yeb Havinga wrote:
>>
>> Since the gistpagesize is derived from the database blocksize, it might be
>> wise to set the blocksize low for this case, I'm going to play with this a
>> bit more.
> Ok, one last mail before it turns into spam: with a 1KB database blocksize,
> the query now runs in 30 seconds (original 70 on my machine, shared buffers
> 240MB).
> The per inner loop access time now 24 microsecs compared to on my machine
> original 74 microsecs with 8KB size and 8 for the btree scan. Not a bad
> speedup with such a simple parameter :-)
>
> postgres=# EXPLAIN ANALYSE SELECT * FROM a, b WHERE a.a BETWEEN b.b AND b.b
> + 2;
>                                                        QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------
> Nested Loop  (cost=0.00..4169159462.20 rows=111109777668 width=8) (actual
> time=0.184..29540.355 rows=2999997 loops=1)
>   ->  Seq Scan on b  (cost=0.00..47037.62 rows=999962 width=4) (actual
> time=0.024..1783.484 rows=1000000 loops=1)
>   ->  Index Scan using a_a on a  (cost=0.00..2224.78 rows=111114 width=4)
> (actual time=0.021..0.024 rows=3 loops=1000000)
>         Index Cond: ((a.a >= b.b) AND (a.a <= (b.b + 2)))
> Total runtime: 30483.303 ms
> (5 rows)
>
>
> postgres=# select gist_stat('a_a');
>                 gist_stat
> -------------------------------------------
> Number of levels:          5             +
> Number of pages:           47618         +
> Number of leaf pages:      45454         +
> Number of tuples:          1047617       +
> Number of invalid tuples:  0             +
> Number of leaf tuples:     1000000       +
> Total size of tuples:      21523756 bytes+
> Total size of leaf tuples: 20545448 bytes+
> Total size of index:       48760832 bytes+
> (1 row)
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

pgsql-performance by date:

Previous
From: Reydan Cankur
Date:
Subject: pgbench installation
Next
From: Andy Colson
Date:
Subject: Re: mysql to postgresql, performance questions