Re: GiST index performance - Mailing list pgsql-performance
From | Yeb Havinga |
---|---|
Subject | Re: GiST index performance |
Date | |
Msg-id | 4BA0A036.1070508@gmail.com Whole thread Raw |
In response to | Re: GiST index performance (Yeb Havinga <yebhavinga@gmail.com>) |
Responses |
Re: GiST index performance
(Yeb Havinga <yebhavinga@gmail.com>)
|
List | pgsql-performance |
Yeb Havinga wrote: > Matthew Wakeling wrote: >>> Matthew Wakeling wrote: >>>> A second quite distinct issue is the general performance of GiST >>>> indexes >>>> which is also mentioned in the old thread linked from Open Items. For >>>> that, we have a test case at >>>> http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php >>>> for >>>> btree_gist indexes. I have a similar example with the bioseg GiST >>>> index. I >>>> have completely reimplemented the same algorithms in Java for >>>> algorithm >>>> investigation and instrumentation purposes, and it runs about a >>>> hundred >>>> times faster than in Postgres. I think this is a problem, and I'm >>>> willing >>>> to do some investigation to try and solve it. >> I have not made any progress on this issue. I think Oleg and Teodor >> would be better placed working it out. All I can say is that I >> implemented the exact same indexing algorithm in Java, and it >> performed 100 times faster than Postgres. Now, Postgres has to do a >> lot of additional work, like mapping the index onto disc, locking >> pages, and abstracting to plugin user functions, so I would expect >> some difference - I'm not sure 100 times is reasonable though. I >> tried to do some profiling, but couldn't see any one section of code >> that was taking too much time. Not sure what I can further do. > Hello Mathew and list, > > A lot of time spent in gistget.c code and a lot of functioncall5's to > the gist's consistent function which is out of sight for gprof. > Something different but related since also gist: we noticed before > that gist indexes that use a compressed form for index entries suffer > from repeated compress calls on query operands (see > http://archives.postgresql.org/pgsql-hackers/2009-05/msg00078.php). > > The btree_gist int4 compress function calls the generic > gbt_num_compress, which does a palloc. Maybe this palloc is allso hit > al lot when scanning the index, because the constants that are queries > with are repeatedly compressed and palloced. Looked in the code a bit more - only the index nodes are compressed at index creation, the consistent functions does not compress queries, so not pallocs there. However when running Mathews example from http://archives.postgresql.org/pgsql-performance/2009-04/msg00276.php with the gist index, the coverage shows in gistget.c: 1000000 palloc0 's of gistsearchstack at line 152 and 2010982 palloc's also of the gistsearchstack on line 342. Two pfrees are also hit a lot: line 195: 1010926 of a stackentry and line 293: 200056 times. My $0.02 cents is that the pain is here. My knowledge of gistget or the other sources in access/gist is zero, but couldn't it be possible to determine the maximum needed size of the stack and then allocate it at once and use a pop/push kind off api? regards, Yeb Havinga > > regards, > Yeb Havinga > >
pgsql-performance by date: