Re: GiST index performance

From: Bruce Momjian
Subject: Re: GiST index performance
Date: ,
Msg-id: 201002252344.o1PNi2007379@momjian.us
(view: Whole thread, Raw)
In response to: Re: GiST index performance  (Matthew Wakeling)
Responses: Re: GiST index performance  (Robert Haas)
List: pgsql-performance

Tree view

GiST index performance  (Matthew Wakeling, )
 Re: GiST index performance  ("Kevin Grittner", )
  Re: GiST index performance  (Matthew Wakeling, )
  Re: GiST index performance  (Tom Lane, )
   Re: GiST index performance  (Matthew Wakeling, )
    Re: GiST index performance  (Tom Lane, )
     Re: GiST index performance  (Matthew Wakeling, )
   Re: GiST index performance  (Matthew Wakeling, )
    Re: GiST index performance  (Matthew Wakeling, )
     Re: GiST index performance  (Tom Lane, )
 Re: GiST index performance  (Matthew Wakeling, )
 Re: GiST index performance  (dforum, )
  Re: GiST index performance  (Tom Lane, )
  Re: GiST index performance  (Craig Ringer, )
 Re: GiST index performance  (Matthew Wakeling, )
  Re: GiST index performance  (Matthew Wakeling, )
   Re: GiST index performance  (Matthew Wakeling, )
    Re: GiST index performance  (Tom Lane, )
     Re: GiST index performance  (Oleg Bartunov, )
 Re: GiST index performance  (Matthew Wakeling, )
  Re: GiST index performance  (Bruce Momjian, )
   Re: GiST index performance  (Robert Haas, )
    Re: GiST index performance  (Bruce Momjian, )

Was this corrected?  I don't see any commits to seg.c.

---------------------------------------------------------------------------

Matthew Wakeling wrote:
> On Thu, 7 May 2009, Oleg Bartunov wrote:
> > Did you try Guttman quadratic split algorithm ? We also found linear
> > split algorithm for Rtree.
>
> The existing (bugfixed) seg split algorithm is the Guttman quadratic split
> algorithm. Guttman did all his work on two-dimensional and above data,
> dismissing one-dimensional data as being handled adequately by B-trees,
> which is not true for segment overlaps. It turns out that the algorithm
> has a weakness with certain types of data, and one-dimensional data is
> almost certain to exercise that weakness. The greater the number of
> dimensions, the less the weakness is exercised.
>
> The problem is that the algorithm does not calculate a split pivot.
> Instead it finds two suitable entries, and adds the remaining entries to
> those two in turn. This can lead to the majority of the entries being
> added to just one side. In fact, I saw lots of cases where 367 entries
> were being split into two pages of 366 and one entry.
>
> Guttman's linear split algorithm has the same weakness.
>
> >> One thing I am seeing is a really big difference in performance between
> >> Postgres/GiST and a Java implementation I have written, using the same
> >> algorithms. Postgres takes three minutes to perform a set of index lookups
> >> while java takes six seconds. The old version of bioseg took an hour. I
> >> can't see anything in the GiST support code that could account for this.
> >
> > is the number of index lookups different, or just index lookup time is very
> > big ?
>
> Same number of index lookups. Same algorithms. I have a set of 681879
> segments, and I load them all into the index. I then query the index for
> overlaps for each one in turn. For some reason, GiST lookups seem to be
> slow, even if they are using a good algorithm. I have seen that problem
> with btree_gist on integers too. I can't see any reason for this is the
> GiST code - it all seems pretty tight to me. We probably need to do some
> profiling.
>
> Matthew
>
> --
>  I suppose some of you have done a Continuous Maths course. Yes? Continuous
>  Maths? <menacing stares from audience> Whoah, it was like that, was it!
>                                         -- Computer Science Lecturer
>
> --
> Sent via pgsql-performance mailing list ()
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance

--
  Bruce Momjian  <>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com
  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
  + If your life is a hard drive, Christ can be your backup. +


pgsql-performance by date:

From: Tom Lane
Date:
Subject: Re: No hash join across partitioned tables?
From: Tory M Blue
Date:
Subject: bgwriter, checkpoints, curious (seeing delays)