Thread: GiST -- making my index faster makes is slower

GiST -- making my index faster makes is slower

From
David Blasby
Date:
I just tried an experiment that I thought would improve the performance 
of the PostGIS spatial index.

The old way used a BOX (bounding box of 4 doubles) as the key in the index.

The new way uses a bounding box of 4 single-precision floats (its only 
16 bytes long - a BOX is 32).  I thought that it would significantly 
reduce the size of the index (it did) and that the indexing would be 
faster since there's less disk pages to fiddle through and these pages 
would be more likely to fit in the disk cache.

It turns out this is not the case - its significantly slower.  I think 
it to do with more keys fitting in the leaf tuples.

As far as I can tell, the GiST index looks through the internal nodes, 
then when it hits a leaf node, it search through each of the keys in the 
leaf.

I save time searching through the internal nodes (because there's less 
of them) - this is O(log n) saving.

I lose time searching through the leafs (because there's more keys in a 
leaf) - this is O(n) cost.

For a query that does a nested loop of 10,000 index scans (on the same 
table), the old method takes about 2 second, the new "faster" method 
takes about 20 seconds.

I'm still trying to verify that this is the actual reason why its slower 
- anyone have any other ideas?  Anyone know a good way to profile this?

dave


Re: GiST -- making my index faster makes is slower

From
David Blasby
Date:
> Couple of observations:
> 
> 1. Are you sure your data is handled as 32 bit all the way through?  Run
> time casting will offset performance gains on 32 bit floats.  Is your
> comparison routine casting to double?

I thought this might be the case - but I thought it would be small.  The 
only place it might be doing hidden casts would be in statements like
"query->xmin <= key->xmax".


> 2. Math CPUs usually crunch at 80 bits, can't save much by using 32 bit
> floats, although cache coherency will be better.
> 
> 3. 64 bit cpu will probably run better on 64 bit floats.
> 
> 4. Is your dataset congested enough that you now have duplicate values
> by loss of precision?   This would of course impact performance.  How
> big is your dataset?  How big is your avg. result set?

My test datasets are quite spatially separate, so I dont expect there to 
be any "accidental" overlaps.  There's about 12 million rows (in total) 
in the datasets - I've only noticed about 2 of these overlaps.  My test 
datasets are 1000 rows, 10,000 rows, 10,000,000 rows, and a few 
different ones in the 200,000 row range.

I'm testing queries that return anywhere from 1 geometry to about 10,000.

The actual index search is only a few milliseconds longer using the 
float32 bounding box for a mid-sized table returning a handfull of rows.  When the result sets get bigger (or you start
doingnested queries), 
 
the performance differences become greater.

dave


Re: GiST -- making my index faster makes is slower

From
Tom Lane
Date:
David Blasby <dblasby@refractions.net> writes:
> The old way used a BOX (bounding box of 4 doubles) as the key in the index.
> The new way uses a bounding box of 4 single-precision floats (its only 
> 16 bytes long - a BOX is 32).

> It turns out this is not the case - its significantly slower.  I think 
> it to do with more keys fitting in the leaf tuples.

Hm.  With twice as many keys per page, you'd think that it could be no
more than 2x slower, if the problem is one of O(n) search through the
keys on the page.  That doesn't explain a 10x slowdown.  Could it be
that some part of the GIST code is O(n^2), or worse, in the number of
keys per page?  If so, perhaps it's fixable.

I'd suggest profiling the backend with both key types to get an idea of
where the time is going.
        regards, tom lane

PS: actually, allowing for the 12-byte index tuple overhead, you
couldn't have even twice as many keys per page.  So there's something
mighty odd here.  Keep us posted.


Re: GiST -- making my index faster makes is slower

From
David Blasby
Date:
Tom Lane wrote:

> I'd suggest profiling the backend with both key types to get an idea of
> where the time is going.

I've been trying to use gprof to do some profiling, but I'm having 
troubles.  Whats the best way to profile?


> PS: actually, allowing for the 12-byte index tuple overhead, you
> couldn't have even twice as many keys per page.  So there's something
> mighty odd here.  Keep us posted.

Using the old system, I'd get about 7 internal node hits and about 110 
"leaf" hits.  Under the new one, I get about 4 internal node hits and 
about 160 "leaf" hits.

I'm just in the process of changing the key to:

typedef struct
{float xmin;float ymin;float xmax;float ymax;char  junk[16]; // make the 16 byte type into 32!
} BOX2DFLOAT4;

To see what happens.

dave


Re: GiST -- making my index faster makes is slower

From
Tom Lane
Date:
David Blasby <dblasby@refractions.net> writes:
> Tom Lane wrote:
>> I'd suggest profiling the backend with both key types to get an idea of
>> where the time is going.

> I've been trying to use gprof to do some profiling, but I'm having 
> troubles.  Whats the best way to profile?

It's not hard; the only real gotcha is that on Linux systems you need to
compile with -DLINUX_PROFILE so that the postmaster works around some
Linux brain damage with dropping the profile status at fork().
I usually do (in an already configured and built source tree)
cd src/backendmake cleanmake PROFILE="-pg -DLINUX_PROFILE" allmake install-bin

Don't forget that the backends will drop their gmon.out files in
$PGDATA/data/DBOID/gmon.out.
        regards, tom lane


Re: GiST -- making my index faster makes is slower

From
David Blasby
Date:
Humm -- strange results here:


typedef struct
{float xmin;float ymin;float xmax;float ymax;
} BOX2DFLOAT4;

This takes about 18,000 ms to do a nested query with 10,000 iterations.

typedef struct
{float xmin;float ymin;float xmax;float ymax;char  junk[16];
} BOX2DFLOAT4;

This takes about 15,000 ms to do a nested query with 10,000 iterations.


typedef struct
{double xmin;double ymin;double xmax;double ymax;
} BOX2DFLOAT4;

This takes about 1500 ms to do a nested query with 10,000 iterations.
Yes - that almost 14 seconds faster!

This doesnt make a lot of sense since the only part of the GiST index 
thats being called that actually looks at the bounding boxes is this:

 retval = (((key->xmax>= query->xmax) &&          (key->xmin <= query->xmax)) ||          ((query->xmax>= key->xmax) &&
        (query->xmin<= key->xmax)))             &&          (((key->ymax>= query->ymax) &&          (key->ymin<=
query->ymax))||          ((query->ymax>= key->ymax) &&          (query->ymin<= key->ymax)));
 

dave


Re: GiST -- making my index faster makes is slower

From
Tom Lane
Date:
David Blasby <dblasby@refractions.net> writes:
> Humm -- strange results here:
> typedef struct
> {
>     float xmin;
>     float ymin;
>     float xmax;
>     float ymax;
> } BOX2DFLOAT4;

> This takes about 18,000 ms to do a nested query with 10,000 iterations.

> typedef struct
> {
>     float xmin;
>     float ymin;
>     float xmax;
>     float ymax;
>     char  junk[16];
> } BOX2DFLOAT4;

> This takes about 15,000 ms to do a nested query with 10,000 iterations.

That is strange --- I'm still suspecting an O(n^2) bit of logic
somewhere.  But the size of the discrepancy is a little bit more
reasonable.  Could you profile these two cases and see where the extra
time goes?

> typedef struct
> {
>     double xmin;
>     double ymin;
>     double xmax;
>     double ymax;
> } BOX2DFLOAT4;

> This takes about 1500 ms to do a nested query with 10,000 iterations.
> Yes - that almost 14 seconds faster!

AFAICS there are only two possible explanations:

1. float-vs-float comparison is a lot slower than double-vs-double on
your hardware.  Actually, the comparisons might be float-vs-double
(is the "query" struct the same as the "key"??).  The compiler could
well be promoting one or both floats to double and then doing double
comparison, but even so, that's a heck of a large overhead.

2. The float bounding boxes are presumably slightly inaccurate.  If they
are a tad too large then perhaps you are visiting more leaf pages than
you need to.  (A worrisome possibility is that a float box could be
slightly too small, causing you to fail to visit an entry you should
:-()
        regards, tom lane


Re: GiST -- making my index faster makes is slower

From
David Blasby
Date:
I tracked the problem down to the "penalty" function used to build the 
tree.   Basically, it compares the area of the bounding boxes.

There wasnt enough precision in the area calculations - instead of 
giving 0.0 it would give numbers in the[+-]10^-6 range.  This really 
screwed up how the tree was being built.

I cast the float to doubles when calculating area, and it all works well 
now.

dave