Re: cube operations slower than geo_distance() on production server - Mailing list pgsql-performance

From Mark Stosberg
Subject Re: cube operations slower than geo_distance() on production server
Date
Msg-id eqq3k3$1mpu$1@news.hub.org
Whole thread Raw
In response to Re: cube operations slower than geo_distance() on production server  ("Merlin Moncure" <mmoncure@gmail.com>)
Responses Re: cube operations slower than geo_distance() on production server
List pgsql-performance
Merlin Moncure wrote:
> On 2/10/07, Mark Stosberg <mark@summersault.com> wrote:
>>
>> With the help of some of this list, I was able to successfully  set up
>> and benchmark a cube-based replacement for geo_distance() calculations.
>>
>> On a development box, the cube-based variations benchmarked consistently
>> running in about 1/3 of the time of the gel_distance() equivalents.
>>
>> After setting up the same columns and indexes on a production
>> database, it's a different story. All the cube operations show
>> themselves to be about the same as, or noticeably slower than, the same
>> operations done with geo_distance().
>>
>> I've stared at the EXPLAIN ANALYZE output as much I can to figure what's
>> gone. Could you help?
>>
>> Here's the plan on the production server, which seems too slow. Below
>> is the plan I get in
>> on the development server, which is much faster.
>>
>> I tried "set enable_nestloop = off", which did change the plan, but
>> the performance.
>>
>> The production DB has much more data in it, but I still expected
>> comparable results relative
>> to using geo_distance() calculations.
>
> any objection to posting the query (any maybe tables, keys, indexes, etc)?

Here the basic query I'm using:
SELECT
 -- 1609.344 is a constant for "meters per mile"
 cube_distance( (SELECT earth_coords from zipcodes WHERE zipcode =
'90210') , earth_coords)/1609.344
   AS RADIUS
   FROM pets
   -- "shelters_active" is a view where "shelter_state = 'active'"
   JOIN shelters_active as shelters USING (shelter_id)
   -- The zipcode fields here are varchars
   JOIN zipcodes ON (
        shelters.postal_code_for_joining = zipcodes.zipcode )
   -- search for just 'dogs'
   WHERE species_id = 1
       AND pet_state='available'
      AND earth_box(
        (SELECT earth_coords from zipcodes WHERE zipcode = '90210') ,
10*1609.344
      ) @ earth_coords
   ORDER BY RADIUS;

All the related columns are indexed:
   pets.species_id
   pets.shelter_id
   pets.pet_state

   shelters.shelter_id (pk)
   shelters.postal_code_for_joining
   shelters.active

   zipcodes.zipcode (pk)
   zipcodes.earth_coords

The pets table has about 300,000 rows, but only about 10% are
"available". It sees regular updates and is "vacuum analyzed" every
couple of hours now. the rest of the tables get "vacuum analyzed
nightly".  The shelters table is about 99% "shelter_state = active".
It's updated infrequently.

The zipcodes table has about 40,000 rows in it and doesn't change.

I tried a partial index on the pets table "WHERE pet_state =
'available'. I could see the index was used, but the performance was
unaffected.

The "EXPLAIN ANALYZE" output is attached, to try to avoid mail-client
wrapping. The query is running 10 times slower today than on Friday,
perhaps because of server load, or because we are at the end of a VACUUM
cycle.

Thanks for any help!

    Mark
Sort  (cost=6887.03..6887.10 rows=27 width=32) (actual time=17925.098..17927.979 rows=1324 loops=1)
   Sort Key: (cube_distance($0, zipcodes.earth_coords) / 1609.344::double precision)
   InitPlan
     ->  Index Scan using zipcodes_pkey on zipcodes  (cost=0.00..3.01 rows=1 width=32) (actual time=0.028..0.031 rows=1
loops=1)Index Cond: ((zipcode)::text = '90210'::text) 
     ->  Index Scan using zipcodes_pkey on zipcodes  (cost=0.00..3.01 rows=1 width=32) (actual time=0.687..0.692 rows=1
loops=1)Index Cond: ((zipcode)::text = '90210'::text) 
   ->  Nested Loop  (cost=568.82..6880.36 rows=27 width=32) (actual time=346.932..17919.697 rows=1324 loops=1)
         ->  Nested Loop  (cost=2.15..572.14 rows=9 width=36) (actual time=8.321..43.378 rows=136 loops=1)
               ->  Bitmap Heap Scan on zipcodes  (cost=2.15..150.05 rows=42 width=41) (actual time=3.442..4.402
rows=240loops=1) 
                     Recheck Cond: (cube_enlarge(($1)::cube, 16093.4357308298::double precision, 3) @ earth_coords)
                     ->  Bitmap Index Scan on zip_earth_coords_idx  (cost=0.00..2.15 rows=42 width=0) (actual
time=3.426..3.426rows=240 loops=1) 
                     Index Cond: (cube_enlarge(($1)::cube, 16093.4357308298::double precision, 3) @ earth_coord)
               ->  Index Scan using shelters_postal_code_for_joining_idx on shelters  (cost=0.00..10.02 rows=2
width=12)(actual time=0.093..0.155 rows=1 loops=240) Index Cond: ((shelters.postal_code_for_joining)::text =
("outer".zipcode)::text)
         ->  Bitmap Heap Scan on pets  (cost=566.67..700.47 rows=34 width=4) (actual time=130.363..131.367 rows=10
loops=136)Recheck Cond: ((pets.shelter_id = "outer".shelter_id) AND ((pets.pet_state)::text = 'available'::text)) 
               Filter: (species_id = 1)
               ->  BitmapAnd  (cost=566.67..566.67 rows=34 width=0) (actual time=129.333..129.333 rows=0 loops=136)
                     ->  Bitmap Index Scan on pets_shelter_id_idx  (cost=0.00..3.92 rows=263 width=0) (actual
time=0.164..0.164rows=178 loops=136) Index Cond: (pets.shelter_id = "outer".shelter_id) 
                     ->  Bitmap Index Scan on pets_pet_state_idx  (cost=0.00..562.50 rows=39571 width=0) (actual
time=213.620..213.620rows=195599 loops=82) 
                           Index Cond: ((pet_state)::text = 'available'::text)
 Total runtime: 17933.675 ms

pgsql-performance by date:

Previous
From: "Merlin Moncure"
Date:
Subject: Re: cube operations slower than geo_distance() on production server
Next
From: "Merlin Moncure"
Date:
Subject: Re: cube operations slower than geo_distance() on production server