Re: cube operations slower than geo_distance() on production server - Mailing list pgsql-performance

From Tom Lane
Subject Re: cube operations slower than geo_distance() on production server
Date
Msg-id 18714.1171437344@sss.pgh.pa.us
Whole thread Raw
In response to Re: cube operations slower than geo_distance() on production server  (Mark Stosberg <mark@summersault.com>)
Responses Re: cube operations slower than geo_distance() on production server  ("Merlin Moncure" <mmoncure@gmail.com>)
List pgsql-performance
Mark Stosberg <mark@summersault.com> writes:
> For reference, here's two versions of the query. The first uses
> the old geo_distance(), and the second one is the new cube query I'm
> trying, inspired by your suggested refactoring.

You didn't show EXPLAIN ANALYZE output :-(

Looking back in the thread, the last E.A. output I see is in your
message of 2/12 11:11, and the striking thing there is that it seems all
the time is going into one indexscan:

    ->  Bitmap Index Scan on pets_pet_state_idx  (cost=0.00..562.50 rows=39571 width=0) (actual time=213.620..213.620
rows=195599loops=82) 
              Index Cond: ((pet_state)::text = 'available'::text)
 Total runtime: 17933.675 ms

213.620 * 82 = 17516.840, so this step is all but 400msec of the run.

There are two things wrong here: first, that the estimated row count is
only 20% of actual; it should certainly not be that far off for such a
simple condition.  I wonder if your vacuum/analyze procedures are
actually working.  Second, you mentioned somewhere along the line that
'available' pets are about 10% of all the entries, which means that this
indexscan is more than likely entirely counterproductive: it would be
cheaper to ignore this index altogether.

Suggestions:

1. ANALYZE the table by hand, try the explain again and see if this
rowcount estimate gets better.  If so, you need to look into why your
existing procedures aren't keeping the stats up to date.

2. If, with a more accurate rowcount estimate, the planner still wants
to use this index, try discouraging it.  Brute force would be to drop
the index.  If there are less-common pet_states that are actually worth
searching for, maybe keep the index but change it to a partial index
WHERE pet_state != 'available'.

Also, I don't see that you mentioned anywhere what PG version you are
running, but if it's not the latest then an update might help.  I recall
having fixed a bug that made the planner too eager to AND on an index
that wouldn't actually help much ... which seems to fit this problem
description pretty well.

            regards, tom lane

pgsql-performance by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: quad or dual core Intel CPUs
Next
From: Brian Herlihy
Date:
Subject: An unwanted seqscan