Thread: Bad query optimisation

Bad query optimisation

From
Greg Stark
Date:
This is weird, it seems like min and max aren't being optimised symmetrically.
It seems like both of these should result in similar plans and run equally
fast. Instead the first is actually really slow and the second is perfectly
quick.



foo=# explain select max(postalcode) from postalcodes where postalcode < 'K0C1N2';

Aggregate  (cost=123.59..123.59 rows=1 width=10) ->  Index Scan using postalcodes_pkey on postalcodes
(cost=0.00..120.50rows=1234 width=10)
 


foo=# explain select min(postalcode) from postalcodes where postalcode > 'K0C1N2';

Aggregate  (cost=10373.45..10373.45 rows=1 width=10) ->  Seq Scan on postalcodes  (cost=0.00..9697.11 rows=270535
width=10)

-- 
greg



Re: Bad query optimisation

From
"Christopher Kings-Lynne"
Date:
> This is weird, it seems like min and max aren't being optimised
symmetrically.
> It seems like both of these should result in similar plans and run equally
> fast. Instead the first is actually really slow and the second is
perfectly
> quick.

Without knowing anything about your data, if Postgres knows (from its stats
tables) that 90% of the values in your column are above 'K0C1N2' then it
will of course do a seq scan for the second query.

If that is incorrect, then have your gone 'ANALYZE postalcodes' recently?

Cheers,

Chris

> foo=# explain select max(postalcode) from postalcodes where postalcode <
'K0C1N2';
>
> Aggregate  (cost=123.59..123.59 rows=1 width=10)
>   ->  Index Scan using postalcodes_pkey on postalcodes  (cost=0.00..120.50
rows=1234 width=10)
>
>
> foo=# explain select min(postalcode) from postalcodes where postalcode >
'K0C1N2';
>
> Aggregate  (cost=10373.45..10373.45 rows=1 width=10)
>   ->  Seq Scan on postalcodes  (cost=0.00..9697.11 rows=270535 width=10)
>
> --
> greg
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>



Re: Bad query optimisation

From
Greg Stark
Date:
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:

> > This is weird, it seems like min and max aren't being optimised
> > symmetrically. It seems like both of these should result in similar plans
> > and run equally fast. Instead the first is actually really slow and the
> > second is perfectly quick.
> 
> Without knowing anything about your data, if Postgres knows (from its stats
> tables) that 90% of the values in your column are above 'K0C1N2' then it
> will of course do a seq scan for the second query.

Oops, you're right that was a bad diagnosis. When I use the midpoint of the
data set they both get optimized into the same plan.

However I still think there's something wrong. It looks like postgres doesn't
know that it's possible to calculate min and max without scanning every
record.

When I run the queries below there's a big variance between the 2.3s required
to find the minimum for the whole dataset, and the .098s required to find the
maximum of the small subset below K0C1N2. Because there's an index on
postalcode it should be possible to find the minimum of any range in a single
index lookup.

It seems like this should be an important optimization given the number of
applications that request max(foo) in a broken attempt to implement sequences.
Occasionally it's not even a broken attempt too.

Incidentally, this is Postgres 7.2. Is this improved in 7.3?


bash-2.05b$ time psql -d salesoutlook -c "select min(postalcode) from postalcodes" min   
--------K0A1A0
(1 row)


real    0m2.334s
user    0m0.040s
sys    0m0.010s

bash-2.05b$ time psql -d salesoutlook -c "select max(postalcode) from postalcodes where postalcode < 'K0C1N2'" max   
--------K0C1N0
(1 row)


real    0m0.098s
user    0m0.030s
sys    0m0.020s

bash-2.05b$ time psql -d salesoutlook -c "select max(postalcode) from postalcodes where postalcode < 'L9J1J2'" max   
--------L9J1J1
(1 row)


real    0m2.128s
user    0m0.030s
sys    0m0.010s

-- 
greg



Re: Bad query optimisation

From
Bruno Wolff III
Date:
On Sat, Nov 30, 2002 at 18:23:56 -0500, Greg Stark <gsstark@mit.edu> wrote:
> 
> It seems like this should be an important optimization given the number of
> applications that request max(foo) in a broken attempt to implement sequences.
> Occasionally it's not even a broken attempt too.

This has been discussed on the mailing lists many times. You can rewrite
the queries using order by and limit clauses if the column is appropiately
indexed.