Thread: Bad query optimisation
This is weird, it seems like min and max aren't being optimised symmetrically. It seems like both of these should result in similar plans and run equally fast. Instead the first is actually really slow and the second is perfectly quick. foo=# explain select max(postalcode) from postalcodes where postalcode < 'K0C1N2'; Aggregate (cost=123.59..123.59 rows=1 width=10) -> Index Scan using postalcodes_pkey on postalcodes (cost=0.00..120.50rows=1234 width=10) foo=# explain select min(postalcode) from postalcodes where postalcode > 'K0C1N2'; Aggregate (cost=10373.45..10373.45 rows=1 width=10) -> Seq Scan on postalcodes (cost=0.00..9697.11 rows=270535 width=10) -- greg
> This is weird, it seems like min and max aren't being optimised symmetrically. > It seems like both of these should result in similar plans and run equally > fast. Instead the first is actually really slow and the second is perfectly > quick. Without knowing anything about your data, if Postgres knows (from its stats tables) that 90% of the values in your column are above 'K0C1N2' then it will of course do a seq scan for the second query. If that is incorrect, then have your gone 'ANALYZE postalcodes' recently? Cheers, Chris > foo=# explain select max(postalcode) from postalcodes where postalcode < 'K0C1N2'; > > Aggregate (cost=123.59..123.59 rows=1 width=10) > -> Index Scan using postalcodes_pkey on postalcodes (cost=0.00..120.50 rows=1234 width=10) > > > foo=# explain select min(postalcode) from postalcodes where postalcode > 'K0C1N2'; > > Aggregate (cost=10373.45..10373.45 rows=1 width=10) > -> Seq Scan on postalcodes (cost=0.00..9697.11 rows=270535 width=10) > > -- > greg > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org >
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes: > > This is weird, it seems like min and max aren't being optimised > > symmetrically. It seems like both of these should result in similar plans > > and run equally fast. Instead the first is actually really slow and the > > second is perfectly quick. > > Without knowing anything about your data, if Postgres knows (from its stats > tables) that 90% of the values in your column are above 'K0C1N2' then it > will of course do a seq scan for the second query. Oops, you're right that was a bad diagnosis. When I use the midpoint of the data set they both get optimized into the same plan. However I still think there's something wrong. It looks like postgres doesn't know that it's possible to calculate min and max without scanning every record. When I run the queries below there's a big variance between the 2.3s required to find the minimum for the whole dataset, and the .098s required to find the maximum of the small subset below K0C1N2. Because there's an index on postalcode it should be possible to find the minimum of any range in a single index lookup. It seems like this should be an important optimization given the number of applications that request max(foo) in a broken attempt to implement sequences. Occasionally it's not even a broken attempt too. Incidentally, this is Postgres 7.2. Is this improved in 7.3? bash-2.05b$ time psql -d salesoutlook -c "select min(postalcode) from postalcodes" min --------K0A1A0 (1 row) real 0m2.334s user 0m0.040s sys 0m0.010s bash-2.05b$ time psql -d salesoutlook -c "select max(postalcode) from postalcodes where postalcode < 'K0C1N2'" max --------K0C1N0 (1 row) real 0m0.098s user 0m0.030s sys 0m0.020s bash-2.05b$ time psql -d salesoutlook -c "select max(postalcode) from postalcodes where postalcode < 'L9J1J2'" max --------L9J1J1 (1 row) real 0m2.128s user 0m0.030s sys 0m0.010s -- greg
On Sat, Nov 30, 2002 at 18:23:56 -0500, Greg Stark <gsstark@mit.edu> wrote: > > It seems like this should be an important optimization given the number of > applications that request max(foo) in a broken attempt to implement sequences. > Occasionally it's not even a broken attempt too. This has been discussed on the mailing lists many times. You can rewrite the queries using order by and limit clauses if the column is appropiately indexed.