Re: slow plan for min/max - Mailing list pgsql-performance

From Christopher Browne
Subject Re: slow plan for min/max
Date
Msg-id 60n0df5b5r.fsf@dev6.int.libertyrms.info
Whole thread Raw
In response to slow plan for min/max  (Pailloncy Jean-Gérard <pailloncy@ifrance.com>)
List pgsql-performance
scott.marlowe@ihs.com ("scott.marlowe") writes:
> On Sun, 7 Sep 2003, Pailloncy Jean-Gérard wrote:
>
> Asking a question about why max(id) is so much slower than select id order
> by id desc limit 1, Pailloncy said:
>
>> I ask for the same thing.  That's better !
>
> This is a Frequently asked question about something that isn't
> likely to change any time soon.
>
> Basically, Postgresql uses an MVCC locking system that makes
> massively parallel operation possible, but costs in certain areas,
> and one of those areas is aggregate performance over large sets.
> MVCC makes it very hard to optimize all but the simplest of
> aggregates, and even those optimzations which are possible would
> wind up being quite ugly at the parser level.

MVCC makes it difficult to optimize aggregates resembling COUNT(*) or
SUM(*), at least vis-a-vis having this available for a whole table
(e.g. - you have to be doing 'SELECT COUNT(*), SUM(SOMEFIELD) FROM
THIS_TABLE' with NO "WHERE" clause).

But there is nothing about MVCC that makes it particularly difficult
to handle the transformation:

 select max(field) from some_table where another_field <
    still_another_field;

   (which isn't particularly efficient) into

 select field from some_table where another_field <
    still_another_field order by field desc limit 1;

The problems observed are thus:

 1.  If the query asks for other data, it might be necessary to scan
     the table to get the other data, making the optimization
     irrelevant;

 2.  If there's a good index to key on, the transformed version might
     be a bunch quicker, but it is nontrivial to determine that, a
     priori;

 3.  It would be a fairly hairy optimization to throw into the query
     optimizer, so people are reluctant to try to do so.

Note that MVCC has _nothing_ to do with any of those three problems.

The MVCC-related point is that there is reluctance to create some
special case that will be troublesome to maintain instead of having
some comprehensive handling of _all_ aggregates.  It seems a better
idea to "fix them all" rather than to kludge things up by fixing one
after another.
--
let name="cbbrowne" and tld="cbbrowne.com" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/lisp.html
Signs of a Klingon Programmer -  10. "A TRUE  Klingon Warrior does not
comment his code!"

pgsql-performance by date:

Previous
From: Neil Conway
Date:
Subject: Re: slow plan for min/max
Next
From: "scott.marlowe"
Date:
Subject: Re: slow plan for min/max