Re: "Big O" notation for postgres? - Mailing list pgsql-performance

From H. Hall
Subject Re: "Big O" notation for postgres?
Date
Msg-id 483AD082.8020104@reedyriver.com
Whole thread Raw
In response to Re: "Big O" notation for postgres?  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-performance
Gregory Stark wrote:
> "Richard Huxton" <dev@archonet.com> writes:
>
>
>> Jonah H. Harris wrote:
>>
>>> On Wed, May 21, 2008 at 10:10 AM, H. Hall wrote:
>>>
>>>> Does anyone know if there is a source that provides "Big O" notation for
>>>> postgres's aggregate functions and operations?  For example is count(*) =
>>>> O(1) or O(n)?
>>>>
>>> I don't know of any document containing the complexity of each
>>> aggregate, but it's sometimes left as a comment in the souce code.
>>>
>> Recent max() and min() can be O(n) or O(1) depending on the where-clause and
>> presence of an index too, just to muddy the waters.
>>
When I first read the above from Jonah I just assumed some Postgres
magic was producing O(1). After seeing this again, I believe that
Postgres must be doing something like the following for max and min:
Max:     ORDER BY colName  DESC LIMIT 1
Min:      ORDER BY coName ASC LIMIT 1

Thus Jonah's caveat about using an index. If postgres is using an index
as in the above then the Max and Min functions would both be O(log N) ,
this is  log base2 of N, which is the time it takes to search a balanced
binary tree, not O(1) which implies a constant time to perform,
regardless of the size of the dataset N.

>
> Hm, true. But excluding those special cases all Postgres aggregate functions
> will be O(n) unless they're doing something very odd. None of the built-in
> functions (except min/max as noted) do anything odd like that.
>
> The reason way is because of the basic design of Postgres aggregate functions.
> They are fed every tuple one by one and have to keep their state in a single
> variable. Most of the aggregate functions like count(*) etc just keep a static
> non-growing state and the state transition function is a simple arithmetic
> operation which is O(1). So the resulting operation is O(n).
>
> Actually one exception would be something like
>
> CREATE AGGREGATE array_agg(anyelement) (SFUNC = array_append, STYPE = anyarray, INITCOND='{}');
>
> Since the state variable has to keep accumulating more and more stuff the
> array_append becomes more and more expensive (it has to generate a new array
> so it has to copy the existing stuff). So actually it woul dbe O(n^2).
>
> The only builtin aggregate which looks like it falls in this category would be
> xmlagg()
>
>
Functions with O(N^2) scale very badly of course.

It would be very handy if the Planner could kick out Big O with its
estimates.  This would allow one to quickly tell how a query scales with
a large number of rows.

Thanks,
HH

--
H. Hall
ReedyRiver Group LLC
http://www.reedyriver.com


pgsql-performance by date:

Previous
From: "Campbell, Lance"
Date:
Subject: Symbolic Links to Tablespaces
Next
From: Tom Lane
Date:
Subject: Re: Symbolic Links to Tablespaces