Thread: "Big O" notation for postgres?
Does anyone know if there is a source that provides "Big O" notation for postgres's aggregate functions and operations? For example is count(*) = O(1) or O(n)? Do the developers for postgres use Big O when selecting algorithms? If so, is the info easily available? Thanks, HH -- H. Hall ReedyRiver Group LLC site: reedyriver.com
On Wed, May 21, 2008 at 10:10 AM, H. Hall <hhall1001@reedyriver.com> wrote: > Does anyone know if there is a source that provides "Big O" notation for > postgres's aggregate functions and operations? For example is count(*) = > O(1) or O(n)? I don't know of any document containing the complexity of each aggregate, but it's sometimes left as a comment in the souce code. IIRC, COUNT (non-distinct) is currently O(n), where n also includes evaluation of tuples not represented in the final count (due to Postgres' MVCC design). -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/
Jonah H. Harris wrote: > On Wed, May 21, 2008 at 10:10 AM, H. Hall <hhall1001@reedyriver.com> wrote: >> Does anyone know if there is a source that provides "Big O" notation for >> postgres's aggregate functions and operations? For example is count(*) = >> O(1) or O(n)? > > I don't know of any document containing the complexity of each > aggregate, but it's sometimes left as a comment in the souce code. Recent max() and min() can be O(n) or O(1) depending on the where-clause and presence of an index too, just to muddy the waters. -- Richard Huxton Archonet Ltd
On Wed, 21 May 2008 16:10:53 +0200, H. Hall <hhall1001@reedyriver.com> wrote: > Does anyone know if there is a source that provides "Big O" notation for > postgres's aggregate functions and operations? For example is count(*) > = O(1) or O(n)? > > Do the developers for postgres use Big O when selecting algorithms? If > so, is the info easily available? You can't do any better than O( n rows examined by the aggregate ) except for max() and min() on an indexed expression, which in this case aren't really aggrgates anymore since they are internally rewritten as an index lookup to get the value you want... but stuff like sum() or avg() or count() will always have to see all the rows selected (and some more) unless you use clever hacks like materialized views etc, in which case the thing in the O() will change, or at least the O() constant will change...
PFC wrote: > On Wed, 21 May 2008 16:10:53 +0200, H. Hall wrote: > >> Does anyone know if there is a source that provides "Big O" notation >> for postgres's aggregate functions and operations? For example is >> count(*) = O(1) or O(n)? >> >> Do the developers for postgres use Big O when selecting algorithms? >> If so, is the info easily available? > > You can't do any better than O( n rows examined by the aggregate ) > except for max() and min() on an indexed expression, which in this > case aren't really aggrgates anymore since they are internally > rewritten as an index lookup to get the value you want... but stuff > like sum() or avg() or count() will always have to see all the rows > selected (and some more) unless you use clever hacks like materialized > views etc, in which case the thing in the O() will change, or at least > the O() constant will change... > Thank you PFC and also Jonah, and Richard for your replies. It occurs to me that Big O might be a useful way to understand/explain what is happening with situations like Albert's earlier today: I've got a query similar to this: > > > > select * from t1, t2 where t1.id > 158507 and t1.id = t2.id; > > > > That took > 84 minutes (the query was a bit longer but this is the part > > that made the difference) after a little change the query took ~1 second: > > > > select * from t1, t2 where t1.id > 158507 and t2.id > 158507 and t1.id = > > t2.id; BTW, anyone reading this and not familiar with Big O notation might want to check out these links. All are intro type articles: * An informal introduction to O(N) notation: http://www.perlmonks.org/?node_id=227909 * Analysis of Algorithms and Selection of Algorithms: http://www.cs.utk.edu/~parker/Courses/CS302-Fall06/Notes/complexity.html * Complexity and Big-O Notation http://pages.cs.wisc.edu/~hasti/cs367-common/notes/COMPLEXITY.html -- H. Hall ReedyRiver Group LLC http://www.reedyriver.com
"Richard Huxton" <dev@archonet.com> writes: > Jonah H. Harris wrote: >> On Wed, May 21, 2008 at 10:10 AM, H. Hall <hhall1001@reedyriver.com> wrote: >>> Does anyone know if there is a source that provides "Big O" notation for >>> postgres's aggregate functions and operations? For example is count(*) = >>> O(1) or O(n)? >> >> I don't know of any document containing the complexity of each >> aggregate, but it's sometimes left as a comment in the souce code. > > Recent max() and min() can be O(n) or O(1) depending on the where-clause and > presence of an index too, just to muddy the waters. Hm, true. But excluding those special cases all Postgres aggregate functions will be O(n) unless they're doing something very odd. None of the built-in functions (except min/max as noted) do anything odd like that. The reason way is because of the basic design of Postgres aggregate functions. They are fed every tuple one by one and have to keep their state in a single variable. Most of the aggregate functions like count(*) etc just keep a static non-growing state and the state transition function is a simple arithmetic operation which is O(1). So the resulting operation is O(n). Actually one exception would be something like CREATE AGGREGATE array_agg(anyelement) (SFUNC = array_append, STYPE = anyarray, INITCOND='{}'); Since the state variable has to keep accumulating more and more stuff the array_append becomes more and more expensive (it has to generate a new array so it has to copy the existing stuff). So actually it woul dbe O(n^2). The only builtin aggregate which looks like it falls in this category would be xmlagg() -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!
Gregory Stark wrote: > "Richard Huxton" <dev@archonet.com> writes: > > >> Jonah H. Harris wrote: >> >>> On Wed, May 21, 2008 at 10:10 AM, H. Hall wrote: >>> >>>> Does anyone know if there is a source that provides "Big O" notation for >>>> postgres's aggregate functions and operations? For example is count(*) = >>>> O(1) or O(n)? >>>> >>> I don't know of any document containing the complexity of each >>> aggregate, but it's sometimes left as a comment in the souce code. >>> >> Recent max() and min() can be O(n) or O(1) depending on the where-clause and >> presence of an index too, just to muddy the waters. >> When I first read the above from Jonah I just assumed some Postgres magic was producing O(1). After seeing this again, I believe that Postgres must be doing something like the following for max and min: Max: ORDER BY colName DESC LIMIT 1 Min: ORDER BY coName ASC LIMIT 1 Thus Jonah's caveat about using an index. If postgres is using an index as in the above then the Max and Min functions would both be O(log N) , this is log base2 of N, which is the time it takes to search a balanced binary tree, not O(1) which implies a constant time to perform, regardless of the size of the dataset N. > > Hm, true. But excluding those special cases all Postgres aggregate functions > will be O(n) unless they're doing something very odd. None of the built-in > functions (except min/max as noted) do anything odd like that. > > The reason way is because of the basic design of Postgres aggregate functions. > They are fed every tuple one by one and have to keep their state in a single > variable. Most of the aggregate functions like count(*) etc just keep a static > non-growing state and the state transition function is a simple arithmetic > operation which is O(1). So the resulting operation is O(n). > > Actually one exception would be something like > > CREATE AGGREGATE array_agg(anyelement) (SFUNC = array_append, STYPE = anyarray, INITCOND='{}'); > > Since the state variable has to keep accumulating more and more stuff the > array_append becomes more and more expensive (it has to generate a new array > so it has to copy the existing stuff). So actually it woul dbe O(n^2). > > The only builtin aggregate which looks like it falls in this category would be > xmlagg() > > Functions with O(N^2) scale very badly of course. It would be very handy if the Planner could kick out Big O with its estimates. This would allow one to quickly tell how a query scales with a large number of rows. Thanks, HH -- H. Hall ReedyRiver Group LLC http://www.reedyriver.com